HBase Issue | google protobuf tag mismatch error while deserialising SCAN st

May 27, 2015, 11:41 pm

≪ Previous: Store data to separate datanodes and grant access to specific datanodes

Replies: 0

Context: I am in the process of migrating my MR jobs on HBase from CDH 2.0.0-cdh4.5.0 (Hadoop1) to HDP 2.2.0.0-2041 (YARN).
After minor changes the code was compiled against HDP 2.2.0.0-2041.

Problem: I am trying to run a oozie workflow that executes a series of MR jobs after creating a scan on HBase. The scan is
created programatically and then serialised-deserialised before handing it to the mapper to fetch batches from HBase.

Issue: When TableInputFormat internally tries to deserialise the scan string, it throws an error indicating that under
the hood google protobuf was not able to deserialise the string. The stack trace looks
as follows.

Exception in thread “main” java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group
tag did not match expected tag.
at com.flipkart.yarn.test.TestScanSerialiseDeserialise.convertStringToScan(TestScanSerialiseDeserialise.java:37)
at com.flipkart.yarn.test.TestScanSerialiseDeserialise.main(TestScanSerialiseDeserialise.java:25)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
at com.google.protobuf.CodedInputStream.readGroup(CodedInputStream.java:241)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:488)
at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan.<init>(ClientProtos.java:13718)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan.<init>(ClientProtos.java:13676)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan$1.parsePartialFrom(ClientProtos.java:13868)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan$1.parsePartialFrom(ClientProtos.java:13863)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan.parseFrom(ClientProtos.java:14555)
at com.flipkart.yarn.test.TestScanSerialiseDeserialise.convertStringToScan(TestScanSerialiseDeserialise.java:35)
… 1 more

Reproducable: I am able to reproduce this in the sample code I have attached https://drive.google.com/file/d/0B5-H2DFQJJZeNWllejlVSjRMbDA/view?usp=sharing

Possible causes: I am suspecting that I missed supplying some dependency or there is some dependency mismatch in
underlying jars.

Appreciate any help in solving this?

↧

Tez and Java heap space

May 28, 2015, 12:26 am

≫ Next: hue-plugin jar file

≪ Previous: HBase Issue | google protobuf tag mismatch error while deserialising SCAN st

Replies: 2

Hi,

When running a Pig on Tez in HDP 2.2 (installed with RPMs) I get OutOfMemoryError: Java heap space.

What configurations should I fiddle with?

Running: 0 Failed: 1 Killed: 6208 FailedTaskAttempts: 61, diagnostics=Vertex failed, vertexName=scope-560, vertexId=vertex_1419952718679_0021_1_06, diagnostics=[Task failed, taskId=task_1419952718679_0021_1_06_000084, diagnostics=[TaskAttempt 0 failed, info=[Error: Fatal Error cause TezChild exit.:java.lang.OutOfMemoryError: Java heap space
at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.<init>(DefaultSorter.java:140)
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:114)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeOutputs(PigProcessor.java:299)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:181)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

↧

hue-plugin jar file

May 28, 2015, 1:29 am

≫ Next: oozie is not allowed to impersonate

≪ Previous: Tez and Java heap space

Replies: 0

I’ve downloaded the HDP 2.2 for centos 6 (linux servers have no internet access) and installed the hue-plugins rpm as instructed, but there is no jar file in the rpm:

rpm -q –filesbypkg hue-plugins-2.6.1.2.2.4.2-2.el6.x86_64
hue-plugins /usr/lib/hadoop/lib

ls -al /usr/lib/hadoop/lib
total 584
drwxr-xr-x 2 root root 4096 Mar 31 22:53 .
drwxr-xr-x 3 root root 4096 May 19 14:17 ..
-rw-r–r– 1 root root 29407 Apr 13 19:29 ambari-log4j-2.0.0.151.jar
-rw-r–r– 1 root root 551290 Apr 13 19:29 postgresql-9.1-901-1.jdbc4.jar

I was expecting a file something like hue-plugins-*.jar somewhere on the linux server.

Any ideas? Thanks.

↧

oozie is not allowed to impersonate

May 28, 2015, 8:34 pm

≫ Next: disctp cant talk to target cluster

≪ Previous: hue-plugin jar file

Replies: 3

Hello,
I am attempting to submit an oozie job via my personal login credentials, and I get an error stating “JA002: User:oozie is not allowed to impersonate user ‘firstname_lastname’ ‘

I had previously modified the Custom core-site.xml in HDFS via Ambari to reflect the following two changes:
hadoop.proxyuser.oozie.groups = *
hadoop.proxyuser.ooozie.hosts = *

(I stopped the HDFS service and saved changes, and started service back up prior to testing)

Additionally under oozie configuration in Ambari, I modified the following advanced variable:
oozie.service.WorkflowAppService.system.libpath = /user/oozie/share/lib

Any ideas would be appreciated. (This is for Hadoop 2.0 on a 10 node cluster)

↧

disctp cant talk to target cluster

May 28, 2015, 9:18 pm

≫ Next: Falcon and schedulling

≪ Previous: oozie is not allowed to impersonate

Replies: 3

We’re looking at backup methods for our hadoop. We found out quickly that disctp will not work to copy data between our clusters because the Name nodes have been configured on networks that are private to each cluster. Therefore the datanodes from our primary cluster can not talk to the namenodes on the target cluster.

However, our edge nodes for our two clusters have been configured so that they can, in fact, talk to both private networks. The network designers have placed nice 1G pipes on the edgenode hosts for the purpose of allowing the edge nodes to move data in and out of the clusters.

The network team is of the opinion that Falcon can extract data from the datanodes in cluster through the single edgenode and transfer the data to the target cluster. My understanding is that Falcon is built on top of distcp and, if distcp can’t move the data, then Falcon can’t move the data.

Who is correct? Can Falcon funnel all the data in a transfer through a single edge node like they believe? [ I realize that, even if possible, it presents a bottleneck but that’s a different issue right now; the question is can it be done?]

↧

Falcon and schedulling

May 28, 2015, 9:23 pm

≫ Next: Falcon UI

≪ Previous: disctp cant talk to target cluster

Replies: 1

Hi,

I am planning to use Falcon in our project to get source and target file names(for ETL) dynamically with the use of feed name.

I have few question. It will be very helpful if some one can guide.

1. I have set Oozie workflows which I am planning to run through Falcon job. All these Oozie workflows should run one after other. I know that on falcon process I can schedule job. But how can I set dependency between these jobs to make it run sequentially. I don’t want to schedule all my falcon jobs.

2. Is there a option in feed file to start some job once feed is received for today? Once all my source files are received I want to start the falcon job. I don’t want to schedule it at specific time.

3. How to tag meta data(schema of the file) in feed file?

↧

Falcon UI

May 28, 2015, 10:10 pm

≫ Next: Ambari agent installation failing

≪ Previous: Falcon and schedulling

Replies: 3

We have HDP 2.2 with Falcon 0.6.0 installed on our cluster. We are able to schedule and run falcon processes, but I can’t find the Falcon dashboards. Falcon UI on port 15000 shows nothing but three tabs with entities configurations. I”ll be very appreciated if some can point to the Lineage and other falcon dashboard. Thanks!

↧

Ambari agent installation failing

May 28, 2015, 10:12 pm

≫ Next: NFS service via Ambari

≪ Previous: Falcon UI

Replies: 4

Hi,

I’m trying to register 9 hosts using Ambari for HDP 2.2. Out of 9, 8 got registered but for one I’m getting the below error :

==========================
Creating target directory…
==========================

Command start time 2015-02-16 17:10:11

Warning: Permanently added ‘l1039lab.sss.se.scania.com’ (RSA) to the list of known hosts.
Connection to l1039lab.sss.se.scania.com closed.
SSH command execution finished
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:11

==========================
Copying common functions script…
==========================

Command start time 2015-02-16 17:10:11

scp /usr/lib/python2.6/site-packages/ambari_commons
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:12

==========================
Copying OS type check script…
==========================

Command start time 2015-02-16 17:10:12

scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:12

==========================
Running OS type check…
==========================

Command start time 2015-02-16 17:10:12
Cluster primary/cluster OS type is redhat6 and local/current OS type is redhat6

Connection to l1039lab.sss.se.scania.com closed.
SSH command execution finished
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:13

==========================
Checking ‘sudo’ package on remote host…
==========================

Command start time 2015-02-16 17:10:13
sudo-1.8.6p3-12.el6.x86_64

Connection to l1039lab.sss.se.scania.com closed.
SSH command execution finished
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:13

==========================
Copying repo file to ‘tmp’ folder…
==========================

Command start time 2015-02-16 17:10:13

scp /etc/yum.repos.d/ambari.repo
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:14

==========================
Moving file to repo dir…
==========================

Command start time 2015-02-16 17:10:14

Connection to l1039lab.sss.se.scania.com closed.
SSH command execution finished
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:14

==========================
Copying setup script file…
==========================

Command start time 2015-02-16 17:10:14

scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py
host=l1039lab.sss.se.scania.com, exitcode=0
Command end time 2015-02-16 17:10:15

==========================
Running setup agent script…
==========================

Command start time 2015-02-16 17:10:15
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
/bin/sh: /usr/sbin/ambari-agent: No such file or directory
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Restarting ambari-agent
Verifying Python version compatibility…
Using python /usr/bin/python2.6
ambari-agent is not running. No PID found at /var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility…
Using python /usr/bin/python2.6
Checking for previously running Ambari Agent…
Starting ambari-agent
Verifying ambari-agent process status…
ERROR: ambari-agent start failed. For more details, see /var/log/ambari-agent/ambari-agent.out:
====================
from Controller import AGENT_AUTO_RESTART_EXIT_CODE
File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 35, in <module>
from Heartbeat import Heartbeat
File “/usr/lib/python2.6/site-packages/ambari_agent/Heartbeat.py”, line 29, in <module>
from HostInfo import HostInfo
File “/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py”, line 32, in <module>
from PackagesAnalyzer import PackagesAnalyzer
File “/usr/lib/python2.6/site-packages/ambari_agent/PackagesAnalyzer.py”, line 28, in <module>
from ambari_commons import OSCheck, OSConst, Firewall
ImportError: No module named ambari_commons
====================
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
tail: cannot open `/var/log/ambari-agent/ambari-agent.log’ for reading: No such file or directory
tail: cannot open `/var/log/ambari-agent/ambari-agent.log’ for reading: No such file or directory
tail: cannot open `/var/log/ambari-agent/ambari-agent.log’ for reading: No such file or directory

Connection to l1039lab.sss.se.scania.com closed.
SSH command execution finished
host=l1039lab.sss.se.scania.com, exitcode=255
Command end time 2015-02-16 17:10:38

ERROR: Bootstrap of host l1039lab.sss.se.scania.com fails because previous action finished with non-zero exit code (255)
ERROR MESSAGE: tcgetattr: Invalid argument
Connection to l1039lab.sss.se.scania.com closed.

STDOUT: This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
/bin/sh: /usr/sbin/ambari-agent: No such file or directory
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Restarting ambari-agent
Verifying Python version compatibility…
Using python /usr/bin/python2.6
ambari-agent is not running. No PID found at /var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility…
Using python /usr/bin/python2.6
Checking for previously running Ambari Agent…
Starting ambari-agent
Verifying ambari-agent process status…
ERROR: ambari-agent start failed. For more details, see /var/log/ambari-agent/ambari-agent.out:
====================
from Controller import AGENT_AUTO_RESTART_EXIT_CODE
File “/usr/lib/python2.6/site-packages/ambari_agent/Controller.py”, line 35, in <module>
from Heartbeat import Heartbeat
File “/usr/lib/python2.6/site-packages/ambari_agent/Heartbeat.py”, line 29, in <module>
from HostInfo import HostInfo
File “/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py”, line 32, in <module>
from PackagesAnalyzer import PackagesAnalyzer
File “/usr/lib/python2.6/site-packages/ambari_agent/PackagesAnalyzer.py”, line 28, in <module>
from ambari_commons import OSCheck, OSConst, Firewall
ImportError: No module named ambari_commons
====================
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
tail: cannot open `/var/log/ambari-agent/ambari-agent.log’ for reading: No such file or directory
tail: cannot open `/var/log/ambari-agent/ambari-agent.log’ for reading: No such file or directory
tail: cannot open `/var/log/ambari-agent/ambari-agent.log’ for reading: No such file or directory

Connection to l1039lab.sss.se.scania.com closed.

Note :

This machine had a previous ONLY(i.e NO HDP etc.) Ambari SERVER installation and I referred link + manual deletion of directories having word ‘ambari’.

What shall I do for the clean uninstall of Ambari and resolve the above issue so that the machine can be registered ?

Thanks and regards !

↧

NFS service via Ambari

May 28, 2015, 10:25 pm

≫ Next: Ambrai 2.0 showing wrong information

≪ Previous: Ambari agent installation failing

Replies: 0

I have a HDP-2.2.4.2-2 9 node cluster running.

There are multiple users who want to upload their files from their Windows/Linux desktops onto HDFS either via tools like WinSCP etc. or map the HDFS as a network drive.

I think ‘HDFS NFS Gateway’ is the way to go but I couldn’t find a way to do it via Ambari(I have done an automated installation of HDP via Ambari).
I came across this Hortonworks doc. link for starting NFS service but shall I proceed with it these manual changes on a node or can I achieve it via Ambari ?

↧

Ambrai 2.0 showing wrong information

May 28, 2015, 10:30 pm

≫ Next: Ambari UI stuck at "Loading…"

≪ Previous: NFS service via Ambari

Replies: 0

Hi everyone,

I just had a big crash on my HDP cluster which forced me to reboot the master node (Im running some tests on a 5 nodes cluster).
Everything seems to be running fine, but Ambari is showing that most services on the master not are down (which is not true).
For instance, Ambari says that they DataNode is not running on node 1 but it appears in the living datanode on the Namenode WebUI.

Is there a command I could make to flush any wrong information in Ambari? I did an restart of the ambari-agent on the master but nothing has changed.
Thank you for any hint.

Regards,
Orlando

↧

Ambari UI stuck at "Loading…"

May 28, 2015, 11:02 pm

≫ Next: Oozie cannot see Teradata Jars

≪ Previous: Ambrai 2.0 showing wrong information

Replies: 4

Hi,

I am setting up a four node cluster with Ambari on CentOS 6.0. I installed Ambari-1.5 on master node, but when I was at the step of opening the Ambari login page on my node’s browser, the fields of Login and Password are not visible on the webpage. Instead, I just see the text “Loading…” on the screen.
Then, I tried removing Ambari-1.5 and installed Ambari-1.4, but the issue still persists.

Can anyone tell me the reason for it and its solution?

Thanks,
Mitali

↧

Oozie cannot see Teradata Jars

May 29, 2015, 1:22 am

≫ Next: resourcemanager upgrade issue

≪ Previous: Ambari UI stuck at "Loading…"

Replies: 1

ERROR sqoop.ConnFactory: Sqoop could not found specified connection manager class org.apache.sqoop.teradata.TeradataConnManager. Please check that you’ve specified the class correctly.
Stdoutput 15/05/05 16:29:50 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: java.lang.ClassNotFoundException: org.apache.sqoop.teradata.TeradataConnManager
Stdoutput at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:166)

The following command runs just fine from command line but fails via oozie. I cannot seem to get oozie to see the jars (my guess). I have job.properties file configured with oozie.use.system.libpath=true. oozie-site.xml has oozie.service.WorkflowAppService.system.libpath = /user/${user.name}/share/lib and I am running the command as user oozie. I have added the jars to hdfs://nn/user/oozie/share/lib/sqoop but no luck. Help?

WORKFLOW.XML relevant snippet

<start to=”SyncFDMMaster”/>
<action name=”SyncFDMMaster”>
<shell xmlns=”uri:oozie:shell-action:0.1″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>sqoop.connection.factories</name>
<value>org.apache.sqoop.teradata.TeradataConnManager</value>
</property>
</configuration>
<exec>/bin/bash</exec>
<argument>syncmaster.sh</argument>
<file>syncmaster.sh</file>

job.properties

nameNode=hdfs://xxxx:8020
jobTracker=xxxx:8050
queueName=workflows
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/oozie/jobs/l48_fdm_mastersync

↧

resourcemanager upgrade issue

May 29, 2015, 1:39 am

≫ Next: Spark 1.3 Upgarde on HDP-2.2.4.2-2

≪ Previous: Oozie cannot see Teradata Jars

Replies: 0

when resource manager in HDP2.2 is upgraded, its continuously logging,
even though i cleaned up

yarn.nodemanager.recovery.dir
yarn.resourcemanager.zk-state-store.parent-path

but the cluster is running fine,
2015-05-26 12:00:00,000 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 11.145.19.88:35500:null (DIGEST-MD5: IO error acquiring password)

2015-05-26 12:00:00,002 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8030: readAndProcess from client 11.145.19.88 threw exception [org.apache.hadoop.security.token.SecretManager$InvalidToken: appattempt_1431936504871_0100_000001 not found in AMRMTokenSecretManager.]

2015-05-26 12:00:00,002 INFO org.apache.hadoop.ipc.Server: IPC Server handler 51 on 8021, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 11.145.19.73:57928 Call#15200325 Retry#0

org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1432630400522_0001' doesn't exist in RM.

at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:324)

at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)

at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

↧

Spark 1.3 Upgarde on HDP-2.2.4.2-2

May 29, 2015, 8:00 am

≫ Next: Not able to open Hortonworks Sandbox IP in browser

≪ Previous: resourcemanager upgrade issue

Replies: 6

I have successfully installed HDP-2.2.4.2-2 and spark 1.2 and now I have a requirement to test spark 1.3.1 integration with YARN and also integrate with Ambari / ZooKeeper.

Please advise if this is a viable option?

Thanks a lot in advance.

↧

Not able to open Hortonworks Sandbox IP in browser

May 29, 2015, 10:45 pm

≫ Next: Bad substitution error with spark on HDP 2.2

≪ Previous: Spark 1.3 Upgarde on HDP-2.2.4.2-2

Replies: 0

When i open sandbox 2.2.4 through VMware-Player 7.0 it shows me the address to be open in browser.But when i try to open it shows me page is not accessible.I also not even able to ping the IP mentioned in the VM. It asks me to open 192.168.40.128

↧

Bad substitution error with spark on HDP 2.2

May 30, 2015, 3:42 am

≫ Next: How to do deployment of Hive Queries in different environments

≪ Previous: Not able to open Hortonworks Sandbox IP in browser

Replies: 3

Hi,

I’m facing issue with running spark on yarn.

YARN is installed through Ambari, HDP v2.2.0.0-2041. Spark 1.2
After submit spark job through YARN, get error message:
Stack trace: ExitCodeException exitCode=1: /hdp/hadoop/yarn/local/usercache/leads_user/appcache/application_1420759015115_0012/container_1420759015115_0012_02_000001/launch_co ntainer.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/ hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr- framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/ hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.ver sion}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/__app__.jar:$PWD/*: bad substitution

at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I’ve followed the instructions given in technical preview.

I’ve given mentioned configurations in spark-defaults.conf file inside conf folder. I’ve also checked using verbose logging. It is taking those parameters. But still i’m getting the same error.

In the verbose mode it is printing the following

System properties:
spark.executor.memory -> 3G
SPARK_SUBMIT -> true
spark.executor.extraJavaOptions -> -Dhdp.version=2.2.0.0-2041 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/hdfs/heapDump/ -XX:+UseCompressedOops
spark.app.name -> com.xxx.xxx.xxxxxx
spark.driver.extraJavaOptions -> -Dhdp.version=2.2.0.0-2041
spark.yarn.am.extraJavaOptions -> -Dhdp.version=2.2.0.0-2041
spark.master -> yarn-cluster

Any idea what could be the problem here.

↧

How to do deployment of Hive Queries in different environments

May 30, 2015, 6:13 am

≫ Next: HBase Hive Integration

≪ Previous: Bad substitution error with spark on HDP 2.2

Replies: 0

What is deployment strategy for Hive related scripts. Like For SQL we have dacpac, Is there any such components ?

Is there any API to get status of Job submitted through ODBC.

↧

HBase Hive Integration

May 30, 2015, 11:20 am

≫ Next: Cannot login to Ambari

≪ Previous: How to do deployment of Hive Queries in different environments

Replies: 1

I have created a HBase by mentioning the default versions as 5

create ‘tablename’,{NAME => ‘cf’, VERSIONS => 5}
and inserted two rows(row1 and row2)

put ‘tablename’,’row1′,’cf:id’,’row1id’
put ‘tablename’,’row1′,’cf:name’,’row1name’
put ‘tablename’,’row2′,’cf:id’,’row2id’
put ‘tablename’,’row2′,’cf:name’,’row2name’
put ‘tablename’,’row2′,’cf:name’,’row2nameupdate’
put ‘tablename’,’row2′,’cf:name’,’row2nameupdateagain’

Tried to select the data by using scan and I’m getting the latest updated data.
and when I tried to select the different versions data by using the below command I got the different versions data.

scan ‘tablename’,{RAW => true, VERSIONS => 5}

Now created a Hive External table to point to this HBase table

CREATE EXTERNAL TABLE hive_timestampupdate(key int, value string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,cf:name”)
TBLPROPERTIES (“hbase.table.name” = “tablename”);

select * from hive_timestampupdate

When I queried the table hive_timestampupdate, I’m able to see the data.

By default here I’m getting the latest updated data based on timestamp.
Here also I want to query the data of different versions.

**Hive command that will fetch the different versions data of HBase.**

Any help please.

Thanks in Advance.

↧

Cannot login to Ambari

May 31, 2015, 3:20 am

≫ Next: Unable to login to Ambari on Sandbox via port 8080

≪ Previous: HBase Hive Integration

Replies: 3

I am new to Hadoop and started learning it recently. I installed Horton Works Sandbox today. I was able to login to Hue using 127.0.0.1:8000. But not to Ambari on 127.0.0.1:8080. It asks for username and password for which I provided admin/admin. But it keep on asking me to enter username/password. Could you please help me?

↧

Unable to login to Ambari on Sandbox via port 8080

May 31, 2015, 3:46 am

≫ Next: Row Numbering in ORC Automatic incrementing while inserting row

≪ Previous: Cannot login to Ambari

Replies: 4

I downloaded sandbox HDP 2.2.4 and tried to connect to Ambari on port 8080. It does not work. Is there anything else I need to do to get it to work? I am following this tutorial: http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/. According to step 1 of the tutorial, we should be able to run the VM and go to http://127.0.0.1:8080. I know by know that we actually need to go to the address that is provided by the sandbox when we run it (such as http://192.168.115.129). I am able to get to to the sandbox via port 8000, but not to ambari with port 8080.

↧