SparkPi example fails with FileNotFoundException

June 4, 2014, 6:57 am

≫ Next: Hue 2.3 – REST Exception in Pig queries

Replies: 0

I am able to run wordcount example. however, when i tried SparkPi, in the log there is an exception. Parf of the log is as follows:

14/06/05 05:45:12 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
14/06/05 05:45:12 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:56831
14/06/05 05:45:12 INFO ui.SparkUI: Started Spark Web UI at http://sandbox.hortonworks.com:56831
14/06/05 05:45:12 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: spark-examples_2.10-0.9.1.2.1.1.0-22.jar (No such file or directory)), was the –addJars option used?
14/06/05 05:45:12 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler

↧

Hue 2.3 – REST Exception in Pig queries

June 4, 2014, 6:06 pm

≫ Next: Why hadoop uses default Longwritable or Intwritable

≪ Previous: SparkPi example fails with FileNotFoundException

Replies: 0

Hi,

I have a problem with Hue 2.3.1-385 when issuing any Pig query from the web interface.

RestException at /pig/start_job/
{“error”:”Can not create a Path from a null string”} (error 500)

Exception Location: /usr/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py in execute, line 176
Python Executable: /usr/bin/python2.6

Pig runs fine from the shell so it’s evidently an HTTP problem.

Here’s an excerpt from the traceback:

Environment:

Request Method: POST
Request URL: https://unece-sandbox.ichec.ie:8000/pig/start_job/
Django Version: 1.2.3
Python Version: 2.6.6
Installed Applications:
[...]
Installed Middleware:
[...]

Traceback:
File “/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/core/handlers/base.py” in get_response
100. response = callback(request, *callback_args, **callback_kwargs)
File “/usr/lib/hue/apps/pig/src/pig/views.py” in start_job
189. job = t.pig_query(pig_file=script_file, execute=execute, statusdir=statusdir, callback=callback, arg=args)
File “/usr/lib/hue/apps/pig/src/pig/templeton.py” in pig_query
99. return self.post(“pig”, data)
File “/usr/lib/hue/apps/pig/src/pig/templeton.py” in post
39. response = self.client.execute(“POST”, url, params=params, data=data)
File “/usr/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py” in execute
176. raise self._exc_class(ex)

Exception Type: RestException at /pig/start_job/
Exception Value: {“error”:”Can not create a Path from a null string”} (error 500)

Thank you for your help!

↧

Why hadoop uses default Longwritable or Intwritable

June 4, 2014, 6:18 pm

≫ Next: Files are already split in data node in blocks while copy file from local to hdf

≪ Previous: Hue 2.3 – REST Exception in Pig queries

Replies: 1

Why hadoop uses default Longwritable or Intwritable ? Why Hadoop framework didnt use some other class to write.

↧

Files are already split in data node in blocks while copy file from local to hdf

June 4, 2014, 6:19 pm

≫ Next: Cannot run M/R job in standalone mode

≪ Previous: Why hadoop uses default Longwritable or Intwritable

Replies: 1

Files are already split in data node in blocks while copy file from local to hdfs then what is use of input splits in mapreduce framework

↧

Cannot run M/R job in standalone mode

June 4, 2014, 8:26 pm

≫ Next: Unable to Create Table in HCatalog

≪ Previous: Files are already split in data node in blocks while copy file from local to hdf

Replies: 0

Hello,

I am facing issues with running M/R job in stand alone mode for hadoop 2.2.0 (HDP 2.0) I am getting following error:
14/06/04 13:55:04 INFO mapreduce.Job: The url to track the job: http://hadooptools:8088/proxy/application_1401791587704_0007/
14/06/04 13:55:04 INFO mapreduce.Job: Running job: job_1401791587704_0007
14/06/04 13:55:08 INFO mapreduce.Job: Job job_1401791587704_0007 running in uber mode : false
14/06/04 13:55:08 INFO mapreduce.Job: map 0% reduce 0%
14/06/04 13:55:08 INFO mapreduce.Job: Job job_1401791587704_0007 failed with state FAILED due to: Application application_1401791587704_0007 failed 2 times due to AM Container for appattempt_1401791587704_0007_000002 exited with exitCode: -1000 due to: File file:/user/hdfs/.staging/job_1401791587704_0007/job.jar does not exist
.Failing this attempt.. Failing the application.
14/06/04 13:55:08 INFO mapreduce.Job: Counters: 0
Exception in thread “main” java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at com.ncr.bigdata.mr.MaxTemperatureDriver.run(MaxTemperatureDriver.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.ncr.bigdata.mr.MaxTemperatureDriver.main(MaxTemperatureDriver.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

BUT

bash-4.1$ ls -al /user/hdfs/.staging/job_1401791587704_0007/
total 108
drwx——. 2 hdfs hadoop 4096 Jun 4 13:55 .
drwx——. 6 hdfs hadoop 4096 Jun 4 13:55 ..
-rw-r–r–. 1 hdfs hadoop 7767 Jun 4 13:55 job.jar
-rw-r–r–. 1 hdfs hadoop 72 Jun 4 13:55 .job.jar.crc
-rw-r–r–. 1 hdfs hadoop 157 Jun 4 13:55 job.split
-rw-r–r–. 1 hdfs hadoop 12 Jun 4 13:55 .job.split.crc
-rw-r–r–. 1 hdfs hadoop 42 Jun 4 13:55 job.splitmetainfo
-rw-r–r–. 1 hdfs hadoop 12 Jun 4 13:55 .job.splitmetainfo.crc
-rw-r–r–. 1 hdfs hadoop 67865 Jun 4 13:55 job.xml
-rw-r–r–. 1 hdfs hadoop 540 Jun 4 13:55 .job.xml.crc
bash-4.1$ whoami
hdfs

Does anybody experienced the same or have clue what might be wrong?

<property>
<name>fs.default.name</name>
<value>file:///</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>local</value>
</property>

↧

Unable to Create Table in HCatalog

June 5, 2014, 2:23 am

≫ Next: .pigbootup Problem in labs

≪ Previous: Cannot run M/R job in standalone mode

Replies: 1

Getting this error

HCatClient error on create table: {“statement”:”use default; create table nyse_stocks(exchange string, stock_symbol string, date string, stock_price_open double, stock_price_high double, stock_price_low double, stock_price_close double, stock_volume bigint, stock_price_adj_close double) row format delimited fields terminated by ‘\\t’;”,”error”:”unable to create table: nyse_stocks”,”exec”:{“stdout”:”",”stderr”:”which: no /usr/lib/hadoop/bin/hadoop in ((null))\ndirname: missing operand\nTry `dirname –help’ for more information.\n Command was terminated due to timeout(60000ms). See templeton.exec.timeout property”,”exitcode”:143}} (error 500)

Can anyone help ? I am totally new in this and just following the tutorials.

Thanks & Regards
Puneet

↧

.pigbootup Problem in labs

June 5, 2014, 4:37 am

≫ Next: parameter of a logistic regression model

≪ Previous: Unable to Create Table in HCatalog

Replies: 0

I’m trying to run through the Pig labs in the Sandbox using the EXACT code and data set provided by Horton. The syntax check passes, but after a minute I get a job failed that seems to point to the fact that Horton cannot find the file “.pigbootup “. I telnet into the sandbox and did a sudo find [for that file] and sure enough it’s nowhere on the Sandbox image. If launch grunt from the command line, I get the same reference to that file missing, but grunt does come up.
Anybody have a workaround for this? It would be GREAT to load files into HCat and run Pig scripts against them. Grunt is clunky and lame. I can send more info if anybody wants.

↧

parameter of a logistic regression model

June 5, 2014, 6:55 am

≫ Next: Hbase REST or thrift in Sandbox

≪ Previous: .pigbootup Problem in labs

Replies: 0

hello,

i managed to run hadoop in a virtual bob, installing spark, installing python 2.7, running ipython

finaly i could run the python examples of LogisticRegressionWithSGD in ipython :-D yeah

now i would like to check the model by looking at the parameters of the model…

how can i display the model parameters after doing

model = LogisticRegressionWithSGD.train(parsedData)

thanks in advance for your reply

↧

Hbase REST or thrift in Sandbox

June 5, 2014, 9:09 pm

≫ Next: Problem accessing HBase REST endpoint

≪ Previous: parameter of a logistic regression model

Replies: 3

Please, help me locate HBASE REST or thrift connection points in the Sandbox.
Sorry if this is obvious, but I did not find them.

↧

Problem accessing HBase REST endpoint

June 5, 2014, 11:05 pm

≫ Next: Hbase not starting up

≪ Previous: Hbase REST or thrift in Sandbox

Replies: 0

Hi,
I’m trying to access the HBase REST endpoint on Hortonworks Sandbox (running on hyper-v on Windows8).
I can hit http://192.168.56.101:8085/rest.jsp and see the RESTServer home page
</p>

However if I try and hit /version (which I should be able to) I see this:

I figure restarting rest endpoint would be handy so issued “/usr/lib/hbase/bin/hbase-daemon.sh stop rest” and was told:
“no rest to stop because kill -0 of pid 19934 failed with status 1″
Not being a linux guy myself I didn’t really know what to do with that. Working on the assumption that it wasn’t a problem I issued “”/usr/lib/hbase/bin/hbase-daemon.sh start rest”. If I look in the log I see this:
(I’ve highlighted what I think are the pertinent parts)

I still can’t hit the /version endpoint though, this time I try with curl:

So at this point I’m rather nonplussed. Can anyone tell me what the problem is here? Hopefully I’ve provided enough info so that someone can diagnose it.

Thanks in advance
JT

↧

Hbase not starting up

June 6, 2014, 12:16 am

≫ Next: Welcome to the new Hortonworks Sandbox!

≪ Previous: Problem accessing HBase REST endpoint

Replies: 0

Hi all,

This week my HDFS ran out of space so I changed to HDFS location from /var/Hadoop to my new partition /Hadoop/Hadoop/…..

I copied the data over and all services where running fine again.
Except Hbase is not starting anymore…

When I start HBase I get no errors, after 5 seconds the ‘HBase master’ gets replaced by ‘Standby HBase master’ and 10 seconds later the ‘Hbase master’ jumps offline.

How can I solve this?

I get the following errors in my log file:

2014-06-06 16:03:03,740 INFO [RpcServer.listener,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: stopping
2014-06-06 16:03:03,743 INFO [master:server01:60000] master.HMaster: Stopping infoServer
2014-06-06 16:03:03,743 INFO [master:server01:60000.archivedHFileCleaner] cleaner.HFileCleaner: master:server01:60000.archivedHFileCleaner exiting
2014-06-06 16:03:03,744 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2014-06-06 16:03:03,745 INFO [master:server01:60000.oldLogCleaner] cleaner.LogCleaner: master:server01:60000.oldLogCleaner exiting
2014-06-06 16:03:03,745 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2014-06-06 16:03:03,745 INFO [master:server01:60000.oldLogCleaner] master.ReplicationLogCleaner: Stopping replicationLogCleaner-0x1467147ae200012, quorum=server03.rdo01.local:2181,server01.rdo01.local:2181,server02.rdo01.local:2181, baseZNode=/hbase-unsecure
2014-06-06 16:03:03,746 INFO [master:server01:60000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2014-06-06 16:03:03,748 INFO [master:server01:60000.oldLogCleaner] zookeeper.ZooKeeper: Session: 0x1467147ae200012 closed
2014-06-06 16:03:03,748 INFO [master:server01:60000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2014-06-06 16:03:03,854 DEBUG [master:server01:60000] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@23d72e0d
2014-06-06 16:03:03,854 INFO [master:server01:60000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1467147ae200011
2014-06-06 16:03:03,856 INFO [master:server01:60000] zookeeper.ZooKeeper: Session: 0x1467147ae200011 closed
2014-06-06 16:03:03,856 INFO [master:server01:60000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2014-06-06 16:03:03,856 INFO [server01.rdo01.local,60000,1402063374987.splitLogManagerTimeoutMonitor] master.SplitLogManager$TimeoutMonitor: server01.rdo01.local,60000,1402063374987.splitLogManagerTimeoutMonitor exiting
2014-06-06 16:03:03,858 INFO [master:server01:60000] zookeeper.ZooKeeper: Session: 0×346714989710014 closed
2014-06-06 16:03:03,858 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2014-06-06 16:03:03,858 INFO [master:server01:60000] master.HMaster: HMaster main thread exiting
2014-06-06 16:03:03,859 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:192)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2889)

Greetings,

Merlijn

↧

Welcome to the new Hortonworks Sandbox!

June 6, 2014, 1:09 am

≫ Next: Incorrect Log Path

≪ Previous: Hbase not starting up

Replies: 18

I’m excited to announce the availability of the new Hortonworks Sandbox. This forum is the destination for any issue or question you may have on the use of the Sandbox. I’d love to hear your feedback on the Sandbox — the tutorials, the functionality or other feature of the Sandbox. Feel free to post your comments here.

For more information about the Sandbox, check out this blog http://hortonworks.com/blog/hortonworks-sandbox-the-fastest-on-ramp-to-apache-hadoop/.

↧

Incorrect Log Path

June 6, 2014, 4:50 am

≫ Next: BeeswaxException when creating table from twitter API

≪ Previous: Welcome to the new Hortonworks Sandbox!

Replies: 1

I’ve noticed the following when starting mapreduce manually in my HDP 1.3.3 environmnent:

–> starting MapReduce
+ sudo -u mapred -H bash -c ‘/cloud/hadoop/bin/hadoop-daemon.sh –config /cloud/hadoop/conf start jobtracker’
starting jobtracker, logging to /var/log/hadoop/mapred/hadoop-mapred-jobtracker-integrate.out
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /var/log/hadoop-mapreduce/mapred/hadoop-mapreduce.jobsummary.log (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:142)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:290)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:164)
at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:216)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:257)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:133)

So, It looks like /var/log/hadoop-mapreduce/ doesn’t exist. Is this a bug or something I’m missing?

Thanks,

↧

BeeswaxException when creating table from twitter API

June 6, 2014, 6:26 am

≫ Next: NameNode as datanode (error)

≪ Previous: Incorrect Log Path

Replies: 6

I followed the tutorial 13 in Windows 7:

http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

when I went to step 4 to browse data in tweets_raw, I recieved following:
Could not read table
BeeswaxException(handle=QueryHandle(log_context=’ae18ae74-518f-400b-b4b0-d399ed78e194′, id=’ae18ae74-518f-400b-b4b0-d399ed78e194′), log_context=’ae18ae74-518f-400b-b4b0-d399ed78e194′, SQLState=’ ‘, _message=None, errorCode=0)

and when I click on the “tweets_raw”, I recieved:
Error getting table description

Traceback (most recent call last): File “/usr/lib/hue/apps/hcatalog/src/hcatalog/views.py”, line 145, in describe_table table_desc_extended = HCatClient(request.user.username).describe_table_extended(table, db=database) File “/usr/lib/hue/apps/hcatalog/src/hcatalog/hcat_client.py”, line 143, in describe_table_extended raise Exception(error) Exception: Could not get table description (extended): {“errorDetail”:”java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe org.openx.data.jsonserde.JsonSerDe does not exist)\n\tat org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:274)\n\tat org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:265)\n\tat org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:597)\n\tat org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:170)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:991)\n\tat org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:924)\n\tat org.apache.hadoop.hive.ql.exec.DDLTask.showTableStatus(DDLTask.java:2689)\n\tat org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:343)\n\tat org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)\n\tat org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)\n\tat
……
and other tables could not be established also.
what can I do to create a table?

↧

NameNode as datanode (error)

June 6, 2014, 6:59 am

≫ Next: Defining required resource node affinity

≪ Previous: BeeswaxException when creating table from twitter API

Replies: 0

I’ve got a situation where the namenode is being used as a datanode and mappers are being run there. Not good. For some reason I HoClient.getActiveTrackerNames() is returning the Namenode as well as all datanodes. What configuration mess up could cause this?

I’m trying to get Sqoop to run my mappers, exactly one per Datanode. I’ve written my InputFormat class to create splits with exactly one hostname (datanodes only) per split, yet am being ignored. Why is the Jobtracker ignoring my splits? The host names I’m submitting are exactly as getActiveTrackerNames() returns, sans the :nnnnn, as suggested in getActiveServersList() by Boris Lublinsky and Mike Segel. btw: this system works in that other Hadoop environment. This is an Amazon hosted system.

↧

Defining required resource node affinity

June 6, 2014, 7:19 am

≫ Next: ganglia-web-ui-performance

≪ Previous: NameNode as datanode (error)

Replies: 0

Hi,
I asked Arun during Hadoop Summit, he says there should be a way to define node affinity for app containers.
In our case, I’d like to have anti-affinity over racks (spread the constainers as far apart as possible).
I looked for it in the resource spec file, couldn’t find it

http://slider.incubator.apache.org/slider_specs/resource_specification.html

It is not urgent, but it would be great to know that it is possible
Thanks,
Ofir

↧

ganglia-web-ui-performance

June 6, 2014, 10:08 am

≫ Next: Error installing Services After Initial Installation

≪ Previous: Defining required resource node affinity

Replies: 0

Ganglia web UI is very slow when many hosts/metrics are in the cluster. I found this post which seems to have a good solution. Will be nice if it’s implemented in next Ambari update .

http://blog.hartshorne.net/2014/02/ganglia-web-ui-performance-improvement.html

↧

Error installing Services After Initial Installation

June 9, 2014, 12:51 am

≫ Next: unable to start hue on HDP-2.1.2

≪ Previous: ganglia-web-ui-performance

Replies: 0

Hi. We have a 4 node cluster, 3 datanodes + Zookeeper, and 1 node dedicated to the Namenode.

We have installed:

When attempting to add Hive to this existing installation, it requires me to included HDFS, YARN/MR2, Tez, Zookeeper (despite these all already being installed). When arriving to the “Install, Start and Test” portion of the Add Service Wizard, I get the error:
“Attempted to create services which already exist; ,clusterName=GoBIg serviceNames=TEZ, ZOOKEEPER,MAPREDUCE2,HDFS, YARN”

As far as I can tell, there is no way to install Hive through Ambari after the initial installation has already been done.

↧