sqoop import from teradata to hive

March 12, 2015, 10:03 pm

≫ Next: Tutorial:Real time Data Ingestion in HBase & Hive using Storm Bolt

≪ Previous: Question – Unable to write to HDFS after enabling wire encryption in hadoop

Replies: 0

I am trying to connect to Teradata DB to import its data to hive. So initially tried to test the connection by just listing the tables in Teradata DB using Teradata connector for hadoop (hortonworks and not cloudera)

connector : teradata-connector-1.3.3.jar
hdp 2.2
sqoop version : 1.4.5

sqoop command : sqoop list-tables –connect jdbc:teradata://<TD_DB_hostname>/Database=c_test –connection-manager org.apache.sqoop.teradata.TeradataConnManager –username xxxx –password xxxxx

Error:
15/03/13 18:00:20 ERROR sqoop.ConnFactory: Sqoop could not found specified connection manager class org.apache.sqoop.teradata.TeradataConnManager. Please check that you’ve specified the class correctly.
15/03/13 18:00:20 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: java.lang.ClassNotFoundException: org.apache.sqoop.teradata.TeradataConnManager

I have extracted the jar and cannot see this class in it. Has anyone else faced this problem using Hortonworks and know how to solve this?

↧

Tutorial:Real time Data Ingestion in HBase & Hive using Storm Bolt

March 12, 2015, 10:44 pm

≫ Next: Changing status of different services on Sandbox 2.2

≪ Previous: sqoop import from teradata to hive

Replies: 6

Hi there
Is Anybody experiencing problems with this tutorial
Submitted the topology
all the services for Storm and Kafka are started
then issued command to Start the ‘TruckEventsProducer’ Kafka Producer. I can see events are being produced and logs sent to the screen
But data is not being persisted. The Kafka sprout is not producing anything. (When I view using storm ui the KafkaSprout-emitted counter is not updating… stays at 0). When I check log files for the worker task for the TruckEventProcessor in /var/log/storm…
I see the following

13:01:45 b.s.d.worker [INFO] Launching worker for truck-event-processor-1-1422017673 on 8c75249c-e8e9-4d31-9908-579f25c4fb88:6701 with id 48ed2969-334c-479c-9b03-2a31053fa65c
13:01:45 b.s.d.worker [ERROR] Error on initialization of server mk-worker
java.io.IOException: No such file or directory

Ive tried resubmitting this topology several times and I get the error always.
I also made sure I cleaned out storm.local.dir: /hadoop/storm before each run
I was able to get everything working in “Ingesting and processing Realtime events with Apache Storm”
That topology submitted for tutorial2 was fine, but the one submitted for this exercise tutorial3 (storm jar target/Tutorial-1.0-SNAPSHOT.jar com.hortonworks.tutorials.tutorial3.TruckEventProcessingTopology ) doesn’t seem to process.

Anybody have any ideas please
Thank you

↧

Changing status of different services on Sandbox 2.2

March 12, 2015, 10:51 pm

≫ Next: MapReduce -GC overhead limit exceeded Container killed by the ApplicationMaster

≪ Previous: Tutorial:Real time Data Ingestion in HBase & Hive using Storm Bolt

Replies: 0

I installed freshly Sandbox 2.2 on Virtual box! I’m able to login into web interface even. But I dont want unnecessary services to be running in background. I only want flume, kafka and storm. I went through various forums and it said you can stop or start service from “SERVICES” tab, but to my dismay I’m unable to find that neither in 2.2 or 2.1.

Any help on where the menu is present will be helpful.

Thanks,.

↧

MapReduce -GC overhead limit exceeded Container killed by the ApplicationMaster

March 13, 2015, 12:16 am

≫ Next: Containers are not being utilized

≪ Previous: Changing status of different services on Sandbox 2.2

Replies: 1

Hello,

I am running a mapreduce job which internally uses PiG and I am using it for dedup process . It works perfectly fine as I have ran successfully many times for inputs of million rows. Now I am using 2 billion records as a input and started running mapreduce job, it fails in final reduce steps with below error.

Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Container exited with a non-zero exit code 255
attempt_1418075527007_2636_r_000655_1 100.00 FAILED reduce > reduce dinfhdp07.wellcare.com:8042 logs Sat, 03 Jan 2015 08:59:42 GMT Sat, 03 Jan 2015 12:44:43 GMT 3hrs, 45mins, 1sec AttemptID:attempt_1418075527007_2636_r_000655_1 Timed out after 1000 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
attempt_1418075527007_2636_r_000656_0 100.00 FAILED reduce > reduce xxxx.xxxx.com:8042 logs Sat, 03 Jan 2015 05:15:23 GMT Sat, 03 Jan 2015 15:08:59 GMT 9hrs, 53mins, 36sec Error: GC overhead limit exceeded Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143.
I am using following settings
SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.type=BLOCK; SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;SET hive.exec.max.dynamic.partitions.pernode=5555; SET mapreduce.job.reduces=5; SET mapred.reduce.tasks=5; SET hive.exec.max.dynamic.partitions=5555;
SET default_parallel 0; SET mapreduce.job.reduces 0; SET mapreduce.task.timeout 1000000;

Task Type Total Complete
Map 6470 6470
Reduce 867 865
Attempt Type Failed Killed Successful
Maps 0 0 6470
Reduces 4 2 865

counter information:

GC time elapsed (ms) 1344654 231634 1576288
Input split bytes 527116674 0 527116674
Map input records 2535338499 0 2535338499
Map output bytes 989378975223 0 989378975223
Map output materialized bytes 999553986159 0 999553986159
Map output records 2535338499 0

↧

Containers are not being utilized

March 13, 2015, 12:22 am

≫ Next: Yarn JMX QueueMetrics "q=root" vs "q=root,q1=default"

≪ Previous: MapReduce -GC overhead limit exceeded Container killed by the ApplicationMaster

Replies: 3

Hi, Installation of HDP2.1.5 Using ambari1.6.1 is successful on cluster of
1NN(4x1TB) 1SecNN(4x1TB) 3DN(10x4TB Physical machines with the n/w speed 1000Mb/s Full)

Trying to run benchmark Teragen 50GB with the following properties set on config files.
calculation has been provided by the python script hdp.

Using cores=24 memory=126GB disks=12 hbase=False
Profile: cores=24 memory=128000MB reserved=1GB usableMem=125GB disks=12
Num Container=22
Container Ram=5120MB
Used Ram=110GB
Unused Ram=1GB
yarn.scheduler.minimum-allocation-mb=5120
yarn.scheduler.maximum-allocation-mb=112640
yarn.nodemanager.resource.memory-mb=112640
mapreduce.map.memory.mb=5120
mapreduce.map.java.opts=-Xmx4096m
mapreduce.reduce.memory.mb=5120
mapreduce.reduce.java.opts=-Xmx4096m
yarn.app.mapreduce.am.resource.mb=5120
yarn.app.mapreduce.am.command-opts=-Xmx4096m
mapreduce.task.io.sort.mb=1792
tez.am.resource.memory.mb=5120
tez.am.java.opts=-Xmx4096m
hive.tez.container.size=5120
hive.tez.java.opts=-Xmx4096m
hive.auto.convert.join.noconditionaltask.size=1342177000

Submit job shows Containers running only 3 and takes time to complete TeraGen50GB 10min15secs.

Ii would like to improve performace and reduce execution time to ~2mins.

Please help in the configuration of my cluster to boost the performace.

Thanks

↧

Yarn JMX QueueMetrics "q=root" vs "q=root,q1=default"

March 13, 2015, 12:38 am

≫ Next: HCatalog/Hive table creation does not import data into /app/hive/warehouse folde

≪ Previous: Containers are not being utilized

Replies: 2

In the Resource Manager’s JMX can anyone explain to me the difference between Hadoop:service=ResourceManager,name=QueueMetrics “q0=root,q1=default” and “q0=root”?

I’ve noticed most of the metrics are the same, but q0=root always shows active users 0, while “q0=root,q1=default” shows a positive integer as expected for active users when queries are running in Hive/Tez.

↧

HCatalog/Hive table creation does not import data into /app/hive/warehouse folde

March 13, 2015, 2:33 am

≫ Next: Make the Internet Better for Us All

≪ Previous: Yarn JMX QueueMetrics "q=root" vs "q=root,q1=default"

Replies: 5

I ran into a very weird problem within a hadoop cluster (HDP 2.2) I setup in Amazon EC2 (3 data nodes + one name node + one secondary name node). Hue server runs on the main name node and hive server runs on the secondary name node. I was using Hue web interface to create table “mytable” in HCatalog using a CSV file loaded into HDFS. The table creation returned successfully without error. The table was created and displayed in the Hue web interface. However, when I tried to query the table, it returned 0 record. I went to the /app/hive/warehouse folder, I could see the table folder “mytable” was created, but the CSV file was never copied into that folder. I reproduced the same behavior using hive shell. If I manually uploaded the csv file to /app/hive/warehouse/mytable directory (/app/hive/warehouse is set by hive.metastore.warehouse.dir in the hive-site.xml), I can use hive query to retrieve data without issue. It kind of ruled out permission issue since the same hue user can upload file to that hdfs directory.

If I does the same operation in the HDP sandbox VM, everything works as expected. After the table creation, the /app/hive/warehouse/mytable folder contains the CSV file I imported into the table.

Any help is highly appreciated.

↧

Make the Internet Better for Us All

March 13, 2015, 5:28 am

≫ Next: HiveServer2 generate a directory in HDFS every 30 seconds

≪ Previous: HCatalog/Hive table creation does not import data into /app/hive/warehouse folde

Replies: 0

http://www.eteamz.com/pettibolos/

To be clear: Twitter has denied claims that it’s specifically singling out Tor users for phone number confirmation. Twitter says instead that spam-like behavior is being flagged for additional checks.

↧

HiveServer2 generate a directory in HDFS every 30 seconds

March 13, 2015, 7:20 am

≫ Next: Insert Arabic data

≪ Previous: Make the Internet Better for Us All

Replies: 6

Hi,

After installing the HDP 2.2 with Ambari I receive the following messages every 30 sec in the hiveserver2.log.

2014-12-17 14:46:42,347 INFO [HiveServer2-Handler-Pool: Thread-33]: thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(232)) - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V6 2014-12-17 14:46:42,356 INFO [HiveServer2-Handler-Pool: Thread-33]: session.SessionState (SessionState.java:createPath(558)) - Created local directory: /tmp/02a6d127-155c-4bde-9133-2a8cf901084a_resources 2014-12-17 14:46:42,363 INFO [HiveServer2-Handler-Pool: Thread-33]: session.SessionState (SessionState.java:createPath(558)) - Created HDFS directory: /tmp/hive/hive/02a6d127-155c-4bde-9133-2a8cf901084a 2014-12-17 14:46:42,366 INFO [HiveServer2-Handler-Pool: Thread-33]: session.SessionState (SessionState.java:createPath(558)) - Created local directory: /tmp/hive/02a6d127-155c-4bde-9133-2a8cf901084a 2014-12-17 14:46:42,373 INFO [HiveServer2-Handler-Pool: Thread-33]: session.SessionState (SessionState.java:createPath(558)) - Created HDFS directory: /tmp/hive/hive/02a6d127-155c-4bde-9133-2a8cf901084a/_tmp_space.db 2014-12-17 14:46:42,374 INFO [HiveServer2-Handler-Pool: Thread-33]: session.SessionState (SessionState.java:start(460)) - No Tez session required at this point. hive.execution.engine=mr.

Can I disable this behavior, because the process generates every 30 seconds a directory in HDFS.

Regards

↧

Insert Arabic data

March 13, 2015, 4:10 pm

≫ Next: HDP 2.2 Nagios/Ganglia

≪ Previous: HiveServer2 generate a directory in HDFS every 30 seconds

Replies: 1

I cannot insert Arabic data into hive ” as an exception Unicode error apear
i tried to change the python file to utf-8-*
but it was useless . what can i do i got stuck !!!

↧

HDP 2.2 Nagios/Ganglia

March 13, 2015, 7:28 pm

≫ Next: Talk About Whatever You Want Right Now

≪ Previous: Insert Arabic data

Replies: 2

I noticed that nagios (fault monitoring) and ganglia (performance monitoring) are now “deprecated” as of hdp2.2.

Is there a replacement for those? What’s the recommended strategy for fault/performance monitoring in hdp2.2?

↧

Talk About Whatever You Want Right Now

March 13, 2015, 8:36 pm

≫ Next: Falcon UI

≪ Previous: HDP 2.2 Nagios/Ganglia

Replies: 0

So you want to join the military police? Well then, get ready to get blasted with a a level-one contamination of oleoresin capsicum.

↧

Falcon UI

March 14, 2015, 3:47 am

≫ Next: Falcon server failure

≪ Previous: Talk About Whatever You Want Right Now

Replies: 1

We have HDP 2.2 with Falcon 0.6.0 installed on our cluster. We are able to schedule and run falcon processes, but I can’t find the Falcon dashboards. Falcon UI on port 15000 shows nothing but three tabs with entities configurations. I”ll be very appreciated if some can point to the Lineage and other falcon dashboard. Thanks!

↧

Falcon server failure

March 14, 2015, 3:52 am

≫ Next: Unable to access Ambari from webbrowser

≪ Previous: Falcon UI

Replies: 1

falcon server is getting failed after getting up . Can you recommend me why it is happening?

↧

Unable to access Ambari from webbrowser

March 14, 2015, 4:02 am

≫ Next: Could not find hdp package installation

≪ Previous: Falcon server failure

Replies: 1

I am downloaded the Sandbox2.2 for VirtualBox. While going through one of the tutorials, I checked Ambari was working as expected and I was able to access it from Firefox in my host machine.
However, now I am unable to log in with a message to make sure the Ambari Server is running and I have permission to access Ambari from this machine.
I see AmbariServer if I do jps in the Sandbox terminal.

I suspect, this problem is due to less RAM on my system, but I would still like to check if there are ways to verify.

↧

Could not find hdp package installation

March 14, 2015, 4:08 am

≫ Next: ambari install fails with NameError: name 'functions' is not defined

≪ Previous: Unable to access Ambari from webbrowser

Replies: 1

Hi,
I’m trying to install HDP manually on a virtual machine Ubuntu 14.04.2.
I followed the manual guide installation. when trying to install the packages with:
apt-get install hadoop hadoop-hdfs libhdfs0 hadoop-yarn hadoop-mapreduce hadoop-client openssl
I could not find the packages ?

Any ideas on how to slove this problem

↧

ambari install fails with NameError: name 'functions' is not defined

March 14, 2015, 7:09 am

≫ Next: Gives Bionic Arm To Seven-Year-Old Boy

≪ Previous: Could not find hdp package installation

Replies: 1

Hi,

I’m trying to install hortonworks 2.2 stack on a 8 nodes(6data,2head, 1 management) node cluster(ambari 1.7). Data node installations work without any issue.

But Hcat,hive and hdfs client installation fail on master nodes. Any idea on what can be wrong?(and why it looks for 2.06 stack packages in 2.2 installation)

The errors that I see is as following:
Traceback (most recent call last):
File “/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hcat_client.py”, line 43, in <module>
HCatClient().execute()
File “/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py”, line 123, in execute
method(env)
File “/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hcat_client.py”, line 28, in install
self.configure(env)
File “/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/hcat_client.py”, line 31, in configure
import params
File “/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HIVE/package/scripts/params.py”, line 119, in <module>
kinit_path_local = functions.get_kinit_path([“/usr/bin”, “/usr/kerberos/bin”, “/usr/sbin”])
NameError: name ‘functions’ is not defined

↧

Gives Bionic Arm To Seven-Year-Old Boy

March 14, 2015, 2:58 pm

≫ Next: Ranger user-sync with LDAP authentication – error

≪ Previous: ambari install fails with NameError: name 'functions' is not defined

Replies: 0

http://www.eteamz.com/gpmelbrne/

↧

Ranger user-sync with LDAP authentication – error

March 15, 2015, 12:50 am

≫ Next: ambari on bananapi with centos

≪ Previous: Gives Bionic Arm To Seven-Year-Old Boy

Replies: 0

Hello
We are trying to connect Ranger to our LDAP server (Microsoft Active Directory).
We filled install.properties file with all the correct values:
SYNC_SOURCE = ldap
SYNC_LDAP_URL = ldap://<LDAP_FQDN:389>
SYNC_LDAP_BIND_DN = cn=<USER>,ou=Users,dc=<domain_name>,dc=local
SYNC_LDAP_BIND_PASSWORD = password

However, after running setup.sh and starting the user-sync service the usersync.log shows:
“ERROR UserGroupSync [UnixUserSyncThread] – Failed to initialize UserGroup source/sink. Will retry after 300000 milliseconds. Error details:
javax.naming.AuthenticationException: [LDAP: error code 49 – 80090308: LdapErr: DSID-0C0903C5, comment: AcceptSecurityContext error, data 52e, v2580^@]”

The error suggests that it’s a credetials issue – however error remains no matter the user & password we provide (which are 100% correct)

Any ideas ?

Adi

↧

ambari on bananapi with centos

March 15, 2015, 2:48 am

≫ Next: Error in starting App Timeline Server

≪ Previous: Ranger user-sync with LDAP authentication – error

Replies: 0

Hi there,

I’m trying to install hadoop with ambari on a small bananapi cluster in order to test and evaluate hadoop in small cluster environment.
I’m using a CentOS 6.6 on a VM for the ambari-server … so far so good… I have tried with ambari-server (1.4, 1.5 and 1.7) to provision the bananapi’s on fedora but isn’t working.

lately on 1.7 I have the following error: REASON: Ambari Server java process died with exitcode 255. Check /var/log/ambari-server/ambari-server.out for more information.

With 1.4 I was on the installer step 4 (where I could select individual parts e.g. hdfs, hive….) I have selected everything but I cant get to the step 5. the JS console is outputting the following error: Uncaught TypeError: Cannot read property ‘get’ of undefined ….. perhabs a bug in the ember-app but I couldnt identify the problem.

Who can help me to provision the cluster?

↧