I tried to learn hbase using apache site and with help of book "Hbase in action

December 15, 2014, 4:22 am

≫ Next: Decimal data type not supported in SPARK->HiveQL?

≪ Previous: Configure solfCloud on HDFS in HDP 2.1 Sandbox

Replies: 2

I did my hadoop, habse and zookeeper configuration with help of apeche documents. now i stared learning hbase but i seen lot of java code on these documents but i do not know where i need to use these code. i tried to run the same code in hbase shell but it does not work. could you please suggest me one best way to learn hbase.

if possible , please help me with video tutorials. i want to be hbase developer
thank and regards,
satya ch

↧

Decimal data type not supported in SPARK->HiveQL?

December 15, 2014, 6:07 am

≫ Next: HCatInput Format Exception on HDP 2.1

≪ Previous: I tried to learn hbase using apache site and with help of book "Hbase in action

Replies: 1

I’m running the Spark “technical preview” (http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/) on a cluster, running HDP 2.1.7

I can perform basic hive queries via a HiveContext in pyspark or spark-shell, but when I query a table (simple select * from table) that has ‘decimal’ columns, I get an error like this:

14/12/04 11:32:55 INFO parse.ParseDriver: Parse Completed
java.lang.RuntimeException: Unsupported dataType: decimal(19,6)
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:233)
at org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute(HiveMetastoreCatalog.scala:308)
at org.apache.spark.sql.hive.MetastoreRelation$$anonfun$9.apply(HiveMetastoreCatalog.scala:318)
at org.apache.spark.sql.hive.MetastoreRelation$$anonfun$9.apply(HiveMetastoreCatalog.scala:318)
…..

Is decimal not a supported data type for this scenario?

Can we not use SPARK with our application unless we change the data types?

↧

HCatInput Format Exception on HDP 2.1

December 15, 2014, 8:28 am

≫ Next: ambari-server upgrade error (1.6 > 1.7 doc)

≪ Previous: Decimal data type not supported in SPARK->HiveQL?

Replies: 0

I’m trying to read a Hive table as input to an MR job and I get the following exception…

[rt2357@104-04-01 ~]$ hadoop jar ./platform-persistence-mapreduce-0.0.1-SNAPSHOT.jar com.att.bdcoe.platform.persistence.mapreduce.jobs.CLFHiveBulkLoader -q “104-03-02.c.datamaster.bigtdata.io,104-03-03.c.datamaster.bigtdata.io,104-04-03.c.datamaster.bigtdata.io” -t clf_csv
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/users/rt2357/platform-common-0.0.1-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/users/rt2357/platform-persistence-api-0.0.1-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
12-15-14 15:10:10,405 INFO metastore:297 – Trying to connect to metastore with URI thrift://104-03-02.c.datamaster.bigtdata.io:9083
12-15-14 15:10:10,446 INFO metastore:385 – Connected to metastore.
12-15-14 15:10:11,659 INFO TimelineClientImpl:123 – Timeline service address: http://104-03-03.c.datamaster.bigtdata.io:8188/ws/v1/timeline/
12-15-14 15:10:11,668 INFO RMProxy:92 – Connecting to ResourceManager at 104-04-02.c.datamaster.bigtdata.io/10.0.28.117:8050
12-15-14 15:10:12,965 INFO deprecation:1009 – mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12-15-14 15:10:12,986 INFO FileInputFormat:247 – Total input paths to process : 1
12-15-14 15:10:12,995 INFO FileInputFormat:247 – Total input paths to process : 1
12-15-14 15:10:13,062 INFO JobSubmitter:396 – number of splits:2
12-15-14 15:10:13,206 INFO JobSubmitter:479 – Submitting tokens for job: job_1418440661430_28689
12-15-14 15:10:13,381 INFO YarnClientImpl:236 – Submitted application application_1418440661430_28689
12-15-14 15:10:13,409 INFO Job:1289 – The url to track the job: http://104-04-02.c.datamaster.bigtdata.io:8088/proxy/application_1418440661430_28689/
12-15-14 15:10:13,410 INFO Job:1334 – Running job: job_1418440661430_28689
12-15-14 15:15:33,371 INFO Job:1355 – Job job_1418440661430_28689 running in uber mode : false
12-15-14 15:15:33,372 INFO Job:1362 – map 0% reduce 0%
12-15-14 15:15:58,528 INFO Job:1441 – Task Id : attempt_1418440661430_28689_m_000000_0, Status : FAILED
Error: org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(Lorg/apache/hadoop/hive/serde2/Deserializer;Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/util/Properties;)V
12-15-14 15:15:59,546 INFO Job:1441 – Task Id : attempt_1418440661430_28689_m_000001_0, Status : FAILED
Error: org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(Lorg/apache/hadoop/hive/serde2/Deserializer;Lorg/apache/hadoop/conf/Configuration;Ljava/util/Properties;Ljava/util/Properties;)V
12-15-14 15:16:21,648 INFO Job:1441 – Tas

↧

ambari-server upgrade error (1.6 > 1.7 doc)

December 15, 2014, 7:21 pm

≫ Next: kerberos and ambari ( 1.7.0)

≪ Previous: HCatInput Format Exception on HDP 2.1

Replies: 2

Hi All,

I used the document Ambari 1.6 upgrade to 1.7,

$ yum upgrade ambari-server ambari-log4j >> OK
[root@hdp sbin]# ambari-server upgrade
Using python /usr/bin/python2.6
Upgrading ambari-server
Traceback (most recent call last):
File “/usr/sbin/ambari-server.py”, line 46, in <module>
from ambari_commons import OSCheck, OSConst, Firewall
ImportError: cannot import name Firewall

ANYONE HAVE ANY IDEA, THANKS

↧

kerberos and ambari ( 1.7.0)

December 15, 2014, 10:48 pm

≫ Next: How to connect to Internet inside the sandbox?

≪ Previous: ambari-server upgrade error (1.6 > 1.7 doc)

Replies: 0

dear all,
I’m actually building a docker container (centos 6) which does contain an fully ambari setup which is working pretty well.

I wanted to secure it with Kerberos and there comes the problems…
I’m actually facing the issue that it’s impossible to start the datanode trough Ambari UI, what makes me more confusing is that the namenode can be started without a problem

error msg from /var/lib/ambari-agent/data/errors-144.txt:
Fail: Execution of ‘ulimit -c unlimited; su -s /bin/bash – root -c ‘export HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec && /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh –config /etc/hadoop/conf start datanode” returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-95a1a98c04e2.out

setup steps:
usr/sbin/kadmin.local -q “addprinc root/admin”
/sbin/service krb5kdc start
/sbin/service kadmin start
mkdir /etc/security/keytabs
cd /etc/security/keytabs
#run kadmin.local
addprinc -randkey ambari-qa@HDUSER
addprinc -randkey hdfs@HDUSER
addprinc -randkey HTTP/95a1a98c04e2@HDUSER
addprinc -randkey yarn/95a1a98c04e2@HDUSER
addprinc -randkey dn/95a1a98c04e2@HDUSER
addprinc -randkey falcon/95a1a98c04e2@HDUSER
addprinc -randkey jhs/95a1a98c04e2@HDUSER
addprinc -randkey hive/95a1a98c04e2@HDUSER
addprinc -randkey knox/95a1a98c04e2@HDUSER
addprinc -randkey nagios/95a1a98c04e2@HDUSER
addprinc -randkey nn/95a1a98c04e2@HDUSER
addprinc -randkey nm/95a1a98c04e2@HDUSER
addprinc -randkey oozie/95a1a98c04e2@HDUSER
addprinc -randkey rm/95a1a98c04e2@HDUSER
addprinc -randkey zookeeper/95a1a98c04e2@HDUSER

xst -norandkey -k smokeuser.headless.keytab ambari-qa@HDUSER
xst -norandkey -k hdfs.headless.keytab hdfs@HDUSER
xst -norandkey -k spnego.service.keytab HTTP/95a1a98c04e2@HDUSER
xst -norandkey -k yarn.service.keytab yarn/95a1a98c04e2@HDUSER
xst -norandkey -k dn.service.keytab dn/95a1a98c04e2@HDUSER
xst -norandkey -k falcon.service.keytab falcon/95a1a98c04e2@HDUSER
xst -norandkey -k jhs.service.keytab jhs/95a1a98c04e2@HDUSER
xst -norandkey -k hive.service.keytab hive/95a1a98c04e2@HDUSER
xst -norandkey -k knox.service.keytab knox/95a1a98c04e2@HDUSER
xst -norandkey -k nagios.service.keytab nagios/95a1a98c04e2@HDUSER
xst -norandkey -k nn.service.keytab nn/95a1a98c04e2@HDUSER
xst -norandkey -k nm.service.keytab nm/95a1a98c04e2@HDUSER
xst -norandkey -k oozie.service.keytab oozie/95a1a98c04e2@HDUSER
xst -norandkey -k rm.service.keytab rm/95a1a98c04e2@HDUSER
xst -norandkey -k zk.service.keytab zookeeper/95a1a98c04e2@HDUSER
#exit kadmin.local

Then i do give the right permission the fils in /etc/security/keytabs/
ambari-server stop
ambari-agent stop
/sbin/service krb5kdc restart
/sbin/service kadmin restart
ambari-server start
ambari-agent start

then i start the security procedure trough ambari ( admin -> security -> enable kerberos)

I don’t ee where is my issue here but the fact is that Ambari can’t start kerberos
My container has also plenty of port open

In advance thank you for your help

↧

How to connect to Internet inside the sandbox?

December 16, 2014, 12:10 am

≫ Next: unable to start hue on HDP-2.1.2

≪ Previous: kerberos and ambari ( 1.7.0)

Replies: 1

Hi Experts
i currently need to access to Internet inside cluster – but it seems that current i don’t have access to Internet in HDP sandbox.
I am using HDP 2.2 in virtualbox. My host environment is Windows. i just imported the appliance, without any changes..
how could i change the configuration to access internet from virtualbox? for example, use wget/yum to download packages?

↧

unable to start hue on HDP-2.1.2

December 16, 2014, 2:06 am

≫ Next: partition

≪ Previous: How to connect to Internet inside the sandbox?

Replies: 7

Hi i have installed and configured hue as per HW documentation,see link here,

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-hue-5.html

but i am not able to start hue, what could be the possible reason,

↧

partition

December 16, 2014, 5:00 am

≫ Next: Query on bucketed table stored as ORCfile

≪ Previous: unable to start hue on HDP-2.1.2

Replies: 0

partition

↧

Query on bucketed table stored as ORCfile

December 16, 2014, 5:28 am

≫ Next: Query on bucketed table stored as ORCfile

≪ Previous: partition

Replies: 1

Hello – I want to test out the new ACID transaction capabilities of Hive 0.14 so I’ve got the HDP 2.2 Preview Sandbox up and running, I was able to import my data into HDFS and create an external table on my delimited data.

Next I created a bucketed ORCfile table:
create table diskavailable_orc_bucketed (location string,gbtotal string,gbfree string,servertype string,updated string)
partitioned by (ds string)
clustered by (updated) into 256 buckets
stored as orc;

and now I’m having issues copying my data into it with this query:
CREATE TEMPORARY FUNCTION rowSequence AS ‘org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
insert overwrite table diskavailable_orc_bucketed PARTITION (ds) select rowSequence() as ds, * from diskavailable_orc;

I am using the hive-contrib-0.14.0.jar file as it has the rowSequence capability I need for the partitioning. When I run the two lines above using Hue interface it just keeps saying “Waiting for query…” (screenshot here: http://postimg.org/image/wp0k5qygd/ ) and never seems to execute.

Does it look like I’ve taken some wrong steps or is something happening with my Sandbox? I’m not getting the expected – data copied into the diskavailable_orc_bucketed table.

Any help would be greatly appreciated. Thank you.

↧

Query on bucketed table stored as ORCfile

December 16, 2014, 5:36 am

≫ Next: Confirming and Registering Hosts fails, where is the setup script getting domain

≪ Previous: Query on bucketed table stored as ORCfile

Replies: 1

This is a cross-post from the Sandbox forum – Please let me know if this is against the rules, I apologize in advance for doing this – only trying to make sure my question gets to the appropriate audience.

Does it look like I’ve taken some wrong steps or is something happening with my Sandbox? I’m not getting the expected – data copied into the diskavailable_orc_bucketed table.

Any help would be greatly appreciated. Thank you.

↧

Confirming and Registering Hosts fails, where is the setup script getting domain

December 16, 2014, 9:19 am

≫ Next: Installation HDP 2.2

≪ Previous: Query on bucketed table stored as ORCfile

Replies: 0

I have a cluster of three hosts. Ambari server is on the host that will become the namenode of the custer. During the ConfirmHosts step, Ambari can see the host it is on. It cannot see the other two hosts. The error is toward the end of the registraton log:
WARNING 2014-12-16 15:14:19,341 main.py:235 – Unable to determine the IP address of the Ambari server ‘namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode’
INFO 2014-12-16 15:14:19,341 NetUtil.py:48 –

This https://namenode.localdomain.com.namenode…. where is this coming from? I have checked the following:

I have re-started all the nodes in the cluster. Several times.
I can reach all nodes from all nodes using ping and their fully qualified domain names
I can reach the data nodes from the name node using password-less ssh.
I have changed the name of the machines to match their names in the hosts files on each machine.
I have checked the /etc/ambari-agnet/conf/ambari-agent.ini file.

↧

Installation HDP 2.2

December 16, 2014, 9:30 am

≫ Next: Cannot connect to 127.0.0.1:8888

≪ Previous: Confirming and Registering Hosts fails, where is the setup script getting domain

Replies: 0

I got error during package installation:
FLUME: D:\HadoopInstallFiles\HadoopPackages\hdp-2.2.0.0-winpkg\resources\winpkg.ps1 “D:\HadoopInstallFiles\HadoopPackages\hdp-2.2.0.0-winpkg\resources\flume-1.5.1.2.2.0.0-2041.winpkg\resources\flume-1.5.1.2.2.0.0-2041.zip” utils unzip “D:\hdp”
WINPKG FAILURE: The term ‘Split-Path’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

It root of this error in wrong name of file flume-1.5.1.2.2.0.0-2041.zip in archive. Script expect that name would be flume-1.5.1.2.2.0.0-2041.zip but in package this fiel has name lume-1.5.1.2.2.0.0-2041-bin.zip
How can I fix it?

↧

Cannot connect to 127.0.0.1:8888

December 17, 2014, 12:36 am

≫ Next: AD LDAP authentication issues

≪ Previous: Installation HDP 2.2

Replies: 6

I already download the Sand box 2.0 on my laptop running on Win 8.0 but I cannot connect to the server of 127.0.0.1:8888/.

I have used 3 different browser, chrome, IE and firefox none of them work.
I have also disable the firewall in my computer but still does not work.

I just brought this new laptop with i5-4200U 1.6 GHz with Turbo Boost up to 2.6GHz, NVIDIA @Geforce@ GT720M with 2 GB Dedicated VRAM. 8 GB DDR3 L Memory. I am using a family broadband LAN.

Please advise before I can proceed on the tutorial lesson.

↧

AD LDAP authentication issues

December 17, 2014, 1:17 am

≫ Next: Hortonworks Teradata Connector Incremental import : Teradata table to Hive table

≪ Previous: Cannot connect to 127.0.0.1:8888

Replies: 3

We have successfully configured ambari with Active Directory to the point where can see a successful authentication and authorization call, but we are never actually logged in. The successful calls are shown below from the ambari-server.log…

[root@cmhlpbigdapp01 conf]# grep succ auth.txt
07:02:51,729 DEBUG [qtp1171366715-24 - /api/v1/users/showard?fields=*,privileges/PrivilegeInfo/cluster_name,privileges/PrivilegeInfo/permission_name&_=1418817771197] BasicAuthenticationFilter:171 - Authentication success: org.springframework.security.authentication.UsernamePasswordAuthenticationToken@c89bf82e: Principal: org.springframework.security.ldap.userdetails.LdapUserDetailsImpl@c89b8f82: Dn: cn=Howard\, Steve,ou=DC1,ou=User Accounts,ou=ILM Managed,dc=fake,dc=domain; Username: showard; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true; CredentialsNonExpired: true; AccountNonLocked: true; Granted Authorities: ; Credentials: [PROTECTED]; Authenticated: true; Details: org.springframework.security.web.authentication.WebAuthenticationDetails@ffff8868: RemoteIpAddress: 1.1.1.1; SessionId: null; Not granted any authorities
07:02:51,763 DEBUG [qtp1171366715-24 - /api/v1/users/showard?fields=*,privileges/PrivilegeInfo/cluster_name,privileges/PrivilegeInfo/permission_name&_=1418817771197] FilterSecurityInterceptor:215 - Authorization successful
[root@cmhlpbigdapp01 conf]#

The browser returns the following text…

Unable to connect to Ambari Server. Confirm Ambari Server is running and you can reach Ambari Server from this machine.

The log also has the following…

07:02:51,840 DEBUG [qtp1171366715-24 - /api/v1/users/showard?fields=*,privileges/PrivilegeInfo/cluster_name,privileges/PrivilegeInfo/permission_name&_=1418817771197] HttpSessionSecurityContextRepository:292 - SecurityContext stored to HttpSession: 'org.springframework.security.core.context.SecurityContextImpl@c89bf82e: Authentication: org.springframework.security.authentication.UsernamePasswordAuthenticationToken@c89bf82e: Principal: org.springframework.security.ldap.userdetails.LdapUserDetailsImpl@c89b8f82: Dn: cn=Howard\, Steve,ou=DC1,ou=User Accounts,ou=ILM Managed,dc=fake,dc=domain; Username: showard; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true; CredentialsNonExpired: true; AccountNonLocked: true; Granted Authorities: ; Credentials: [PROTECTED]; Authenticated: true; Details: org.springframework.security.web.authentication.WebAuthenticationDetails@ffff8868: RemoteIpAddress: 1.1.1.1; SessionId: null; Not granted any authorities'

Our understanding is we should be able to add these accounts to local ambari roles once authenticated. What are we missing?

↧

Hortonworks Teradata Connector Incremental import : Teradata table to Hive table

December 17, 2014, 4:44 am

≫ Next: Scoop import using Netezza file location

≪ Previous: AD LDAP authentication issues

Replies: 1

Hi,

I am looking to an INCREMENTAL IMPORT of Teradata table to Hive table.

There is a sample e.g. for incremental import in the documentation.
Referring the same I wrote:
$SQOOP_HOME/bin/sqoop import -libjars $LIB_JARS –connection-manager org.apache.sqoop.teradata.TeradataConnManager –connect jdbc:teradata://xxx.xxx.xxx.xxx/terdatadb –username testuser –password testpwd -query ‘select * from bookinfo where last_modified_col > 2014-12-17 12:55:43.100000 AND $CONDITIONS’ –target-dir /user/hive/warehouse/bookinfo –split-by bookid

LOGS:
——-
14/12/17 18:49:41 INFO teradata.TeradataSqoopImportHelper: Import query select * from bookinfo where last_modified_col > 2014-12-17 12:55:43.100000 AND (1 = 1)
14/12/17 18:49:52 ERROR teradata.TeradataSqoopImportHelper: Exception running Teradata import job
com.teradata.connector.common.exception.ConnectorException: com.teradata.jdbc.jdbc_4.util.JDBCException: [Teradata Database] [TeraJDBC 14.10.00.26] [Error 3706] [SQLState 42000] Syntax error: expected something between an integer and the integer ’12’.
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDatabaseSQLException(ErrorFactory.java:307)
at com.teradata.jdbc.jdbc_4.statemachine.ReceiveInitSubState.action(ReceiveInitSubState.java:109)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:314)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:202)
at com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:123)
at com.teradata.jdbc.jdbc_4.statemachine.StatementController.run(StatementController.java:114)
at com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:384)
at com.teradata.jdbc.jdbc_4.TDStatement.prepareRequest(TDStatement.java:562)
at com.teradata.jdbc.jdbc_4.TDPreparedStatement.<init>(TDPreparedStatement.java:114)
at com.teradata.jdbc.jdk6.JDK6_SQL_PreparedStatement.<init>(JDK6_SQL_PreparedStatement.java:29)
at com.teradata.jdbc.jdk6.JDK6_SQL_Connection.constructPreparedStatement(JDK6_SQL_Connection.java:81)
at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1501)
at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1545)
at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1531)
at com.teradata.connector.teradata.db.TeradataConnection.getColumnDescsForSQL(TeradataConnection.java:963)
at com.teradata.connector.teradata.db.TeradataConnection.getColumnNamesForSQL(TeradataConnection.java:908)
at com.teradata.connector.teradata.utils.TeradataUtils.validateInputTeradataProperties(TeradataUtils.java:298)
at com.teradata.connector.teradata.processor.TeradataInputProcessor.validateConfiguration(TeradataInputProcessor.java:83)

Source Teradata table has ‘last_modified_col’ as timestamp & ‘bookid’ is primary key.

Kindly help me what the exception message is pointing to?

Thanks,
-Nirmal

↧

Scoop import using Netezza file location

December 17, 2014, 4:47 am

≫ Next: HowTo: Import .mdb into Hadoop

≪ Previous: Hortonworks Teradata Connector Incremental import : Teradata table to Hive table

Replies: 1

How do you specify where the external table lands when you do a Scoop import out of netezza using the netezza driver? Or is it some type of transient external table?

Thank you

↧

HowTo: Import .mdb into Hadoop

December 17, 2014, 4:55 am

≫ Next: Sqoop Import from Teradata : Import Hive table's column schema is missing

≪ Previous: Scoop import using Netezza file location

Replies: 1

Dear Community,

I would like to import a locally stored .mdb (MS Access) file into hadoop, but sadly all my attempts failed so far. I tried it with sqoop but i dont know how to modify the import path for my local machine.
Any Idea how to do this?

Thanks a lot in advance!

Bes wishes
Jonas

↧

Sqoop Import from Teradata : Import Hive table's column schema is missing

December 17, 2014, 4:56 am

≫ Next: Cannot log in to Apache Ambari

≪ Previous: HowTo: Import .mdb into Hadoop

Replies: 3

Hi All,

I am able to import the Teradata Table in Hive table using:

$SQOOP_HOME/bin/sqoop import -libjars $LIB_JARS -Dteradata.db.input.target.database=hive_database -Dteradata.db.input.target.table.schema=”Id int, PaperTitle string,PaperYear string,ConferenceId int, JournalId int, Keyword string, last_modified_col timestamp” -Dteradata.db.input.split.by.column=Id –connection-manager org.apache.sqoop.teradata.TeradataConnManager –connect jdbc:teradata://192.168.199.137/testtd –username testtd –password etloffload –table Paper_STAGE –hive-import

The Hive table is created and loaded with data.

However, I see that

-Dteradata.db.input.target.table.schema=”Id int, PaperTitle string,PaperYear string,ConferenceId int, JournalId int, Keyword string, last_modified_col timestamp”

is always required for the import.

If I remove this property I get the below exception:
14/10/09 13:01:21 INFO processor.HiveOutputProcessor: hive table hive_database.Paper_STAGE does not exist
14/10/09 13:01:21 WARN tool.ConnectorJobRunner: com.teradata.connector.common.exception.ConnectorException: The output post processor returns 1
14/10/09 13:01:21 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor starts at: 1412839881857
14/10/09 13:01:26 INFO processor.TeradataInputProcessor: input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor ends at: 1412839881857
14/10/09 13:01:26 INFO processor.TeradataInputProcessor: the total elapsed time of input postprocessor com.teradata.connector.teradata.processor.TeradataSplitByHashProcessor is: 5s
14/10/09 13:01:26 ERROR teradata.TeradataSqoopImportHelper: Exception running Teradata import job
com.teradata.connector.common.exception.ConnectorException: Import Hive table’s column schema is missing
at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:104)
at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:48)
at org.apache.sqoop.teradata.TeradataSqoopImportHelper.runJob(TeradataSqoopImportHelper.java:370)
at org.apache.sqoop.teradata.TeradataConnManager.importTable(TeradataConnManager.java:504)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)

Is -Dteradata.db.input.target.table.schema always required for first time import ?
Do I need to map the Teradata Data Types to Hive Data Types manually?
And how about handling Teradata specific data types like Period, Interval and also CLOB, BLOB?
Can these be mapped with ‘string’ in Hive?

Thanks,
-Nirmal

↧

Cannot log in to Apache Ambari

December 17, 2014, 5:21 am

≫ Next: Ambari cannot register datanodes

≪ Previous: Sqoop Import from Teradata : Import Hive table's column schema is missing

Replies: 2

I have followed the installation instructions completely and am now on the step to go to http://{main.install.hostname}:8080. I am using NoMachine and working through the terminal as well as the browser. When I click the browser button, and try to go to the address, I get “Unable to connect Firefox can’t establish a connection to the server at node1:8080.” error.

Is there some kind of security that I am missing that would explain why I cannot get to this address?

Here’s the instructions I am following:

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_using_Ambari_book/content/ambari-chap3-1.html

↧

Ambari cannot register datanodes

December 17, 2014, 9:09 am

≫ Next: HDP2.2.0 upgrade notes

≪ Previous: Cannot log in to Apache Ambari

Replies: 4

Here is another question about getting hosts to register. I am using Ambari 1.7.0 on CentOS 6 machines. I am trying to install HDP 2.1.

First here is the hosts file I am using. Note each node has the same hosts file:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.200.144 datanode10.localdomain.com
192.168.200.107 datanode01.localdomain.com
192.168.200.143 namenode.localdomain.com

Also, I can ping each machine for any machine.
I can SSH without a password from the name node into the other datanodes.
I disabled selinux and iptables on all machines.

I am following the startup procedure listed here: https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.7.0+from+Public+Repositories. Please note that these install instructions mention nothing about iptables or selinux. People on the mailing list have told me that I need to disable those items.

Ambari can discover the namenode it is sitting on. It cannot discover the datanodes. I get this error from the registration log file:
—

==========================
Running setup agent script…
==========================
…
Verifying ambari-agent process status…
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
(“WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 – Server at https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440 is not reachable, sleeping for 10 seconds…
—

Why is the namenode being appended to the namenode.localdomain.com URL? Why is the script considering this a valid
URL and not throwing an error?

↧