参考
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_table_access_mapreduce.html
https://github.com/cloudera/hcatalog-examples.git
命令:
for jarfile in `ls /logdata/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib/.`; do
export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:/logdata/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib/$jarfile"
done
hadoop jar bigdata-mapreduce-0.0.1-SNAPSHOT.jar com.yeahmobi.bigdata.mapreduce.GroupByAge age age_group
异常
14/11/05 10:56:44 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/11/05 10:56:44 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
14/11/05 10:56:44 INFO metastore.ObjectStore: ObjectStore, initialize called
14/11/05 10:56:44 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
14/11/05 10:56:47 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
14/11/05 10:56:47 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
14/11/05 10:56:48 INFO metastore.ObjectStore: Initialized ObjectStore
14/11/05 10:56:48 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.12.0
14/11/05 10:56:49 INFO metastore.HiveMetaStore: 0: get_databases: NonExistentDatabaseUsedForHealthCheck
14/11/05 10:56:49 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck
14/11/05 10:56:49 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=groups
14/11/05 10:56:49 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=groups
14/11/05 10:56:49 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
14/11/05 10:56:49 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.io.IOException: NoSuchObjectException(message:default.groups table not found)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:88)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:55)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:47)
at com.cloudera.test.UseHCat.run(UseHCat.java:81)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.cloudera.test.UseHCat.main(UseHCat.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: NoSuchObjectException(message:default.groups table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1377)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy12.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:898)
at org.apache.hcatalog.common.HCatUtil.getTable(HCatUtil.java:194)
at org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:86)
... 11 more
14/11/05 10:56:51 INFO metastore.HiveMetaStore: 1: Shutting down the object store...
14/11/05 10:56:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=Shutting down the object store...
14/11/05 10:56:51 INFO metastore.HiveMetaStore: 1: Metastore shutdown complete.
14/11/05 10:56:51 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=Metastore shutdown complete.
找不到表,将hive client配置xml加入source里,解决。
2014-11-25 06:59:33,953 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hcatalog.mapreduce.HCatOutputFormat not found
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hcatalog.mapreduce.HCatOutputFormat not found
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:467)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1477)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hcatalog.mapreduce.HCatOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:232)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:463)
... 8 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hcatalog.mapreduce.HCatOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
... 10 more
hive相关类找不到,mr是通过ToolRunner执行,使用-libjars参数增加hive相关包,解决。
hadoop jar bigdata-mapreduce-0.0.1-SNAPSHOT.jar com.yeahmobi.bigdata.mapreduce.GroupByAge -libjars /logdata/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hive-hcatalog-core-0.13.1-cdh5.2.0.jar,/logdata/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hive-exec-0.13.1-cdh5.2.0.jar,/logdata/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/hive-metastore-0.13.1-cdh5.2.0.jar age age_group
如果字段比较多,感觉java代码停麻烦的,不如直接load。