Spark-sql与hive整合运行在Yarn上,经典错误解决方案!

1.版本 

spark 2.3.0   hive1.2.1  

2.错误现象

jar通过 spark-submit提交到 yarn运行时报错如下:

org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'

        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3646)

        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:231)

        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:215)

        at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:338)

        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:299)

        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:274)

        at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:243)

        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:265)

        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)

        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)

        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)

        at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:339)

        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:197)

        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:197)

        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:197)

        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)

        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:196)

        at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)

        at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)

        at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)

        at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)

        at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)

        at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.(HiveSessionStateBuilder.scala:69)

        at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)

        at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)

        at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)

        at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)

        at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)

        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)

        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)

        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)

        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)

        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)

        at com.daqsoft.Spark_Hive_Select$.main(Spark_Hive_Select.scala:36)

        at com.daqsoft.Spark_Hive_Select.main(Spark_Hive_Select.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)

        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)

        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'

        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)

        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)

        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3457)

        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3445)

        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2196)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:105)

        at com.sun.proxy.$Proxy31.getAllFunctions(Unknown Source)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2134)

        at com.sun.proxy.$Proxy31.getAllFunctions(Unknown Source)

        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3643)

        ... 44 more

问题分析:元数据库问题,各种方法尝试,未能解决。

最终发现lib 报导入的问题:

因为CHD原来集群hive是1.1.0,我升级到1.2.1 但引用的jar包还是1.1.0版本的。

更正过后的。

hive.HiveUtils: Initializing HiveMetastoreConnection version 1.1.0 using file:/opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop/../hive/lib/accumulo-core-1.6.0.jar:file:/opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop/../hive/lib/accumulo-fate-1.6.0.jar:file:/opt/cloudera/parcels/CDH-5.13.2-1.cdh5.13.2.p0.3/lib/hadoop/../hive/lib/accumulo-start-1.6.0.jar

3.解决方案

把服务器上 hive的lib包更新为 1.2.1版本的。

/opt/cloudera/parcels/CDH/lib/hive

drwxr-xr-x. 4 root root 8192 Oct 9 11:27 lib

drwxr-xr-x. 4 root root 8192 Feb 3 2018 lib_back


注意先备份,再cp一份命名为lib。

问题完美解决!

你可能感兴趣的:(Spark-sql与hive整合运行在Yarn上,经典错误解决方案!)