spark-3.0.1和hive-1.1.0集成的问题

问题一

主jar里面包含了jetty或者spring boot相关的web依赖信息,可能会导致和 hive-1.1.0 自带的jetty冲突

classNotFoundException: org.apache.geronimo.components.jaspi.AuthConfigFactoryImpl 

解决办法:单独下载 geronimo-jaspi-2.0.0.jar 包,放入 spark 根目录的 jars 目录里面

geronimo-jaspi-2.0.0.jar

链接: https://pan.baidu.com/s/1ThAjKRznK2CUDaiXYPzKXg 提取码: b0b0 

问题二

Hive queries failing with "Unable to fetch table test_table. Invalid method name: 'get_table_req'" with spark 3.0.0 & Hive 1.1.0

Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test Invalid method name: 'get_table_req'
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table zps_xxx. Invalid method name: 'get_table_req'
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test. Invalid method name: 'get_table_req'
        at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1282)
        at org.apache.spark.sql.hive.client.HiveClientImpl.getRawTableOption(HiveClientImpl.scala:392)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$tableExists$1(HiveClientImpl.scala:406)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
        at org.apache.spark.sql.hive.client.HiveClientImpl.tableExists(HiveClientImpl.scala:406)
        at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$tableExists$1(HiveExternalCatalog.scala:854)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
        ... 111 more

解决办法:

在spark启动命令里面增加如下配置:

 --conf spark.sql.hive.metastore.version=1.2.1 \
 --conf spark.sql.hive.metastore.jars=/opt/hive_jars/* \

spark-env.sh里面注意设置依赖组件变量比如 hbase 和 hive 等:

#!/usr/bin/env bash

export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:\
/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:\
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native:\
/opt/cloudera/parcels/CDH/lib/hadoop/lib/native

export SPARK_DIST_CLASSPATH=/etc/hadoop/conf:\
/opt/cloudera/parcels/CDH/lib/hadoop/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop/*:\
/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/:\
/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*:\
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*:\
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:\
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*:\
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/*

export SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:/opt/hive_jars/*:/opt/cloudera/parcels/CDH/lib/hbase/*:/opt/cloudera/parcels/CDH/lib/hbase/lib/*

你可能感兴趣的:(Hadoop,spark,hive,hive,spark,大数据)