spark读取hive问题汇总

hive的计算引擎是tez,该如何配置?

spark读取hive的数据报错,按照网上的说明,将hive的conf目录下的hive-site.xml复制到spark的conf目录下,并添加上hive的metastore。


        hive.metastore.uris
        thrift://hadoop102:9083

然后启动hive metastore服务:

bin/hive --service metastore。

 

然后启动spark读取hive的内容时还是报错:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:114)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 12 more

 

主要是spark和tez引擎冲突了,所以把spark的conf目录下的hive计算引擎改为原来的mapreduce就行了。


       hive.execution.engine
       mr

 

hive的表是lzo压缩的,查询表的数据报错,该怎么解决?

Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
  ... 92 more

这里报错说明的是没有找到lzo的压缩包,网上有说的在conf下的spark_env.sh里面加上lzo的配置目录。

 

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/hadoop/share/hadoop/share/hadoop/common/*

我配置了之后启动报错,然后说spark2.x已经淘汰了SPARK_CLASSPATH,建议使用--driver-class-path。

我加上--driver-class-path启动:

 bin/spark-shell --master yarn --deploy-mode client --driver-class-path  /opt/module/hadoop-2.7.2/share/hadoop/common/*

启动之后还是报错:

java.lang.NoSuchMethodError: org.apache.hadoop.io.retry.RetryPolicies.retryOtherThanRemoteException(Lorg/apache/hadoop/io/retry/RetryPolicy;Ljava/util/Map;)Lorg/apache/hadoop/io/retry/RetryPolicy;

之后我选择具体的lzo包:

 bin/spark-shell --master yarn --deploy-mode client --driver-class-path  /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar

正常启动。


spark读取hbase关联的hive表报错

error in initSerDe: java.lang.ClassNotFoundException Class org.apache.hadoop.hive.hbase.HBaseSerDe not found java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.hbase.HBaseSerDe not found。

将hbase/lib目录下这几个jar包复制到spark的jar目录下。

  • hbase-protocol-1.1.2.jar
  • hbase-client-1.1.2.jar
  • hbase-common-1.1.2.jar
  • hbase-server-1.1.2.jar
  • hive-hbase-handler-1.2.1.jar
  • metrics-core-2.2.0.jar

然后将hbase-site.xml拷贝到spark的conf目录下,重新启动spark-shell即可。

你可能感兴趣的:(spark)