Spark整合Hbase遇到"java.lang.IllegalStateException: unread block data"错误

问题描述:

spark程序读取hbase数据库,standalone模式提交后出现如下错误,异常栈如下:

2018-02-24 10:05:32,012 INFO  [dag-scheduler-event-loop] scheduler.DAGScheduler: ResultStage 0 (count at HbaseApiDemo.scala:22) failed in 1.099 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.1.109, executor 0): java.lang.IllegalStateException: unread block data
    at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2773)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1599)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
解决办法

折腾了很近,参考网上说明,原因是executor缺少执行应用所需的jar包。(当然前提是driver启动时的jar你已经配置或在命令行中指定了)。

我的应用只是简单读取hbase中的数据,运行时依赖hbase相关的jar包,如下:

/home/zzhan/workspace/hbase-1.2.6/lib/hbase-client-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-common-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-server-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/zookeeper-3.4.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar
/home/zzhan/workspace/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar
/home/zzhan/workspace/hbase-1.2.6/lib/metrics-core-2.2.0.jar

因此我们只需要将依赖的jar包添加到executor执行时的classpath中就行了。
方法1:
提交应用的时候,在命令行中通过–jars指定

spark-2.2.1/bin/spark-submit --master "spark://master:7077" --jars /home/zzhan/workspace/hbase-1.2.6/lib/hbase-client-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/hbase-common-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/hbase-server-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/zookeeper-3.4.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar,/home/zzhan/workspace/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar,/home/zzhan/workspace/hbase-1.2.6/lib/metrics-core-2.2.0.jar --driver-class-path /home/zzhan/workspace/spark-2.2.1/jars/hbase/*:/home/zzhan/workspace/hbase-1.2.6/conf --class HbaseApiDemo jobs/sparkdemo_2.11-0.3.jar

方法2:
配置spark-defaults.conf文件,指定executor或driver依赖的jar包

spark.driver.extraClassPath        /home/zzhan/workspace/spark-2.2.1/jars/hbase/*
# spark.executor.extraClassPath      /home/zzhan/workspace/spark-2.2.1/jars/hbase/*
# jar之间用冒号分隔
spark.executor.extraClassPath      /home/zzhan/workspace/hbase-1.2.6/lib/hbase-client-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/hbase-common-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/hbase-server-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/zookeeper-3.4.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar:/home/zzhan/workspace/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar:/home/zzhan/workspace/hbase-1.2.6/lib/metrics-core-2.2.0.jar

提示:你可以将需要依赖的jar包放到指定的目录中,就像spark.driver.extraClassPath那样

参考文章:
http://blog.csdn.net/u010842515/article/details/51451883
https://stackoverflow.com/questions/34901331/spark-hbase-error-java-lang-illegalstateexception-unread-block-data

你可能感兴趣的:(spark,hbase)