How-to: resolve "java.io.NotSerializableException" issue during spark reading hbase table

During reading htable via spark scala code, the following error happened:

15/10/28 16:39:00 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 10.0 (TID 2536, slave14.dc.tj): java.lang.RuntimeException: java.io.NotSerializableException: org.apache.hadoop.hbase.io.ImmutableBytesWritable
Serialization stack:
        - object not serializable (class: org.apache.hadoop.hbase.io.ImmutableBytesWritable, value: 30 30 5f 39 39 39 38 38 33)
        - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
        - object (class scala.Tuple2, (30 30 5f 39 39 39 38 38 33,keyval......
        at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
        at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153)
        at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1190)
        ......

The solution is turn spark to use KryoSerializer:
sparkconf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

Reference:
http://spark.apache.org/docs/latest/tuning.html#data-serialization

你可能感兴趣的:(spark,hbase,serializer)