Exception in thread "main" org.apache.spark.SparkException: Task not serializable

运行一个spark应用时报如下错误:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:769)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:768)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.PairRDDFunctions.flatMapValues(PairRDDFunctions.scala:768)
    at com.my.spark.project.userbehaviour.UserClickTrackETL$.main(UserClickTrackETL.scala:72)
    at com.my.spark.project.userbehaviour.UserClickTrackETL.main(UserClickTrackETL.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
    - object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@1192b58e)
    - field (class: com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, name: sc$1, type: class org.apache.spark.SparkContext)
	- object (class com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, )
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 20 more

异常中提示是org.apache.spark.SparkContext没有序列化,
出错的代码片段

    val sessionRDD = parsedLogRDD.groupBy((rowLog: TrackerLog) => rowLog.getCookie.toString, partitioner).flatMapValues {
      case iter =>
        //对每一个cookie下的日志按照日志的时间升序排序
        val sortedParseLogs = iter.toArray.sortBy(_.getLogServerTime.toString)
        //对每一个cookie下的日志进行遍历,按30分钟切割会话
        val sessionParsedLogsResult = cutSession(sortedParseLogs)
        //获取hdfs中的cookie_label.txt
        val cookieLabelRDD = sc.textFile(s"${trackerDataPath}/cookie_label.txt");
        val cLArray = cookieLabelRDD.map(line => {
          val labels = line.split("\\|")
          labels(0) -> labels(1)
        }).collect()

其中变量sc是SparkContext的实例,它是运行在Driver端的,而flatMapValues是运行在Executor端的,所以报错,因此将sc的使用移出flatMapValues即可。
总结可得,我们需要弄清一个spark应用那些代码段是运行在Driver端,哪些是运行在Executor端,一般RDD的Api是运行在Exectuor端的。

你可能感兴趣的:(问题解决整理)