java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

 

版权声明:本文为博主原创文章,未经博主允许不得转载!!

欢迎访问:https://blog.csdn.net/qq_21439395/article/details/80412688

交流QQ: 824203453

欢迎关注B站,收看更多视频内容:https://space.bilibili.com/383891492 

 

spark streaming 2.2  整合 kafka 0.10 后,

当有对同一个rdd的多次filter,再聚合的操作时报错如下:

 

java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
代码如下:
val ssc: StreamingContext = new StreamingContext(conf, Seconds(3))
// Kafka的参数配置
val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "hdp-02:9092,hdp-03:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer], 
  "group.id" -> "group_hello",
  "auto.offset.reset" -> "earliest"
)

val topics = Array("helloTopic8")
val directStream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream(ssc,
  LocationStrategies.PreferConsistent,
  ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
)

val maped: DStream[String] = directStream.map(crd => (crd.value())) /*.cache()*/

val rdd = maped.foreachRDD(rdds => {

  // 具体的业务逻辑是否合理不重要,对同一个rdd执行两次操作,然后进行join,这里会报错:
  // Caused by: java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
  val rdd1 = rdds.filter(_.equals("a")).map((_, 1))
  val rdd2 = rdds.filter(_.equals("b")).map((_, 1))
  rdd1.leftOuterJoin(rdd2).foreach(println)
})

报错原因:

这里的两个rdd读取的是同一份数据,当执行action时,都会触发两次数据的读操作,(rdd中的一个分区对应着topic中的一个分区,也就是说kafka中的一个分区的数据这里被读取了2次) 但是,同一个分区的数据只能被一个consumer消费,所以这里报错。

解决方案:一个可行的解决方案是对rdd进行缓存或者checkpoint,然后要能保证,原始的kafka中的数据,只会被消费一次,然后剩下的数据消费都从缓存中获取数据。

    maped.foreachRDD(rdd => {
//      val rdds = rdd.cache()
      val rdds = rdd.persist(StorageLevel.DISK_ONLY)
      val rdd1 = rdds.filter(_.equals("a")).map((_, 1))
      val rdd2 = rdds.filter(_.equals("b")).map((_, 1))
      rdd1.leftOuterJoin(rdd2).foreach(println)
    })
 

关于该问题,在stackoverflow中,有多篇文章,这里贴出来,供大家参考:

https://stackoverflow.com/questions/44530234/kafkaconsumer-is-not-safe-for-multi-threading-access 

https://stackoverflow.com/questions/45115905/concurrent-exception-for-kafkaconsumer-is-not-safe-for-multi-threaded-access 

https://stackoverflow.com/questions/42762020/kafkaconsumer-is-not-safe-for-multi-threaded-access-from-sparkstreaming 

https://stackoverflow.com/questions/41245752/spark2-0-2-java-util-concurrentmodificationexception-kafkaconsumer-is-not-safe 

https://blog.csdn.net/NeverKnowPig/article/details/78460031

 

版权声明:本文为博主原创文章,未经博主允许不得转载!!

欢迎访问:https://blog.csdn.net/qq_21439395/article/details/80412688

交流QQ: 824203453

欢迎关注B站,收看更多视频内容:https://space.bilibili.com/383891492 

你可能感兴趣的:(kafka,SparkStreaming,spark2.x)