sparkStreaming读kafka

windows环境本地起kafka producer进行测试,windows环境安装启动kafka可参照这篇博客:https://blog.csdn.net/shenyanwei/article/details/90374859

代码如下:

import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Author Justice
  * @Date 2019/12/26 14:44
  */
object sparkKafkaDemo {
  def main(args: Array[String]): Unit = {
    //创建streamingcontext
    val sparkConf = new SparkConf().setAppName("sparkKafkaDemo").setMaster("local[4]")
    val ssc = new StreamingContext(sparkConf,Seconds(1))

    //准备参数
    val kafkaParams: Map[String, Object] = Map[String, Object](
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "127.0.0.1:9092",
      ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.GROUP_ID_CONFIG -> "test-consumer-group",
      ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "latest",
      ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (false: java.lang.Boolean))

    //准备topics
    val topics = Array("test")

    val streams = KafkaUtils.createDirectStream(
      ssc,
      LocationStrategies.PreferConsistent,
      ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
    )

    streams.foreachRDD(iter=>{
      iter.foreachPartition(partitions=>{
        partitions.foreach(row=>{
          println(row.value()+"=================================")
        })
      })
    })
    ssc.start()
    ssc.awaitTermination()

  }


}

producer发送数据:

控制台输出结果:

sparkStreaming读kafka_第1张图片

这段代码时没有对offset进行手动管理的,这意味着根据配置的AUTO_OFFSET_RESET_CONFIG策略不同(latest/earliest)可能造成数据的丢失或者重复消费,后续会对手动管理offset和sparkStreaming发送数据到kafka进行更新

你可能感兴趣的:(spark)