Spark Streaming 读取Kafka数据

1.引入Maven依赖

 
            org.apache.spark
            spark-core_2.11
            2.1.1
        
        
            org.apache.spark
            spark-sql_2.11
            2.1.1
        
        
            org.apache.spark
            spark-streaming_2.11
            2.1.1
        

        
            org.apache.spark
            spark-streaming-kafka-0-8_2.11
            2.1.1
        

2. Spark Streaming 读取Kafka中的数据

import kafka.serializer.StringDecoder
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @author fczheng 
  *  
  */
object SparkStreaming03_Kafka {
    def main(args: Array[String]): Unit = {
        
        //todo 创建Spark配置对象
        val conf: SparkConf = new SparkConf().setAppName("SparkStreaming03_Kafka").setMaster("local[*]")
        
        //创建上下文环境对象
        val ssc: StreamingContext = new StreamingContext(conf,Seconds(3))
        
        //Kafka参数
        //Kafla参数声明
        val brokers = "hadoop102:9092,hadoop103:9092,hadoop104:9092"
        val topic = "first"
        val group = "bigdata"
        val deserialization = "org.apache.kafka.common.serialization.StringDeserializer"
    
        val kafkaParams = Map(
            ConsumerConfig.GROUP_ID_CONFIG -> group,
            ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
            ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> deserialization,
            ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> deserialization
        )
        val dStream: InputDStream[(String, String)] = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
            ssc, kafkaParams, Set(topic))
        
        
        //dStream.print()
        dStream.map(_._2).print()
        ssc.start()
        ssc.awaitTermination()
    
    
    }
}

3. 在Kafka中创建数据

(a)创建topic

bin/kafka-topics.sh --zookeeper hadoop102:2181 --create --replication-factor 3 --partitions 3 --topic first

(b) 生产数据

 [hadoop@hadoop102 kafka_2.11-0.11.0.2]$  bin/kafka-console-producer.sh --broker-list hadoop102:9092 --topic first
>haddoop
>hadoop hive spark
>aa bbb cc aa

4.结果展示

-------------------------------------------
Time: 1570784403000 ms
-------------------------------------------
haddoop

-------------------------------------------
Time: 1570784406000 ms
-------------------------------------------

-------------------------------------------
Time: 1570784409000 ms
-------------------------------------------
hadoop hive spark

-------------------------------------------
Time: 1570784412000 ms
-------------------------------------------
aa bbb cc aa

-------------------------------------------
Time: 1570784415000 ms
-------------------------------------------

 

你可能感兴趣的:(Spark)