Spark Streaming + Kafka整合实例

     摘要:本文主要讲了一个Spark Streaming+Kafka整合的实例

本文工程下载:https://github.com/appleappleapple/BigDataLearning

1、工程目录结构

Spark Streaming + Kafka整合实例_第1张图片

2、引入依赖


	4.0.0
	com.lin
	SparkStreaming-Demo
	0.0.1-SNAPSHOT
	${project.artifactId}
	My wonderfull scala app
	2015
	
		
			My License
			http://....
			repo
		
	

	
		1.8
		1.8
		UTF-8
		2.11.5
		2.11
	

	
		
			org.slf4j
			slf4j-log4j12
			1.7.8
		

		
			org.scala-lang
			scala-library
			${scala.version}
		

		
			org.apache.spark
			spark-core_2.11
			2.1.0
		

		
			org.apache.spark
			spark-streaming_2.11
			2.1.0
		

		
			org.apache.spark
			spark-streaming-kafka_2.11
			1.6.1
		

	

	
		src/main/scala
		src/test/scala
		
			
				src/main/resources
				${basedir}/target/classes
				
					**/*.properties
					**/*.xml
				
				true
			
			
				src/main/resources
				${basedir}/target/resources
				
					**/*.properties
					**/*.xml
				
				true
			
		
		
			
				
				net.alchim31.maven
				scala-maven-plugin
				3.2.0
				
					
						
							compile
							testCompile
						
						
							
								
								-dependencyfile
								${project.build.directory}/.scala_dependencies
							
						
					
				
			
			
				org.apache.maven.plugins
				maven-surefire-plugin
				2.18.1
				
					false
					true
					
						**/*Test.*
						**/*Suite.*
					
				
			
			
				maven-assembly-plugin
				2.6
				
					
						jar-with-dependencies
					
				
			
		
	



3、编写计算代码

package com.lin.demo

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Durations
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.kafka.KafkaUtils
import kafka.serializer.StringDecoder

object KafkaWordCount {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setAppName("WordCount").setMaster("local[2]") //至少2个线程,一个DRecive接受监听端口数据,一个计算
    val sc = new StreamingContext(sparkConf, Durations.seconds(3));
    val kafkaParams = Map[String, String]("metadata.broker.list" -> "127.0.0.1:9092") // 然后创建一个set,里面放入你要读取的Topic,这个就是我们所说的,它给你做的很好,可以并行读取多个topic
    var topics = Set[String]("linlin");
    //kafka返回的数据时key/value形式,后面只要对value进行分割就ok了
    val linerdd = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      sc, kafkaParams, topics)
    val wordrdd = linerdd.flatMap { _._2.split(" ") }
    wordrdd.foreachRDD(rdd => {
      println("从topic:" + topics + "读取rdd:" + rdd.count())
    })

    wordrdd.print()
    val resultrdd = wordrdd.map { x => (x, 1) }.reduceByKey { _ + _ }
    resultrdd.print()
    sc.start()
    sc.awaitTermination()
    sc.stop()
  }

}

4、启动zk和kafka

启动zk

Spark Streaming + Kafka整合实例_第2张图片

启动kafka

Spark Streaming + Kafka整合实例_第3张图片


5、发送消息

package com.lin.demo.producer;

import java.util.Properties;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

public class KafkaProducer {
	private final Producer producer;
	public final static String TOPIC = "linlin";

	private KafkaProducer() {
		Properties props = new Properties();
		// 此处配置的是kafka的端口
		props.put("metadata.broker.list", "127.0.0.1:9092");
		props.put("zk.connect", "127.0.0.1:2181");  

		// 配置value的序列化类
		props.put("serializer.class", "kafka.serializer.StringEncoder");
		// 配置key的序列化类
		props.put("key.serializer.class", "kafka.serializer.StringEncoder");

		props.put("request.required.acks", "-1");

		producer = new Producer(new ProducerConfig(props));
	}

	void produce() {
		int messageNo = 1000;
		final int COUNT = 10000;

		while (true) {
			String key = String.valueOf(messageNo);
			String data = "INFO JobScheduler: Finished job streaming job 1493090727000 ms.0 from job set of time 1493090727000 ms" + key;
			producer.send(new KeyedMessage(TOPIC, key, data));
			System.out.println(data);
			messageNo++;
		}
	}

	public static void main(String[] args) {
		new KafkaProducer().produce();
	}
}

6、验证

将3和6中的代码都跑起来

Spark Streaming + Kafka整合实例_第4张图片

Spark Streaming + Kafka整合实例_第5张图片

本文工程下载:https://github.com/appleappleapple/BigDataLearning

你可能感兴趣的:(Spark,Streaming,Spark技术研究,Spark,Streaming,wordCount)