os:Windows 10
zookeeper:zookeeper-3.4.6
kafka:kafka_2.11-1.1.0
scala:scala-2.11.8
java:jdk1.8.0_111
Intellij idea: 14.1.4
1.下载zookeeper安装包,解压到指定目录,比如D:\envpath\zookeeper-3.4.6。
2.将conf文件夹下zoo_sample.cfg重命名为zoo.cfg,修改其中的配置。
#修改配置项:
dataDir=D:/envpath/zookeeper-3.4.6/data
#增加配置项:
dataLogDir=D:/envpath/zookeeper-3.4.6/logs
3.添加环境变量ZOOKEEPER_HOME=D:\envpath\zookeeper-3.4.6;将%ZOOKEEPER_HOME%\bin添加到Path。
4.启动Zookeeper。cmd中输入:
zkServer
1.下载Kafka部署包,解压到指定目录。
2.修改config文件夹下的server.properties,修改日志路径的配置。
log.dirs=D:/envpath/kafka_2.11-1.1.0/logs
3.到kafka的安装目录下,启动Kafka。
.\bin\windows\kafka-server-start.bat .\config\server.properties
<properties>
<kafka.version>1.1.0kafka.version>
properties>
<dependency>
<groupId>org.apache.kafkagroupId>
<artifactId>kafka_2.11artifactId>
<version>${kafka.version}version>
dependency>
<dependency>
<groupId>org.apache.kafkagroupId>
<artifactId>kafka-clientsartifactId>
<version>${kafka.version}version>
dependency>
producer是用来生成数据的。props中配置了一系列的参数,每个参数如下:
参数 | 含义 |
---|---|
bootstrap.servers | kafka连接的broker地址列表。格式为host[:port];可以有多个地址,用逗号分隔,如kafka01:9092,kafka02:9092。 |
acks | 代表kafka收到消息的答复数。0表示不需要收到答复。1表示,只要有一个leader broker答复即可,all表示需要收到所有broker的答复。默认为1。 |
retries | 重试发送次数。网络故障时,会自动重发消息。若acks为0,则该项无效,因为无法判断是否需要重发。 |
batch.size | 批处理消息字节数。发往broker的消息会包含多个batches,每个分区对应一个batch,batch小了会减小响吞吐量,batch为0的话就禁用了batch发送。默认值为16384(16kb)。 |
linger.ms | 逗留时间。这个逗留指的是消息不立即发送,而是逗留这个时间后一块发送。默认值为0。 |
buffer.memory | 保存待发送消息的内存大小。当消息发送速度大于kafka服务器接收的速度,producer会阻塞max_block_ms,超时会报异常,buffer_memory用来保存等待发送的消息,默认33554432(32MB)。 |
key.serializer | key序列化函数。默认值为: None,因此必须要配置该项,否则会报错。 |
value.serializer | value序列化函数。默认值为: None,因此必须要配置该项,否则会报错。 |
具体代码如下:
import java.util.Properties
import org.apache.kafka.clients.producer.{ProducerRecord, KafkaProducer}
import scala.util.Random
object MessageProducer {
val topic = "test-music-topic"
def main(args: Array[String]) {
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("acks", "1")
props.put("retries", "0")
props.put("batch.size", "16384")
props.put("linger.ms", "1")
props.put("buffer.memory","33554432")
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val users = Array("Tim", "Mary", "Jack", "Edward", "Milly", "Jackson")
val musics = Array("Life is like a boat", "Lemon", "Rain", "Fish in the pool", "City of Starts", "Summer", "Planet")
val operations = Array("like", "download", "store","delete")
val random = new Random()
val num = 10
for (i <- 0 to num) {
val message = users(random.nextInt(users.length)) + "," +
musics(random.nextInt(musics.length)) + "," +
operations(random.nextInt(operations.length)) + "," +
System.currentTimeMillis()
producer.send(new ProducerRecord[String, String](topic, Integer.toString(i), message))
println(message)
}
producer.close()
}
}
Consumer用来消费数据。其配置项key.deserializer和value.deserializer是必须的,与Producer的key.seriliazer和value.seriliazer对应。具体代码如下。
import java.util.{Collections, Properties}
import org.apache.kafka.clients.consumer.KafkaConsumer
import scala.collection.JavaConverters._
object MessageConsumer {
val topic = "test-music-topic"
def main(args: Array[String]) {
val props = new Properties();
props.put("bootstrap.servers", "localhost:9092")
props.put("request.required.acks", "1");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("group.id", "something")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(Collections.singletonList(topic))
while (true) {
val records = consumer.poll(100)
for (record <- records.asScala) {
println(s"offset = ${record.offset()}, key = ${record.key()}, value = ${record.value()}")
}
}
}
}
先运行Consumer,其会先输出consumer的配置信息,因为producer还没有生成消息,所以之后consumer停止输出。然后运行producer,producer只发送10条消息数据。发送完自动关闭。
producer的输出如下:
2018-06-12 19:56:22,699 INFO [org.apache.kafka.common.utils.AppInfoParser] - Kafka version : 1.1.0
2018-06-12 19:56:22,699 INFO [org.apache.kafka.common.utils.AppInfoParser] - Kafka commitId : fdcf75ea326b8e07
Tim,Lemon,like,1528804583071
2018-06-12 19:56:23,528 INFO [org.apache.kafka.clients.Metadata] - Cluster ID: u88DYmIoSJCSkoWG2EdXDQ
Jackson,City of Starts,like,1528804583552
Tim,Rain,delete,1528804583553
Mary,City of Starts,like,1528804583553
Jack,Lemon,like,1528804583556
Edward,Lemon,download,1528804583556
Tim,Lemon,download,1528804583558
Milly,Fish in the pool,like,1528804583558
Tim,Planet,download,1528804583559
Edward,Rain,like,1528804583559
Tim,Rain,like,1528804583559
2018-06-12 19:56:23,559 INFO [org.apache.kafka.clients.producer.KafkaProducer] - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
producer发送消息后,consumer马上可以收到,输出如下。
2018-06-12 19:56:16,833 INFO [org.apache.kafka.clients.consumer.internals.AbstractCoordinator] - [Consumer clientId=consumer-1, groupId=something] Successfully joined group with generation 29
2018-06-12 19:56:16,833 INFO [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] - [Consumer clientId=consumer-1, groupId=something] Setting newly assigned partitions [test-music-topic-0]
offset = 121, key = 0, value = Tim,Lemon,like,1528804583071
offset = 122, key = 1, value = Jackson,City of Starts,like,1528804583552
offset = 123, key = 2, value = Tim,Rain,delete,1528804583553
offset = 124, key = 3, value = Mary,City of Starts,like,1528804583553
offset = 125, key = 4, value = Jack,Lemon,like,1528804583556
offset = 126, key = 5, value = Edward,Lemon,download,1528804583556
offset = 127, key = 6, value = Tim,Lemon,download,1528804583558
offset = 128, key = 7, value = Milly,Fish in the pool,like,1528804583558
offset = 129, key = 8, value = Tim,Planet,download,1528804583559
offset = 130, key = 9, value = Edward,Rain,like,1528804583559
offset = 131, key = 10, value = Tim,Rain,like,1528804583559
<properties>
<spark.version>2.2.0spark.version>
properties>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-streaming_2.11artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-streaming-kafka-0-10_2.11artifactId>
<version>${spark.version}version>
dependency>
接收proudcer生产的数据,打印输出。代码如下:
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}
/**
* Created by DELL_PC on 2018/6/12.
*/
object UserActionStreaming {
def main(args: Array[String]) {
val group = "something"
val topics = "test-music-topic"
val conf = new SparkConf().setAppName("pvuv").setMaster("local[3]")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
ssc.checkpoint("data/spark/checkpoint")
val topicSets = topics.split(",").toSet
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> group,
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topicSets, kafkaParams)
)
stream.map(record => (record.key, record.value())).foreachRDD(rdd => rdd.foreach(println));
ssc.start()
ssc.awaitTermination()
}
}
先运行UserActionStreaming,然后再运行producer。producer发送后,UserActionStreaming才接收到消息,产生消息输出。UserActionStreaming整个的运行输出如下。
2018-06-12 15:21:17,472 INFO [org.apache.kafka.clients.producer.ProducerConfig] - ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
... ...
2018-06-12 15:21:18,033 INFO [org.apache.kafka.common.utils.AppInfoParser] - Kafka version : 1.1.0
2018-06-12 15:21:18,033 INFO [org.apache.kafka.common.utils.AppInfoParser] - Kafka commitId : fdcf75ea326b8e07
2018-06-12 15:21:18,629 INFO [org.apache.kafka.clients.Metadata] - Cluster ID: u88DYmIoSJCSkoWG2EdXDQ
Jack,Rain,delete,1528788078392
Milly,Planet,like,1528788078674
Jack,Lemon,delete,1528788078675
Mary,Rain,store,1528788078675
Edward,Planet,store,1528788078675
Milly,Fish in the pool,delete,1528788078675
Milly,City of Starts,download,1528788078676
Mary,Planet,delete,1528788078680
Milly,Fish in the pool,like,1528788078680
Jack,Summer,like,1528788078680
Jackson,Fish in the pool,store,1528788078680
2018-06-12 15:21:18,681 INFO [org.apache.kafka.clients.producer.KafkaProducer] - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
producer运行输出如下。
2018-06-12 15:21:20,134 INFO [org.apache.kafka.common.utils.AppInfoParser] - Kafka version : 1.1.0
2018-06-12 15:21:20,134 INFO [org.apache.kafka.common.utils.AppInfoParser] - Kafka commitId : fdcf75ea326b8e07
2018-06-12 15:21:20,138 INFO [org.apache.spark.streaming.kafka010.CachedKafkaConsumer] - Initial fetch for spark-executor-something test-music-topic 0 55
2018-06-12 15:21:20,151 INFO [org.apache.kafka.clients.Metadata] - Cluster ID: u88DYmIoSJCSkoWG2EdXDQ
(0,Jack,Rain,delete,1528788078392)
(1,Milly,Planet,like,1528788078674)
(2,Jack,Lemon,delete,1528788078675)
(3,Mary,Rain,store,1528788078675)
(4,Edward,Planet,store,1528788078675)
(5,Milly,Fish in the pool,delete,1528788078675)
(6,Milly,City of Starts,download,1528788078676)
(7,Mary,Planet,delete,1528788078680)
(8,Milly,Fish in the pool,like,1528788078680)
(9,Jack,Summer,like,1528788078680)
(10,Jackson,Fish in the pool,store,1528788078680)
参考文章:
https://blog.csdn.net/woloqun/article/details/76047104
https://my.oschina.net/u/218540/blog/1794669
http://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html