1.下载zookeeper:http://zookeeper.apache.org/releases.html
2.解压,将conf文件夹下zoo_sample.cfg重命名为zoo.cfg,修改其中的配置:
#修改配置项:
dataDir=D:/dzy/envpath/zookeeper-3.4.14/data
#增加配置项:
dataLogDir=D:/dzy/envpath/zookeeper-3.4.14/logs
3.添加环境变量ZOOKEEPER_HOME=D:\envpath\zookeeper-3.4.6;将%ZOOKEEPER_HOME%\bin添加到Path
4.启动zookeeper,cmd输入:zkServer
1.下载,解压:http://kafka.apache.org/downloads.html
2.修改config文件夹下的server.properties,修改日志路径的配置
log.dirs=D:/dzy/envpath/kafka_2.11-2.1.1/logs
3.启动kafka服务:
.\bin\windows\kafka-server-start.bat .\config\server.properties
4.使用kafka及常用命令:
#创建主题:进入kafka安装目录的\bin\windows下按shift +右键,选择“在此处打开命令窗口”,输入如下命令并回车:
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
#创建producer 及consumer来测试服务器
#在kafka安装目录的\bin\windows启动新的命令窗口,producer和consumer需要分别启动命令窗口。
#启动producter,启动命令如下:
kafka-console-producer.bat --broker-list localhost:9092 --topic test
#启动consumer,启动命令如下:
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test
#在producter窗口输入内容,如果在consumer窗口能看到内容,则说明kafka安装成功
列出主题
kafka-topics.bat --list --zookeeper localhost:2181
描述主题
kafka-topics.bat --describe --zookeeper localhost:2181 --topic [topic name]
从头读取消息
kafka-console-consumer.bat --zookeeper localhost:2181 --topic [topic name] --from-beginning
删除主题
kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181
pom.xml
nexus-aliyun
nexus-aliyun
http://maven.aliyun.com/nexus/content/groups/public/
true
false
org.apache.kafka
kafka_2.12
1.1.0
log4j
log4j
1.2.16
org.apache.spark
spark-streaming-kafka-0-10_2.11
2.3.1
org.apache.spark
spark-streaming_2.11
2.3.1
src/main/com.bonc.spark_test
src/test/test
sparkStreaming程序,简单输出消息:
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
/**
* Created by DZY on 2019/4/19.
*/
object StreamingKafka {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("sparkStremingTest").set("spark.streaming.stopGracefullyOnShutdown", "true")
val ssc = new StreamingContext(conf, Seconds(2))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "streaming-kafka-test",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("test")
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.foreachRDD(_.foreachPartition(_.foreach(records=>println(records.value()))))
ssc.start()
ssc.awaitTermination()
}
}
spark读取kafka报错,因为其他包引入了jackson的高版本,在其中将jackson的自动引入剔除即可
org.apache.kafka
kafka_2.12
1.1.0
com.fasterxml.jackson.core
*
1.下载hadoop-common-2.2.0-bin并解压到某个目录
https://github.com/srccodes/hadoop-common-2.2.0-bin
2.设置hadoop.home.dir
System.setProperty("hadoop.home.dir", "D:\\dzy\\envpath\\hadoop-common-2.2.0-bin-master")
val conf = new SparkConf().setMaster("local[2]").setAppName("sparkStremingTest").set("spark.streaming.stopGracefullyOnShutdown", "true")
local 本地单线程
local[K] 本地多线程(指定K个内核)
local[*] 本地多线程(指定所有可用内核)
spark://HOST:PORT 连接到指定的 Spark standalone cluster master,需要指定端口。
mesos://HOST:PORT 连接到指定的 Mesos 集群,需要指定端口。
yarn-client客户端模式 连接到 YARN 集群。需要配置 HADOOP_CONF_DIR。
yarn-cluster集群模式 连接到 YARN 集群。需要配置 HADOOP_CONF_DIR。
//设置日志级别
ssc.sparkContext.setLogLevel("WARN")