windows下 idea本地运行sparkStreaming + kafka 测试程序

1. 安装zookeeper

1.下载zookeeper:http://zookeeper.apache.org/releases.html

2.解压,将conf文件夹下zoo_sample.cfg重命名为zoo.cfg,修改其中的配置:

#修改配置项:
dataDir=D:/dzy/envpath/zookeeper-3.4.14/data
#增加配置项:
dataLogDir=D:/dzy/envpath/zookeeper-3.4.14/logs

3.添加环境变量ZOOKEEPER_HOME=D:\envpath\zookeeper-3.4.6;将%ZOOKEEPER_HOME%\bin添加到Path

4.启动zookeeper,cmd输入:zkServer

2.安装kafka

1.下载,解压:http://kafka.apache.org/downloads.html

2.修改config文件夹下的server.properties,修改日志路径的配置

log.dirs=D:/dzy/envpath/kafka_2.11-2.1.1/logs

3.启动kafka服务:

.\bin\windows\kafka-server-start.bat .\config\server.properties

4.使用kafka及常用命令:

#创建主题:进入kafka安装目录的\bin\windows下按shift +右键,选择“在此处打开命令窗口”,输入如下命令并回车:
kafka-topics.bat  --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

#创建producer 及consumer来测试服务器
#在kafka安装目录的\bin\windows启动新的命令窗口,producer和consumer需要分别启动命令窗口。
#启动producter,启动命令如下:
kafka-console-producer.bat  --broker-list localhost:9092  --topic test

#启动consumer,启动命令如下:
kafka-console-consumer.bat  --bootstrap-server localhost:9092  --topic test

#在producter窗口输入内容,如果在consumer窗口能看到内容,则说明kafka安装成功
列出主题
kafka-topics.bat --list --zookeeper localhost:2181

描述主题
kafka-topics.bat --describe --zookeeper localhost:2181 --topic [topic name]

从头读取消息
kafka-console-consumer.bat --zookeeper localhost:2181 --topic [topic name] --from-beginning

删除主题
kafka-run-class.bat kafka.admin.TopicCommand --delete --topic [topic_to_delete] --zookeeper localhost:2181

3.代码

pom.xml


    
        
            nexus-aliyun
            nexus-aliyun
            http://maven.aliyun.com/nexus/content/groups/public/
            
                true
            
            
                false
            
        
    

    
        
            org.apache.kafka
            kafka_2.12
            1.1.0            
        

        
            log4j
            log4j
            1.2.16
        

        
        
            org.apache.spark
            spark-streaming-kafka-0-10_2.11
            2.3.1
        


        
            org.apache.spark
            spark-streaming_2.11
            2.3.1
        


    


    
        src/main/com.bonc.spark_test
        src/test/test

        
    

sparkStreaming程序,简单输出消息:

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe


/**
  * Created by DZY on 2019/4/19.
  */
object StreamingKafka {
  def main(args: Array[String]): Unit = {
    
    val conf = new SparkConf().setAppName("sparkStremingTest").set("spark.streaming.stopGracefullyOnShutdown", "true")
    val ssc = new StreamingContext(conf, Seconds(2))

    val kafkaParams = Map[String, Object](
      "bootstrap.servers" -> "localhost:9092",
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> "streaming-kafka-test",
      "auto.offset.reset" -> "latest",
      "enable.auto.commit" -> (false: java.lang.Boolean)
    )

    val topics = Array("test")
    val stream = KafkaUtils.createDirectStream[String, String](
      ssc,
      PreferConsistent,
      Subscribe[String, String](topics, kafkaParams)
    )
    
    stream.foreachRDD(_.foreachPartition(_.foreach(records=>println(records.value()))))

    ssc.start()
    ssc.awaitTermination()

  }
}

4.错误解决

1.com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.4

spark读取kafka报错,因为其他包引入了jackson的高版本,在其中将jackson的自动引入剔除即可

        
            org.apache.kafka
            kafka_2.12
            1.1.0
            
                
                    com.fasterxml.jackson.core
                    *
                
            
        

2.解决spark运行中failed to locate the winutils binary in the hadoop binary path的问题

1.下载hadoop-common-2.2.0-bin并解压到某个目录

 https://github.com/srccodes/hadoop-common-2.2.0-bin
 

2.设置hadoop.home.dir 

System.setProperty("hadoop.home.dir", "D:\\dzy\\envpath\\hadoop-common-2.2.0-bin-master")

3. A master URL must be set in your configuration;  

val conf = new SparkConf().setMaster("local[2]").setAppName("sparkStremingTest").set("spark.streaming.stopGracefullyOnShutdown", "true")
local 本地单线程
local[K] 本地多线程(指定K个内核)
local[*] 本地多线程(指定所有可用内核)
spark://HOST:PORT 连接到指定的 Spark standalone cluster master,需要指定端口。
mesos://HOST:PORT 连接到指定的 Mesos 集群,需要指定端口。
yarn-client客户端模式 连接到 YARN 集群。需要配置 HADOOP_CONF_DIR。
yarn-cluster集群模式 连接到 YARN 集群。需要配置 HADOOP_CONF_DIR。

4.设置测试程序日志级别

    //设置日志级别
    ssc.sparkContext.setLogLevel("WARN")

 

你可能感兴趣的:(大数据,#,spark)