SparkStreaming2.4整合kafka_010

SparkStreaming2.4整合kafka_010

maven依赖如下:


<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0modelVersion>

    <groupId>com.shufanggroupId>
    <artifactId>sparkstreaming-kafka-offsetartifactId>
    <version>1.0-SNAPSHOTversion>

    <dependencies>
        
        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-core_2.11artifactId>
            <version>2.4.0version>
        dependency>

        
        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-sql_2.11artifactId>
            <version>2.4.0version>
        dependency>

        
        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-streaming_2.11artifactId>
            <version>2.4.0version>

        dependency>

        
        <dependency>
            <groupId>mysqlgroupId>
            <artifactId>mysql-connector-javaartifactId>
            <version>5.1.47version>
        dependency>

        
        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-streaming-kafka-0-10_2.11artifactId>
            <version>2.4.0version>
        dependency>


    dependencies>

project>

简单的代码如下:

前提条件:

1.开启zookeeper,

2.开启sparkstreaming的程序

3.开启kafka-console-producer并且设置序列化与反序列化保持一致

package com.shufang.sparkstreaming

import java.util.Collections

import com.shufang.utils.SparkApiUtil
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka010._


object KafkaStreamDemo {

  def main(args: Array[String]): Unit = {

    //获取ssc
    val ssc: StreamingContext = SparkApiUtil.getSsc("kafkastream", "local[*]", 5)

    //配置kafka的消费者配置
    val kafkaParams = Map[String, Object](
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "localhost:9092",
      ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
      ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "latest",
      ConsumerConfig.GROUP_ID_CONFIG -> "console-group",
      //这个一定要写成这样,而不能直接写true
      ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (true: java.lang.Boolean)
    )


    //指定需要消费的topic
    val topics: Iterable[String] = Array("console-topic").toIterable


    //创建流,记得指明kv的范性类型
    val kafkaStream = KafkaUtils.createDirectStream[String, String](
      ssc,
      LocationStrategies.PreferBrokers,
      //这个Subscribe的构造器中记得加上范型,不然不能自动识别
      ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
    )

    //消费数据流,计算....
    kafkaStream.map(_.value()).print()

    //开启计算
    ssc.start()

    ssc.awaitTermination()
  }
}


//如果不需要自动提交offset,我们可以设置
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (true: java.lang.Boolean)

//然后处理逻辑,保证数据消费的幂等性、或者事务性,exaclty once的语义保证
参考:redis的Set保证幂等性
手动提交offset,但是要让提交与消费这2个流程在同一个事务性操作中

你可能感兴趣的:(Spark)