flink+kafka commit offset

Kafka Consumers Offset Committing Behaviour Configuration
The Flink Kafka Consumer allows configuring the behaviour of how offsets are committed back to Kafka brokers (or Zookeeper in 0.8). Note that the Flink Kafka Consumer does not rely on the committed offsets for fault tolerance guarantees. The committed offsets are only a means to expose the consumer’s progress for monitoring purposes.

The way to configure offset commit behaviour is different, depending on whether or not checkpointing is enabled for the job.

  • Checkpointing disabled: if checkpointing is disabled, the Flink Kafka Consumer relies on the automatic periodic offset committing capability of the internally used Kafka clients. Therefore, to disable or enable offset committing, simply set the enable.auto.commit (or auto.commit.enable for Kafka 0.8) / auto.commit.interval.ms keys to appropriate values in the provided Properties configuration.

  • Checkpointing enabled: if checkpointing is enabled, the Flink Kafka Consumer will commit the offsets stored in the checkpointed states when the checkpoints are completed. This ensures that the committed offsets in Kafka brokers is consistent with the offsets in the checkpointed states. Users can choose to disable or enable offset committing by calling the setCommitOffsetsOnCheckpoints(boolean) method on the consumer (by default, the behaviour is true). Note that in this scenario, the automatic periodic offset committing settings in Properties is completely ignored.

大概的含义
Flink kafka consumer commit offset 方式需要区分是否开启了 checkpoint。

checkpoint 关闭,commit offset 要依赖于 kafka 客户端的 auto commit。需设置 enable.auto.commit,auto.commit.interval.ms 参数到 consumer properties,就会按固定的时间间隔定期 auto commit offset 到 kafka。

checkpoint开启,这个时候作业消费的 offset 是 Flink 在 state 中自己管理和容错。此时提交 offset 到 kafka,一般都是作为外部进度的监控,想实时知道作业消费的位置和 lag 情况。此时需要 setCommitOffsetsOnCheckpoints 为 true 来设置当 checkpoint 成功时提交 offset 到 kafka。此时 commit offset 的间隔就取决于 checkpoint 的间隔,所以此时从 kafka 一侧看到的 lag 可能并非完全实时,如果 checkpoint 间隔比较长 lag 曲线可能会是一个锯齿状。

好,话不多说,上代码

scala


import java.util.{Date, Properties}

import org.apache.flink.api.common.functions.MapFunction
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.CheckpointingMode
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
import org.apache.kafka.common.serialization.StringDeserializer


object main {

  def main(args: Array[String]): Unit = {

    //环境初始化
    val evn = StreamExecutionEnvironment.getExecutionEnvironment
    //flink的checkpoint的时间间隔
    evn.enableCheckpointing(5000)
    // 设置模式为exactly-once 默认(this is the default)
    evn.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)

    //kafka的consumer,test1是要消费的topic
    val consumer = new FlinkKafkaConsumer010[String]("test", new SimpleStringSchema(), getKafkaConfig)
    // Flink从topic中最初的数据开始消费
    // consumer.setStartFromEarliest()
    //设置checkpoint后在提交offset,即oncheckpoint模式
    consumer.setCommitOffsetsOnCheckpoints(true)

    //添加consumer
    val stream = evn.addSource(consumer)
    // 并发
    // stream.setParallelism(3)
    stream.map(new MapFunction[String, String]() {
      override def map(value: String): String = {

        new Date().toString + ":  " + value
      }
    }).print

    //启动执行
    evn.execute("Flink Streaming")

  }

  /**
   * kafka相关配置
   *
   * @return
   */
  def getKafkaConfig: Properties = {

    val properties = new Properties
    properties.setProperty("bootstrap.servers", "47.110.138.240:9092")
    properties.setProperty("group.id", "consumer1")
    // 如果为真,消费者的偏移量将定期在 kafka 后台提交。
    properties.setProperty("enable.auto.commit", "false")
    // 自动提交间隔。
    //    properties.setProperty("auto.commit.interval.ms", "500")
    properties.setProperty("key.deserializer", classOf[StringDeserializer].getName)
    properties.setProperty("value.deserializer", classOf[StringDeserializer].getName)

    properties
  }
}

java

package com.inhertech;


import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;

import java.util.Date;
import java.util.Properties;

public class ReadFromKafka {

    public static void main(String[] args) throws Exception {
        // 构建环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //这里是由一个kafka
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "47.110.138.240:9092");
        properties.setProperty("zookeeper.connect", "47.110.138.240:2181");
        properties.setProperty("group.id", "console-consumer-44911");
        properties.setProperty("enable.auto.commit", "false");
        // 偏移量提交的时间间隔,毫秒
        //properties.setProperty("auto.commit.interval.ms", "500");


        FlinkKafkaConsumer010<String> consumer = new FlinkKafkaConsumer010<>("test", new SimpleStringSchema(), properties);
   
        //设置checkpoint后在提交offset,即oncheckpoint模式
        consumer.setCommitOffsetsOnCheckpoints(true);

        DataStream<String> stream = env.addSource(consumer);

        stream.map(new MapFunction<String, String>() {

            @Override
            public String map(String value) throws Exception {
                return new Date().toString() + ":  " + value;
            }

        }).print();

        env.execute("Flink Streaming");

    }

}

你可能感兴趣的:(scala,JAVA,flink,flink,kafka,java,scala,stream)