【Flink基础】-- 写入 Kafka 的两种方式

方式一:用 Robin 的方式写入 Kafka

1、实现 exactly-once 语义的 kafka sink,用 Robin 的方式写入 Kafka

2、randomRobin: 创建 FlinkKafkaProducer 时,指定空的 customPartitioner,flink 会把 一个 sink subtask 的数据以 round-robin 方式写入 kafka 的各个分区

3、注意:使用此方法可以不用设置 sink 的并行度

 /**
    * 
    *
    * @param brokerList kafka broker
    * @param topic topics
    * @param compressionType compression Type
    * @return FlinkKafkaProducer
    */
  def producerToKafkaRobin(
    brokerList: String,
    maxMessageSize: Int,
    topic: String,
    compressionType: String
  ): FlinkKafkaProducer[String] = {
    val producerProperties = new Properties()
    // 增大输出数据量限制
    producerProperties.setProperty(
      ProducerConfig.MAX_REQUEST_SIZE_CONFIG,
      maxMessageSize * 1048576 + ""
    )

    // 启用压缩
    producerProperties.setProperty(
      ProducerConfig.COMPRESSION_TYPE_CONFIG,
      compressionType
    )

    // 配置 bootstrap.servers
    producerProperties.setProperty(
      ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
      brokerList
    )

    // 不能超过 12 分钟
    producerProperties.setProperty(
      "transaction.timeout.ms",
      1000 * 60 * 12 + ""
    )

    // produce ack =-1 ,保证不丢
    producerProperties.setProperty(ProducerConfig.ACKS_CONFIG, -1 + "")

    // 开启 exactly-once 时必须设置幂等
    producerProperties.setProperty(
      "enable.idempotence",
      "true"
    )

    // 设置了retries参数,可以在Kafka的Partition发生leader切换时,Flink不重启,而是做5次尝试:
    producerProperties.setProperty(ProducerConfig.RETRIES_CONFIG, "5")
    // 开启 RETREIS 时,可能导致消息乱序,如果要求消息严格有序,配置 MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION 为 1
    producerProperties.setProperty(
      ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
      "1"
    )

    new FlinkKafkaProducer[String](
      topic,
      new KeyedSerializationSchemaWrapper[String](new SimpleStringSchema()),
      producerProperties,
      Optional.empty(),
      FlinkKafkaProducer.Semantic.EXACTLY_ONCE,
      FlinkKafkaProducer.DEFAULT_KAFKA_PRODUCERS_POOL_SIZE
    )
  }

 

方式二:fixedPartition

 1、实现 exactly-once 语义的 kafka sink
 2、fixedPartition: 一个 kafka partition 对应一个 flinkkafkaproducer。
      配置该种方式时,flink kafka producer 并行度应该不小于写入的 kafka topic 分区数,否则会导致有些分区没有数据
 3、注意:使用此方法需要合理设置 sink 的并行度,不能超过 topic 的分区数量 ,sink并发度 >= partition分区数

/**
    *
    * @param brokerList kafka broker
    * @param topic topics
    * @return FlinkKafkaProducer
    */
  def producerToKafkaFixed(
    brokerList: String,
    maxMessageSize: Int,
    topic: String,
    compressionType: String
  ): FlinkKafkaProducer[String] = {

    val producerProperties = new Properties()
    // 增大输出数据量限制
    producerProperties.setProperty(
      ProducerConfig.MAX_REQUEST_SIZE_CONFIG,
      maxMessageSize * 1048576 + ""
    )
    // 启用压缩
    producerProperties.setProperty(
      ProducerConfig.COMPRESSION_TYPE_CONFIG,
      compressionType
    )

    // 配置 bootstrap.servers
    producerProperties.setProperty(
      ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
      brokerList
    )

    // 不能超过 12 分钟
    producerProperties.setProperty(
      "transaction.timeout.ms",
      1000 * 60 * 12 + ""
    )

    // produce ack =-1 ,保证不丢
    producerProperties.setProperty(ProducerConfig.ACKS_CONFIG, -1 + "")

    // 开启 exactly-once 时必须设置幂等
    producerProperties.setProperty(
      "enable.idempotence",
      "true"
    )
    producerProperties.setProperty(
      ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
      "1"
    )

    new FlinkKafkaProducer(
      topic,
      new LatestSimpleStringSchema(topic),
      producerProperties,
      FlinkKafkaProducer.Semantic.EXACTLY_ONCE
    )

  }

 

LatestSimpleStringSchema.scala 类

import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.nio.charset.StandardCharsets;

public class LatestSimpleStringSchema implements KafkaSerializationSchema {

    private static final long serialVersionUID = 1221534846982366764L;

    private String topic;

    public LatestSimpleStringSchema(String topic) {
        super();
        this.topic = topic;
    }


    @Override
    public ProducerRecord serialize(String message, Long timestamp) {

        return new ProducerRecord<>(topic, message.getBytes(StandardCharsets.UTF_8));
    }
}

参考

https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-producer-partitioning-scheme

 

你可能感兴趣的:(Scala,Flink,flink,kafka)