1、实现 exactly-once 语义的 kafka sink,用 Robin 的方式写入 Kafka
2、randomRobin: 创建 FlinkKafkaProducer 时,指定空的 customPartitioner,flink 会把 一个 sink subtask 的数据以 round-robin 方式写入 kafka 的各个分区
3、注意:使用此方法可以不用设置 sink 的并行度
/**
*
*
* @param brokerList kafka broker
* @param topic topics
* @param compressionType compression Type
* @return FlinkKafkaProducer
*/
def producerToKafkaRobin(
brokerList: String,
maxMessageSize: Int,
topic: String,
compressionType: String
): FlinkKafkaProducer[String] = {
val producerProperties = new Properties()
// 增大输出数据量限制
producerProperties.setProperty(
ProducerConfig.MAX_REQUEST_SIZE_CONFIG,
maxMessageSize * 1048576 + ""
)
// 启用压缩
producerProperties.setProperty(
ProducerConfig.COMPRESSION_TYPE_CONFIG,
compressionType
)
// 配置 bootstrap.servers
producerProperties.setProperty(
ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
brokerList
)
// 不能超过 12 分钟
producerProperties.setProperty(
"transaction.timeout.ms",
1000 * 60 * 12 + ""
)
// produce ack =-1 ,保证不丢
producerProperties.setProperty(ProducerConfig.ACKS_CONFIG, -1 + "")
// 开启 exactly-once 时必须设置幂等
producerProperties.setProperty(
"enable.idempotence",
"true"
)
// 设置了retries参数,可以在Kafka的Partition发生leader切换时,Flink不重启,而是做5次尝试:
producerProperties.setProperty(ProducerConfig.RETRIES_CONFIG, "5")
// 开启 RETREIS 时,可能导致消息乱序,如果要求消息严格有序,配置 MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION 为 1
producerProperties.setProperty(
ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
"1"
)
new FlinkKafkaProducer[String](
topic,
new KeyedSerializationSchemaWrapper[String](new SimpleStringSchema()),
producerProperties,
Optional.empty(),
FlinkKafkaProducer.Semantic.EXACTLY_ONCE,
FlinkKafkaProducer.DEFAULT_KAFKA_PRODUCERS_POOL_SIZE
)
}
1、实现 exactly-once 语义的 kafka sink
2、fixedPartition: 一个 kafka partition 对应一个 flinkkafkaproducer。
配置该种方式时,flink kafka producer 并行度应该不小于写入的 kafka topic 分区数,否则会导致有些分区没有数据
3、注意:使用此方法需要合理设置 sink 的并行度,不能超过 topic 的分区数量 ,sink并发度 >= partition分区数
/**
*
* @param brokerList kafka broker
* @param topic topics
* @return FlinkKafkaProducer
*/
def producerToKafkaFixed(
brokerList: String,
maxMessageSize: Int,
topic: String,
compressionType: String
): FlinkKafkaProducer[String] = {
val producerProperties = new Properties()
// 增大输出数据量限制
producerProperties.setProperty(
ProducerConfig.MAX_REQUEST_SIZE_CONFIG,
maxMessageSize * 1048576 + ""
)
// 启用压缩
producerProperties.setProperty(
ProducerConfig.COMPRESSION_TYPE_CONFIG,
compressionType
)
// 配置 bootstrap.servers
producerProperties.setProperty(
ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
brokerList
)
// 不能超过 12 分钟
producerProperties.setProperty(
"transaction.timeout.ms",
1000 * 60 * 12 + ""
)
// produce ack =-1 ,保证不丢
producerProperties.setProperty(ProducerConfig.ACKS_CONFIG, -1 + "")
// 开启 exactly-once 时必须设置幂等
producerProperties.setProperty(
"enable.idempotence",
"true"
)
producerProperties.setProperty(
ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
"1"
)
new FlinkKafkaProducer(
topic,
new LatestSimpleStringSchema(topic),
producerProperties,
FlinkKafkaProducer.Semantic.EXACTLY_ONCE
)
}
LatestSimpleStringSchema.scala 类
import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.nio.charset.StandardCharsets;
public class LatestSimpleStringSchema implements KafkaSerializationSchema {
private static final long serialVersionUID = 1221534846982366764L;
private String topic;
public LatestSimpleStringSchema(String topic) {
super();
this.topic = topic;
}
@Override
public ProducerRecord serialize(String message, Long timestamp) {
return new ProducerRecord<>(topic, message.getBytes(StandardCharsets.UTF_8));
}
}
参考
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-producer-partitioning-scheme