FlinkKafkaProducer采用EXACTLY_ONCE异常

使用Flink提供的 FlinkKafkaProducer 写入kafka数据的时候,相同的代码相同的环境下,Semantic.AT_LEAST_ONCE 策略一切正常,但是当切换到 Semantic.EXACTLY_ONCE 时,flink作业异常如下:

org.apache.kafka.common.KafkaException: Unexpected error in InitProducerIdResponse; The transaction timeout is larger than the maximum value allowed by the broker (as configured by transaction.max.timeout.ms).
	at org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:984)
	at org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:909)
	at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
	at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:557)
	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:549)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:288)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
	at java.lang.Thread.run(Thread.java:748)

根据flink官网增加事务超时时间

‘thus transaction.max.timeout.ms should be increased before using the
Semantic.EXACTLY_ONCE mode’

producerConfig.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "900000");

上述错误解决后,很快出现其他错误。
若使用的Producer版本是FlinkKafkaProducer011,会出现 Checkpoint 一直失败,但是无论是在JobManager或者TaskManager都看不到任何报错信息,整个作业仿佛处于暂停阶段。
于是改用 FlinkKafkaProducer, 不带版本号的Producer,并且将代码更新如下:

Properties producerConfig = new Properties();
        producerConfig.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_SINK_BOOTSTRAP);
        producerConfig.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, "900000");
        producerConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        producerConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        // The configuration 'transaction.state.log.min.isr' or 'transaction.state.log.replication.factor' was supplied but isn't a known config.
//        producerConfig.put("transaction.state.log.replication.factor", 1);
//        producerConfig.put("transaction.state.log.min.isr", 1);
        producerConfig.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"my-transaction");
        producerConfig.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");

        FlinkKafkaProducer kafkaProducer = new FlinkKafkaProducer(KAFKA_SINK_TOPIC_GENERAL, (KafkaSerializationSchema<String>) (element, timestamp) -> new ProducerRecord<>(KAFKA_SINK_TOPIC_GENERAL, element.getBytes()), producerConfig, FlinkKafkaProducer.Semantic.EXACTLY_ONCE);
        transferStream.addSink(kafkaProducer);

此时会有具体报错信息如下:

2020-08-12 09:32:27
org.apache.kafka.common.errors.TimeoutException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.

猜测是由于之前 ProducerConfig 不一致导致Kakfa Server端遗留事务锁住了,于是更换topic再次测试,报错如下:

2020-08-12 10:08:53.153 [ForkJoinPool.commonPool-worker-5] WARN  org.apache.kafka.clients.producer.ProducerConfig  ------ The configuration 'transaction.state.log.replication.factor' was supplied but isn't a known config.
2020-08-12 10:08:53.153 [ForkJoinPool.commonPool-worker-5] WARN  org.apache.kafka.clients.producer.ProducerConfig  ------ The configuration 'transaction.state.log.min.isr' was supplied but isn't a known config.

我很奇怪,按照kafka社区的建议是配置这些参数, 但是 FlinkKafkaProducer 又不支持,到这里我只能放弃了,好在当前公司业务对数据的一致性没有这么强烈,暂时先采用 Semantic.AT_LEAST_ONCE 策略,这个问题待后期解决。

参考:
https://stackoverflow.com/questions/54295588/kafka-streams-failed-to-rebalance-error
https://blog.csdn.net/querydata_boke/article/details/105393438

你可能感兴趣的:(flink)