在上一篇文章中Kafka编程示例-Producer实现了1.0版本的kafka 生产者的示例代码。相对于Producer而言,Consumer的代码实现会更复杂一些,因为这里面会涉及到启动多个线程去消费kafka。本文就尝试实现一个单consumer多线程消费多个分区的消费者示例代码
强烈建议大家先阅读下这篇博文:Kafka Consumer多线程实例,本文就是参考这篇文章而成。
源代码地址
<dependency>
<groupId>org.apache.kafkagroupId>
<artifactId>kafka_2.12artifactId>
<version>1.0.0version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeepergroupId>
<artifactId>zookeeperartifactId>
exclusion>
<exclusion>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-log4j12artifactId>
exclusion>
<exclusion>
<groupId>junitgroupId>
<artifactId>junitartifactId>
exclusion>
exclusions>
dependency>
这个pom会在你的项目工程中引入两个jar包:
org.apache.kafka:kafka-clients:1.0.0
org.apache.kafka:kafka_2.12:1.0.0
代码主要包含三个类:
ConsumerHandler10:业务层直接调用此类的execute方法对消息进行消费
ConsumerWorker:Runnalble接口的实现,ConsumerHandler10维护一个线程池,每个线程池启动一个worker,对消息进行消费
ConsumerCallback:消费者的callback函数,由不同的业务场景自己定义,对接收到的消息进行具体的处理。此类可以解耦消费消息和处理消息的逻辑
ConsumerHandler10类
package com.russell.bigdata.kafka.handler;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
/**
* @author liumenghao
* @Date 2019/3/2
*/
@Slf4j
public class ConsumerHandler10 {
private final KafkaConsumer<String, String> consumer;
private ExecutorService executors;
private ConsumerCallback consumerCallback;
/**
* 构造函数
*
* @param brokerList
* @param groupId
* @param topic
* @param callback
*/
public ConsumerHandler10(String brokerList, String groupId, String topic, ConsumerCallback callback) {
Properties props = createConsumerConfig(brokerList, groupId);
this.consumerCallback = callback;
consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topic));
}
/**
* 执行方法,开始消费
*
* @param workerNum 启动的线程数,一般对应topic的partition数量
*/
public void execute(int workerNum) {
executors = new ThreadPoolExecutor(workerNum, workerNum, 0L, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
while (true) {
ConsumerRecords<String, String> records = consumer.poll(200);
for (final ConsumerRecord<String, String> record : records) {
executors.submit(new ConsumerWorker(record, consumerCallback));
}
}
}
/**
* 用来关闭消费者,释放资源
*/
public void shutdown() {
if (consumer != null) {
consumer.close();
}
if (executors != null) {
executors.shutdown();
}
try {
if (!executors.awaitTermination(10, TimeUnit.SECONDS)) {
log.info("Timeout.... Ignore for this case");
}
} catch (InterruptedException ignored) {
log.error("Other thread interrupted this shutdown, ignore for this case.");
Thread.currentThread().interrupt();
}
}
/**
* consumer的配置类
*
* @param kafkaBroker
* @param groupId
* @return
*/
private Properties createConsumerConfig(String kafkaBroker, String groupId) {
Properties props = new Properties();
props.put("bootstrap.servers", kafkaBroker);
props.put("group.id", groupId);
// 自动提交偏移量
props.put("enable.auto.commit", true);
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
return props;
}
}
ConsumerWorker类
package com.russell.bigdata.kafka.handler;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
/**
* @author liumenghao
* @Date 2019/3/2
*/
@Slf4j
public class ConsumerWorker implements Runnable{
private ConsumerRecord<String, String> consumerRecord;
private ConsumerCallback callback;
public ConsumerWorker(ConsumerRecord consumerRecord, ConsumerCallback consumerCallback){
this.consumerRecord = consumerRecord;
this.callback = consumerCallback;
}
@Override
public void run() {
String topic = consumerRecord.topic();
String message = consumerRecord.value();
Integer partition = consumerRecord.partition();
Long offset = consumerRecord.offset();
log.info("线程名称:{}, topic名称:{}, partition名称:{}, offset:{}", Thread.currentThread().getName(),
topic, partition, offset);
callback.callback(topic, message);
}
}
ConsumerCallback类
package com.russell.bigdata.kafka.handler;
/**
* @author liumenghao
* @Date 2019/2/28
*/
public interface ConsumerCallback {
/**
* kafka消费数据后的回调函数 ==> 消息处理逻辑
*
* @param topic
* @param message
*/
void callback(String topic, String message);
}
业务层通过创建一个ConsumerHandler10实例,传入的参数有kafka的节点地址,消费组的groupId,需要消费的topic以及消息的处理逻辑ConsumerCallback方法。然后调用execute方法启动消费者程序,其中workerNum的值一般是使用该topic的partition数目,因为topic的一个分区只能被同个消费者组中的单个消费者消费,因此启动和分区数量的消费者线程可以最大限度的并发处理。
package com.russell.bigdata.kafka.example;
import com.russell.bigdata.kafka.common.KafkaTopicType;
import com.russell.bigdata.kafka.handler.ConsumerHandler10;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import static com.russell.bigdata.kafka.common.Constants.KAFKA_BROKER;
/**
* 测试使用,感受kafka java consumer api的使用方式
*
* @author liumenghao
* @Date 2019/2/22
*/
@Slf4j
@Data
public class ConsumerTest {
public static void main(String[] args) throws Exception {
String groupId = "kafka_example";
init(groupId);
}
public static void init(String groupId) {
String topic0 = KafkaTopicType.THREE_PARTITION_TOPIC.getName();
ConsumerHandler10 consumer = new ConsumerHandler10(KAFKA_BROKER, groupId, topic0,
(topic, message) -> doProcessMessage(topic, message));
// 测试使用的topic有三个分区
consumer.execute(3);
}
public static void doProcessMessage(String topic, String message) {
switch (topic) {
case "kafka_partitions_topic": {
log.info(message);
break;
}
default: {
log.info("topic 无效");
}
}
}
}
为了测试多个线程的作用,首先创建一个3个分区的topic,创建命令可以参考Kafka环境搭建,然后启动Kafka编程示例-Producer中的生产者代码向该topic中写入消息,然后启动本文的消费者测试代码对消息进行消费,测试结果如下:
测试生产数据 5
线程名称:pool-1-thread-3, topic名称:kafka_partitions_topic, partition名称:0, offset:386
测试生产数据 6
线程名称:pool-1-thread-2, topic名称:kafka_partitions_topic, partition名称:2, offset:387
测试生产数据 7
线程名称:pool-1-thread-1, topic名称:kafka_partitions_topic, partition名称:1, offset:387
测试生产数据 8
线程名称:pool-1-thread-3, topic名称:kafka_partitions_topic, partition名称:0, offset:387
测试生产数据 9
线程名称:pool-1-thread-2, topic名称:kafka_partitions_topic, partition名称:2, offset:388
测试生产数据 10
线程名称:pool-1-thread-1, topic名称:kafka_partitions_topic, partition名称:1, offset:388
可以看到,消费者确实启动了3个线程对kafka的消息进行消费。而且每个partition都只有一个对应的线程消费
本文只提供入门级的示例代码,至于其中的实现细节大家通过阅读源码想要了解应该不难。本文使用的自动提交offset的方式,当然kafka客户端也有手动提交代码的方式,大家可以自己研究下。
有任何问题,欢迎留言,大家一起讨论!