[Java大数据入门]Kafka编程示例-Consumer

在上一篇文章中Kafka编程示例-Producer实现了1.0版本的kafka 生产者的示例代码。相对于Producer而言,Consumer的代码实现会更复杂一些,因为这里面会涉及到启动多个线程去消费kafka。本文就尝试实现一个单consumer多线程消费多个分区的消费者示例代码

强烈建议大家先阅读下这篇博文:Kafka Consumer多线程实例,本文就是参考这篇文章而成。

源代码地址

一、引入pom依赖

<dependency>
    <groupId>org.apache.kafkagroupId>
    <artifactId>kafka_2.12artifactId>
    <version>1.0.0version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.zookeepergroupId>
            <artifactId>zookeeperartifactId>
        exclusion>
        <exclusion>
            <groupId>org.slf4jgroupId>
            <artifactId>slf4j-log4j12artifactId>
        exclusion>
        <exclusion>
            <groupId>junitgroupId>
            <artifactId>junitartifactId>
        exclusion>
    exclusions>
dependency>

这个pom会在你的项目工程中引入两个jar包:

  • org.apache.kafka:kafka-clients:1.0.0

  • org.apache.kafka:kafka_2.12:1.0.0

二、编写示例代码

代码主要包含三个类:

  • ConsumerHandler10:业务层直接调用此类的execute方法对消息进行消费

  • ConsumerWorker:Runnalble接口的实现,ConsumerHandler10维护一个线程池,每个线程池启动一个worker,对消息进行消费

  • ConsumerCallback:消费者的callback函数,由不同的业务场景自己定义,对接收到的消息进行具体的处理。此类可以解耦消费消息和处理消息的逻辑

ConsumerHandler10类

package com.russell.bigdata.kafka.handler;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * @author liumenghao
 * @Date 2019/3/2
 */
@Slf4j
public class ConsumerHandler10 {

    private final KafkaConsumer<String, String> consumer;

    private ExecutorService executors;

    private ConsumerCallback consumerCallback;

    /**
     * 构造函数
     *
     * @param brokerList
     * @param groupId
     * @param topic
     * @param callback
     */
    public ConsumerHandler10(String brokerList, String groupId, String topic, ConsumerCallback callback) {
        Properties props = createConsumerConfig(brokerList, groupId);
        this.consumerCallback = callback;
        consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topic));
    }

    /**
     * 执行方法,开始消费
     *
     * @param workerNum 启动的线程数,一般对应topic的partition数量
     */
    public void execute(int workerNum) {
        executors = new ThreadPoolExecutor(workerNum, workerNum, 0L, TimeUnit.MILLISECONDS,
                new ArrayBlockingQueue<>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(200);
            for (final ConsumerRecord<String, String> record : records) {
                executors.submit(new ConsumerWorker(record, consumerCallback));
            }

        }
    }

    /**
     * 用来关闭消费者,释放资源
     */
    public void shutdown() {
        if (consumer != null) {
            consumer.close();
        }
        if (executors != null) {
            executors.shutdown();
        }
        try {
            if (!executors.awaitTermination(10, TimeUnit.SECONDS)) {
                log.info("Timeout.... Ignore for this case");
            }
        } catch (InterruptedException ignored) {
            log.error("Other thread interrupted this shutdown, ignore for this case.");
            Thread.currentThread().interrupt();
        }
    }

    /**
     * consumer的配置类
     *
     * @param kafkaBroker
     * @param groupId
     * @return
     */
    private Properties createConsumerConfig(String kafkaBroker, String groupId) {
        Properties props = new Properties();
        props.put("bootstrap.servers", kafkaBroker);
        props.put("group.id", groupId);

        // 自动提交偏移量
        props.put("enable.auto.commit", true);
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        return props;
    }
}

ConsumerWorker类

package com.russell.bigdata.kafka.handler;

import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;

/**
 * @author liumenghao
 * @Date 2019/3/2
 */
@Slf4j
public class ConsumerWorker implements Runnable{

    private ConsumerRecord<String, String> consumerRecord;

    private ConsumerCallback callback;

    public ConsumerWorker(ConsumerRecord consumerRecord, ConsumerCallback consumerCallback){
        this.consumerRecord = consumerRecord;
        this.callback = consumerCallback;
    }


    @Override
    public void run() {
        String topic = consumerRecord.topic();
        String message = consumerRecord.value();
        Integer partition = consumerRecord.partition();
        Long offset = consumerRecord.offset();
        log.info("线程名称:{}, topic名称:{}, partition名称:{}, offset:{}", Thread.currentThread().getName(),
                topic, partition, offset);
        callback.callback(topic, message);
    }
}

ConsumerCallback类

package com.russell.bigdata.kafka.handler;

/**
 * @author liumenghao
 * @Date 2019/2/28
 */
public interface ConsumerCallback {

    /**
     * kafka消费数据后的回调函数 ==> 消息处理逻辑
     *
     * @param topic
     * @param message
     */
    void callback(String topic, String message);
}

业务层通过创建一个ConsumerHandler10实例,传入的参数有kafka的节点地址,消费组的groupId,需要消费的topic以及消息的处理逻辑ConsumerCallback方法。然后调用execute方法启动消费者程序,其中workerNum的值一般是使用该topic的partition数目,因为topic的一个分区只能被同个消费者组中的单个消费者消费,因此启动和分区数量的消费者线程可以最大限度的并发处理。

三、代码测试

package com.russell.bigdata.kafka.example;

import com.russell.bigdata.kafka.common.KafkaTopicType;
import com.russell.bigdata.kafka.handler.ConsumerHandler10;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;

import static com.russell.bigdata.kafka.common.Constants.KAFKA_BROKER;

/**
 * 测试使用,感受kafka java consumer api的使用方式
 *
 * @author liumenghao
 * @Date 2019/2/22
 */
@Slf4j
@Data
public class ConsumerTest {

    public static void main(String[] args) throws Exception {
        String groupId = "kafka_example";
        init(groupId);
    }

    public static void init(String groupId) {
        String topic0 = KafkaTopicType.THREE_PARTITION_TOPIC.getName();
        ConsumerHandler10 consumer = new ConsumerHandler10(KAFKA_BROKER, groupId, topic0,
                (topic, message) -> doProcessMessage(topic, message));
        // 测试使用的topic有三个分区
        consumer.execute(3);
    }

    public static void doProcessMessage(String topic, String message) {
        switch (topic) {
            case "kafka_partitions_topic": {
                log.info(message);
                break;
            }
            default: {
                log.info("topic 无效");
            }
        }
    }
}

为了测试多个线程的作用,首先创建一个3个分区的topic,创建命令可以参考Kafka环境搭建,然后启动Kafka编程示例-Producer中的生产者代码向该topic中写入消息,然后启动本文的消费者测试代码对消息进行消费,测试结果如下:

测试生产数据 5
线程名称:pool-1-thread-3, topic名称:kafka_partitions_topic, partition名称:0, offset:386
测试生产数据 6
线程名称:pool-1-thread-2, topic名称:kafka_partitions_topic, partition名称:2, offset:387
测试生产数据 7
线程名称:pool-1-thread-1, topic名称:kafka_partitions_topic, partition名称:1, offset:387
测试生产数据 8
线程名称:pool-1-thread-3, topic名称:kafka_partitions_topic, partition名称:0, offset:387
测试生产数据 9
线程名称:pool-1-thread-2, topic名称:kafka_partitions_topic, partition名称:2, offset:388
测试生产数据 10
线程名称:pool-1-thread-1, topic名称:kafka_partitions_topic, partition名称:1, offset:388

可以看到,消费者确实启动了3个线程对kafka的消息进行消费。而且每个partition都只有一个对应的线程消费

四、总结

本文只提供入门级的示例代码,至于其中的实现细节大家通过阅读源码想要了解应该不难。本文使用的自动提交offset的方式,当然kafka客户端也有手动提交代码的方式,大家可以自己研究下。

有任何问题,欢迎留言,大家一起讨论!

你可能感兴趣的:(Java大数据入门)