看了一下kafka,然后写了消费Kafka数据的代码。感觉自己功力还是不够。
- 不能随心所欲地操作数据,数据结构没学好,spark的RDD操作没学好。
- 不能很好地组织代码结构,设计模式没学好,面向对象思想理解不够成熟。
消费程序特点
- 用队列来存储要消费的数据。
- 用队列来存储要提交的offest,然后处理线程将其给回消费者提交。
- 每个分区开一个处理线程来处理数据,分区与处理器的映射放在map中。
- 当处理到一定的数量或者距离上一次处理一定的时间间隔后, 由poll线程进行提交offset。
不好的地方:
- 每次处理的数据太少,而且每个数据都进行判断其分区是否已经有处理线程在处理了。
- 获取topic不太优雅。
流程图
下面是多线程消费者实现:
1. 管理程序
/**
* 负责启动消费者线程MsgReceiver, 保存消费者线程MsgReceiver, 保存处理任务和线程RecordProcessor, 以及销毁这些线程
* Created by stillcoolme on 2018/10/12.
*/
public class KafkaMultiProcessorMain {
private static final Logger logger = LoggerFactory.getLogger(KafkaMultiProcessorMain.class);
// 消费者参数
private Properties consumerProps = new Properties();
// kafka消费者参数
Map consumerConfig;
//存放topic的配置
Map topicConfig;
//订阅的topic
private String alarmTopic;
//消费者线程数组
private Thread[] threads;
//保存处理任务和线程的map
ConcurrentHashMap recordProcessorTasks = new ConcurrentHashMap<>();
ConcurrentHashMap recordProcessorThreads = new ConcurrentHashMap<>();
public void setAlarmTopic(String alarmTopic) {
this.alarmTopic = alarmTopic;
}
public static void main(String[] args) {
KafkaMultiProcessorMain kafkaMultiProcessor = new KafkaMultiProcessorMain();
//这样设置topic不够优雅啊!!!
kafkaMultiProcessor.setAlarmTopic("picrecord");
kafkaMultiProcessor.init(null);
}
private void init(String consumerPropPath) {
getConsumerProps(consumerPropPath);
consumerConfig = getConsumerConfig();
int threadsNum = 3;
logger.info("create " + threadsNum + " threads to consume kafka warn msg");
threads = new Thread[threadsNum];
for (int i = 0; i < threadsNum; i++) {
MsgReceiver msgReceiver = new MsgReceiver(consumerConfig, alarmTopic, recordProcessorTasks, recordProcessorThreads);
Thread thread = new Thread(msgReceiver);
threads[i] = thread;
}
for (int i = 0; i < threadsNum; i++) {
threads[i].start();
}
logger.info("finish creating" + threadsNum + " threads to consume kafka warn msg");
}
//销毁启动的线程
public void destroy() {
closeRecordProcessThreads();
closeKafkaConsumer();
}
private void closeRecordProcessThreads() {
logger.debug("start to interrupt record process threads");
for (Map.Entry entry : recordProcessorThreads.entrySet()) {
Thread thread = entry.getValue();
thread.interrupt();
}
logger.debug("finish interrupting record process threads");
}
private void closeKafkaConsumer() {
logger.debug("start to interrupt kafka consumer threads");
//使用interrupt中断线程, 在线程的执行方法中已经设置了响应中断信号
for (int i = 0; i < threads.length; i++) {
threads[i].interrupt();
}
logger.debug("finish interrupting consumer threads");
}
private Map getConsumerConfig() {
return ImmutableMap.builder()
.put("bootstrap.servers", consumerProps.getProperty("bootstrap.servers"))
.put("group.id", "group.id")
.put("enable.auto.commit", "false")
.put("session.timeout.ms", "30000")
.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
.put("max.poll.records", 1000)
.build();
}
/**
* 获取消费者参数
*
* @param proPath
*/
private void getConsumerProps(String proPath) {
InputStream inStream = null;
try {
if (StringUtils.isNotEmpty(proPath)) {
inStream = new FileInputStream(proPath);
} else {
inStream = this.getClass().getClassLoader().getResourceAsStream("consumer.properties");
}
consumerProps.load(inStream);
} catch (IOException e) {
logger.error("读取consumer配置文件失败:" + e.getMessage(), e);
} finally {
if (null != inStream) {
try {
inStream.close();
} catch (IOException e) {
logger.error("读取consumer配置文件失败:" + e.getMessage(), e);
}
}
}
}
}
2. 消费者任务 MsgReceiver
/**
* 负责调用 RecordProcessor进行数据处理
* Created by zhangjianhua on 2018/10/12.
*/
public class MsgReceiver implements Runnable {
private static final Logger logger = LoggerFactory.getLogger(MsgReceiver.class);
private BlockingQueue
3. 消息处理任务 RecordProcessor
public class RecordProcessor implements Runnable{
private static Logger logger = LoggerFactory.getLogger(RecordProcessor.class);
//保存MsgReceiver线程发送过来的消息
private BlockingQueue> queue = new LinkedBlockingQueue<>();
//用于向consumer线程提交消费偏移的队列
private BlockingQueue
改进
- 对处理程序RecordProcessor进行抽象,抽象出BasePropessor父类。以后业务需求需要不同的处理程序RecordProcessor就可以灵活改变了。
- 反射来构建RecordProcessor??在配置文件配置具体要new的RecordProcessor类路径,然后在创建MsgReceiver的时候传递进去。
参考
- Kafka Consumer多线程实例 : 如这篇文章所说的维护了多个worker来做具体业务处理,这篇文章用的是ThreadPoolExecutor线程池。