kafka--指定时间戳消费

kafka 在 0.10.1.1 版本增加了时间索引文件,因此我们可以根据时间戳来访问消息。

时间戳可否自定义, 如果自定义时间戳写入先后错乱, 索引怎么弄????

具体原理

kafka--指定时间戳消费_第1张图片

具体使用

如以下需求:从半个小时之前的offset处开始消费消息,代码示例如下:

package com.bonc.rdpe.kafka110.consumer;

import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;

public class TimestampConsumer {
        
    public static void main(String[] args) {
        
        Properties props = new Properties();
        props.put("bootstrap.servers", "rdpecore4:9092,rdpecore5:9092,rdpecore6:9092");
        props.put("group.id", "dev3-yangyunhe-topic001-group001");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer consumer = new KafkaConsumer<>(props);
        String topic = "dev3-yangyunhe-topic001";
        
        try {
            // 获取topic的partition信息
            List partitionInfos = consumer.partitionsFor(topic);
            List topicPartitions = new ArrayList<>();
            
            Map timestampsToSearch = new HashMap<>();
            DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            Date now = new Date();
            long nowTime = now.getTime();
            System.out.println("当前时间: " + df.format(now));
            long fetchDataTime = nowTime - 1000 * 60 * 30;  // 计算30分钟之前的时间戳
            
            for(PartitionInfo partitionInfo : partitionInfos) {
                topicPartitions.add(new TopicPartition(partitionInfo.topic(), partitionInfo.partition()));
                timestampsToSearch.put(new TopicPartition(partitionInfo.topic(), partitionInfo.partition()), fetchDataTime);
            }
            
            consumer.assign(topicPartitions);
            
            // 获取每个partition一个小时之前的偏移量
            Map map = consumer.offsetsForTimes(timestampsToSearch);
            
            OffsetAndTimestamp offsetTimestamp = null;
            System.out.println("开始设置各分区初始偏移量...");
            for(Map.Entry entry : map.entrySet()) {
                // 如果设置的查询偏移量的时间点大于最大的索引记录时间,那么value就为空
                offsetTimestamp = entry.getValue();
                if(offsetTimestamp != null) {
                    int partition = entry.getKey().partition();
                    long timestamp = offsetTimestamp.timestamp();
                    long offset = offsetTimestamp.offset();
                    System.out.println("partition = " + partition + 
                            ", time = " + df.format(new Date(timestamp))+ 
                            ", offset = " + offset);
                    // 设置读取消息的偏移量
                    consumer.seek(entry.getKey(), offset);
                }
            }
            System.out.println("设置各分区初始偏移量结束...");
            
            while(true) {
                ConsumerRecords records = consumer.poll(1000);
                for (ConsumerRecord record : records) {
                    System.out.println("partition = " + record.partition() + ", offset = " + record.offset());
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            consumer.close();
        }
    }
}

运行结果:
当前时间: 2018-07-16 10:15:09
开始设置各分区初始偏移量...
partition = 2, time = 2018-07-16 09:45:10, offset = 727
partition = 0, time = 2018-07-16 09:45:09, offset = 727
partition = 1, time = 2018-07-16 09:45:10, offset = 727
设置各分区初始偏移量结束...
partition = 1, offset = 727
partition = 1, offset = 728
partition = 1, offset = 729
......
partition = 2, offset = 727
partition = 2, offset = 728
partition = 2, offset = 729
......
partition = 0, offset = 727
partition = 0, offset = 728
partition = 0, offset = 729
......

说明:

1.基于时间戳查询消息,consumer 订阅 topic 的方式必须是 Assign

2.根据时间戳查找offset

3.seek(offset)

4.consumer.poll (如果想到某个时间戳结束, 则可在poll后实时查询时间戳判断是否要结束消费)

5.如果多个partion订阅时间戳???

 

你可能感兴趣的:(kafka)