auto.offset.reset 说明

Kafka的consumer是以pull的形式获取消息数据的,consumer提供两种版本,即high level 和low level API。

1 consumer和partition

1)如果consumer比partition多,是浪费,因为kafka的设计是在一个partition上是不允许并发的,所以consumer数不要大于partition数 
2)如果consumer比partition少,一个consumer会对应于多个partitions,这里主要合理分配consumer数和partition数,否则会导致partition里面的数据被取的不均匀,最好partiton数目是consumer数目的整数倍,所以partition数目很重要,比如取24,就很容易设定consumer数目 
3)如果consumer从多个partition读到数据,不保证数据间的顺序性,kafka只保证在一个partition上数据是有序的,但多个partition,根据你读的顺序会有不同 
4)增减consumer,broker,partition会导致rebalance,所以rebalance后consumer对应的partition会发生变化 
5)High-level接口中获取不到数据的时候是会block的。 
6)突然停止Consumer以及Broker会导致消息重复读的情况,为了避免这种情况在shutdown之前通过Thread.sleep(10000)让Consumer有时间将offset同步到zookeeper

2 关于auto.offset.reset的一些问题

如下所示为kafkaconsumer的测试代码:

<code class="hljs java has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">kafkaConsumer</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">extends</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">Thread</span>{</span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> String topic;  

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-title" style="box-sizing: border-box;">kafkaConsumer</span>(String topic){  
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">super</span>();  
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">this</span>.topic = topic;  
    }  


    <span class="hljs-annotation" style="color: rgb(155, 133, 157); box-sizing: border-box;">@Override</span>  
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">run</span>() {  
        ConsumerConnector consumer = createConsumer();  
        Map<String, Integer> topicCountMap = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> HashMap<String, Integer>();  
        topicCountMap.put(topic, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>); <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 一次从主题中获取一个数据  </span>
         Map<String, List<KafkaStream<<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">byte</span>[], <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">byte</span>[]>>>  messageStreams = consumer.createMessageStreams(topicCountMap);  
         KafkaStream<<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">byte</span>[], <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">byte</span>[]> stream = messageStreams.get(topic).get(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>);<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 获取每次接收到的这个数据  </span>
         ConsumerIterator<<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">byte</span>[], <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">byte</span>[]> iterator =  stream.iterator();  
         <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span>(iterator.hasNext()){  
             String message = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> String(iterator.next().message());  
             System.out.println(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"接收到: "</span> + message);  
         }  
    }  

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">private</span> ConsumerConnector <span class="hljs-title" style="box-sizing: border-box;">createConsumer</span>() {  
        Properties properties = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> Properties();  
        properties.put(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"zookeeper.connect"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"ip1:2181,ip2:2181,ip3:2181"</span>);<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//声明zk  </span>
        properties.put(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"group.id"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"group03"</span>);
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> Consumer.createJavaConsumerConnector(<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> ConsumerConfig(properties));  
     }  


    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">public</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">static</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> <span class="hljs-title" style="box-sizing: border-box;">main</span>(String[] args) {  
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> kafkaConsumer(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"user"</span>).start();<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 使用kafka集群中创建好的topic:user      </span>
    }  
}
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li></ul>

auto.offset.reset 默认值为largest,那么auto.offset.reset 有什么作用呢?auto.offset.reset定义了Consumer在ZooKeeper中发现没有初始的offset时或者发现offset非法时定义Comsumer的行为,常见的配置有:

  1. smallest : 自动把offset设为最小的offset;
  2. largest : 自动把offset设为最大的offset;
  3. anything else: 抛出异常;

遇到过这种情况:先produce一些数据,然后停止produce数据的线程——〉然后再用consumer上面的代码消费数据,发现无数据可消费

其原因在于:初始的offset默认是非法的,而auto.offset.reset 默认值为largest,表示自动把offset设为最大的offset,由于此时没有生产者向kafka push数据,当然没有数据可以消费了。如果此时有生产者向kafka push数据,那么该代码可以从最新位置消费数据。

如果在代码中增加如下配置:

<code class="hljs livecodeserver has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">properties.<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">put</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"auto.offset.reset"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"smallest"</span>); </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

那么在停止生产者线程之后,再启动消费者线程可以消费之前produce的数据。

3 high-level的Consumer工具

3.1 kafka.tools.ConsumerOffsetChecker 
使用如下命令查看当前group的offset情况

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"> ./kafka-run-class<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.sh</span> kafka<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tools</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.ConsumerOffsetChecker</span> --group group03</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

或者指定topic

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">./kafka-run-class<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.sh</span> kafka<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tools</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.ConsumerOffsetChecker</span> --topic user01 --group group03</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

如上图所示pid表示topic的partition号,上图中topic为user0的partiton数量为10;现在只启动生产者线程一段时间后,再次运行上面的命令,发现如下:

3.2 kafka.tools.UpdateOffsetsInZK

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">./kafka-run-class<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.sh</span> kafka<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tools</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.UpdateOffsetsInZK</span> earliest config/consumer<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.properties</span>  user01</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

该命令的三个参数 
[earliest | latest],表示将offset置到哪里 
consumer.properties ,这里是配置文件的路径 
topic,topic名,这里是user01

你可能感兴趣的:(并发,kafka,offset,Consumer)