kafka的ZookeeperConsumer数据获取的步骤如下:
入口ZookeeperConsumerConnector def consume[T](topicCountMap: scala.collection.Map[String,Int], decoder: Decoder[T])
: Map[String,List[KafkaStream[T]]] 方法
客户端启动后会在消费者注册目录上添加子节点变化的监听ZKRebalancerListener,ZKRebalancerListener实例会在内部创建一个线程,这个线程定时检查监听的事件有没有执行(消费者发生变化),如果没有变化则wait1秒钟,当发生了变化就调用 syncedRebalance 方法,去rebalance消费者。
while (!isShuttingDown.get) { try { lock.lock() try { if (!isWatcherTriggered) cond.await(1000, TimeUnit.MILLISECONDS) // wake up periodically so that it can check the shutdown flag } finally { doRebalance = isWatcherTriggered isWatcherTriggered = false lock.unlock() } if (doRebalance) syncedRebalance } catch { case t => error("error during syncedRebalance", t) }
syncedRebalance方法在内部会调用def rebalance(cluster: Cluster): Boolean方法,去执行操作。
这个方法的伪代码如下:
while (!isShuttingDown.get) { try { lock.lock() try { if (!isWatcherTriggered) cond.await(1000, TimeUnit.MILLISECONDS) // wake up periodically so that it can check the shutdown flag } finally { doRebalance = isWatcherTriggered isWatcherTriggered = false lock.unlock() } if (doRebalance) syncedRebalance } catch { case t => error("error during syncedRebalance", t) }
syncedRebalance方法在内部会调用def rebalance(cluster: Cluster): Boolean方法,去执行操作。
这个方法的伪代码如下:
// 关闭所有的数据获取者 closeFetchers // 解除分区的所有者 releasePartitionOwnership // 按规则得到当前消费者拥有的分区信息并保存到topicRegistry中 topicRegistry=getCurrentConsumerPartitionInfo // 修改并重启Fetchers updateFetchers
updateFetcher是这样实现的。
private def updateFetcher(cluster: Cluster) { // 遍历topicRegistry中保存的当前消费者的分区信息,修改Fetcher的partitions信息 var allPartitionInfos : List[PartitionTopicInfo] = Nil for (partitionInfos <- topicRegistry.values) for (partition <- partitionInfos.values) allPartitionInfos ::= partition info("Consumer " + consumerIdString + " selected partitions : " + allPartitionInfos.sortWith((s,t) => s.partition < t.partition).map(_.toString).mkString(",")) fetcher match { case Some(f) => // 调用fetcher的startConnections方法,初始化Fetcher并启动它 f.startConnections(allPartitionInfos, cluster) case None => } }
Fetcher在startConnections时,它先把topicInfo按brokerid去分组
for(info <- topicInfos) { m.get(info.brokerId) match { case None => m.put(info.brokerId, List(info)) case Some(lst) => m.put(info.brokerId, info :: lst) } }然后检查每组topicInfo对应的broker是否在当前集群中注册了
val brokers = ids.map { id => cluster.getBroker(id) match { case Some(broker) => broker case None => throw new IllegalStateException("Broker " + id + " is unavailable, fetchers could not be started") } }最后对每个broker创建一个FetcherRunnable线程,并启动它。这个线程负责从服务器上不断获取数据,把数据插入内部阻塞队列的操作。
// 对每个分区分别创建FetchRequest
val fetches = partitionTopicInfos.map(info => new FetchRequest(info.topic, info.partition.partId, info.getFetchOffset, config.fetchSize)) // 批量执行fetch操作 val response = simpleConsumer.multifetch(fetches : _*) .... // 遍历返回获取到的数据 for((messages, infopti) <- response.zip(partitionTopicInfos)) { try { var done = false // 当zk中存放的offset值不在kafka机器上存在时,比如consumer好久没有启动,相应的offset的数据已经在kafka集群中被过期删除清理掉了 if(messages.getErrorCode == ErrorMapping.OffsetOutOfRangeCode) { info("offset for " + infopti + " out of range") // see if we can fix this error val resetOffset = resetConsumerOffsets(infopti.topic, infopti.partition) if(resetOffset >= 0) { infopti.resetFetchOffset(resetOffset) infopti.resetConsumeOffset(resetOffset) done = true } } // 如果成功了,把消息放到队列中,实际上是把当前分区信息、当前获取到的消息、当前获取使用的fetchoffset封装FetchedDataChunk对象,放到分区消息对象的内部队列中(chunkQueue.put(new FetchedDataChunk(messages, this, fetchOffset)))。 if (!done) read += infopti.enqueue(messages, infopti.getFetchOffset) }客户端用ConsumerIterator不断的从分区信息的内部队列中取数据。ConsumerIterator实现了IteratorTemplate的接口,它的内部保存一个Iterator的属性current,每次调用makeNext时会检查它,如果有则从中取否则从队列中取。
protected def makeNext(): MessageAndMetadata[T] = { var currentDataChunk: FetchedDataChunk = null // if we don't have an iterator, get one,从内部变量中取数据 var localCurrent = current.get() if(localCurrent == null || !localCurrent.hasNext) { // 内部变量中取不到值,检查timeout的值 if (consumerTimeoutMs < 0) currentDataChunk = channel.take // 是负数(-1),则表示永不过期,如果接下来无新数据可取,客户端线程会在channel.take阻塞住 else { // 设置了过期时间,在没有新数据可用时,pool会在相应的时间返回,返回值为空,则说明没有取到新数据,抛出timeout的异常 currentDataChunk = channel.poll(consumerTimeoutMs, TimeUnit.MILLISECONDS) if (currentDataChunk == null) { // reset state to make the iterator re-iterable resetState() throw new ConsumerTimeoutException } } // kafka把shutdown的命令也做为一个datachunk放到队列中,用这种方法来保证消息的顺序性 if(currentDataChunk eq ZookeeperConsumerConnector.shutdownCommand) { debug("Received the shutdown command") channel.offer(currentDataChunk) return allDone } else { currentTopicInfo = currentDataChunk.topicInfo if (currentTopicInfo.getConsumeOffset != currentDataChunk.fetchOffset) { error("consumed offset: %d doesn't match fetch offset: %d for %s;\n Consumer may lose data" .format(currentTopicInfo.getConsumeOffset, currentDataChunk.fetchOffset, currentTopicInfo)) currentTopicInfo.resetConsumeOffset(currentDataChunk.fetchOffset) } // 把取出chunk中的消息转化为iterator localCurrent = if (enableShallowIterator) currentDataChunk.messages.shallowIterator else currentDataChunk.messages.iterator // 使用这个新的iterator初始化current,下次可直接从current中取数据 current.set(localCurrent) } } // 取出下一条数据,并用下一条数据的offset值设置consumedOffset val item = localCurrent.next() consumedOffset = item.offset // 解码消息,封装消息和它的topic信息到MessageAndMetadata对象,返回 new MessageAndMetadata(decoder.toEvent(item.message), currentTopicInfo.topic) }ConsumerIterator的next方法
override def next(): MessageAndMetadata[T] = { val item = super.next() if(consumedOffset < 0) throw new IllegalStateException("Offset returned by the message set is invalid %d".format(consumedOffset)) // 使用makeNext方法设置的consumedOffset,去修改topicInfo的消费offset currentTopicInfo.resetConsumeOffset(consumedOffset) val topic = currentTopicInfo.topic trace("Setting %s consumed offset to %d".format(topic, consumedOffset)) ConsumerTopicStat.getConsumerTopicStat(topic).recordMessagesPerTopic(1) ConsumerTopicStat.getConsumerAllTopicStat().recordMessagesPerTopic(1) // 返回makeNext得到的item item }KafkaStream对ConsumerIterator做了进一步的封装,我们调用stream的next方法就可以取到数据了(内部通过调用ConsumerIterator的next方法实现)
这篇文章转载自田加国:http://www.tianjiaguo.com/system-architecture/kafka/kafka的zookeeperconsumer实现/