kafka的ZookeeperConsumer数据获取的步骤如下:
入口ZookeeperConsumerConnector def consume[T](topicCountMap: scala.collection.Map[String,Int], decoder: Decoder[T])
: Map[String,List[KafkaStream[T]]] 方法
客户端启动后会在消费者注册目录上添加子节点变化的监听ZKRebalancerListener,ZKRebalancerListener实例会在内部创建一个线程,这个线程定时检查监听的事件有没有执行(消费者发生变化),如果没有变化则wait1秒钟,当发生了变化就调用 syncedRebalance 方法,去rebalance消费者。
while (!isShuttingDown.get) {
try {
lock.lock()
try {
if (!isWatcherTriggered)
cond.await(1000, TimeUnit.MILLISECONDS) // wake up periodically so that it can check the shutdown flag
} finally {
doRebalance = isWatcherTriggered
isWatcherTriggered = false
lock.unlock()
}
if (doRebalance)
syncedRebalance
} catch {
case t => error("error during syncedRebalance", t)
}
syncedRebalance方法在内部会调用def rebalance(cluster: Cluster): Boolean方法,去执行操作。
这个方法的伪代码如下:
while (!isShuttingDown.get) {
try {
lock.lock()
try {
if (!isWatcherTriggered)
cond.await(1000, TimeUnit.MILLISECONDS) // wake up periodically so that it can check the shutdown flag
} finally {
doRebalance = isWatcherTriggered
isWatcherTriggered = false
lock.unlock()
}
if (doRebalance)
syncedRebalance
} catch {
case t => error("error during syncedRebalance", t)
}
syncedRebalance方法在内部会调用def rebalance(cluster: Cluster): Boolean方法,去执行操作。
这个方法的伪代码如下:
// 关闭所有的数据获取者
closeFetchers
// 解除分区的所有者
releasePartitionOwnership
// 按规则得到当前消费者拥有的分区信息并保存到topicRegistry中
topicRegistry=getCurrentConsumerPartitionInfo
// 修改并重启Fetchers
updateFetchers
updateFetcher是这样实现的。
private def updateFetcher(cluster: Cluster) {
// 遍历topicRegistry中保存的当前消费者的分区信息,修改Fetcher的partitions信息
var allPartitionInfos : List[PartitionTopicInfo] = Nil
for (partitionInfos <- topicRegistry.values)
for (partition <- partitionInfos.values)
allPartitionInfos ::= partition
info("Consumer " + consumerIdString + " selected partitions : " +
allPartitionInfos.sortWith((s,t) => s.partition < t.partition).map(_.toString).mkString(","))
fetcher match {
case Some(f) =>
// 调用fetcher的startConnections方法,初始化Fetcher并启动它
f.startConnections(allPartitionInfos, cluster)
case None =>
}
}
Fetcher在startConnections时,它先把topicInfo按brokerid去分组
for(info <- topicInfos) {
m.get(info.brokerId) match {
case None => m.put(info.brokerId, List(info))
case Some(lst) => m.put(info.brokerId, info :: lst)
}
}
然后检查每组topicInfo对应的broker是否在当前集群中注册了
val brokers = ids.map { id =>
cluster.getBroker(id) match {
case Some(broker) => broker
case None => throw new IllegalStateException("Broker " + id + " is unavailable, fetchers could not be started")
}
}
最后对每个broker创建一个FetcherRunnable线程,并启动它。这个线程负责从服务器上不断获取数据,把数据插入内部阻塞队列的操作。
// 对每个分区分别创建FetchRequest
val fetches = partitionTopicInfos.map(info =>
new FetchRequest(info.topic, info.partition.partId, info.getFetchOffset, config.fetchSize))
// 批量执行fetch操作
val response = simpleConsumer.multifetch(fetches : _*)
....
// 遍历返回获取到的数据
for((messages, infopti) <- response.zip(partitionTopicInfos)) {
try {
var done = false
// 当zk中存放的offset值不在kafka机器上存在时,比如consumer好久没有启动,相应的offset的数据已经在kafka集群中被过期删除清理掉了
if(messages.getErrorCode == ErrorMapping.OffsetOutOfRangeCode) {
info("offset for " + infopti + " out of range")
// see if we can fix this error
val resetOffset = resetConsumerOffsets(infopti.topic, infopti.partition)
if(resetOffset >= 0) {
infopti.resetFetchOffset(resetOffset)
infopti.resetConsumeOffset(resetOffset)
done = true
}
}
// 如果成功了,把消息放到队列中,实际上是把当前分区信息、当前获取到的消息、当前获取使用的fetchoffset封装FetchedDataChunk对象,放到分区消息对象的内部队列中(chunkQueue.put(new FetchedDataChunk(messages, this, fetchOffset)))。
if (!done)
read += infopti.enqueue(messages, infopti.getFetchOffset)
}
客户端用ConsumerIterator不断的从分区信息的内部队列中取数据。ConsumerIterator实现了IteratorTemplate的接口,它的内部保存一个Iterator的属性current,每次调用makeNext时会检查它,如果有则从中取否则从队列中取。
protected def makeNext(): MessageAndMetadata[T] = {
var currentDataChunk: FetchedDataChunk = null
// if we don't have an iterator, get one,从内部变量中取数据
var localCurrent = current.get()
if(localCurrent == null || !localCurrent.hasNext) {
// 内部变量中取不到值,检查timeout的值
if (consumerTimeoutMs < 0)
currentDataChunk = channel.take // 是负数(-1),则表示永不过期,如果接下来无新数据可取,客户端线程会在channel.take阻塞住
else {
// 设置了过期时间,在没有新数据可用时,pool会在相应的时间返回,返回值为空,则说明没有取到新数据,抛出timeout的异常
currentDataChunk = channel.poll(consumerTimeoutMs, TimeUnit.MILLISECONDS)
if (currentDataChunk == null) {
// reset state to make the iterator re-iterable
resetState()
throw new ConsumerTimeoutException
}
}
// kafka把shutdown的命令也做为一个datachunk放到队列中,用这种方法来保证消息的顺序性
if(currentDataChunk eq ZookeeperConsumerConnector.shutdownCommand) {
debug("Received the shutdown command")
channel.offer(currentDataChunk)
return allDone
} else {
currentTopicInfo = currentDataChunk.topicInfo
if (currentTopicInfo.getConsumeOffset != currentDataChunk.fetchOffset) {
error("consumed offset: %d doesn't match fetch offset: %d for %s;\n Consumer may lose data"
.format(currentTopicInfo.getConsumeOffset, currentDataChunk.fetchOffset, currentTopicInfo))
currentTopicInfo.resetConsumeOffset(currentDataChunk.fetchOffset)
}
// 把取出chunk中的消息转化为iterator
localCurrent = if (enableShallowIterator) currentDataChunk.messages.shallowIterator
else currentDataChunk.messages.iterator
// 使用这个新的iterator初始化current,下次可直接从current中取数据
current.set(localCurrent)
}
}
// 取出下一条数据,并用下一条数据的offset值设置consumedOffset
val item = localCurrent.next()
consumedOffset = item.offset
// 解码消息,封装消息和它的topic信息到MessageAndMetadata对象,返回
new MessageAndMetadata(decoder.toEvent(item.message), currentTopicInfo.topic)
}
ConsumerIterator的next方法
override def next(): MessageAndMetadata[T] = {
val item = super.next()
if(consumedOffset < 0)
throw new IllegalStateException("Offset returned by the message set is invalid %d".format(consumedOffset))
// 使用makeNext方法设置的consumedOffset,去修改topicInfo的消费offset
currentTopicInfo.resetConsumeOffset(consumedOffset)
val topic = currentTopicInfo.topic
trace("Setting %s consumed offset to %d".format(topic, consumedOffset))
ConsumerTopicStat.getConsumerTopicStat(topic).recordMessagesPerTopic(1)
ConsumerTopicStat.getConsumerAllTopicStat().recordMessagesPerTopic(1)
// 返回makeNext得到的item
item
}
KafkaStream对ConsumerIterator做了进一步的封装,我们调用stream的next方法就可以取到数据了(内部通过调用ConsumerIterator的next方法实现)
这篇文章转载自田加国:http://www.tianjiaguo.com/system-architecture/kafka/kafka的zookeeperconsumer实现/