导语
1.Producer、Replicas机制如何实现生产端的At-Least-Once语义?
2.Consumer、GroupCoordinator如何做到负载均衡以及可靠的offset管理?
3.Kafka 0.11.0.0版本是如何实现高效的Exactly-Once语义的?//TODO:
Kafka 0.10.0还未实现Exactly-Once
[1],consume虽然可以基于事务实现Exactly-Once,但是produce流程如果ack阶段丢失响应依然会造成日志重复,所以本版本的实现Exactly-Once的方式还是建议采用消费端做幂等来实现。
一.Index
1.1 Produce流程
org.apache.kafka.clients.producer.KafkaProducer.KafkaProducer
->KafkaProducer.KafkaProducer()
->KafkaProducer.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
->retryBackoffMs、METADATA_MAX_AGE_CONFIG、maxRequestSize、totalMemorySize、compressionType、maxBlockTimeMs、requestTimeoutMs、
->this.metadata.update(Cluster.bootstrap(addresses), time.milliseconds());
->org.apache.kafka.common.network.ChannelBuilders.create
->PlaintextChannelBuilder.configure //权限相关的暂时都跳过
->NetworkClient client = new NetworkClient(new Selector(..channelBuilder..);
->client.metadataUpdater=new DefaultMetadataUpdater(metadata)
->this.selector = selector;
->this.inFlightRequests = new InFlightRequests(maxInFlightRequestsPerConnection);
->this.connectionStates = new ClusterConnectionStates(reconnectBackoffMs)
->KafkaProducer.sender = new Sender(client,this.metadata,..)
->KafkaProducer.ioThread = new KafkaThread(ioThreadName, this.sender, true);
->org.apache.kafka.clients.producer.internals.Sender.run //Sender线程
->while (running)
->org.apache.kafka.clients.producer.internals.Sender.run
->RecordAccumulator.ReadyCheckResult result= Sender.accumulator.ready(cluster, now);//get the list of partitions with data ready to send
->if (result.unknownLeadersExist) this.metadata.requestUpdate() // if there are any partitions whose leaders are not known yet, force metadata update
->client.ready(node, now)//检查metadata是否为最新的、检查connection是否连接、Channel是否ready、inFlightRequests是否超过阈值,不合格的node移除
->Map> batches = Sender.accumulator.drain(cluster,result.readyNodes,this.maxRequestSize,now)
->guaranteeMessageOrder?this.accumulator.mutePartition(batch.topicPartition)//如果设置max.in.flight.requests.per.connection=1,则producer发送完全有序
->this.accumulator.mutePartition(batch.topicPartition);//设置RecordAccumulator.muted,这样drain的时候会检查下个循环放置当前in-flight TopicPartition的数据
->List expiredBatches = this.accumulator.abortExpiredBatches(this.requestTimeout, now); //对于确认超时失败的并不需要再重试的,回收内存,调用RecordBatch.done执行callback
->RecordBatch.maybeExpire
->RecordBatch.done(new TimeoutException())
->RecordBatch.thunks.callback.onCompletion(null, exception);
->produceFuture.done(topicPartition, baseOffset, exception)
->List requests = Sender.createProduceRequests(batches, now)
->Sender.produceRequest
->ProduceRequest request = new ProduceRequest(acks, timeout, produceRecordsByPartition);
->RequestSend send = new RequestSend(Integer.toString(destination),this.client.nextRequestHeader(ApiKeys.PRODUCE),request.toStruct());
->new ClientRequest(now, acks != 0, send, callback);
->NetworkClient.send(request, now); //Queue up the given request for sending
->NetworkClient.doSend
->NetworkClient.inFlightRequests.add(request)
->NetworkClient.Selector.send
->KafkaChannel.send= send
->this.transportLayer.addInterestOps(SelectionKey.OP_WRITE);
->Sender.NetworkClient.poll(pollTimeout, now); //Do actual reads and writes to sockets
->metadataUpdater.maybeUpdate(now) //判断是否需要更新metadata,如果有则插入一个MetadataRequest请求
->NetworkClient.selector.poll(Utils.min(timeout, metadataTimeout, requestTimeoutMs)) //Do I/O. Reads, writes, connection establishment
->Selector.pollSelectionKeys //NIO读写数据
->↓[Leader Broker]↓KafkaRequestHandler.run
->kafka.server.KafkaApis.handle
->kafka.server.KafkaApis.handleProducerRequest[TODO://]
->ReplicaManager.appendMessages
->ReplicaManager.appendToLocalLog
->Partition.appendMessagesToLeader
->if (inSyncSize < minIsr && requiredAcks == -1) {throw new NotEnoughReplicasException()}//如果ack要求isr的个数不满足最低配置,则抛出异常,不让写入
-> val info = log.append(messages, assignOffsets = true)
->val segment = maybeRoll(validMessages.sizeInBytes) // maybe roll the log if this segment is full
->segment.append(appendInfo.firstOffset, validMessages)// now append to the log
->updateLogEndOffset(appendInfo.lastOffset + 1) //更新log的EOF
->replicaManager.tryCompleteDelayedFetch(new TopicPartitionOperationKey(this.topic, this.partitionId))
->DelayedFetch.tryComplete
->DelayedFetch.onComplete
->ReplicaManager.readFromLocalLog
->if (readOnlyCommitted) localReplica.highWatermark.messageOffset //如果只读取被提交日志,则基于hw读取
->Partition.maybeIncrementLeaderHW(leaderReplica) //更新hw,使消费者能消费到被isr最新同步的消息
->if ReplicaManager.delayedRequestRequired //如果ack=all或是还有数据append等情况,则添加delayedRequest
->delayedProducePurgatory.tryCompleteElseWatch(delayedProduce, producerRequestKeys)
->DelayedProduce.tryComplete
->Partition.checkEnoughReplicasReachOffset //ack=all时检查是否所有isr已经追上本次日志的offset
-> if (leaderReplica.highWatermark.messageOffset >= requiredOffset ) && if (minIsr <= curInSyncReplicas.size) //如果hw满足了requiredOffset,且isr的数量大于要求的数量,则满足日志添加成功的条件
->sendResponseCallback
->Selector.addToCompletedReceives//Selector.stagedReceives->Selector.completedReceives
->Selector.maybeCloseOldestConnection//关闭空转的connect
->NetworkClient.handleCompletedSends //初始化List
->Selector.completedSends
->InFlightRequests.requests.get(node)
->NetworkClient.handleCompletedReceives //收到的response赋值List
->Selector.completedReceives、NetworkClient.source
->NetworkClient.parseResponse
->NetworkClient.MetadataUpdater.maybeHandleCompletedReceive(req, now, body)
->responses.add(new ClientResponse(req, now, false, body))
->NetworkClient.handleDisconnections //处理断开的连接
->NetworkClient.processDisconnection
->metadataUpdater.requestUpdate()
->NetworkClient.handleConnections //更新上次poll建立的连接
->NetworkClient.handleTimedOutRequests //处理超时链接
->this.inFlightRequests.getNodesWithTimedOutRequests(now, this.requestTimeoutMs)
->processDisconnection(responses, nodeId, now);
->metadataUpdater.requestUpdate();
->response.request().callback().onComplete(response) //处理回调
->Sender.handleProduceResponse
->KafkaProducer.keySerializer、KafkaProducer.valueSerializer
->KafkaProducer.interceptors
->KafkaProducer.send(ProducerRecord record, Callback callback)
->KafkaProducer.doSend()
->KafkaProducer.waitOnMetadata //设置Metadata.needUpdate=true,唤醒client,然后等待metadata更新,NetworkClient.poll会调用metadataUpdater.maybeUpdate(now)发送MetadataRequest
->KafkaProducer.metadata.add(topic)
->int version = metadata.requestUpdate()
->serializedKey = keySerializer.serialize(record.topic(), record.key());serializedValue = valueSerializer.serialize(record.topic(), record.value());
->KafkaProducer.partition(record, serializedKey, serializedValue, metadata.fetch());
->RecordAccumulator.append //数据存到RecordAccumulator.batches相应的TopicPartition的RecordBatch中
->RecordAccumulator.tryAppend
->RecordBatch.tryAppend->RecordBatch.records.append(offsetCounter++, timestamp, key, value);//add to MemoryRecords
->RecordBatch.thunks.add(new Thunk(callback, future));//保存回调函数
->//如果没有可用的,则初始化一个新的RecordBatch给当前TopicPartition
->BufferPool.allocate
->MemoryRecords records = MemoryRecords.emptyRecords(buffer, compression, this.batchSize);
->RecordBatch batch = new RecordBatch(tp, records, time.milliseconds());
->RecordBatch.tryAppend(timestamp, key, value, callback, time.milliseconds())
->dq.addLast(batch);
->if (result.batchIsFull || result.newBatchCreated) {this.sender.wakeup();} //Waking up the sender
->KafkaProducer.close
1.2 Consume流程
org.apache.kafka.clients.consumer.KafkaConsumer
->KafkaConsumer.KafkaConsumer()
->KafkaConsumer.metadata.update(Cluster.bootstrap(addresses), 0)
->NetworkClient netClient = new NetworkClient()
->KafkaConsumer.client = new ConsumerNetworkClient(netClient...)
->client = new ConsumerNetworkClient()
->this.subscriptions = new SubscriptionState(offsetResetStrategy);
->List assignors = config.getConfiguredInstances(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,PartitionAssignor.class);
->this.interceptors = interceptorList
->this.coordinator = new ConsumerCoordinator() //初始化当前consumer group的
->ConsumerCoordinator.metadata.requestUpdate();
->ConsumerCoordinator.addMetadataListener //metadata发生变化需要触发PartitionReAssignment
->onMetadataUpdate->ConsumerCoordinator.subscriptions.needsPartitionAssignment=true //check if there are any changes to the metadata which should trigger a rebalance
->if (autoCommitEnabled) //如果开启了AutoCommit
-> this.autoCommitTask = new AutoCommitTask(autoCommitIntervalMs);
->AutoCommitTask.run
->ConsumerCoordinator.commitOffsetsAsync
->ConsumerCoordinator.lookupCoordinator()//查找当前group的Coordinator
->AbstractCoordinator.sendGroupCoordinatorRequest //发送CoordinatorRequest
->Node node = this.client.leastLoadedNode();//最后一个node
->GroupCoordinatorRequest metadataRequest = new GroupCoordinatorRequest(this.groupId);
->AbstractCoordinator.client.send(node, ApiKeys.GROUP_COORDINATOR, metadataRequest) [Broker TODO://]
->AbstractCoordinator.handleGroupMetadataResponse
->AbstractCoordinator.coordinator=new Node(Integer.MAX_VALUE - groupCoordinatorResponse.node().id(), groupCoordinatorResponse.node().host(),AbstractCoordinator.groupCoordinatorResponse.node().port())//设置本
->ConsumerCoordinator.doCommitOffsetsAsync
->ConsumerCoordinator.sendOffsetCommitRequest(offsets);
->OffsetCommitRequest req = new OffsetCommitRequest()
->ConsumerNetworkClient.send
->ConsumerNetworkClient.unsent.put(node, nodeUnsent);
->OffsetCommitResponseHandler.handle //回调
->ConsumerCoordinator.subscriptions.committed(tp, offsetAndMetadata) // update the local cache only if the partition is still assigned
->ConsumerNetworkClient.pollNoWakeup//触发IO
->KafkaConsumer.subscribe
->KafkaConsumer.subscriptions.subscribe(topics, listener);
->SubscriptionState.subscribe
->setSubscriptionType(SubscriptionType.AUTO_TOPICS)]
->SubscriptionState.changeSubscription
->SubscriptionState.subscription.addAll(topicsToSubscribe);
->SubscriptionState.groupSubscription.addAll(topicsToSubscribe)
->SubscriptionState.needsPartitionAssignment = true;
->org.apache.kafka.clients.Metadata.topics.addAll(topics)
->KafkaConsumer.metadata.setTopics(subscriptions.groupSubscription());
->KafkaConsumer.poll
->KafkaConsumer.pollOnce// this does any needed heart-beating, auto-commits, and offset updates.
->AbstractCoordinator.ensureActiveGroup //Ensure that the group is active (i.e. joined and synced)
->AbstractCoordinator.ensureCoordinatorReady //可以基于groupid查到Coordinator
->[Broker]kafka.server.KafkaApis.handleGroupCoordinatorRequest //基于groupid hash然后判断__consumer_offsets topic相应replica的leader brokerID来确定的
->ConsumerCoordinator.onJoinPrepare
->ConsumerCoordinator.commitOffsetsSync //提交之前的offset
->ConsumerCoordinator.subscriptions.needsPartitionAssignment=true
->AbstractCoordinator.sendJoinGroupRequest
->JoinGroupRequest request = new JoinGroupRequest()
->ConsumerNetworkClient.send(coordinator, ApiKeys.JOIN_GROUP, request)
->↓[GroupCoordinator]↓kafka.server.KafkaApis.handleJoinGroupRequest // join group
->GroupCoordinator.handleJoinGroup
->if (group == null) group = groupManager.addGroup(new GroupMetadata(groupId, protocolType)) //如果没有group信息,new一个
->GroupCoordinator.doJoinGroup
->Case => memberId == JoinGroupRequest.UNKNOWN_MEMBER_ID)//如果memberId不存在,则初始化一个,并返回给客户端
->GroupCoordinator.addMemberAndRebalance
->val memberId = clientId + "-" + group.generateMemberIdSuffix
->val member = new MemberMetadata(memberId, group.groupId, clientId, clientHost, sessionTimeoutMs, protocols)
->group.add(member.memberId, member)
->GroupCoordinator.maybePrepareRebalance
->GroupCoordinator.prepareRebalance
-> group.transitionTo(PreparingRebalance)
->joinPurgatory.tryCompleteElseWatch(new DelayedJoin(this, group, rebalanceTimeout), Seq(groupKey)) //加入DelayedJoin监听,等待已知的member重新join或是超时
->group.notYetRejoinedMembers.isEmpty //已知的member都已经join
->GroupCoordinator.onCompleteJoin
->group.initNextGeneration() //标志一个周期,并转为AwaitingSync状态
->generationId += 1
->transitionTo(AwaitingSync)
->member.awaitingJoinCallback(joinResult) //向所有节点发送join response
->val responseBody = new JoinGroupResponse(joinResult.errorCode, joinResult.generationId, joinResult.subProtocol,joinResult.memberId, joinResult.leaderId, members)
->requestChannel.sendResponse(new RequestChannel.Response(request, new ResponseSend(request.connectionId, responseHeader, responseBody)))
->GroupCoordinator.completeAndScheduleNextHeartbeatExpiration //更新当前节点心跳
->ELSE =>
->GroupCoordinator.updateMemberAndRebalance
->GroupCoordinator.maybePrepareRebalance //同上
->↓[Consumer]↓JoinGroupResponseHandler.handle
->AbstractCoordinator.this.memberId = joinResponse.memberId()
->AbstractCoordinator.this.generation = joinResponse.generationId();
->if (joinResponse.isLeader())
->[Leader]AbstractCoordinator.onJoinLeader
->Map groupAssignment = ConsumerCoordinator.performAssignment
->ConsumerCoordinator.performAssignment //Consumer端进行Assignment
->PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
->......
->SyncGroupRequest request = new SyncGroupRequest(groupId, generation, memberId, groupAssignment);
->AbstractCoordinator.sendSyncGroupRequest
->client.send(coordinator, ApiKeys.SYNC_GROUP, request).compose(new SyncGroupResponseHandler())
->↓[GroupCoordinator]↓KafkaApis.handleSyncGroupRequest
->GroupCoordinator.handleSyncGroup
->GroupCoordinator.doSyncGroup
->completeAndScheduleNextHeartbeatExpiration(group, group.get(memberId)) //更新心跳
->if (memberId == group.leaderId)
->GroupMetadataManager.store
->GroupMetadataManager.prepareStoreGroup
->val groupMetadataMessageSet = Map(groupMetadataPartition ->new ByteBufferMessageSet(config.offsetsTopicCompressionCodec, message)) //分区信息写入__consumer_offsets分区
->replicaManager.appendMessages()
->DelayedStore(groupMetadataMessageSet, putCacheCallback)//等待replica确认写入
->GroupCoordinator.setAndPropagateAssignment(group, assignment) //写入成功则
->GroupCoordinator.propagateAssignment
->member.awaitingSyncCallback(member.assignment, errorCode) //写回consumer
->val responseBody = new SyncGroupResponse(errorCode, ByteBuffer.wrap(memberState))
->requestChannel.sendResponse(new Response(request, new ResponseSend(request.connectionId, responseHeader, responseBody)))
->completeAndScheduleNextHeartbeatExpiration(group, member)//保存心跳
->group.transitionTo(Stable) //切换为Stable状态
->↓[Consumer]↓SyncGroupResponseHandler.handle
->[Follower]AbstractCoordinator.onJoinFollower
->AbstractCoordinator.sendSyncGroupRequest//send follower's sync group with an empty assignment
->ConsumerCoordinator.onJoinComplete
->Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
->ConsumerCoordinator.SubscriptionState.assignFromSubscribed(assignment.partitions()); //更新SubscriptionState.assignment
->ConsumerRebalanceListener.onPartitionsAssigned(assigned);
->AbstractCoordinator.HeartbeatTask.reset
->AbstractCoordinator.HeartbeatTask.run //发送心跳线程
->AbstractCoordinator.sendHeartbeatRequest
->HeartbeatRequest req = new HeartbeatRequest(this.groupId, this.generation, this.memberId)
->client.send(coordinator, ApiKeys.HEARTBEAT, req)
->HeartbeatCompletionHandler.handle
->KafkaConsumer.updateFetchPositions //向leader发送请求,更新subscriptions中的offset
->KafkaConsumer.Fetcher.resetOffsetsIfNeeded(partitions); //如果重置offset,则向leader Replica请求hw的值
->Fetcher.resetOffset
->Fetcher.sendListOffsetRequest //请求topicPartition的leader
->Node node = info.leader();
->ListOffsetRequest request = new ListOffsetRequest(-1, partitions);
->client.send(node, ApiKeys.LIST_OFFSETS, request)
->↓[GroupCoordinator]↓KafkaApis.handleOffsetRequest
->val hw = localReplica.highWatermark.messageOffset
->Fetcher.handleListOffsetResponse
->Fetcher.subscriptions.seek(partition, offset);
->coordinator.refreshCommittedOffsetsIfNeeded();//更新之前的消费offset
->ConsumerCoordinator.fetchCommittedOffsets
->ConsumerCoordinator.sendOffsetFetchRequest
->OffsetFetchRequest request = new OffsetFetchRequest(this.groupId, new ArrayList<>(partitions));
->client.send(coordinator, ApiKeys.OFFSET_FETCH, request)
->↓[GroupCoordinator]↓KafkaApis.handleOffsetFetchRequest
->GroupCoordinator.handleFetchOffsets
->GroupMetadataManager.getOffsets
->GroupMetadataManager.offsetsCache.get(...)//直接从offsetcache中读取
->OffsetFetchResponseHandler.handle
->client.poll(future);
->ConsumerCoordinator.subscriptions.committed(tp, entry.getValue());//更新到ConsumerCoordinator的SubscriptionState中
->fetcher.updateFetchPositions(partitions);
->Fetcher.fetchedRecords//Return the fetched records, empty the record buffer and update the consumed position. 从Fetcher.completedFetches提取历史抓取的记录先返回
->Fetcher.sendFetches //如果历史的不能满足需求,马上发送Fetcher请求,抓取数据
->Fetcher.createFetchRequests //创建各个node的FetchRequest
->client.send(fetchEntry.getKey(), ApiKeys.FETCH, request)
->Fetcher.completedFetches.add(new CompletedFetch(partition, fetchOffset, fetchData, metricAggregator));//抓到的存入Fetcher.completedFetches
->client.poll(timeout, now);
->↓[Leader]↓KafkaApis.handleFetchRequest //同下Follower抓取日志,区别是基于hw作为读取的阈值
->Fetcher.fetchedRecords()//同上
->KafkaConsumer.Fetcher.sendFetches//继续发送下一轮抓取请求
->ConsumerNetworkClient.poll
->KafkaConsumer.commitSync
->ConsumerCoordinator.commitOffsetsSync
->ConsumerCoordinator.sendOffsetCommitRequest
->OffsetCommitRequest req = new OffsetCommitRequest();
->client.send(coordinator, ApiKeys.OFFSET_COMMIT, req)
->↓[GroupCoordinator]↓KafkaApis.handleOffsetCommitRequest
->GroupCoordinator.handleCommitOffsets //等待写入topic offset,更新缓存
->delayedOffsetStore = Some(groupManager.prepareStoreOffsets(groupId, memberId, generationId,offsetMetadata, responseCallback))
->replicaManager.appendMessages()
->GroupMetadataManager.offsetsCache.put(key, offsetAndMetadata) //update cache
->OffsetCommitResponseHandler.handle
->ConsumerCoordinator.subscriptions.committed(tp, offsetAndMetadata);// update the local cache
1.3 Replicas机制
↓[Controller]↓kafka.server.KafkaApis.createTopic
->kafka.admin.AdminUtils.createTopic
->val brokerMetadatas = getBrokerMetadatas(zkUtils, rackAwareMode)//查询机架信息
->AdminUtils.assignReplicasToBrokers(brokerMetadatas, partitions, replicationFactor)//分配partition的AR列表
->AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK //AR保存到zk中[NewPartition]
->[NewPartition -> OnlinePartition]kafka.controller.PartitionStateMachine.handleStateChange//触发NewPartition -> OnlinePartition
->PartitionStateMachine.initializeLeaderAndIsrForPartition(topicAndPartition) //initialize leader and isr path for new partition
->val leaderIsrAndControllerEpoch = new LeaderIsrAndControllerEpoch(new LeaderAndIsr(leader, liveAssignedReplicas.toList),controller.epoch) //AR列表中的第一个作为leader,其他作为isr成员
->zkUtils.createPersistentPath(getTopicPartitionLeaderAndIsrPath() //将PartitionLeaderAndIsr写入zk
->controllerContext.partitionLeadershipInfo.put(topicAndPartition, leaderIsrAndControllerEpoch) //controllerContext中保存leaderIsr信息
->ControllerBrokerRequestBatch.addLeaderAndIsrRequestForBrokers //向broker发送LeaderAndIsrRequest
->controller.sendRequest(broker, ApiKeys.LEADER_AND_ISR, None, leaderAndIsrRequest, null)
->↓[Broker]↓kafka.server.KafkaApis.handleLeaderAndIsrRequest //broker处理LeaderAndIsrRequest请求
->kafka.server.ReplicaManager.becomeLeaderOrFollower
->partitionLeaderEpoch < stateInfo.leaderEpoch//检查controller周期是否大于broker周期,防止接收过期的Controller请求
->↓[Leader]↓kafka.server.ReplicaManager.makeLeaders //基于leaderID是否为当前brokerID判断自己是不是leader
->replicaFetcherManager.removeFetcherForPartitions(partitionState.keySet.map(new TopicAndPartition(_))) //First stop fetchers for all the partitions
->kafka.cluster.Partition.makeLeader //设置remote replicas的LEO并设置leader and isr
->Partition.getOrCreateReplica
->Case Local //如果是leader Replica
->val log = logManager.createLog(TopicAndPartition(topic, partitionId), config) //关联本地Log对象,基于此分辨是本地副本还是远程副本
->val checkpoint = replicaManager.highWatermarkCheckpoints(log.dir.getParentFile.getAbsolutePath) //读取hw的Checkpoint存储
->val offset = offsetMap.getOrElse(TopicAndPartition(topic, partitionId), 0L).min(log.logEndOffset) //如果有hw则取出当前patition的hw
->val localReplica = new Replica(replicaId, this, time, offset, Some(log))
->Case remote
->val remoteReplica = new Replica(replicaId, this, time)
->Partition.assignedReplicaMap.putIfNotExists(replica.brokerId, replica)//更新
->isNewLeader
->leaderReplica.convertHWToLocalOffsetMetadata()
->Replica.highWatermarkMetadata = log.get.convertToOffsetMetadata(highWatermarkMetadata.messageOffset)//设置本地副本的hw
->kafka.cluster.Replica.updateLogReadResult //更新远程副本的LEO Metadata
->Partition.maybeIncrementLeaderHW //检查isr的LEO,更新hw
->kafka.cluster.Partition.tryCompleteDelayedRequests[TODO://检查DelayedFetch与DelayedProduce条件]
->replicaManager.tryCompleteDelayedFetch(requestKey)
->replicaManager.tryCompleteDelayedProduce(requestKey)
->↓[Follower]↓kafka.server.ReplicaManager.makeFollowers
->AbstractFetcherManager.removeFetcherForPartitions //首先停止之前的Fetcher
->AbstractFetcherThread.removePartitions
->kafka.log.LogManager.truncateTo(partitionsToMakeFollower.map(partition => (new TopicAndPartition(partition), partition.getOrCreateReplica().highWatermark.messageOffset)).toMap)//日志截断到hw位置
->tryCompleteDelayedProduce(topicPartitionOperationKey)
->tryCompleteDelayedFetch(topicPartitionOperationKey)
->kafka.server.AbstractFetcherManager.addFetcherForPartitions //启动Fetcher任务
->createFetcherThread(brokerAndFetcherId.fetcherId, brokerAndFetcherId.broker)
->ReplicaFetcherManager.createFetcherThread
->ReplicaFetcherThread.ReplicaFetcherThread()
->fetcherThreadMap.put(brokerAndFetcherId, fetcherThread)
->fetcherThread.start
->AbstractFetcherThread.doWork //开始从leader抓取数据
->new FetchRequest(new JFetchRequest(replicaId, maxWait, minBytes, requestMap.asJava))
->AbstractFetcherThread.processFetchRequest
->ReplicaFetcherThread.fetch
->ReplicaFetcherThread.sendRequest(ApiKeys.FETCH, Some(fetchRequestVersion), fetchRequest.underlying)
->↓[Leader]↓kafka.server.KafkaApis.handleFetchRequest //leader响应fetch请求
->kafka.server.ReplicaManager.fetchMessages
->ReplicaManager.readFromLocalLog// read from local logs
->kafka.server.ReplicaManager.updateFollowerLogReadResults //如果是Follower的fetch请求
->Partition.updateReplicaLogReadResult //更新replica的LEO Metadata
->Partition.maybeExpandIsr //如果replica的LEO达到了hw,则加入isr集合,即Partition.inSyncReplicas,并更新zk
->val newInSyncReplicas = inSyncReplicas + replica
->Partition.updateIsr
->Partition.maybeIncrementLeaderHW(leaderReplica)
->tryCompleteDelayedProduce(new TopicPartitionOperationKey(topicAndPartition)) [TODO://]
->val delayedFetch = new DelayedFetch(timeout, fetchMetadata, this, responseCallback) //如果条件不满足,则创建DelayedFetch等条件,否则直接返回数据
->delayedFetchPurgatory.tryCompleteElseWatch(delayedFetch, delayedFetchKeys)
->sendResponseCallback
->ReplicaFetcherThread.processPartitionData //process fetched data
->replica.log.get.append(messageSet, assignOffsets = false)
->ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers //Send UpdateMetadataRequest to the given brokers
->PartitionStateMachine.partitionState.put(topicAndPartition, OnlinePartition)//保存Partition状态Controller
二.reference
- http://www.jasongj.com/tags/Kafka/
- https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
- Apache Kafka源码剖析
- Kafka权威指南
三.下篇
APACHE HBASE 1.2.0 CODE REVIEW
-
0.11.0+版本提供支持 ↩