我们先来看一下哪些情况会发生Rebalance操作:
1、有新的消费者加入ConsumerGroup
2、有消费者宕机下线。
3、有消费者主动退出Consumer Group
4、Consumer Group订阅的任一Topic出现分区数量的变化
5、消费者调用unsubscribe取消对某Topic的订阅
下面对Reblance操作的具体实现进行分析
第一阶段
Rebalance操作的第一步就是查找GroupCoordinator,这个阶段消费者会向Kafka集群中的任意一个Broker发送GroupCoordinatorRequest请求,并处理返回的GroupCoordinatorResponse,
1、首先检测是否需要重新查找GroupCoordinator,主要是检查coordinator字段是否为空以及与GroupCoordinator之间的连接是否正常。
public boolean coordinatorUnknown() {
//检查coordinator字段是否为空
if (coordinator == null)
return true;
//检测连接是否正常
if (client.connectionFailed(coordinator)) {
//将unsent集合中对应的请求清空并将coordinator字段设置为null
coordinatorDead();
return true;
}
return false;
}
2、查找集群负载最低的Node节点,并创建GroupCoordinatorRequest请求。调用client.send方法将请求放入unsent队列中等待发送,并返回RequestFuture对象。
3、调用ConsumerNetworkClient.pool方法,将GroupCoordinatorRequest请求发送出去。
4、检测检查RequestFuture对象的状态。如果出现RetriableException异常,则调用ConsuemerNetworkClient.awaitMetadataUpdate()方法阻塞更新Metadata中的记录的集群元数据后跳转到步骤一
5、如果成功找到GroupCoordinator节点,但是网络连接失败,则将其unsent中对应的请求清空,并将coordinator字段置为1,退避一段时间后跳转到步骤1执行
下面我们就进入源码看一下这个过程:
public void ensureCoordinatorReady() {
//把是否需要查找GroupCoordinator作为循环条件
while (coordinatorUnknown()) {
//将GroupCoordinatorRequest放到unsent队列里面等待发送
RequestFuture future = sendGroupCoordinatorRequest();
//阻塞获取response
client.poll(future);
//判断是否串异常
if (future.failed()) {
//如果是这个异常,那么阻塞等待metadata更新
if (future.isRetriable())
client.awaitMetadataUpdate();
else
throw future.exception();
//如果获取失败,那么清空coordinator,等待一段时间再去请求
} else if (coordinator != null && client.connectionFailed(coordinator)) {
// we found the coordinator, but the connection has failed, so mark
// it dead and backoff before retrying discovery
coordinatorDead();
time.sleep(retryBackoffMs);
}
}
}
下面我们看里面的方法的具体实现:
private RequestFuture sendGroupCoordinatorRequest() {
// 找到负载最小的节点
Node node = this.client.leastLoadedNode();
if (node == null) {
//如果没有找到返回一个异常结束
return RequestFuture.noBrokersAvailable();
} else {
// 常见要给request
GroupCoordinatorRequest metadataRequest = new GroupCoordinatorRequest(this.groupId);
//调用ConsumerNetworkClient.send添加到unsent集合中
return client.send(node, ApiKeys.GROUP_COORDINATOR, metadataRequest)
.compose(new RequestFutureAdapter() {
@Override
public void onSuccess(ClientResponse response, RequestFuture future) {
handleGroupMetadataResponse(response, future);
}
});
}
}
服务端会根据发出的请求返回要给Response,当接收到Response的时候需要对它进行处理:
1、调用coordinatorUnknown检测是否已经找到GroupCoordinator且成功连接。如果是则忽略此Response,因为会有重发机制
2、解析Response来解析GroupCoordinator
3、构建Node对象赋值给coordinator字段,并尝试与GroupCoordinator建立连接。
4、启动HeartbeatTask定时任务
5、最后调用RequestFuture.complete方法将正常收到的GroupCoordinatorResponse的事件传播出去
6、如果收到的Response中的错误码不为NONE,则将异常事件传播出去
private void handleGroupMetadataResponse(ClientResponse resp, RequestFuture future) {
log.debug("Received group coordinator response {}", resp);
//首先检查是否已经存在coordinator了
if (!coordinatorUnknown()) {
//如果存在了,那么忽略当前的Resposne
future.complete(null);
} else {
//获取response的实体
GroupCoordinatorResponse groupCoordinatorResponse = new GroupCoordinatorResponse(resp.responseBody());
// 获取错误码
Errors error = Errors.forCode(groupCoordinatorResponse.errorCode());
//如果没有错误码
if (error == Errors.NONE) {
//创建coordinator节点
this.coordinator = new Node(Integer.MAX_VALUE - groupCoordinatorResponse.node().id(),
groupCoordinatorResponse.node().host(),
groupCoordinatorResponse.node().port());
log.info("Discovered coordinator {} for group {}.", coordinator, groupId);
//尝试和coordinator建立连接
client.tryConnect(coordinator);
// 建立心跳任务
if (generation > 0)
heartbeatTask.reset();
//传播这个事件
future.complete(null);
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
//如果出现了错误,将这个异常传播出去
future.raise(new GroupAuthorizationException(groupId));
} else {
future.raise(error);
}
}
}
第二阶段
当成功查找到对应的GroupCoordinator的时候会进入JoinGroup阶段,在此阶段,消费者会向GroupCoordinator发送JoinGroupRequest,并处理响应
1、首先调用SubscriptionState.partitionsAutoAssigned方法检测Consumer的订阅是否是AUTO_TOPICS或AUTO_PATTERN,因为USER_ASSIGNED不需要进行Rebalance操作,而是由用户指定分区
2、如果订阅模式是AUTO_PATTERN,则检查Metadata是否需要更新
public void ensureFreshMetadata() {
//检测Metadata是否需要更新,如果需要阻塞到更新结束
if (this.metadata.updateRequested() || this.metadata.timeToNextUpdate(time.milliseconds()) == 0)
awaitMetadataUpdate();
}
ConsumerCoordinator的构造函数给Metadata添加了构造器,当Metadata更新时就会使用SubscriptionState中的正则表达式过滤Topic,并更新Subscription中的订阅信息。同时,也会使用metadataSnapshot字段记录当前的Metadata的快照。
3、调用ConsumerCoordinator.needRejoin方法判断是否要送JoinGroupRequest加入ConsumerGroup,其实是检测是否使用了AUTO_TOPICS或AUTO_PATTERN模式:
public boolean needRejoin() {
return subscriptions.partitionsAutoAssigned() &&
(super.needRejoin() || subscriptions.partitionAssignmentNeeded());
}
4、调用onJoinPrepare方法进行发送Request之前的准备,做了三件事:一是如果开启了自动提交offset则进行同步提交offset,此步骤可能会阻塞线程;二是调用注册在SubscriptionState中的ConsumerRebalanceListener上的回调方法;三是将SubscriptionState的needsPartitionAssignment字段设置为true并收缩groupSubscription集合:
protected void onJoinPrepare(int generation, String memberId) {
// 如果开启了自动提交offset则进行同步提交offset
maybeAutoCommitOffsetsSync();
// 执行监听器的方法
ConsumerRebalanceListener listener = subscriptions.listener();
log.info("Revoking previously assigned partitions {} for group {}", subscriptions.assignedPartitions(), groupId);
try {
Set revoked = new HashSet<>(subscriptions.assignedPartitions());
listener.onPartitionsRevoked(revoked);
} catch (WakeupException e) {
throw e;
} catch (Exception e) {
log.error("User provided listener {} for group {} failed on partition revocation",
listener.getClass().getName(), groupId, e);
}
//将SubscriptionState的needsPartitionAssignment字段设置为true并收缩groupSubscription集合
assignmentSnapshot = null;
subscriptions.needReassignment();
}
5、再次调用needRejoin方法检测,之后调用ensureCoordinatorReady方法检测找到GroupCoordinator且与之建立了连接
6、如果还有发往GroupCoordinator所在Node的请求,则阻塞等待这些请求全部发送完成并收到响应,然后返回步骤5
7、调用sendJoinGroupRequest方法创建JoinGroupRequest请求,并调用ConsumerNetworkClient.send方法将请求放入unsent中缓存等待发送:
private RequestFuture sendJoinGroupRequest() {
if (coordinatorUnknown())
return RequestFuture.coordinatorNotAvailable();
// send a join group request to the coordinator
log.info("(Re-)joining group {}", groupId);
//创建request
JoinGroupRequest request = new JoinGroupRequest(
groupId,
this.sessionTimeoutMs,
this.memberId,
protocolType(),
metadata());
//将请求添加到unsent队列中
return client.send(coordinator, ApiKeys.JOIN_GROUP, request)
.compose(new JoinGroupResponseHandler());
}
8、在步骤7返回的RequestFuture对象上添加RequestFutureListener
9、调用ConsumerNetworkClient.poll方法发送JoinGroupRequest,这里会阻塞等待
10、检测RequestFuture.fail
第二阶段的整体流程就介绍完了,我们来看一下里面的源码:
public void ensurePartitionAssignment() {
//订阅模式是否是自动分配的那两种模式
if (subscriptions.partitionsAutoAssigned()) {
//检测是否需要更新Metadata
if (subscriptions.hasPatternSubscription())
client.ensureFreshMetadata();
ensureActiveGroup();
}
}
public void ensureActiveGroup() {
//检查是否需要加入Group
if (!needRejoin())
return;
//准备操作
if (needsJoinPrepare) {
onJoinPrepare(generation, memberId);
needsJoinPrepare = false;
}
//再次验证是否需要加入Group
while (needRejoin()) {
ensureCoordinatorReady();
// 检查是否还有发送到coordinator节点的请求
if (client.pendingRequestCount(this.coordinator) > 0) {
//如果有阻塞到这些请求发送完毕
client.awaitPendingRequests(this.coordinator);
continue;
}
//构建Request并将他添加到unsent队列中,最后返回RequestFuture
RequestFuture future = sendJoinGroupRequest();
//RequestFuture添加监听器
future.addListener(new RequestFutureListener() {
@Override
public void onSuccess(ByteBuffer value) {
// handle join completion in the callback so that the callback will be invoked
// even if the consumer is woken up before finishing the rebalance
onJoinComplete(generation, memberId, protocol, value);
needsJoinPrepare = true;
heartbeatTask.reset();
}
@Override
public void onFailure(RuntimeException e) {
// we handle failures below after the request finishes. if the join completes
// after having been woken up, the exception is ignored and we will rejoin
}
});
//阻塞获取Response
client.poll(future);
//如果出现异常,需要处理
if (future.failed()) {
RuntimeException exception = future.exception();
if (exception instanceof UnknownMemberIdException ||
exception instanceof RebalanceInProgressException ||
exception instanceof IllegalGenerationException)
continue;
else if (!future.isRetriable())
throw exception;
time.sleep(retryBackoffMs);
}
}
}
服务端收到JoinGroupRequest之后会发送JoinGroupResponse,下面我们就来看一下它的处理流程:
1、解析JoinGroupResponse,获取GroupCoordinator分配的memberId、generation等信息,更新到本地
2、消费者根据leaderId检测自己是不是leader。如果是Leader则进入onJoinLeader方法,如果不是Leader则进入onJoinFollower方法。
3、Leader根据Join Group Response的group——protocol字段指定的Partition分配策略,查找相应的PartitionAssignor对象
4、Leader将JoinGroupResponse的members字段进行反序列化,得到ConsumerGroup中全部消费者的订阅的Topic。Leader会将这些Topic信息添加到器SubscriptionState.groupSubscription集合中。而Follower则只关心自己订阅的Topic信息
5、更新Metadata
6、生成Metadata的快照
7、调用PartitionAssignor.assign方法进行分区分配
8、将分配结果序列化,保存到Map中返回,其中key是消费者的memberId,value是分配结果序列化后的ByteBuffer
public void handle(JoinGroupResponse joinResponse, RequestFuture future) {
Errors error = Errors.forCode(joinResponse.errorCode());
//如果错误码为null
if (error == Errors.NONE) {
log.debug("Received successful join group response for group {}: {}", groupId, joinResponse.toStruct());
//获取memberId
AbstractCoordinator.this.memberId = joinResponse.memberId();
AbstractCoordinator.this.generation = joinResponse.generationId();
//修改标记
AbstractCoordinator.this.rejoinNeeded = false;
//获取分配的策略
AbstractCoordinator.this.protocol = joinResponse.groupProtocol();
sensors.joinLatency.record(response.requestLatencyMs());
//根据不同的角色选择不同的处理结果
if (joinResponse.isLeader()) {
onJoinLeader(joinResponse).chain(future);
} else {
onJoinFollower().chain(future);
}
}
...
}
private RequestFuture onJoinLeader(JoinGroupResponse joinResponse) {
try {
// 执行步骤3-8
Map groupAssignment = performAssignment(joinResponse.leaderId(), joinResponse.groupProtocol(),
joinResponse.members());
//创建并发送SyncGroupRequest
SyncGroupRequest request = new SyncGroupRequest(groupId, generation, memberId, groupAssignment);
log.debug("Sending leader SyncGroup for group {} to coordinator {}: {}", groupId, this.coordinator, request);
return sendSyncGroupRequest(request);
} catch (RuntimeException e) {
return RequestFuture.failure(e);
}
}
protected Map performAssignment(String leaderId,
String assignmentStrategy,
Map allSubscriptions) {
//查找分区分配使用的PartitionAssignor
PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
if (assignor == null)
throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
Set allSubscribedTopics = new HashSet<>();
Map subscriptions = new HashMap<>();
for (Map.Entry subscriptionEntry : allSubscriptions.entrySet()) {
Subscription subscription = ConsumerProtocol.deserializeSubscription(subscriptionEntry.getValue());
subscriptions.put(subscriptionEntry.getKey(), subscription);
allSubscribedTopics.addAll(subscription.topics());
}
// 保存全部消费者订阅的Topic
this.subscriptions.groupSubscribe(allSubscribedTopics);
metadata.setTopics(this.subscriptions.groupSubscription());
// 更新元数据信息
client.ensureFreshMetadata();
//记录快照
assignmentSnapshot = metadataSnapshot;
//进行分区分配
Map assignment = assignor.assign(metadata.fetch(), subscriptions);
//将分区分配的结果序列化保存到groupAssignment中
Map groupAssignment = new HashMap<>();
for (Map.Entry assignmentEntry : assignment.entrySet()) {
ByteBuffer buffer = ConsumerProtocol.serializeAssignment(assignmentEntry.getValue());
groupAssignment.put(assignmentEntry.getKey(), buffer);
}
return groupAssignment;
}
第三阶段
完成分区分配之后就进入了Synchronizing Group State阶段,主要逻辑是向GroupCoordinator发送SyncGroupRequest请求并处理SyncGroupResponse响应。
下面分析发送SyncGroupRequest流程:
1、得到序列化的分区分配结果后,Leader将其封装成SyncGroupRequest,Follower形成SyncGroupRequest中这部分为空
2、调用ConsumerNetWorkClient.send方法将请求放到unsent集合中等待发送
下面我们看一下对SyncGroupResponse的处理:
public void handle(SyncGroupResponse syncResponse,
RequestFuture future) {
Errors error = Errors.forCode(syncResponse.errorCode());
//如果没有错误码,那么将分区分配的结果传播出去
if (error == Errors.NONE) {
log.info("Successfully joined group {} with generation {}", groupId, generation);
sensors.syncLatency.record(response.requestLatencyMs());
future.complete(syncResponse.memberAssignment());
} else {
//如果存在错误码,那将rejoinNeeded 标记为true
AbstractCoordinator.this.rejoinNeeded = true;
...
}
}
从SyncGroupResponse中得到的分区最终由ConsumerCoordinator.onJoinComplete方法处理:
protected void onJoinComplete(int generation,
String memberId,
String assignmentStrategy,
ByteBuffer assignmentBuffer) {
// 如果快照发生了变化,需要重新分配分区
if (assignmentSnapshot != null && !assignmentSnapshot.equals(metadataSnapshot)) {
subscriptions.needReassignment();
return;
}
//找到当前分配策略的分配器
PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
if (assignor == null)
throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
//反序列化分配结果
Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
// 将needsFetchCommittedOffsets设置为true,允许从服务端获取最近一次提交的offset
subscriptions.needRefreshCommits();
//填充assignment
subscriptions.assignFromSubscribed(assignment.partitions());
// 回调函数
assignor.onAssignment(assignment);
// 开启定时任务
if (autoCommitEnabled)
autoCommitTask.reschedule();
// 执行监听器的方法
ConsumerRebalanceListener listener = subscriptions.listener();
try {
Set assigned = new HashSet<>(subscriptions.assignedPartitions());
listener.onPartitionsAssigned(assigned);
} catch (WakeupException e) {
throw e;
} catch (Exception e) {
log.error("User provided listener {} for group {} failed on partition assignment",
listener.getClass().getName(), groupId, e);
}
}
1、将第二阶段分配前保存的快照和最新的快照进行比较,如果不一样,说明在分区分配过程中出现了Topic增删或分区数量的变化,需要重新进行分配
2、反序列化拿到分配给当前消费者的分区,并添加到SubscriptionState.assignment集合中,之后消费者会按照此集合指定的分区进行消费,将needsPartitionAssignment置为false
3、调用onAssignment回调函数,默认是空实现
4、如果开启了自动提交offset的功能,则重新启动AutoCommitTask定时任务
5、调用SubscriptionState中注册的监听器
6、将needsJoinPrepare重置为true,为下一次Rebalance做准备
7、重新定时任务,定时发送心跳