Kafka源码之KafkaConsumer分析之Rebalance

我们先来看一下哪些情况会发生Rebalance操作:
1、有新的消费者加入ConsumerGroup
2、有消费者宕机下线。
3、有消费者主动退出Consumer Group
4、Consumer Group订阅的任一Topic出现分区数量的变化
5、消费者调用unsubscribe取消对某Topic的订阅
下面对Reblance操作的具体实现进行分析
第一阶段
Rebalance操作的第一步就是查找GroupCoordinator,这个阶段消费者会向Kafka集群中的任意一个Broker发送GroupCoordinatorRequest请求,并处理返回的GroupCoordinatorResponse,
1、首先检测是否需要重新查找GroupCoordinator,主要是检查coordinator字段是否为空以及与GroupCoordinator之间的连接是否正常。

    public boolean coordinatorUnknown() {
    	//检查coordinator字段是否为空
        if (coordinator == null)
            return true;
		//检测连接是否正常
        if (client.connectionFailed(coordinator)) {
        	//将unsent集合中对应的请求清空并将coordinator字段设置为null
            coordinatorDead();
            return true;
        }

        return false;
    }

2、查找集群负载最低的Node节点,并创建GroupCoordinatorRequest请求。调用client.send方法将请求放入unsent队列中等待发送,并返回RequestFuture对象。
3、调用ConsumerNetworkClient.pool方法,将GroupCoordinatorRequest请求发送出去。
4、检测检查RequestFuture对象的状态。如果出现RetriableException异常,则调用ConsuemerNetworkClient.awaitMetadataUpdate()方法阻塞更新Metadata中的记录的集群元数据后跳转到步骤一
5、如果成功找到GroupCoordinator节点,但是网络连接失败,则将其unsent中对应的请求清空,并将coordinator字段置为1,退避一段时间后跳转到步骤1执行
下面我们就进入源码看一下这个过程:

public void ensureCoordinatorReady() {
		//把是否需要查找GroupCoordinator作为循环条件
        while (coordinatorUnknown()) {
        	//将GroupCoordinatorRequest放到unsent队列里面等待发送
            RequestFuture future = sendGroupCoordinatorRequest();
            //阻塞获取response
            client.poll(future);
			//判断是否串异常
            if (future.failed()) {
            	//如果是这个异常,那么阻塞等待metadata更新
                if (future.isRetriable())
                    client.awaitMetadataUpdate();
                else
                    throw future.exception();
                    //如果获取失败,那么清空coordinator,等待一段时间再去请求
            } else if (coordinator != null && client.connectionFailed(coordinator)) {
                // we found the coordinator, but the connection has failed, so mark
                // it dead and backoff before retrying discovery
                coordinatorDead();
                time.sleep(retryBackoffMs);
            }

        }
    }

下面我们看里面的方法的具体实现:

private RequestFuture sendGroupCoordinatorRequest() {
        // 找到负载最小的节点
        Node node = this.client.leastLoadedNode();
        if (node == null) {
        	//如果没有找到返回一个异常结束
            return RequestFuture.noBrokersAvailable();
        } else {
            // 常见要给request
            GroupCoordinatorRequest metadataRequest = new GroupCoordinatorRequest(this.groupId);
            //调用ConsumerNetworkClient.send添加到unsent集合中
            return client.send(node, ApiKeys.GROUP_COORDINATOR, metadataRequest)
                    .compose(new RequestFutureAdapter() {
                        @Override
                        public void onSuccess(ClientResponse response, RequestFuture future) {
                            handleGroupMetadataResponse(response, future);
                        }
                    });
        }
    }

服务端会根据发出的请求返回要给Response,当接收到Response的时候需要对它进行处理:
1、调用coordinatorUnknown检测是否已经找到GroupCoordinator且成功连接。如果是则忽略此Response,因为会有重发机制
2、解析Response来解析GroupCoordinator
3、构建Node对象赋值给coordinator字段,并尝试与GroupCoordinator建立连接。
4、启动HeartbeatTask定时任务
5、最后调用RequestFuture.complete方法将正常收到的GroupCoordinatorResponse的事件传播出去
6、如果收到的Response中的错误码不为NONE,则将异常事件传播出去

private void handleGroupMetadataResponse(ClientResponse resp, RequestFuture future) {
        log.debug("Received group coordinator response {}", resp);
		//首先检查是否已经存在coordinator了
        if (!coordinatorUnknown()) {
            //如果存在了,那么忽略当前的Resposne
            future.complete(null);
        } else {
        	//获取response的实体
            GroupCoordinatorResponse groupCoordinatorResponse = new GroupCoordinatorResponse(resp.responseBody());
            // 获取错误码
            Errors error = Errors.forCode(groupCoordinatorResponse.errorCode());
			//如果没有错误码
            if (error == Errors.NONE) {
            	//创建coordinator节点
                this.coordinator = new Node(Integer.MAX_VALUE - groupCoordinatorResponse.node().id(),
                        groupCoordinatorResponse.node().host(),
                        groupCoordinatorResponse.node().port());

                log.info("Discovered coordinator {} for group {}.", coordinator, groupId);
				//尝试和coordinator建立连接
                client.tryConnect(coordinator);

                // 建立心跳任务
                if (generation > 0)
                    heartbeatTask.reset();
                //传播这个事件
                future.complete(null);
            } else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
            	//如果出现了错误,将这个异常传播出去
                future.raise(new GroupAuthorizationException(groupId));
            } else {
                future.raise(error);
            }
        }
    }

第二阶段
当成功查找到对应的GroupCoordinator的时候会进入JoinGroup阶段,在此阶段,消费者会向GroupCoordinator发送JoinGroupRequest,并处理响应
1、首先调用SubscriptionState.partitionsAutoAssigned方法检测Consumer的订阅是否是AUTO_TOPICS或AUTO_PATTERN,因为USER_ASSIGNED不需要进行Rebalance操作,而是由用户指定分区
2、如果订阅模式是AUTO_PATTERN,则检查Metadata是否需要更新

public void ensureFreshMetadata() {
		//检测Metadata是否需要更新,如果需要阻塞到更新结束
        if (this.metadata.updateRequested() || this.metadata.timeToNextUpdate(time.milliseconds()) == 0)
            awaitMetadataUpdate();
    }

ConsumerCoordinator的构造函数给Metadata添加了构造器,当Metadata更新时就会使用SubscriptionState中的正则表达式过滤Topic,并更新Subscription中的订阅信息。同时,也会使用metadataSnapshot字段记录当前的Metadata的快照。
3、调用ConsumerCoordinator.needRejoin方法判断是否要送JoinGroupRequest加入ConsumerGroup,其实是检测是否使用了AUTO_TOPICS或AUTO_PATTERN模式:

public boolean needRejoin() {
        return subscriptions.partitionsAutoAssigned() &&
                (super.needRejoin() || subscriptions.partitionAssignmentNeeded());
    }

4、调用onJoinPrepare方法进行发送Request之前的准备,做了三件事:一是如果开启了自动提交offset则进行同步提交offset,此步骤可能会阻塞线程;二是调用注册在SubscriptionState中的ConsumerRebalanceListener上的回调方法;三是将SubscriptionState的needsPartitionAssignment字段设置为true并收缩groupSubscription集合:

protected void onJoinPrepare(int generation, String memberId) {
        // 如果开启了自动提交offset则进行同步提交offset
        maybeAutoCommitOffsetsSync();

        // 执行监听器的方法
        ConsumerRebalanceListener listener = subscriptions.listener();
        log.info("Revoking previously assigned partitions {} for group {}", subscriptions.assignedPartitions(), groupId);
        try {
            Set revoked = new HashSet<>(subscriptions.assignedPartitions());
            listener.onPartitionsRevoked(revoked);
        } catch (WakeupException e) {
            throw e;
        } catch (Exception e) {
            log.error("User provided listener {} for group {} failed on partition revocation",
                    listener.getClass().getName(), groupId, e);
        }
		//将SubscriptionState的needsPartitionAssignment字段设置为true并收缩groupSubscription集合
        assignmentSnapshot = null;
        subscriptions.needReassignment();
    }

5、再次调用needRejoin方法检测,之后调用ensureCoordinatorReady方法检测找到GroupCoordinator且与之建立了连接
6、如果还有发往GroupCoordinator所在Node的请求,则阻塞等待这些请求全部发送完成并收到响应,然后返回步骤5
7、调用sendJoinGroupRequest方法创建JoinGroupRequest请求,并调用ConsumerNetworkClient.send方法将请求放入unsent中缓存等待发送:

private RequestFuture sendJoinGroupRequest() {
        if (coordinatorUnknown())
            return RequestFuture.coordinatorNotAvailable();

        // send a join group request to the coordinator
        log.info("(Re-)joining group {}", groupId);
        //创建request
        JoinGroupRequest request = new JoinGroupRequest(
                groupId,
                this.sessionTimeoutMs,
                this.memberId,
                protocolType(),
                metadata());

        //将请求添加到unsent队列中
        return client.send(coordinator, ApiKeys.JOIN_GROUP, request)
                .compose(new JoinGroupResponseHandler());
    }

8、在步骤7返回的RequestFuture对象上添加RequestFutureListener
9、调用ConsumerNetworkClient.poll方法发送JoinGroupRequest,这里会阻塞等待
10、检测RequestFuture.fail
第二阶段的整体流程就介绍完了,我们来看一下里面的源码:

public void ensurePartitionAssignment() {
		//订阅模式是否是自动分配的那两种模式
        if (subscriptions.partitionsAutoAssigned()) {
        	//检测是否需要更新Metadata
            if (subscriptions.hasPatternSubscription())
                client.ensureFreshMetadata();
            ensureActiveGroup();
        }
    }
public void ensureActiveGroup() {
		//检查是否需要加入Group
        if (!needRejoin())
            return;
		//准备操作
        if (needsJoinPrepare) {
            onJoinPrepare(generation, memberId);
            needsJoinPrepare = false;
        }
		//再次验证是否需要加入Group
        while (needRejoin()) {
            ensureCoordinatorReady();

            // 检查是否还有发送到coordinator节点的请求
            if (client.pendingRequestCount(this.coordinator) > 0) {
            	//如果有阻塞到这些请求发送完毕
                client.awaitPendingRequests(this.coordinator);
                continue;
            }
			//构建Request并将他添加到unsent队列中,最后返回RequestFuture
            RequestFuture future = sendJoinGroupRequest();
            //RequestFuture添加监听器
            future.addListener(new RequestFutureListener() {
                @Override
                public void onSuccess(ByteBuffer value) {
                    // handle join completion in the callback so that the callback will be invoked
                    // even if the consumer is woken up before finishing the rebalance
                    onJoinComplete(generation, memberId, protocol, value);
                    needsJoinPrepare = true;
                    heartbeatTask.reset();
                }

                @Override
                public void onFailure(RuntimeException e) {
                    // we handle failures below after the request finishes. if the join completes
                    // after having been woken up, the exception is ignored and we will rejoin
                }
            });
            //阻塞获取Response
            client.poll(future);
			//如果出现异常,需要处理
            if (future.failed()) {
                RuntimeException exception = future.exception();
                if (exception instanceof UnknownMemberIdException ||
                        exception instanceof RebalanceInProgressException ||
                        exception instanceof IllegalGenerationException)
                    continue;
                else if (!future.isRetriable())
                    throw exception;
                time.sleep(retryBackoffMs);
            }
        }
    }

服务端收到JoinGroupRequest之后会发送JoinGroupResponse,下面我们就来看一下它的处理流程:
1、解析JoinGroupResponse,获取GroupCoordinator分配的memberId、generation等信息,更新到本地
2、消费者根据leaderId检测自己是不是leader。如果是Leader则进入onJoinLeader方法,如果不是Leader则进入onJoinFollower方法。
3、Leader根据Join Group Response的group——protocol字段指定的Partition分配策略,查找相应的PartitionAssignor对象
4、Leader将JoinGroupResponse的members字段进行反序列化,得到ConsumerGroup中全部消费者的订阅的Topic。Leader会将这些Topic信息添加到器SubscriptionState.groupSubscription集合中。而Follower则只关心自己订阅的Topic信息
5、更新Metadata
6、生成Metadata的快照
7、调用PartitionAssignor.assign方法进行分区分配
8、将分配结果序列化,保存到Map中返回,其中key是消费者的memberId,value是分配结果序列化后的ByteBuffer

public void handle(JoinGroupResponse joinResponse, RequestFuture future) {
            Errors error = Errors.forCode(joinResponse.errorCode());
            //如果错误码为null
            if (error == Errors.NONE) {
                log.debug("Received successful join group response for group {}: {}", groupId, joinResponse.toStruct());
                //获取memberId
                AbstractCoordinator.this.memberId = joinResponse.memberId();
                AbstractCoordinator.this.generation = joinResponse.generationId();
                //修改标记
                AbstractCoordinator.this.rejoinNeeded = false;
                //获取分配的策略
                AbstractCoordinator.this.protocol = joinResponse.groupProtocol();
                sensors.joinLatency.record(response.requestLatencyMs());
                //根据不同的角色选择不同的处理结果
                if (joinResponse.isLeader()) {
                    onJoinLeader(joinResponse).chain(future);
                } else {
                    onJoinFollower().chain(future);
                }
            }
            ...
        }
private RequestFuture onJoinLeader(JoinGroupResponse joinResponse) {
        try {
            // 执行步骤3-8
            Map groupAssignment = performAssignment(joinResponse.leaderId(), joinResponse.groupProtocol(),
                    joinResponse.members());
			//创建并发送SyncGroupRequest 
            SyncGroupRequest request = new SyncGroupRequest(groupId, generation, memberId, groupAssignment);
            log.debug("Sending leader SyncGroup for group {} to coordinator {}: {}", groupId, this.coordinator, request);
            return sendSyncGroupRequest(request);
        } catch (RuntimeException e) {
            return RequestFuture.failure(e);
        }
    }
protected Map performAssignment(String leaderId,
                                                        String assignmentStrategy,
                                                        Map allSubscriptions) {
         //查找分区分配使用的PartitionAssignor
        PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
        if (assignor == null)
            throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);

        Set allSubscribedTopics = new HashSet<>();
        Map subscriptions = new HashMap<>();
        for (Map.Entry subscriptionEntry : allSubscriptions.entrySet()) {
            Subscription subscription = ConsumerProtocol.deserializeSubscription(subscriptionEntry.getValue());
            subscriptions.put(subscriptionEntry.getKey(), subscription);
            allSubscribedTopics.addAll(subscription.topics());
        }

        // 保存全部消费者订阅的Topic
        this.subscriptions.groupSubscribe(allSubscribedTopics);
        metadata.setTopics(this.subscriptions.groupSubscription());

        // 更新元数据信息
        client.ensureFreshMetadata();
        //记录快照
        assignmentSnapshot = metadataSnapshot;
		//进行分区分配
        Map assignment = assignor.assign(metadata.fetch(), subscriptions);
		//将分区分配的结果序列化保存到groupAssignment中
        Map groupAssignment = new HashMap<>();
        for (Map.Entry assignmentEntry : assignment.entrySet()) {
            ByteBuffer buffer = ConsumerProtocol.serializeAssignment(assignmentEntry.getValue());
            groupAssignment.put(assignmentEntry.getKey(), buffer);
        }

        return groupAssignment;
    }

第三阶段
完成分区分配之后就进入了Synchronizing Group State阶段,主要逻辑是向GroupCoordinator发送SyncGroupRequest请求并处理SyncGroupResponse响应。
下面分析发送SyncGroupRequest流程:
1、得到序列化的分区分配结果后,Leader将其封装成SyncGroupRequest,Follower形成SyncGroupRequest中这部分为空
2、调用ConsumerNetWorkClient.send方法将请求放到unsent集合中等待发送
下面我们看一下对SyncGroupResponse的处理:

public void handle(SyncGroupResponse syncResponse,
                           RequestFuture future) {
            Errors error = Errors.forCode(syncResponse.errorCode());
            //如果没有错误码,那么将分区分配的结果传播出去
            if (error == Errors.NONE) {
                log.info("Successfully joined group {} with generation {}", groupId, generation);
                sensors.syncLatency.record(response.requestLatencyMs());
                future.complete(syncResponse.memberAssignment());
            } else {
            	//如果存在错误码,那将rejoinNeeded 标记为true
                AbstractCoordinator.this.rejoinNeeded = true;
                ...
            }
        }

从SyncGroupResponse中得到的分区最终由ConsumerCoordinator.onJoinComplete方法处理:

protected void onJoinComplete(int generation,
                                  String memberId,
                                  String assignmentStrategy,
                                  ByteBuffer assignmentBuffer) {
        // 如果快照发生了变化,需要重新分配分区
        if (assignmentSnapshot != null && !assignmentSnapshot.equals(metadataSnapshot)) {
            subscriptions.needReassignment();
            return;
        }
		//找到当前分配策略的分配器
        PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
        if (assignor == null)
            throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
		//反序列化分配结果
        Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);

        // 将needsFetchCommittedOffsets设置为true,允许从服务端获取最近一次提交的offset
        subscriptions.needRefreshCommits();

        //填充assignment
        subscriptions.assignFromSubscribed(assignment.partitions());

        // 回调函数
        assignor.onAssignment(assignment);

        // 开启定时任务
        if (autoCommitEnabled)
            autoCommitTask.reschedule();

        // 执行监听器的方法
        ConsumerRebalanceListener listener = subscriptions.listener();
        try {
            Set assigned = new HashSet<>(subscriptions.assignedPartitions());
            listener.onPartitionsAssigned(assigned);
        } catch (WakeupException e) {
            throw e;
        } catch (Exception e) {
            log.error("User provided listener {} for group {} failed on partition assignment",
                    listener.getClass().getName(), groupId, e);
        }
    }

1、将第二阶段分配前保存的快照和最新的快照进行比较,如果不一样,说明在分区分配过程中出现了Topic增删或分区数量的变化,需要重新进行分配
2、反序列化拿到分配给当前消费者的分区,并添加到SubscriptionState.assignment集合中,之后消费者会按照此集合指定的分区进行消费,将needsPartitionAssignment置为false
3、调用onAssignment回调函数,默认是空实现
4、如果开启了自动提交offset的功能,则重新启动AutoCommitTask定时任务
5、调用SubscriptionState中注册的监听器
6、将needsJoinPrepare重置为true,为下一次Rebalance做准备
7、重新定时任务,定时发送心跳

你可能感兴趣的:(Kafka)