面试官:RocketMQ的Consumer是如何做的负载均衡?比如:5个Consumer进程同时消费一个Topic,这个Topic只有4个queue会出现啥情况?反之Consumer数量小于queue的数据是啥情况?
应聘者:一脸懵逼。
public class RebalancePushImpl extends RebalanceImpl {
public RebalancePushImpl(String consumerGroup, MessageModel messageModel,
AllocateMessageQueueStrategy allocateMessageQueueStrategy,
MQClientInstance mQClientFactory, DefaultMQPushConsumerImpl defaultMQPushConsumerImpl) {
// 可以看到很简单,调用了父类RebalanceImpl的构造器
super(consumerGroup, messageModel, allocateMessageQueueStrategy, mQClientFactory);
this.defaultMQPushConsumerImpl = defaultMQPushConsumerImpl;
}
public abstract class RebalanceImpl {
// 很简单,就是初始化一些东西,关键在于下面的doRebalance
public RebalanceImpl(String consumerGroup, MessageModel messageModel,
AllocateMessageQueueStrategy allocateMessageQueueStrategy,
MQClientInstance mQClientFactory) {
this.consumerGroup = consumerGroup;
this.messageModel = messageModel;
this.allocateMessageQueueStrategy = allocateMessageQueueStrategy;
this.mQClientFactory = mQClientFactory;
}
/**
* 分配消息队列,命名抄袭spring,doXXX开始真正的业务逻辑
*
* @param isOrder:是否是顺序消息 true:是;false:不是
*/
public void doRebalance(final boolean isOrder) {
// 分配每个topic的消息队列
Map<String, SubscriptionData> subTable = this.getSubscriptionInner();
if (subTable != null) {
for (final Map.Entry<String, SubscriptionData> entry : subTable.entrySet()) {
final String topic = entry.getKey();
try {
// 这个是关键了
this.rebalanceByTopic(topic, isOrder);
} catch (Throwable e) {
if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
log.warn("rebalanceByTopic Exception", e);
}
}
}
}
// 移除未订阅的topic对应的消息队列
this.truncateMessageQueueNotMyTopic();
}
}
private void rebalanceByTopic(final String topic, final boolean isOrder) {
switch (messageModel) {
case CLUSTERING: {
// 获取topic对应的队列和consumer信息,比如mqSet如下
/**
* 0 = {MessageQueue@2151} "MessageQueue [topic=myTopic001, brokerName=broker-a, queueId=3]"
* 1 = {MessageQueue@2152} "MessageQueue [topic=myTopic001, brokerName=broker-a, queueId=0]"
* 2 = {MessageQueue@2153} "MessageQueue [topic=myTopic001, brokerName=broker-a, queueId=2]"
* 3 = {MessageQueue@2154} "MessageQueue [topic=myTopic001, brokerName=broker-a, queueId=1]"
*/
Set<MessageQueue> mqSet = this.topicSubscribeInfoTable.get(topic);
// 所有的Consumer客户端cid,比如:172.16.20.246@7832
List<String> cidAll = this.mQClientFactory.findConsumerIdList(topic, consumerGroup);
if (mqSet != null && cidAll != null) {
List<MessageQueue> mqAll = new ArrayList<MessageQueue>();
// 为什么要addAll到list里,因为他要排序
mqAll.addAll(mqSet);
// 排序消息队列和消费者数组,因为是在进行分配队列,排序后,各Client的顺序才能保持一致。
Collections.sort(mqAll);
Collections.sort(cidAll);
// 默认选择的是org.apache.rocketmq.client.consumer.rebalance.AllocateMessageQueueAveragely
AllocateMessageQueueStrategy strategy = this.allocateMessageQueueStrategy;
// 根据队列分配策略分配消息队列
List<MessageQueue> allocateResult = null;
try {
// 这个才是要介绍的真正C位,strategy.allocate()
allocateResult = strategy.allocate(
this.consumerGroup,
this.mQClientFactory.getClientId(),
mqAll,
cidAll);
} catch (Throwable e) {
return;
}
}
}
}
}
public class AllocateMessageQueueAveragely implements AllocateMessageQueueStrategy {
private final InternalLogger log = ClientLogger.getLog();
@Override
public List<MessageQueue> allocate(String consumerGroup, String currentCID, List<MessageQueue> mqAll,
List<String> cidAll) {
/**
* 参数校验的代码我删了。
*/
List<MessageQueue> result = new ArrayList<MessageQueue>();
/**
* 第几个Consumer,这也是我们上面为什么要排序的重要原因之一。
* Collections.sort(mqAll);
* Collections.sort(cidAll);
*/
int index = cidAll.indexOf(currentCID);
// 取模,多少消息队列无法平均分配 比如mqAll.size()是4,代表4个queue。cidAll.size()是5,代表一个consumer,那么mod就是4
int mod = mqAll.size() % cidAll.size();
// 平均分配
// 4 <= 5 ? 1 : (4 > 0 && 1 < 4 ? 4 / 5 + 1 : 4 / 5)
int averageSize =
mqAll.size() <= cidAll.size() ? 1 : (mod > 0 && index < mod ? mqAll.size() / cidAll.size()
+ 1 : mqAll.size() / cidAll.size());
// 有余数的情况下,[0, mod) 平分余数,即每consumer多分配一个节点;第index开始,跳过前mod余数。
int startIndex = (mod > 0 && index < mod) ? index * averageSize : index * averageSize + mod;
// 分配队列数量。之所以要Math.min()的原因是,mqAll.size() <= cidAll.size(),部分consumer分配不到消息队列。
int range = Math.min(averageSize, mqAll.size() - startIndex);
for (int i = 0; i < range; i++) {
result.add(mqAll.get((startIndex + i) % mqAll.size()));
}
return result;
}
}
看着这算法凌乱的很,太复杂了!说实话,确实挺复杂,蛮罗嗦的,但是代数法可以得到如下表格:
假设4个queue | Consumer有2个 可以整除 | Consumer有3个 不可整除 | Consumer有5个 无法都分配 |
---|---|---|---|
queue[0] | Consumer[0] | Consumer[0] | Consumer[0] |
queue[1] | Consumer[0] | Consumer[0] | Consumer[1] |
queue[2] | Consumer[1] | Consumer[1] | Consumer[2] |
queue[3] | Consumer[1] | Consumer[2] | Consumer[3] |
所以得出如下真香定律(也是回击面试官的最佳答案):
queue选择算法也就是负载均衡算法有很多种可选择:
AllocateMessageQueueAveragely
:是前面讲的默认方式AllocateMessageQueueAveragelyByCircle
:每个消费者依次消费一个partition,环状。AllocateMessageQueueConsistentHash
:一致性hash算法AllocateMachineRoomNearby
:就近元则,离的近的消费AllocateMessageQueueByConfig
:是通过配置的方式那就得从Consumer启动的源码开始看起,先看Consumer的启动方法start()
public class DefaultMQPushConsumerImpl implements MQConsumerInner {
private MQClientInstance mQClientFactory;
// 启动Consumer的入口函数
public synchronized void start() throws MQClientException {
this.mQClientFactory = MQClientManager.getInstance().getOrCreateMQClientInstance(
this.defaultMQPushConsumer, this.rpcHook);
// 调用MQClientInstance的start方法,追进去看看。
mQClientFactory.start();
}
}
看看mQClientFactory.start();
都干了什么
public class MQClientInstance {
private final RebalanceService rebalanceService;
public void start() throws MQClientException {
synchronized (this) {
// 调用RebalanceService的start方法,别慌,继续追进去看看
this.rebalanceService.start();
}
}
}
看看rebalanceService.start();
都干了什么,先看下他的父类ServiceThread
/*
* 首先可以发现他是个线程的任务,实现了Runnable接口
* 其次发现上步调用的start方法居然就是thread.start(),那就相当于调用了RebalanceService的run方法
*/
public abstract class ServiceThread implements Runnable {
public void start() {
this.thread = new Thread(this, getServiceName());
this.thread.setDaemon(isDaemon);
this.thread.start();
}
}
最后来看看RebalanceService.run()
public class RebalanceService extends ServiceThread {
/**
* 等待时间的间隔,毫秒,默认是20s
*/
private static long waitInterval =
Long.parseLong(System.getProperty(
"rocketmq.client.rebalance.waitInterval", "20000"));
@Override
public void run() {
while (!this.isStopped()) {
// 等待20s,然后超时自动释放锁执行doRebalance
this.waitForRunning(waitInterval);
this.mqClientFactory.doRebalance();
}
}
}
到这里真相大白了。
当一个consumer出现宕机后,默认最多20s,其它机器将重新消费已宕机的机器消费的queue,同样当有新的Consumer连接上后,20s内也会完成rebalance使得新的Consumer有机会消费queue里的msg。
等等,好像有问题:新上线一个Consumer要等20s才能负载均衡?这不是搞笑呢吗?肯定有猫腻。
确实,新启动Consumer的话会立即唤醒沉睡的线程, 让他立马进行this.mqClientFactory.doRebalance();
,源码如下
public class DefaultMQPushConsumerImpl implements MQConsumerInner {
// 启动Consumer的入口函数
public synchronized void start() throws MQClientException {
// 看到了没!!!, 见名知意,立即rebalance负载均衡
this.mQClientFactory.rebalanceImmediately();
}
}