消息的发送与接收 这一篇仅简单介绍了消息的发送流程,很多细节没有提及,这篇将详细介绍消息发送的高可用 RocketMQ 是怎么做的。
正式环境一般不允许发送消息的时候自动创建 Topic,会设置 autoCreateTopicEnable = false,然后需要提前通过控制台创建好相应的 Topic,这里做测试使用默认配置。那具体在源码中是在哪里,又是怎么创建的呢?
发送消息前先获取 Topic 发布信息
DefaultMQProducerImpl#sendDefaultImpl 发送消息
TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
public class TopicPublishInfo {
private boolean orderTopic = false;
private boolean haveTopicRouterInfo = false;
private List<MessageQueue> messageQueueList = new ArrayList<MessageQueue>();
private volatile ThreadLocalIndex sendWhichQueue = new ThreadLocalIndex();
private TopicRouteData topicRouteData;
}
public class MessageQueue implements Comparable<MessageQueue>, Serializable {
private static final long serialVersionUID = 6191200464116433425L;
private String topic;
private String brokerName;
private int queueId;
}
public class TopicRouteData extends RemotingSerializable {
private String orderTopicConf;
private List<QueueData> queueDatas;
private List<BrokerData> brokerDatas;
private HashMap<String/* brokerAddr */, List<String>/* Filter Server */> filterServerTable;
}
public class QueueData implements Comparable<QueueData> {
private String brokerName;
private int readQueueNums;
private int writeQueueNums;
private int perm;
private int topicSynFlag;
}
public class BrokerData implements Comparable<BrokerData> {
private String cluster;
private String brokerName;
private HashMap<Long/* brokerId */, String/* broker address */> brokerAddrs;
}
准备获取路由信息
private TopicPublishInfo tryToFindTopicPublishInfo(final String topic) {
TopicPublishInfo topicPublishInfo = this.topicPublishInfoTable.get(topic);
if (null == topicPublishInfo || !topicPublishInfo.ok()) {
this.topicPublishInfoTable.putIfAbsent(topic, new TopicPublishInfo());
this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic);
topicPublishInfo = this.topicPublishInfoTable.get(topic);
}
if (topicPublishInfo.isHaveTopicRouterInfo() || topicPublishInfo.ok()) {
return topicPublishInfo;
} else {
this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic, true, this.defaultMQProducer);
topicPublishInfo = this.topicPublishInfoTable.get(topic);
return topicPublishInfo;
}
}
在主题没有创建的情况下,会进入到第4行,先从缓存中按照 topic 名称查找,找到就返回。然后执行 updateTopicRouteInfoFromNameServer 更新路由信息,这一步更新实际没啥用会打印错误日志,因为此时 NameServer 中没有此主题的信息。然后代码进入到第11行再次获取,此时传递了一个 true 的参数表明使用默认的主题。
public static final String AUTO_CREATE_TOPIC_KEY_TOPIC = "TBW102"; // Will be created at broker when isAutoCreateTopicEnable
从 NameServer 获取默认的 Topic#TBW102 路由信息来创建一个 TopicPublishInfo 对象。
MQClientInstance#updateTopicRouteInfoFromNameServer 更新主题路由信息
// 是否是需要按照默认主题创建新主题
if (isDefault && defaultMQProducer != null) {
// 再次进入这里获取默认主题 TBW102 的信息
topicRouteData = this.mQClientAPIImpl.getDefaultTopicRouteInfoFromNameServer(defaultMQProducer.getCreateTopicKey(),
1000 * 3);
if (topicRouteData != null) {
for (QueueData data : topicRouteData.getQueueDatas()) {
// 设置读写队列数量,默认读写数量都为4
int queueNums = Math.min(defaultMQProducer.getDefaultTopicQueueNums(), data.getReadQueueNums());
data.setReadQueueNums(queueNums);
data.setWriteQueueNums(queueNums);
}
}
} else {
// 第一次调用会进入这里,向 NameServer 获取主题信息,然后获取不到数据抛异常,在后面被捕获打印日志
topicRouteData = this.mQClientAPIImpl.getTopicRouteInfoFromNameServer(topic, 1000 * 3);
}
按照请求代码 RequestCode.GET_ROUTEINTO_BY_TOPIC 向 NameServer 发送查询主题路由信息请求。NameServer 调用 pickupTopicRouteData 方法返回结果。
public TopicRouteData pickupTopicRouteData(final String topic) {
TopicRouteData topicRouteData = new TopicRouteData();
...
// 从缓存中按照主题取队列数据
List<QueueData> queueDataList = this.topicQueueTable.get(topic);
...
// 从缓存中按照 Broker 名称取 Broker 详细信息(brokerId - address)
BrokerData brokerData = this.brokerAddrTable.get(brokerName);
...
// 将数据封装到 TopicRouteData 返回给客户端
}
获取到主题 TBW102 的路由信息后,更新客户端的路由缓存。
if (topicRouteData != null) {
// 获取之前的缓存
TopicRouteData old = this.topicRouteTable.get(topic);
// 判断是否发生改变
boolean changed = topicRouteDataIsChange(old, topicRouteData);
if (!changed) {
// 再次判断默认的客户端实例中的主题发布信息是否有效,如果是消费者则判断负载均衡中的缓存是否有变化
changed = this.isNeedUpdateTopicRouteInfo(topic);
} else {
log.info("the topic[{}] route info changed, old[{}] ,new[{}]", topic, old, topicRouteData);
}
if (changed) {
// 发生了变化,需要更新本地缓存
TopicRouteData cloneTopicRouteData = topicRouteData.cloneTopicRouteData();
// 更新 Broker 地址
for (BrokerData bd : topicRouteData.getBrokerDatas()) {
this.brokerAddrTable.put(bd.getBrokerName(), bd.getBrokerAddrs());
}
// 修改主题发布信息
{
// 转换下数据格式
TopicPublishInfo publishInfo = topicRouteData2TopicPublishInfo(topic, topicRouteData);
publishInfo.setHaveTopicRouterInfo(true);
Iterator<Entry<String, MQProducerInner>> it = this.producerTable.entrySet().iterator();
while (it.hasNext()) {
// 每个客户端实例都需要更新路由信息
Entry<String, MQProducerInner> entry = it.next();
MQProducerInner impl = entry.getValue();
if (impl != null) {
// 这里最终更新到本地的缓存表
impl.updateTopicPublishInfo(topic, publishInfo);
}
}
}
// Update sub info 省略消费者的信息,之后的文章再分析
...
log.info("topicRouteTable.put. Topic = {}, TopicRouteData[{}]", topic, cloneTopicRouteData);
// 放到缓存中
this.topicRouteTable.put(topic, cloneTopicRouteData);
return true;
}
}
为什么需要转换下数据格式,去执行 topicRouteData2TopicPublishInfo?这是为了创建发送负载队列,需要删除掉没有写权限的队列以及 Slave 节点,为之后发送消息选择队列做准备。
List<QueueData> qds = route.getQueueDatas();
// 默认按照 Broker 名称排序
Collections.sort(qds);
for (QueueData qd : qds) {
// 没有写权限不加入到队列
if (PermName.isWriteable(qd.getPerm())) {
BrokerData brokerData = null;
for (BrokerData bd : route.getBrokerDatas()) {
// QueueData 重写了 equals 方法,里面详细对比了属性信息
if (bd.getBrokerName().equals(qd.getBrokerName())) {
brokerData = bd;
break;
}
}
if (null == brokerData) {
continue;
}
// 非 MASTER 节点不加入到队列
if (!brokerData.getBrokerAddrs().containsKey(MixAll.MASTER_ID)) {
continue;
}
// 按照写队列个数创建多个 MessageQueue
for (int i = 0; i < qd.getWriteQueueNums(); i++) {
MessageQueue mq = new MessageQueue(topic, qd.getBrokerName(), i);
info.getMessageQueueList().add(mq);
}
}
}
上面准备好路由信息之后,开始选择 Broker 以及 queue。回到 sendDefaultImpl 方法,有这么一段代码获取队列 queue,lastBrokerName 代表上次发往的 Broker 名称。
MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
return this.mqFaultStrategy.selectOneMessageQueue(tpInfo, lastBrokerName);
}
调用了 mqFaultStrategy 容错策略,默认 sendLatencyFaultEnable = false 不开启,即容错策略不生效。
不开启的情况下会直接调用 selectOneMessageQueue 轮询队列返回一个 MessageQueue,若存在上个 lastBrokerName,尝试发往其他的 Broker,做到负载实现高可用。
public MessageQueue selectOneMessageQueue(final String lastBrokerName) {
if (lastBrokerName == null) {
return selectOneMessageQueue();
} else {
int index = this.sendWhichQueue.getAndIncrement();
for (int i = 0; i < this.messageQueueList.size(); i++) {
int pos = Math.abs(index++) % this.messageQueueList.size();
if (pos < 0)
pos = 0;
MessageQueue mq = this.messageQueueList.get(pos);
// 规避掉 lastBrokerName
if (!mq.getBrokerName().equals(lastBrokerName)) {
return mq;
}
}
return selectOneMessageQueue();
}
}
public MessageQueue selectOneMessageQueue() {
// 轮询,使用了 ThreadLocal
int index = this.sendWhichQueue.getAndIncrement();
int pos = Math.abs(index) % this.messageQueueList.size();
if (pos < 0)
pos = 0;
return this.messageQueueList.get(pos);
}
如果开启 sendLatencyFaultEnable,在随机递增取模的基础上,再过滤掉 not available 的 Broker 代理。
适用场景:若向某个 Broker 发送消息失败,再次递增取模仍可能是原来的那个异常的 Broker,可否有一种策略能够在一定的时间内不向那个 Broker 发送消息。
latencyFaultTolerance 是指对之前失败的,按一定的时间做退避,它是实现消息发送高可用的核心关键所在。
public class LatencyFaultToleranceImpl implements LatencyFaultTolerance<String> {
private final ConcurrentHashMap<String, FaultItem> faultItemTable = new ConcurrentHashMap<String, FaultItem>(16);
}
class FaultItem implements Comparable<FaultItem> {
private final String name;
private volatile long currentLatency;
private volatile long startTimestamp;
}
若某一次发送消息失败,会记录失败的 Broker,调用 updateFaultItem 方法。
传递了三个参数:Broker 名称,本次发送的时间,是否隔离。
public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {
if (this.sendLatencyFaultEnable) {
long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);
this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration);
}
}
computeNotAvailableDuration 计算应该规避的时间
private long[] latencyMax = {
50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L};
private long[] notAvailableDuration = {
0L, 0L, 30000L, 60000L, 120000L, 180000L, 600000L};
private long computeNotAvailableDuration(final long currentLatency) {
for (int i = latencyMax.length - 1; i >= 0; i--) {
if (currentLatency >= latencyMax[i])
return this.notAvailableDuration[i];
}
return 0;
}
例如,如果上次请求的 currentLatency 超过550Lms,就退避3000Lms;超过1000L,就退避60000L。
然后将本次计算的延迟时间进行缓存
LatencyFaultToleranceImpl#updateFaultItem
public void updateFaultItem(final String name, final long currentLatency, final long notAvailableDuration) {
FaultItem old = this.faultItemTable.get(name);
if (null == old) {
final FaultItem faultItem = new FaultItem(name);
faultItem.setCurrentLatency(currentLatency);
faultItem.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
old = this.faultItemTable.putIfAbsent(name, faultItem);
if (old != null) {
old.setCurrentLatency(currentLatency);
old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
}
} else {
old.setCurrentLatency(currentLatency);
old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
}
}
判断缓存是否存在此 Broker 的延迟信息,有的话就更新,没有就创建一个 FaultItem。FaultItem.startTimestamp 代表下次可以使用的时间。
再发送新建消息请求的时候,执行规避策略。
public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
if (this.sendLatencyFaultEnable) {
try {
int index = tpInfo.getSendWhichQueue().getAndIncrement();
for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
if (pos < 0)
pos = 0;
MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName))
return mq;
}
}
final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
if (writeQueueNums > 0) {
final MessageQueue mq = tpInfo.selectOneMessageQueue();
if (notBestBroker != null) {
mq.setBrokerName(notBestBroker);
mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
}
return mq;
} else {
latencyFaultTolerance.remove(notBestBroker);
}
} catch (Exception e) {
log.error("Error occurred when selecting message queue", e);
}
// 策略失败就轮询选择一个
return tpInfo.selectOneMessageQueue();
}
}
步骤一:遍历消息发送队列,判断各个 Broker 是否可用。
LatencyFaultToleranceImpl.FaultItem#isAvailable
public boolean isAvailable() {
return (System.currentTimeMillis() - startTimestamp) >= 0;
}
步骤二:步骤一没有找到 Broker,可能此时 Broker 失效时间还没过去,执行 pickOneAtLeast 来选择延迟时间最近的 Broker。
步骤三:此 Broker 没有写队列就移除。
步骤四:上面没有找到 Broker,随机递增取模选择一个 Broker,保证消息要往外发送。
到这里其实 Topic 还没有创建,需要等到 Broker 接受到新消息,判断 Topic 配置信息是否存在,然后再进行创建。
AbstractSendMessageProcessor#msgCheck
protected RemotingCommand msgCheck(final ChannelHandlerContext ctx,
final SendMessageRequestHeader requestHeader, final RemotingCommand response) {
// 校验有没有写权限
// 校验此 Topic 是否是系统默认主题
// 获取缓存中此 Topic 的 TopicConfig
TopicConfig topicConfig =
this.brokerController.getTopicConfigManager().selectTopicConfig(requestHeader.getTopic());
// 没有 TopicConfig 就创建一个
topicConfig = this.brokerController.getTopicConfigManager().createTopicInSendMessageMethod(
requestHeader.getTopic(),
requestHeader.getDefaultTopic(),
RemotingHelper.parseChannelRemoteAddr(ctx.channel()),
requestHeader.getDefaultTopicQueueNums(), topicSysFlag);
// 校验写队列序号是否超出返回
}
TopicConfigManager#createTopicInSendMessageMethod
由默认主题创建,会校验是否开启 autoCreateTopicEnable。
TopicConfig defaultTopicConfig = this.topicConfigTable.get(defaultTopic);
if (defaultTopicConfig != null) {
if (defaultTopic.equals(MixAll.AUTO_CREATE_TOPIC_KEY_TOPIC)) {
if (!this.brokerController.getBrokerConfig().isAutoCreateTopicEnable()) {
// 修改默认主题的权限
defaultTopicConfig.setPerm(PermName.PERM_READ | PermName.PERM_WRITE);
}
}
...
// 校验权限
if (PermName.isInherited(defaultTopicConfig.getPerm())) {
// 新建一个 TopicConfig
topicConfig = new TopicConfig(topic);
...
topicConfig.setReadQueueNums(queueNums);
topicConfig.setWriteQueueNums(queueNums);
int perm = defaultTopicConfig.getPerm();
perm &= ~PermName.PERM_INHERIT;
topicConfig.setPerm(perm);
...
}
}
如果有创建一个 TopicConfig,更新缓存,更新缓存版本,持久化主题信息到 topics.json 本地文件。
if (topicConfig != null) {
log.info("Create new topic by default topic:[{}] config:[{}] producer:[{}]",
defaultTopic, topicConfig, remoteAddress);
this.topicConfigTable.put(topic, topicConfig);
this.dataVersion.nextVersion();
createNew = true;
this.persist();
}
强制向所有 NameServer 发送心跳,以 invokeOneway 方式,请求命令代码 RequestCode.REGISTER_BROKER。
if (createNew) {
this.brokerController.registerBrokerAll(false, true,true);
}
NameServer 接受注册请求,更新本地的缓存信息,其中就有 Topic 配置信息。
Broker 在启动后也会定期向 NameServer 发送心跳。
this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
BrokerController.this.registerBrokerAll(true, false, brokerConfig.isForceRegister());
} catch (Throwable e) {
log.error("registerBrokerAll Exception", e);
}
}
}, 1000 * 10, Math.max(10000, Math.min(brokerConfig.getRegisterNameServerPeriod(), 60000)), TimeUnit.MILLISECONDS);
关键看 NameServer 如何加入或者更新一个 Broker
RouteInfoManager#registerBroker
public RegisterBrokerResult registerBroker(
final String clusterName,//集群名称
final String brokerAddr,//Broker地址
final String brokerName,//Broker名称
final long brokerId,//主从标识
final String haServerAddr,//master地址
final TopicConfigSerializeWrapper topicConfigWrapper,//主题配置
final List<String> filterServerList,//消息过滤服务器列表
final Channel channel) {
...}
步骤一:上读写锁,串行处理心跳,更新缓存 clusterAddrTable,Broke 集群信息
this.lock.writeLock().lockInterruptibly();
Set<String> brokerNames = this.clusterAddrTable.get(clusterName);
if (null == brokerNames) {
brokerNames = new HashSet<String>();
this.clusterAddrTable.put(clusterName, brokerNames);
}
brokerNames.add(brokerName);
步骤二:更新缓存 brokerAddrTable,更新 Broker 的基础信息,brokerId 为0代表 Master
boolean registerFirst = false;
BrokerData brokerData = this.brokerAddrTable.get(brokerName);
if (null == brokerData) {
registerFirst = true;
brokerData = new BrokerData(clusterName, brokerName, new HashMap<Long, String>());
this.brokerAddrTable.put(brokerName, brokerData);
}
// 主从切换,先删除之前的地址记录
...
String oldAddr = brokerData.getBrokerAddrs().put(brokerId, brokerAddr);
// 是否第一次注册
registerFirst = registerFirst || (null == oldAddr);
步骤三:Master Broker 在主题配置被修改的情况下(通过比较 DataVersion)或者第一次注册,更新缓存 topicQueueTable 的消息队列
if (null != topicConfigWrapper
&& MixAll.MASTER_ID == brokerId) {
if (this.isBrokerTopicConfigChanged(brokerAddr, topicConfigWrapper.getDataVersion())
|| registerFirst) {
ConcurrentMap<String, TopicConfig> tcTable =
topicConfigWrapper.getTopicConfigTable();
if (tcTable != null) {
for (Map.Entry<String, TopicConfig> entry : tcTable.entrySet()) {
this.createAndUpdateQueueData(brokerName, entry.getValue());
}
}
}
}
步骤四:更新缓存 brokerLiveTable,Broker 是否存活
BrokerLiveInfo prevBrokerLiveInfo = this.brokerLiveTable.put(brokerAddr,
new BrokerLiveInfo(
System.currentTimeMillis(),
topicConfigWrapper.getDataVersion(),
channel,
haServerAddr));
步骤五:更新缓存 filterServerTable,消息过滤服务
if (filterServerList != null) {
if (filterServerList.isEmpty()) {
this.filterServerTable.remove(brokerAddr);
} else {
this.filterServerTable.put(brokerAddr, filterServerList);
}
}
步骤六:若是 Slave Broker,更新它的 Master 地址
if (MixAll.MASTER_ID != brokerId) {
String masterAddr = brokerData.getBrokerAddrs().get(MixAll.MASTER_ID);
if (masterAddr != null) {
BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.get(masterAddr);
if (brokerLiveInfo != null) {
result.setHaServerAddr(brokerLiveInfo.getHaServerAddr());
result.setMasterAddr(masterAddr);
}
}
}
客户端启动后,消息生产者每隔30s定期向 NameServer 获取主题路由信息。
this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
MQClientInstance.this.updateTopicRouteInfoFromNameServer();
} catch (Exception e) {
log.error("ScheduledTask updateTopicRouteInfoFromNameServer exception", e);
}
}
}, 10, this.clientConfig.getPollNameServerInterval(), TimeUnit.MILLISECONDS);
生产者和消费者都是同一个实例,因此同时更新订阅的主题和创建的主题
private final ConcurrentMap<String/* group */, MQProducerInner> producerTable = new ConcurrentHashMap<String, MQProducerInner>();
private final ConcurrentMap<String/* group */, MQConsumerInner> consumerTable = new ConcurrentHashMap<String, MQConsumerInner>();
public void updateTopicRouteInfoFromNameServer() {
Set<String> topicList = new HashSet<String>();
...
// 从 producerTable 和 consumerTable 取出所有 Topic,放到 topicList 中
for (String topic : topicList) {
MQClientInstance#updateTopicRouteInfoFromNameServer(topic, false, null);
}
}
再调用 updateTopicRouteInfoFromNameServer 更新本地缓存,调整负载队列。
int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;
int times = 0;
for (; times < timesTotal; times++) {
...
sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
...
return sendResult;
}
MQClientAPIImpl#sendMessageAsync
...
this.remotingClient.invokeAsync(addr, request, timeoutMillis, new InvokeCallback() {
...
// 获取到结果后就 return 了。
if (response != null) {
...
} else {
// times 初始为0 ,每次会加一,大于 retryTimesWhenSendFailed 就不会再执行
onExceptionImpl(brokerName, msg, 0L, request, sendCallback, topicPublishInfo, instance,
retryTimesWhenSendFailed, times, ex, context, true, producer);
}
}
);
到这里分析完毕,对照官方文档来看更清晰。