half消息(生产者发送的Prepare消息):发送到MQ Server但无法被consumer消费的消息,暂时存在MQ Server,需要收到生产者二次确认后才能被消费
消息回查:一些意外情况可能导致生产者不能对消息二次确认,当MQ Server发现时间过长的half消息,会向生产者发送回查消息,通知生产者对half消息进行二次确认
这里RocketMQ采用了2PC的思想来实现了提交事务消息,同时增加一个补偿逻辑来处理二阶段超时或者失败的消息
优点:
系统解耦:消息之间独立存储,系统之间消息完成事务
复杂度低,实现简单
缺点:
一次消息需要发送两次请求(half消息+commit or rollback)
业务中需要实现消息状态回查接口
具体可以查看另一篇博客:
分布式事务2PC、3PC、TCC、RocketMQ事务消息方案详解与对比(详细图解)
这是官网的一张事务消息图,基本画得很详细了,配合源码看还是不难的,接下来我按照图中的序号进行说明。
half消息
(会存储在RMQ_SYS_TRANS_HALF_TOPIC主题下,原有主题会被备份到消息自身的map,避免被consumer消费)回查
来检查本地事务状态源码我们将分别从如下几个方面入口
org.apache.rocketmq.example.transaction.TransactionProducer
org.apache.rocketmq.example.transaction.TransactionListenerImpl
org.apache.rocketmq.client.producer.TransactionMQProducer
org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl
事务消息处理
和定时任务回查流程
org.apache.rocketmq.broker.transaction.queue.TransactionalMessageBridge
org.apache.rocketmq.broker.transaction.queue.TransactionalMessageServiceImpl
org.apache.rocketmq.broker.transaction.AbstractTransactionalMessageCheckListener
org.apache.rocketmq.broker.transaction.TransactionalMessageCheckService
org.apache.rocketmq.broker.processor.EndTransactionProcessor
事务消息demo在官方源码中的org.apache.rocketmq.example.transaction.TransactionProducer
public class TransactionProducer {
public static void main(String[] args) throws MQClientException, InterruptedException {
// 当RocketMQ发现`Prepared消息`时,会根据这个Listener实现的策略来决断事务
TransactionListener transactionListener = new TransactionListenerImpl();
// 构造事务消息的生产者
TransactionMQProducer producer = new TransactionMQProducer("please_rename_unique_group_name");
producer.setNamesrvAddr("127.0.0.1:9876");
producer.setVipChannelEnabled(true);
ExecutorService executorService = new ThreadPoolExecutor(2, 5, 100, TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(2000), new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setName("client-transaction-msg-check-thread");
return thread;
}
});
// 设置执行器
producer.setExecutorService(executorService);
// 设置事务决断处理类
producer.setTransactionListener(transactionListener);
producer.start();
// 发送10条事务消息
String[] tags = new String[]{"TagA", "TagB", "TagC", "TagD", "TagE"};
for (int i = 0; i < 10; i++) {
try {
Message msg =
new Message("TopicA", tags[i % tags.length], "KEY" + i,
("Hello RocketMQ " + i).getBytes(RemotingHelper.DEFAULT_CHARSET));
// 发送事务消息
SendResult sendResult = producer.sendMessageInTransaction(msg, null);
System.out.printf("%s%n", sendResult);
Thread.sleep(10);
} catch (MQClientException | UnsupportedEncodingException e) {
e.printStackTrace();
}
}
...
}
}
主要逻辑:
接下来我们看下TransactionListenerImpl
public class TransactionListenerImpl implements TransactionListener {
private AtomicInteger transactionIndex = new AtomicInteger(0);
private ConcurrentHashMap<String, Integer> localTrans = new ConcurrentHashMap<>();
/**
* 执行本地事务
*/
@Override
public LocalTransactionState executeLocalTransaction(Message msg, Object arg) {
int value = transactionIndex.getAndIncrement();
// 设置不同的本地事务状态
int status = value % 3;
localTrans.put(msg.getTransactionId(), status);
return LocalTransactionState.UNKNOW;
}
/**
* 事务回查方法
*
* @return 返回本地事务的状态
*/
@Override
public LocalTransactionState checkLocalTransaction(MessageExt msg) {
//
Integer status = localTrans.get(msg.getTransactionId());
if (null != status) {
switch (status) {
case 0:
return LocalTransactionState.UNKNOW;
case 1:
return LocalTransactionState.COMMIT_MESSAGE;
case 2:
return LocalTransactionState.ROLLBACK_MESSAGE;
default:
return LocalTransactionState.COMMIT_MESSAGE;
}
}
return LocalTransactionState.COMMIT_MESSAGE;
}
}
可以看到,主要实现了执行本地事务的逻辑方法和回查本地事务逻辑的方法,其中回查方法checkLocalTransaction在后面会被使用
首先我们来看下demo中调用的sendMessageInTransaction
方法
/**
* 发送事务消息
*/
@Override
public TransactionSendResult sendMessageInTransaction(final Message msg,
final Object arg) throws MQClientException {
...
// 设置主题
msg.setTopic(NamespaceUtil.wrapNamespace(this.getNamespace(), msg.getTopic()));
// 发送事务消息
return this.defaultMQProducerImpl.sendMessageInTransaction(msg, null, arg);
}
/**
* 事务消息的发送过程
*/
public TransactionSendResult sendMessageInTransaction(final Message msg,
final LocalTransactionExecuter localTransactionExecuter, final Object arg)
throws MQClientException {
// 获取检查监听
TransactionListener transactionListener = getCheckListener();
...
SendResult sendResult = null;
// 事务消息
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PREPARED, "true");
// 设置生产者group
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_PRODUCER_GROUP, this.defaultMQProducer.getProducerGroup());
// 第一步:发送事务消息
sendResult = this.send(msg);
...
LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
Throwable localException = null;
// 获取发送状态
switch (sendResult.getSendStatus()) {
// 发送成功
case SEND_OK: {
if (sendResult.getTransactionId() != null) {
// 保存事务消息id到msg
msg.putUserProperty("__transactionId__", sendResult.getTransactionId());
}
String transactionId = msg.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
if (null != transactionId && !"".equals(transactionId)) {
msg.setTransactionId(transactionId);
}
if (null != localTransactionExecuter) {
// 第二步:如果发送消息成功,处理与消息关联的本地事务单元
localTransactionState = localTransactionExecuter.executeLocalTransactionBranch(msg, arg);
} else if (transactionListener != null) {
...
}
// 如果执行本地事务后没有返回state,则默认UNKNOW
if (null == localTransactionState) {
localTransactionState = LocalTransactionState.UNKNOW;
}
// 执行本地事务后的状态不是commit
if (localTransactionState != LocalTransactionState.COMMIT_MESSAGE) {
...
}
}
break;
...
}
// 第三步:结束事务
// endTransaction()方法会将请求发往broker(mq server)去更新事务消息的最终状态
this.endTransaction(sendResult, localTransactionState, localException);
// 构建transactionSendResult
...
return transactionSendResult;
}
主要逻辑就三步:
endTransaction()
方法会将请求本地事务状态Commit/Rollback
发往MQ Server去更新事务消息的最终状态(生产消息索引对消费者可见或直接标记为delete)在MQ Server中,半消息的处理是在TransactionalMessageBridge
类
/**
* 异步保存半消息
*/
public CompletableFuture<PutMessageResult> asyncPutHalfMessage(MessageExtBrokerInner messageInner) {
// 将消息写入磁盘持久化,当持久化完成后,会以异步方式通知客户端
return store.asyncPutMessage(parseHalfMessageInner(messageInner));
}
/**
* 解析半消息
*/
private MessageExtBrokerInner parseHalfMessageInner(MessageExtBrokerInner msgInner) {
// 将消息的topic放进消息自身的map进行缓存
MessageAccessor.putProperty(msgInner, MessageConst.PROPERTY_REAL_TOPIC, msgInner.getTopic());
// 将消息的queueId放进消息自身的map进行缓存
MessageAccessor.putProperty(msgInner, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msgInner.getQueueId()));
msgInner.setSysFlag(MessageSysFlag.resetTransactionValue(msgInner.getSysFlag(), MessageSysFlag.TRANSACTION_NOT_TYPE));
// 将消息的topic设置为 RMQ_SYS_TRANS_HALF_TOPIC,单独区分topic,避免被consumer消费
msgInner.setTopic(TransactionalMessageUtil.buildHalfTopic());
// queueId设置为0
msgInner.setQueueId(0);
msgInner.setPropertiesString(MessageDecoder.messageProperties2String(msgInner.getProperties()));
return msgInner;
}
这里主要做如下几件事情:
其中更新topic和queueId主要是为了与普通消息区分,避免被consumer消费
在client接收到半消息的发送响应,且执行完了本地事务,会向MQ Server发送Commit/Rollback请求到MQ Server,主要在EndTransactionProcessor.processRequest()
处理,我们来看下源码
/**
* 【重点】处理 EndTransactionRequest 请求
*/
@Override
public RemotingCommand processRequest(ChannelHandlerContext ctx, RemotingCommand request) throws
RemotingCommandException {
final RemotingCommand response = RemotingCommand.createResponseCommand(null);
final EndTransactionRequestHeader requestHeader =
(EndTransactionRequestHeader)request.decodeCommandCustomHeader(EndTransactionRequestHeader.class);
...
OperationResult result = new OperationResult();
// 提交事务
if (MessageSysFlag.TRANSACTION_COMMIT_TYPE == requestHeader.getCommitOrRollback()) {
// 从commitLog中查出原始的prepared消息,要求producer在发送半消息和comit消息都要同一个broker
result = this.brokerController.getTransactionalMessageService().commitMessage(requestHeader);
if (result.getResponseCode() == ResponseCode.SUCCESS) {
// 检查获取的消息与请求的消息是否匹配
RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
if (res.getCode() == ResponseCode.SUCCESS) {
// 将prepareMessage构建为要发送给consumer的消息
MessageExtBrokerInner msgInner = endMessageTransaction(result.getPrepareMessage());
...
// 调用MessageStore的消息存储接口提交消息,使用真正的topic和queueId
RemotingCommand sendResult = sendFinalMessage(msgInner);
if (sendResult.getCode() == ResponseCode.SUCCESS) {
// 将prepareMessage标记为delete
this.brokerController.getTransactionalMessageService().deletePrepareMessage(result.getPrepareMessage());
}
return sendResult;
}
return res;
}
} else if (MessageSysFlag.TRANSACTION_ROLLBACK_TYPE == requestHeader.getCommitOrRollback()) {
// 收到的是rollback,查出原始Prepare消息
result = this.brokerController.getTransactionalMessageService().rollbackMessage(requestHeader);
if (result.getResponseCode() == ResponseCode.SUCCESS) {
// 检查获取的消息与请求的消息是否匹配
RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
if (res.getCode() == ResponseCode.SUCCESS) {
// 将prepareMessage标记为delete
this.brokerController.getTransactionalMessageService().deletePrepareMessage(result.getPrepareMessage());
}
return res;
}
}
...
// 返回响应
return response;
}
主要的处理逻辑如下:
主要是在TransactionalMessageCheckService
类中的onWaitEnd方法
public class TransactionalMessageCheckService extends ServiceThread {
private BrokerController brokerController;
public TransactionalMessageCheckService(BrokerController brokerController) {
this.brokerController = brokerController;
}
@Override
public String getServiceName() {
return TransactionalMessageCheckService.class.getSimpleName();
}
@Override
public void run() {
...
// 事务回查周期
long checkInterval = brokerController.getBrokerConfig().getTransactionCheckInterval();
while (!this.isStopped()) {
this.waitForRunning(checkInterval);
}
...
}
/**
* run()->waitForRunning()->onWaitEnd()
*/
@Override
protected void onWaitEnd() {
// 超时时间
long timeout = brokerController.getBrokerConfig().getTransactionTimeOut();
// 最大检查次数,默认回查15次,如果15次回查还是无法得知事务状态,rocketmq默认回滚该消息。
int checkMax = brokerController.getBrokerConfig().getTransactionCheckMax();
long begin = System.currentTimeMillis();
...
// 开始回查
this.brokerController.getTransactionalMessageService().check(timeout, checkMax, this.brokerController.getTransactionalMessageCheckListener());
...
}
}
然后看下this.brokerController.getTransactionalMessageService().check()
方法的实现
/**
* 遍历未提交/回滚的半消息,并向消息发送方发送回查请求以获取发送方本地事务状态
*/
@Override
public void check(long transactionTimeout, int transactionCheckMax,
AbstractTransactionalMessageCheckListener listener) {
// 获取所有半消息
String topic = TopicValidator.RMQ_SYS_TRANS_HALF_TOPIC;
Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
...
// 检查所有消息队列
for (MessageQueue messageQueue : msgQueues) {
// 对消息的过滤和检查
...
// single thread
int getMessageNullCount = 1;
long newOffset = halfOffset;
long i = halfOffset;
// 开始遍历半消息
while (true) {
...
// 获取半消息
GetResult getResult = getHalfMsg(messageQueue, i);
MessageExt msgExt = getResult.getMsg();
// 半消息不存在
if (msgExt == null) {
...
}
// 是否需要丢弃或跳过的半消息
if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
listener.resolveDiscardMsg(msgExt);
newOffset = i + 1;
i++;
continue;
}
...
List<MessageExt> opMsg = pullResult.getMsgFoundList();
// 判断消息是否需要回查
boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
|| (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
|| (valueOfCurrentMinusBorn <= -1);
// 是否需要回查
if (isNeedCheck) {
if (!putBackHalfMsgQueue(msgExt, i)) {
continue;
}
// 发送回查消息
listener.resolveHalfMsg(msgExt);
} else {
pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);
log.debug("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
messageQueue, pullResult);
continue;
}
newOffset = i + 1;
i++;
}
...
}
}
前面的大半部分都是对消息的判断、过滤操作,主要的方法在
// 是否需要回查
if (isNeedCheck) {
if (!putBackHalfMsgQueue(msgExt, i)) {
continue;
}
// 发送回查消息
listener.resolveHalfMsg(msgExt);
}
我们来看下这两个方法
/**
* 基于高性能考虑(顺序写):每次把半消息从磁盘拉到内存,然后把半消息再次放进磁盘,这样可以基于最新的物理偏移量顺序写入,
* 而不是对原有半消息进行修改
*/
private boolean putBackHalfMsgQueue(MessageExt msgExt, long offset) {
// 获取半消息后再次写
PutMessageResult putMessageResult = putBackToHalfQueueReturnResult(msgExt);
...
}
public void resolveHalfMsg(final MessageExt msgExt) {
// 开启一个线程
executorService.execute(new Runnable() {
@Override
public void run() {
try {
// 发送回查消息
sendCheckMessage(msgExt);
} catch (Exception e) {
LOGGER.error("Send check message error!", e);
}
}
});
}
/**
* 发送回查消息
*/
public void sendCheckMessage(MessageExt msgExt) throws Exception {
// 构建回查请求头
CheckTransactionStateRequestHeader checkTransactionStateRequestHeader = new CheckTransactionStateRequestHeader();
...
if (channel != null) {
// 异步执行回查请求
brokerController.getBroker2Client().checkProducerTransactionState(groupId, channel, checkTransactionStateRequestHeader, msgExt);
} else {
...
}
}
执行回查请求可以解决2PC中在Commit或Rollback阶段发生的超时或者失败的消息进行补偿,client收到回查请求后会发送Commit/Rollback到MQ Server,也就是重新恢复到上面EndTransactionProcessor
的处理逻辑
至此,事务消息的源码解析已完成,相信你也对其中的逻辑有了大概的了解
事务消息仅仅保证本地事务和MQ消息
发送形成原子性,但不保证消费者是否能一定消费成功
分布式事务保证的式多个操作形成原子性,一个失败则全部回滚
从事务消息的使用、消息的生产、half消息的发送、解析、消费者如何消费消息以及异常情况补偿做了全面的文字说明和源码解析,并对源码设计中引发了一些思考。