RocketMQ刷盘策略源码解析及总结

Broker通过CommitLog类来完成数据的落盘工作,对于前面的流程我们直接略过,直接从关键方法putMessage(….)开始。

public class CommitLog {

    ......
    /**
     * 添加消息,返回消息结果
     *
     * @param msg 消息
     * @return 结果
     */
    public PutMessageResult putMessage(final MessageExtBrokerInner msg) {

        ......
        // 获取写入映射文件
        MappedFile unlockMappedFile = null;
        MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile();

        // 获取追加锁,限制同一时间只能有一个线程进行数据的Put工作
        lockForPutMessage();                                           //##1
        try {
            long beginLockTimestamp = this.defaultMessageStore.getSystemClock().now();
            this.beginTimeInLock = beginLockTimestamp;

            // Here settings are stored timestamp, in order to ensure an orderly
            // global
            msg.setStoreTimestamp(beginLockTimestamp);

            // 当不存在映射文件或者文件已经空间已满,进行创建
            if (null == mappedFile || mappedFile.isFull()) {
                mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise
            }
            if (null == mappedFile) {
                log.error("create maped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
                beginTimeInLock = 0;
                return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, null);
            }

            // 将消息追加到MappedFile的MappedByteBuffer/writeBuffer中,更新其写入位置wrotePosition,但还没Commit及Flush
            result = mappedFile.appendMessage(msg, this.appendMessageCallback);           //##2
            switch (result.getStatus()) {
                case PUT_OK:
                    break;
                case END_OF_FILE: // 当文件剩余空间不足以插入当前消息时,创建新的MapperFile,进行插入
                    unlockMappedFile = mappedFile;
                    // Create a new file, re-write the message
                    mappedFile = this.mappedFileQueue.getLastMappedFile(0);
                    if (null == mappedFile) {
                        // XXX: warn and notify me
                        log.error("create maped file2 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString());
                        beginTimeInLock = 0;
                        return new PutMessageResult(PutMessageStatus.CREATE_MAPEDFILE_FAILED, result);
                    }
                    result = mappedFile.appendMessage(msg, this.appendMessageCallback);
                    break;
                case MESSAGE_SIZE_EXCEEDED:
                case PROPERTIES_SIZE_EXCEEDED:
                    beginTimeInLock = 0;
                    return new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result);
                case UNKNOWN_ERROR:
                    beginTimeInLock = 0;
                    return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result);
                default:
                    beginTimeInLock = 0;
                    return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result);
            }

            eclipseTimeInLock = this.defaultMessageStore.getSystemClock().now() - beginLockTimestamp;
            beginTimeInLock = 0;
        } finally {
            // 释放锁
            releasePutMessageLock();
        }

        if (eclipseTimeInLock > 500) {
            log.warn("[NOTIFYME]putMessage in lock cost time(ms)={}, bodyLength={} AppendMessageResult={}", eclipseTimeInLock, msg.getBody().length, result);
        }

        if (null != unlockMappedFile && this.defaultMessageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
            this.defaultMessageStore.unlockMappedFile(unlockMappedFile);
        }

        PutMessageResult putMessageResult = new PutMessageResult(PutMessageStatus.PUT_OK, result);

        // Statistics
        storeStatsService.getSinglePutMessageTopicTimesTotal(msg.getTopic()).incrementAndGet();
        storeStatsService.getSinglePutMessageTopicSizeTotal(topic).addAndGet(result.getWroteBytes());

        // 进行同步||异步 flush||commit
        GroupCommitRequest request = null;
        // Synchronization flush
        if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
            final GroupCommitService service = (GroupCommitService)this.flushCommitLogService;
            if (msg.isWaitStoreMsgOK()) {
                request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
                service.putRequest(request);
                boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
                if (!flushOK) {
                    log.error("do groupcommit, wait for flush failed, topic: " + msg.getTopic() + " tags: " + msg.getTags()
                        + " client address: " + msg.getBornHostString());
                    putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
                }
            } else {
                service.wakeup();
            }
        }
        // Asynchronous flush
        else {
            if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
                flushCommitLogService.wakeup(); //异步刷盘,使用MappedByteBuffer,默认策略           //##4
            } else {
                commitLogService.wakeup();  //异步刷盘,使用写入缓冲区+FileChannel                  //##3
            }
        }

        ......
    }


}

1 .

lockForPutMessage()

,借助AtomicBoolean类型变量实现自旋锁,线程阻塞和重启涉及到上下文间的切换,在线程停顿时间很短的情况下,自旋锁消耗的CPU资源比阻塞要低得多,即使在自旋期间CPU会一直空转

    /**
     * Spin util acquired the lock.
     * 获取 putMessage 锁
     */
    private void lockForPutMessage() {
        if (this.defaultMessageStore.getMessageStoreConfig().isUseReentrantLockWhenPutMessage()) {
            putMessageNormalLock.lock();
        } else {
            boolean flag;
            do {  //通过AtomicBoolean实现自旋锁,自旋直到当前线程获得锁
                flag = this.putMessageSpinLock.compareAndSet(true, false);
            }
            while (!flag);
        }
    }

2 . mappedFile.appendMessage(msg, this.appendMessageCallback) 在这个步骤中,会根据是否开启写入缓冲池来决定将消息写入到缓冲池writeBuffer中还是mappedByteBuffer,默认策略是写入mappedByteBuffer,借鉴其映射虚拟内存的特性实现极高速的写入性能。这个步骤仅仅是写入到流中,还没有实际同步到数据文件中。

    /**
     * 附加消息到文件。
     * 实际是插入映射文件buffer
     *
     * @param msg 消息
     * @param cb  逻辑
     * @return 附加消息结果
     */
    public AppendMessageResult appendMessage(final MessageExtBrokerInner msg, final AppendMessageCallback cb) {
        assert msg != null;
        assert cb != null;

        int currentPos = this.wrotePosition.get();

        if (currentPos < this.fileSize) {    
            //判断是否开启写入缓冲池
            ByteBuffer byteBuffer = writeBuffer != null ? writeBuffer.slice() : this.mappedByteBuffer.slice();
            byteBuffer.position(currentPos);
            AppendMessageResult result =
                cb.doAppend(this.getFileFromOffset(), byteBuffer, this.fileSize - currentPos, msg);
            this.wrotePosition.addAndGet(result.getWroteBytes());
            this.storeTimestamp = result.getStoreTimestamp();
            return result;
        }

        log.error("MappedFile.appendMessage return null, wrotePosition: " + currentPos + " fileSize: "
            + this.fileSize);
        return new AppendMessageResult(AppendMessageStatus.UNKNOWN_ERROR);
    }

3 . 当配置为异步刷盘且开启了写入缓冲池时,commitLogService.wakeup(),commitLogService在BrokerStartup启动时,会将其实例化为CommitRealTimeService类型的对象,这个对象间接继承自Thread,且在BrokerStartup启动时就执行其start()方法。 在commitLogService启动后一直在循环,且每休眠500ms执行一次Commit操作,但可能因为前提条件不满足而没有Commit成功。通过commitLogService.wakeup()能够立即唤醒Commit线程,让其在接受到消息的第一时间尝试Commit操作。

    /**
     * 实时 commit commitLog 线程服务
     */
    class CommitRealTimeService extends FlushCommitLogService {

        ......
        @Override
        public void run() {
            CommitLog.log.info(this.getServiceName() + " service started");
            while (!this.isStopped()) {
                int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitIntervalCommitLog();
                int commitDataLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogLeastPages();
                int commitDataThoroughInterval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogThoroughInterval();

                long begin = System.currentTimeMillis();
                //当最近200毫秒内没有消息Commit时,此次消息触发Commit
                if (begin >= (this.lastCommitTimestamp + commitDataThoroughInterval)) {
                    this.lastCommitTimestamp = begin;
                    commitDataLeastPages = 0;
                }

                try {
                    //Commit需要缓冲区内至少含有4页数据,也就是16KB,或者是最近200毫秒内没有消息Commit
                    boolean result = CommitLog.this.mappedFileQueue.commit(commitDataLeastPages);
                    long end = System.currentTimeMillis();
                    //代表着writeBuffer里的数据commit到了fileChannel中,
                    //可能是writeBuffer里数据超过16KB或者最近200毫秒内没有消息Commit
                    if (!result) {   
                        this.lastCommitTimestamp = end;
                        //now wake up flush thread.
                        flushCommitLogService.wakeup();
                    }

                    if (end - begin > 500) {
                        log.info("Commit data to file costs {} ms", end - begin);
                    }

                    // 等待执行
                    this.waitForRunning(interval);
                } catch (Throwable e) {
                    CommitLog.log.error(this.getServiceName() + " service has exception. ", e);
                }
            }
            //在循环退出也就是CommitLog Stop停止时,强制刷盘
            boolean result = false;
            for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
                result = CommitLog.this.mappedFileQueue.commit(0);
                CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
            }
            CommitLog.log.info(this.getServiceName() + " service end");
        }
    }

Commit是将写入缓冲池writeBuffer中的数据转移到fileChannel中,触发Commit有两种前提:

  1. 写入缓冲池内的数据页数超过了最小提交值,默认是4,也就是writeBuffer的缓冲了超过4*4KB=16KB的数据
  2. 最近200ms内未接收到消息,也就是没有消息写入到writeBuffer中,写入缓冲池的最大作用就是能够将多条消息合并后写入到fileChannel中,在一定程度上提高IO性能。但如果一段时间内没有消息,那么这个时间也正好可以当做写入时间,因为此时IO压力不大。

Commit仅仅是数据从缓冲池转移到fileChannel文件通道中,此时也还没有实际的同步到数据文件。
.

4 . 当配置为异步刷盘,未开启写入缓冲池(默认策略)时,flushCommitLogService.wakeup()。flushCommitLogService在BrokerStartup启动时,将其实例化为FlushRealTimeService类型的对象,它和commitLogService一样,间接继承自Thread,BrokerStartup启动时就开启循环,每500ms尝试执行Flush工作,但Flush需要有些前提条件,wakeup()能立即唤醒此线程,使其在接收到消息的第一时间尝试Flush。

    /**
     * 实时 flush commitLog 线程服务
     */
    class FlushRealTimeService extends FlushCommitLogService {
        ......
        @Override
        public void run() {
            CommitLog.log.info(this.getServiceName() + " service started");

            while (!this.isStopped()) {
                boolean flushCommitLogTimed = CommitLog.this.defaultMessageStore.getMessageStoreConfig().isFlushCommitLogTimed();
                int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushIntervalCommitLog();
                int flushPhysicQueueLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogLeastPages();
                int flushPhysicQueueThoroughInterval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogThoroughInterval();

                // Print flush progress
                boolean printFlushProgress = false;
                long currentTimeMillis = System.currentTimeMillis();
                // 当时间满足距离上次flush时间超过10秒时,即使写入的数量不足flushPhysicQueueLeastPages(4页,16KB),也进行flush
                if (currentTimeMillis >= (this.lastFlushTimestamp + flushPhysicQueueThoroughInterval)) {
                    this.lastFlushTimestamp = currentTimeMillis;
                    flushPhysicQueueLeastPages = 0;
                    printFlushProgress = (printTimes++ % 10) == 0;
                }

                try {
                    // 刷盘是否设置了定时,默认否
                    if (flushCommitLogTimed) {
                        Thread.sleep(interval);
                    } else {  //当没有消息触发wakeup()时,每休眠500毫秒执行一次flush;  当commit消息后触发wakeup(),若正在休眠则直接终止休眠,若不在休眠则跳过下次休眠
                        this.waitForRunning(interval);
                    }

                    if (printFlushProgress) {
                        this.printFlushProgress();
                    }

                    // flush commitLog
                    long begin = System.currentTimeMillis();
                    //刷盘至少需要4页数据,也就是16KB
                    CommitLog.this.mappedFileQueue.flush(flushPhysicQueueLeastPages);
                    long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
                    if (storeTimestamp > 0) {
                        CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
                    }
                    long past = System.currentTimeMillis() - begin;
                    if (past > 500) {
                        log.info("Flush data to disk costs {} ms", past);
                    }
                } catch (Throwable e) {
                    CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
                    this.printFlushProgress();
                }
            }

            // Normal shutdown, to ensure that all the flush before exit
            boolean result = false;
            for (int i = 0; i < RETRY_TIMES_OVER && !result; i++) {
                result = CommitLog.this.mappedFileQueue.flush(0);
                CommitLog.log.info(this.getServiceName() + " service shutdown, retry " + (i + 1) + " times " + (result ? "OK" : "Not OK"));
            }

            this.printFlushProgress();

            CommitLog.log.info(this.getServiceName() + " service end");
        }

    }

无论是否开启写入缓冲池,刷盘最终都由FlushRealTimeService来执行,CommitRealTimeService在Commit成功后,会执行flushCommitLogService.wakeup();也就是让FlushRealTimeService尝试将内存中的数据同步至磁盘。

是否实际将内存中的数据同步至磁盘,也就是刷盘有一些前提条件。

  1. 若当前时间距离上次实际刷盘时间已经超过10S,则会忽略其他所有前提,确定刷盘,这样即使服务器宕机了最多也仅丢失10S的数据,提高了消息队列的可靠性。
  2. 正常情况下刷盘需要满足持久化数据大于配置的最小页数,默认4,也就是新写入内存中的数据 >=(4*4KB=16KB),当开启写入缓冲,也就是追加到fileChannel的数据>=16KB,未开启写入缓冲则是追加到mappedByteBuffer的数据>=16KB

总结:

RocketMQ刷盘策略源码解析及总结_第1张图片

  1. 异步刷盘有两种策略,一种是writeBuffer+fileChannel,另一种是mappedByteByffer
  2. 在最开始写入数据时,writeBuffer+fileChannel的形式是写入到缓冲池writeBuffer中,而另一种则是写入mappedByteByffer中。
  3. writeBuffer+fileChannel形式相比mappedByteByffer多了一个写入缓冲池,当200ms内没有消息Commit成功或者缓冲了超过最小提交页数时,将writeBuffer内的数据Commit到fileChannel,比mappedByteByffer多了个Commit的过程。
  4. 两种形式的刷盘策略相同,都是距离上次刷盘后新写入的数据量大于最小页数或者是时间超过10S。
  5. 建议使用默认的mappedByteByffer,其映射虚拟内存的特性使得写入性能已经非常高了,不需要再额外开启写入缓冲。

你可能感兴趣的:(rocketmq)