rocketMQ存储 NO.2

在存储第一篇中主要说了一些存储文件的载体，和其他的管理类。至于消息的转换存储，中间的一些设计只是聊了一部分。

DefaultAppendMessageCallback 继续聊

之前将了消息的大小计算，计算好了以后，就可以进行其他的验证了

            // Exceeds the maximum message
            if (msgLen > this.maxMessageSize) {
                // 消息太大了
                CommitLog.log.warn("message size exceeded, msg total size: " + msgLen + ", msg body size: " + bodyLength
                    + ", maxMessageSize: " + this.maxMessageSize);
                return new AppendMessageResult(AppendMessageStatus.MESSAGE_SIZE_EXCEEDED);
            }

            // Determines whether there is sufficient free space
            if ((msgLen + END_FILE_MIN_BLANK_LENGTH) > maxBlank) {
                // 在消息放入到buffer中，放入失败，需要将buffer标识为文件已满。
                this.resetByteBuffer(this.msgStoreItemMemory, maxBlank);
                // END_FILE_MIN_BLANK_LENGTH 这个长度放在文件的末尾，当读取到该位置时，发现时BLANK_MAGIC_CODE，
                // 说明文件读取到尽头了，该换一个文件读取
                // 1 TOTALSIZE
                this.msgStoreItemMemory.putInt(maxBlank); // 先写入剩余的内容的长度
                // 2 MAGICCODE
                this.msgStoreItemMemory.putInt(CommitLog.BLANK_MAGIC_CODE); // 设置一个标识位，表示读到尽头了
                // 3 The remaining space may be any value
                // Here the length of the specially set maxBlank
                final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
                byteBuffer.put(this.msgStoreItemMemory.array(), 0, maxBlank);
                return new AppendMessageResult(AppendMessageStatus.END_OF_FILE, wroteOffset, maxBlank, msgId, msgInner.getStoreTimestamp(),
                    queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);
            }

首先消息的长度不能太大了，太大就是非法的消息。maxBlank参数是mappedFile中计算剩余多大的容量。但是这里的判断是否大于剩余量时，是通过消息的长度加上END_FILE_MIN_BLANK_LENGTH 的长度比较的。并且END_FILE_MIN_BLANK_LENGTH是默认的8个字节，有什么用意呢？往下看，重置了msgStoreItemMemory，并且limit为maxBlank。但是先放入4个字节的maxBlank长度，然后又放入4个字节的CommitLog.BLANK_MAGIC_CODE。在计算消息的大小时，也有4个字节的消息大小，和一个4个字节的MAGICCODE标识。

   public final static int MESSAGE_MAGIC_CODE = -626843481;
    protected final static int BLANK_MAGIC_CODE = -875286124;

看一下，他们的code，一个标识消息，一个标识空白。可以看出来，在怎么放消息到MappedFile中，文件是需要满或者结束的，那怎么标识这个文件内容获取时，没有消息了呢，就可以通过BLANK_MAGIC_CODE标识，说明该文件的存储的消息已经结束，后面的内容都是空。在查询消息功能时，读取到BLANK状态时就可以停止了，往下查询了。所以每个文件的结尾必须要包含BLANK_MAGIC_CODE，从而就需要自动占用8个字节了。最后返回的AppendMessageResult中状态为END_OF_FILE，告诉调用方，文件满了，需要重新创建新的MappedFile。在考虑一下，MappedFile在调用callBack方法时，会将自身的wrote值对result中写入的数量进行累加的，那么就算文件不能继续写了，也要告诉MappedFile本次写入多少长度，所以在AppendMessageResult中的wroteBytes参数值就是maxBlank值了。

            // Initialization of storage space
            this.resetByteBuffer(msgStoreItemMemory, msgLen);
            // 1 TOTALSIZE
            this.msgStoreItemMemory.putInt(msgLen);
            // 2 MAGICCODE
            this.msgStoreItemMemory.putInt(CommitLog.MESSAGE_MAGIC_CODE); // 表示是个消息
            // 3 BODYCRC
            this.msgStoreItemMemory.putInt(msgInner.getBodyCRC());
            // 4 QUEUEID
            this.msgStoreItemMemory.putInt(msgInner.getQueueId());
            // 5 FLAG
            this.msgStoreItemMemory.putInt(msgInner.getFlag());
            // 6 QUEUEOFFSET
            this.msgStoreItemMemory.putLong(queueOffset);
            // 7 PHYSICALOFFSET
            this.msgStoreItemMemory.putLong(fileFromOffset + byteBuffer.position());
            // 8 SYSFLAG
            this.msgStoreItemMemory.putInt(msgInner.getSysFlag());
            // 9 BORNTIMESTAMP
            this.msgStoreItemMemory.putLong(msgInner.getBornTimestamp());
            // 10 BORNHOST
            this.resetByteBuffer(hostHolder, 8);
            this.msgStoreItemMemory.put(msgInner.getBornHostBytes(hostHolder));
            // 11 STORETIMESTAMP
            this.msgStoreItemMemory.putLong(msgInner.getStoreTimestamp());
            // 12 STOREHOSTADDRESS
            this.resetByteBuffer(hostHolder, 8);
            this.msgStoreItemMemory.put(msgInner.getStoreHostBytes(hostHolder));
            //this.msgBatchMemory.put(msgInner.getStoreHostBytes());
            // 13 RECONSUMETIMES
            this.msgStoreItemMemory.putInt(msgInner.getReconsumeTimes());
            // 14 Prepared Transaction Offset
            this.msgStoreItemMemory.putLong(msgInner.getPreparedTransactionOffset());
            // 15 BODY
            this.msgStoreItemMemory.putInt(bodyLength);
            if (bodyLength > 0)
                this.msgStoreItemMemory.put(msgInner.getBody());
            // 16 TOPIC
            this.msgStoreItemMemory.put((byte) topicLength);
            this.msgStoreItemMemory.put(topicData);
            // 17 PROPERTIES
            this.msgStoreItemMemory.putShort((short) propertiesLength);
            if (propertiesLength > 0)
                this.msgStoreItemMemory.put(propertiesData);

            final long beginTimeMills = CommitLog.this.defaultMessageStore.now();
            // Write messages to the queue buffer
            byteBuffer.put(this.msgStoreItemMemory.array(), 0, msgLen);

            AppendMessageResult result = new AppendMessageResult(AppendMessageStatus.PUT_OK, wroteOffset, msgLen, msgId,
                msgInner.getStoreTimestamp(), queueOffset, CommitLog.this.defaultMessageStore.now() - beginTimeMills);

            switch (tranType) {
                case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
                case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
                    break;
                case MessageSysFlag.TRANSACTION_NOT_TYPE:
                case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
                    // The next update ConsumeQueue information
                    // 当消息没有事务，或者事务提交，则更新queue偏移量
                    CommitLog.this.topicQueueTable.put(key, ++queueOffset);
                    break;
                default:
                    break;
            }
            return result;

接下来就是消息转发成byte数组了，依次按规则写入到msgStoreItemMemory中，最终msgStoreItemMemory写入到byteBuffer中。其中有个queueOffset参数是在第6个次序写入的，并且在事务提交或者没有事务时，进行++queueOffset操作，放入到topicQueueTable中。说明queueOffset依次递增的，他的作用是什么呢？
DefaultAppendMessageCallback的append方法已经大概了解，本文只是讲了单个消息放置，当然还提供了批量消息放置，原理都差不多
再回到CommitLog中putMessag方法剩余片段

        if (null != unlockMappedFile && this.defaultMessageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
            this.defaultMessageStore.unlockMappedFile(unlockMappedFile);
        }

        PutMessageResult putMessageResult = new PutMessageResult(PutMessageStatus.PUT_OK, result);

        // Statistics
        storeStatsService.getSinglePutMessageTopicTimesTotal(msg.getTopic()).incrementAndGet();
        storeStatsService.getSinglePutMessageTopicSizeTotal(topic).addAndGet(result.getWroteBytes());

        handleDiskFlush(result, putMessageResult, msg);
        handleHA(result, putMessageResult, msg);

        return putMessageResult;

该逻辑就是解锁掉unlockMappedFile文件，即释放掉文件与内存映射关系映射，因为不需要再写了，只剩下读了。然后做个统计同一个topic的生成次数，和消息大小。
先是处理磁盘刷新的逻辑，因为broker支持同步刷盘和异步刷盘的。同步刷屏的好处就是保证数据不丢失，但是性能会降低很多；异步刷屏则就有可能会丢消息数据了。那么就看看同步和异步是如何实现的把？

    public void handleDiskFlush(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
        // Synchronization flush
        if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
            final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
            if (messageExt.isWaitStoreMsgOK()) {
                GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
                service.putRequest(request);
                boolean flushOK = request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
                if (!flushOK) {
                    log.error("do groupcommit, wait for flush failed, topic: " + messageExt.getTopic() + " tags: " + messageExt.getTags()
                        + " client address: " + messageExt.getBornHostString());
                    putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_DISK_TIMEOUT);
                }
            } else {
                service.wakeup();
            }
        }
        // Asynchronous flush
        else {
            if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
                flushCommitLogService.wakeup();
            } else {
                commitLogService.wakeup();
            }
        }
    }

首先看一下异步方式，他只是通过ThreadService方法中唤醒线程wakeup()，该flush消息线程就可以唤醒。看一下FlushRealTimeService 实时刷新服务类
在线程实现方法中

                try {
                    if (flushCommitLogTimed) {
                        Thread.sleep(interval);
                    } else {
                        this.waitForRunning(interval);
                    }

                    if (printFlushProgress) {
                        this.printFlushProgress();
                    }

                    long begin = System.currentTimeMillis();
                    CommitLog.this.mappedFileQueue.flush(flushPhysicQueueLeastPages);
                    long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
                    if (storeTimestamp > 0) {
                        // 物理消息时间戳更新
                        CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
                    }
                    long past = System.currentTimeMillis() - begin;
                    if (past > 500) {
                        log.info("Flush data to disk costs {} ms", past);
                    }
                } catch (Throwable e) {
                    CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
                    this.printFlushProgress();
                }

在刷新之前都会等待或者sleep一段时间，然后通过mappedFileQueue执行flush方法，并且更新了StoreCheckPoint的存储消息的时间。异步刷新很简单，可以通过其他线程唤醒刷新线程，执行刷盘操作。
同步刷新时，声明了GroupCommitRequest请求，并且设置了内部属性nextOffset的值，该值是由消息的存储起始位置+消息的写入长度组合的。将该Request放入到了GroupCommitService服务中的Request列表中。该Request也存在倒计时监听器，所以这段代码request.waitForFlush()，进行等待刷新完成。
GroupCommitService中代码片段

        private volatile List requestsWrite = new ArrayList();
        private volatile List requestsRead = new ArrayList();

        public synchronized void putRequest(final GroupCommitRequest request) {
            synchronized (this.requestsWrite) {
                this.requestsWrite.add(request);
            }
            if (hasNotified.compareAndSet(false, true)) {
                waitPoint.countDown(); // notify
            }
        }

首先定义了2个属性集合，一个是请求写，一个请求读。在放入请求时，是将request对象放入到requestsWrite里面的，并且是锁住requestsWrite对象。然后唤醒ServiceThread线程。在唤起线程是，会调用onWaiteEnd方法，而GroupCommitService实现该方法时调用了swapRequests()方法，

        private void swapRequests() {
            List tmp = this.requestsWrite;
            this.requestsWrite = this.requestsRead;
            this.requestsRead = tmp;
        }

起始就是将读写的集合进行交换而已。而线程唤醒后，就会调用doCommit方法

        private void doCommit() {
            synchronized (this.requestsRead) {
                if (!this.requestsRead.isEmpty()) {
                    for (GroupCommitRequest req : this.requestsRead) {
                        // There may be a message in the next file, so a maximum of
                        // two times the flush
                        boolean flushOK = false;
                        for (int i = 0; i < 2 && !flushOK; i++) {
                            flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();

                            if (!flushOK) {
                                CommitLog.this.mappedFileQueue.flush(0);
                            }
                        }

                        req.wakeupCustomer(flushOK);
                    }

                    long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
                    if (storeTimestamp > 0) {
                        CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
                    }

                    this.requestsRead.clear();
                } else {
                    // Because of individual messages is set to not sync flush, it
                    // will come to this process
                    CommitLog.this.mappedFileQueue.flush(0);
                }
            }
        }

首先对读集合进行锁定。在这里依次遍历所有的请求，然后判断mappedFileQueue中FlushedWhere与请求中nextOffset比较，如果大于则刷新成功了，就可以直接唤醒等待request请求的线程，如果小于则调用mappedFileQueue的flush方法。并且可以保证2次刷新。通过这种方式，实现消息的同步刷屏的，但是性能的确不是很高。
刷新磁盘后，还有handleHA()方法，该方法是高可用消息的处理方式，如何实现的，后面会专门聊聊如何实现 Master/Slave功能

考虑一下几点问题？
1.已经存储的消息，存储在commitLog中的消息都是各种类型的topic消息，包括有延迟消息，事务消息，普通消息如何区分消费；
2.由于commitLog中存储的所有的消息，消息的查询设计的不好，效率特别低，最终导致消费进度缓慢
3.还有一些特别需求，例如通过关键字或者时间段，检索消息，这些都是需要设计良好的方式，提升查询效率，从而可以加快消费进度。

ReputMessageService

该类是DefaultMessageStore的内部类，它继承与ServiceThread类，也是一个线程类，在该类中只有一个属性 reputFromOffset 简单解释为重放偏移量。既然是实现线程接口，看一下run方法

        public void run() {
            DefaultMessageStore.log.info(this.getServiceName() + " service started");

            while (!this.isStopped()) {
                try {
                    Thread.sleep(1);
                    this.doReput();
                } catch (Exception e) {
                    DefaultMessageStore.log.warn(this.getServiceName() + " service has exception. ", e);
                }
            }

            DefaultMessageStore.log.info(this.getServiceName() + " service end");
        }

内部也是一个线程自循环，不停的调用doReput()方法。

        private void doReput() {
            if (this.reputFromOffset < DefaultMessageStore.this.commitLog.getMinOffset()) {
                log.warn("The reputFromOffset={} is smaller than minPyOffset={}, this usually indicate that the dispatch behind too much and the commitlog has expired.",
                    this.reputFromOffset, DefaultMessageStore.this.commitLog.getMinOffset());
                this.reputFromOffset = DefaultMessageStore.this.commitLog.getMinOffset();
            }
            for (boolean doNext = true; this.isCommitLogAvailable() && doNext; ) {

                if (DefaultMessageStore.this.getMessageStoreConfig().isDuplicationEnable()
                    && this.reputFromOffset >= DefaultMessageStore.this.getConfirmOffset()) {
                    break;
                }

                SelectMappedBufferResult result = DefaultMessageStore.this.commitLog.getData(reputFromOffset);
                if (result != null) {
                    try {
                        this.reputFromOffset = result.getStartOffset();

                        for (int readSize = 0; readSize < result.getSize() && doNext; ) {
                            DispatchRequest dispatchRequest =
                                DefaultMessageStore.this.commitLog.checkMessageAndReturnSize(result.getByteBuffer(), false, false);
                            int size = dispatchRequest.getBufferSize() == -1 ? dispatchRequest.getMsgSize() : dispatchRequest.getBufferSize();

                            if (dispatchRequest.isSuccess()) {
                                if (size > 0) {
                                    DefaultMessageStore.this.doDispatch(dispatchRequest);

                                    if (BrokerRole.SLAVE != DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole()
                                        && DefaultMessageStore.this.brokerConfig.isLongPollingEnable()) {
                                        DefaultMessageStore.this.messageArrivingListener.arriving(dispatchRequest.getTopic(),
                                            dispatchRequest.getQueueId(), dispatchRequest.getConsumeQueueOffset() + 1,
                                            dispatchRequest.getTagsCode(), dispatchRequest.getStoreTimestamp(),
                                            dispatchRequest.getBitMap(), dispatchRequest.getPropertiesMap());
                                    }

                                    this.reputFromOffset += size;
                                    readSize += size;
                                    if (DefaultMessageStore.this.getMessageStoreConfig().getBrokerRole() == BrokerRole.SLAVE) {
                                        DefaultMessageStore.this.storeStatsService
                                            .getSinglePutMessageTopicTimesTotal(dispatchRequest.getTopic()).incrementAndGet();
                                        DefaultMessageStore.this.storeStatsService
                                            .getSinglePutMessageTopicSizeTotal(dispatchRequest.getTopic())
                                            .addAndGet(dispatchRequest.getMsgSize());
                                    }
                                } else if (size == 0) {
                                    // 读取到了文件的末尾，重新换个文件读取
                                    this.reputFromOffset = DefaultMessageStore.this.commitLog.rollNextFile(this.reputFromOffset);
                                    readSize = result.getSize();
                                }
                            } else if (!dispatchRequest.isSuccess()) {

                                if (size > 0) {
                                    log.error("[BUG]read total count not equals msg total size. reputFromOffset={}", reputFromOffset);
                                    this.reputFromOffset += size;
                                } else {
                                    doNext = false;
                                    // If user open the dledger pattern or the broker is master node,
                                    // it will not ignore the exception and fix the reputFromOffset variable
                                    if (DefaultMessageStore.this.getMessageStoreConfig().isEnableDLegerCommitLog() ||
                                        DefaultMessageStore.this.brokerConfig.getBrokerId() == MixAll.MASTER_ID) {
                                        log.error("[BUG]dispatch message to consume queue error, COMMITLOG OFFSET: {}",
                                            this.reputFromOffset);
                                        this.reputFromOffset += result.getSize() - readSize;
                                    }
                                }
                            }
                        }
                    } finally {
                        result.release();
                    }
                } else {
                    doNext = false;
                }
            }
        }

先通过reputFromOffset偏移量从commitLog中的MappedFile中截取剩余部分的所有消息内容SelectMappedBufferResult，之前在MappedFile中也讲过SelectMappedBufferResult可能会存在多条消息，他不是只有一条数据，因为他截取的部分是从reputFromOffset到MappedFile的wrotePositon位置的数据。获取到SelectMappedBufferResult时就开始遍历数据。由于result中的ByteBuffer是顺序读取，所以内部的pos位置随着读取也会越来越大，但是不需要重置。通过CommitLog中的checkMessageAndReturnSize方法，就可以知道一个消息的大致信息，

public class DispatchRequest {
    private final String topic;
    private final int queueId;
    private final long commitLogOffset;
    private int msgSize;
    private final long tagsCode;
    private final long storeTimestamp;
    private final long consumeQueueOffset;
    private final String keys;
    private final boolean success;
    private final String uniqKey;

    private final int sysFlag;
    private final long preparedTransactionOffset;
    private final Map propertiesMap;
    private byte[] bitMap;

    private int bufferSize = -1;//the buffer size maybe larger than the msg size if the message is wrapped by something
    // .....
}

其中得到的消息内容都是一些关键属性，例如topic，queueId，msgSize等等，这些属性有什么用，继续讲。因为得到dispatchRequest的结果不太相同的，例如文件读到MAGIC_BLANK_CODE怎么处理的。首先dispatchRequest返回成功的，都是正常去读的，如果size大于0，存在消息。如果size=0说明文件末尾了，需要换下一个文件读取了，在这里commitLog.rollNextFile(reputFromOffset)就是指向了下一个文件的起始偏移量。在存在消息的时，首先调用了了doDispatch(request) 分发消息的方法，通过判断条件进行执行消息到达监听器，将消息的reputFromOffset加上了消息的size长度，然后做一些统计。重放线程主要功能还是在doDispatch()方法内。

    public void doDispatch(DispatchRequest req) {
        // 消息入磁盘成功，还有后续处理，例如创建索引，放入到消费队列中，
        for (CommitLogDispatcher dispatcher : this.dispatcherList) {
            dispatcher.dispatch(req);
        }
    }

CommitLogDispatcher 消息分发的接口，在doDispatch方法只是遍历一遍分发接口实现类，那么有哪些实现类

CommitLogDispatcherBuildConsumeQueue 构建消费队列

        @Override
        public void dispatch(DispatchRequest request) {
            final int tranType = MessageSysFlag.getTransactionValue(request.getSysFlag());
            switch (tranType) {
                case MessageSysFlag.TRANSACTION_NOT_TYPE:
                case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
                    DefaultMessageStore.this.putMessagePositionInfo(request);
                    break;
                case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
                case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
                    break;
            }
        }

先判断消息的事务类型，如果是无事务或者事务提交，则执行putMessagePositionInfo方法，如果其他事务则不做任何处理。

    public void putMessagePositionInfo(DispatchRequest dispatchRequest) {
        // 得到消费队列，然后进行数据更新
        ConsumeQueue cq = this.findConsumeQueue(dispatchRequest.getTopic(), dispatchRequest.getQueueId());
        cq.putMessagePositionInfoWrapper(dispatchRequest);
    }

ok，这里就引入了ConsumeQueue的消费队列，在生成的时候已经选择好放入那个topic下的队列编号，那么对于消费组，也应该知道消费的是哪个消费队列。基本上一个生成队列对应一个消费队列，除非读写权限控制了。

    public ConsumeQueue findConsumeQueue(String topic, int queueId) {
        ConcurrentMap map = consumeQueueTable.get(topic);
        if (null == map) {
            ConcurrentMap newMap = new ConcurrentHashMap(128);
            ConcurrentMap oldMap = consumeQueueTable.putIfAbsent(topic, newMap);
            if (oldMap != null) {
                map = oldMap;
            } else {
                map = newMap;
            }
        }

        ConsumeQueue logic = map.get(queueId);
        if (null == logic) {
            ConsumeQueue newLogic = new ConsumeQueue(
                topic,
                queueId,
                StorePathConfigHelper.getStorePathConsumeQueue(this.messageStoreConfig.getStorePathRootDir()),
                this.getMessageStoreConfig().getMappedFileSizeConsumeQueue(),
                this);
            ConsumeQueue oldLogic = map.putIfAbsent(queueId, newLogic);
            if (oldLogic != null) {
                logic = oldLogic;
            } else {
                logic = newLogic;
            }
        }

        return logic;
    }

查找消费队列，首先在一个broker下，topic是唯一的，但是topic下可以有多个不同编号的queueId组成的消费队列ConsumeQueue。属性一下消费队列的属性信息

    public ConsumeQueue(
        final String topic,
        final int queueId,
        final String storePath,
        final int mappedFileSize,
        final DefaultMessageStore defaultMessageStore) {
        this.storePath = storePath;
        this.mappedFileSize = mappedFileSize;
        this.defaultMessageStore = defaultMessageStore;

        this.topic = topic;
        this.queueId = queueId;

        String queueDir = this.storePath
            + File.separator + topic
            + File.separator + queueId;

        this.mappedFileQueue = new MappedFileQueue(queueDir, mappedFileSize, null);

        this.byteBufferIndex = ByteBuffer.allocate(CQ_STORE_UNIT_SIZE);

        if (defaultMessageStore.getMessageStoreConfig().isEnableConsumeQueueExt()) {
            this.consumeQueueExt = new ConsumeQueueExt(
                topic,
                queueId,
                StorePathConfigHelper.getStorePathConsumeQueueExt(defaultMessageStore.getMessageStoreConfig().getStorePathRootDir()),
                defaultMessageStore.getMessageStoreConfig().getMappedFileSizeConsumeQueueExt(),
                defaultMessageStore.getMessageStoreConfig().getBitMapLengthConsumeQueueExt()
            );
        }
    }

这是一个消费队列的构造器方法，包含了topic，queueId，也需要MappedFileQueue映射文件队列，说明该消费队列也是需要存储数据的，只是他与CommitLog存储的内容可能不同而已。定义了文件的大小mappedFileSize，和其他的存储根地址等等。
this.byteBufferIndex = ByteBuffer.allocate(CQ_STORE_UNIT_SIZE);
这段代码是申请了CQ_STORE_UNIT_SIZE=20长度的字节，为什么是20个字节？下面继续说。在通过topic和queueId查询得到了一个ConsumeQueue，然后执行cq.putMessagePositionInfoWrapper方法。

    public void putMessagePositionInfoWrapper(DispatchRequest request) {
        final int maxRetries = 30;
        boolean canWrite = this.defaultMessageStore.getRunningFlags().isCQWriteable();
        for (int i = 0; i < maxRetries && canWrite; i++) {
            long tagsCode = request.getTagsCode();
            if (isExtWriteEnable()) {
                // ...
            }
            boolean result = this.putMessagePositionInfo(request.getCommitLogOffset(),
                request.getMsgSize(), tagsCode, request.getConsumeQueueOffset());
            if (result) {
                this.defaultMessageStore.getStoreCheckpoint().setLogicsMsgTimestamp(request.getStoreTimestamp());
                return;
            } else {
                // XXX: warn and notify me
                log.warn("[BUG]put commit log position info to " + topic + ":" + queueId + " " + request.getCommitLogOffset()
                    + " failed, retry " + i + " times");

                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    log.warn("", e);
                }
            }
        }

        // XXX: warn and notify me
        log.error("[BUG]consume queue can not write, {} {}", this.topic, this.queueId);
        this.defaultMessageStore.getRunningFlags().makeLogicsQueueError();
    }

在执行putMessagePositionInfo方法，然后更新StoreCheckPoint中logicsMsgTimestamp方法。如果失败，则继续尝试。

    private boolean putMessagePositionInfo(final long offset, final int size, final long tagsCode,
        final long cqOffset) {

        if (offset + size <= this.maxPhysicOffset) {
            log.warn("Maybe try to build consume queue repeatedly maxPhysicOffset={} phyOffset={}", maxPhysicOffset, offset);
            return true;
        }

        this.byteBufferIndex.flip(); // 长度缩短limit=pos，并且重置pos=0位置
        this.byteBufferIndex.limit(CQ_STORE_UNIT_SIZE);
        this.byteBufferIndex.putLong(offset);
        this.byteBufferIndex.putInt(size);
        this.byteBufferIndex.putLong(tagsCode);

        // cqOffset是编号，他的真实地址是CQ_STORE_UNIT_SIZE的倍数
        final long expectLogicOffset = cqOffset * CQ_STORE_UNIT_SIZE;

        MappedFile mappedFile = this.mappedFileQueue.getLastMappedFile(expectLogicOffset);
        if (mappedFile != null) {

            if (mappedFile.isFirstCreateInQueue() && cqOffset != 0 && mappedFile.getWrotePosition() == 0) {
                // 当文件时队列中的第一个，且消费对了的偏移量不为0.文件中写入的数据为0，需要重置一下flush，commit偏移量
                this.minLogicOffset = expectLogicOffset;
                this.mappedFileQueue.setFlushedWhere(expectLogicOffset);
                this.mappedFileQueue.setCommittedWhere(expectLogicOffset);
                // 并且填充之前的数据，
                this.fillPreBlank(mappedFile, expectLogicOffset);
                log.info("fill pre blank space " + mappedFile.getFileName() + " " + expectLogicOffset + " "
                    + mappedFile.getWrotePosition());
            }

            if (cqOffset != 0) {
                // 进行校验，保证expectLogicOffset的偏移量与真正需要写入的位置时一致的
                long currentLogicOffset = mappedFile.getWrotePosition() + mappedFile.getFileFromOffset();

                if (expectLogicOffset < currentLogicOffset) {
                    log.warn("Build  consume queue repeatedly, expectLogicOffset: {} currentLogicOffset: {} Topic: {} QID: {} Diff: {}",
                        expectLogicOffset, currentLogicOffset, this.topic, this.queueId, expectLogicOffset - currentLogicOffset);
                    return true;
                }

                if (expectLogicOffset != currentLogicOffset) {
                    LOG_ERROR.warn(
                        "[BUG]logic queue order maybe wrong, expectLogicOffset: {} currentLogicOffset: {} Topic: {} QID: {} Diff: {}",
                        expectLogicOffset,
                        currentLogicOffset,
                        this.topic,
                        this.queueId,
                        expectLogicOffset - currentLogicOffset
                    );
                }
            }
            this.maxPhysicOffset = offset + size;
            // 将byte数组添加到mappedFile中
            return mappedFile.appendMessage(this.byteBufferIndex.array());
        }
        return false;
    }

所以putMessagePositionInfo才是真正核心的方法，在方法中参数包括了物理偏移量 offset，消息大小size，消息tags码，和消息逻辑顺序cqOffset（在存储消息时，会通过topic从CommitLog中topicQueueTable 中得到一个顺序偏移量，消息存储成功就行进行自身+1 操作）。byteBufferIndex之前说过长度为20个字节，是固定的。在这里他放了哪些信息，8个字节的物理偏移量offset，4个字节的消息长度size，8个字节的togasCode，刚刚组成20个字节，也说明了一个消息转换成ConsumeQueu信息时，只存储了3个属性值，并且是固定长度的20个字节。expectLogicOffset是存储byteBufferIndex内容的起始位置，通过MappedFileQueue得到了MappedFile。在确认一下ConsumeQueue中一个MappedFile文件大小：

    public int getMappedFileSizeConsumeQueue() {

        int factor = (int) Math.ceil(this.mappedFileSizeConsumeQueue / (ConsumeQueue.CQ_STORE_UNIT_SIZE * 1.0));
        return (int) (factor * ConsumeQueue.CQ_STORE_UNIT_SIZE);
    }

他是20的倍数，说明一个文件是可以完整的记录factor数量的数据。不需要类似消息存储一样，需要有个结尾结束标识。在看一下存储的消费信息，当获取符合条件的MappedFile时，判断了该文件是否第一次创建即完全是没有写入过数据的，该文件需要初始化，设置了最小的逻辑偏移量minLogicOffset ，更新了刷新和提交的位置，还执行了fillPreBlank填充方法。其他的都是一些偏移量校验过程，然后更新maxPhysicOffset，相同topic下的最大物理偏移量，然后将byteBufferIndex转换成byte数组添加到mappedFile中。
好了ConsumeQueue有什么特点，
1.他的存储数据格式固定的，20个字节大小。2.他可以侧面看到，topic下的消息存储情况。3.由于commitLog是存储了所有的消息，但是通过不同的topic和queueId时，存储的简化数据，方便以后数据定位及查找。4.有些系统自定义的topic，例如延迟类型的topic，或者重试这样的topic，系统可以进行单独管理和分配。

CommitLogDispatcherBuildIndex 构建索引分发器

    class CommitLogDispatcherBuildIndex implements CommitLogDispatcher {

        @Override
        public void dispatch(DispatchRequest request) {
            if (DefaultMessageStore.this.messageStoreConfig.isMessageIndexEnable()) {
                DefaultMessageStore.this.indexService.buildIndex(request);
            }
        }
    }

为了方便消息内容查询，例如数据库设计中，引入索引，就能快速定位到具体消息的位置。在RocketMQ中的设计，索引的存储设计，采用数组及链表结合方式的数据结构，与HashMap中的结构设计类似。索引管理通过IndexService索引服务控制。

    public void buildIndex(DispatchRequest req) {
        IndexFile indexFile = retryGetAndCreateIndexFile();
        if (indexFile != null) {
            long endPhyOffset = indexFile.getEndPhyOffset();
            DispatchRequest msg = req;
            String topic = msg.getTopic();
            String keys = msg.getKeys();
            if (msg.getCommitLogOffset() < endPhyOffset) {
                return;
            }

            final int tranType = MessageSysFlag.getTransactionValue(msg.getSysFlag());
            switch (tranType) {
                case MessageSysFlag.TRANSACTION_NOT_TYPE:
                case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
                case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
                    break;
                case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
                    return;
            }

            if (req.getUniqKey() != null) {
                indexFile = putKey(indexFile, msg, buildKey(topic, req.getUniqKey()));
                if (indexFile == null) {
                    log.error("putKey error commitlog {} uniqkey {}", req.getCommitLogOffset(), req.getUniqKey());
                    return;
                }
            }

            if (keys != null && keys.length() > 0) {
                String[] keyset = keys.split(MessageConst.KEY_SEPARATOR);
                for (int i = 0; i < keyset.length; i++) {
                    String key = keyset[i];
                    if (key.length() > 0) {
                        indexFile = putKey(indexFile, msg, buildKey(topic, key));
                        if (indexFile == null) {
                            log.error("putKey error commitlog {} uniqkey {}", req.getCommitLogOffset(), req.getUniqKey());
                            return;
                        }
                    }
                }
            }
        } else {
            log.error("build index error, stop building index");
        }
    }

索引存储的结构和内容，在该代码片段中，首先获取到IndexFile索引文件，然后通过DispatchRequest中topic，keys，uniqKey等属性进行放置。尤其是keys，他是多个关键字组成，但都会拆分多个key，与topic组合成最终的key进行存储。索引一个消息可以有多个关键字组成，或者一个唯一关键字组成。那么IndexFile是如何存储索引内容的。

    public IndexFile(final String fileName, final int hashSlotNum, final int indexNum,
        final long endPhyOffset, final long endTimestamp) throws IOException {
        // 一个索引文件最大需需要占的子节，有头文件（40）+ 槽 + 索引信息
        int fileTotalSize =
            IndexHeader.INDEX_HEADER_SIZE + (hashSlotNum * hashSlotSize) + (indexNum * indexSize);
        this.mappedFile = new MappedFile(fileName, fileTotalSize);
        this.fileChannel = this.mappedFile.getFileChannel();
        this.mappedByteBuffer = this.mappedFile.getMappedByteBuffer();
        this.hashSlotNum = hashSlotNum;
        this.indexNum = indexNum;

        ByteBuffer byteBuffer = this.mappedByteBuffer.slice();
        this.indexHeader = new IndexHeader(byteBuffer);

        // 初始化时，设置了起始的头部信息
        if (endPhyOffset > 0) {
            this.indexHeader.setBeginPhyOffset(endPhyOffset);
            this.indexHeader.setEndPhyOffset(endPhyOffset);
        }

        if (endTimestamp > 0) {
            this.indexHeader.setBeginTimestamp(endTimestamp);
            this.indexHeader.setEndTimestamp(endTimestamp);
        }
    }

这个索引文件的构造器，
1.先是定义了文件的大小fileTotalSize，并且已经确定了他的组成部门，包括INDEX_HEADER_SIZE长度，hash槽的长度，索引的长度。也可以看出索引文件是三部分组成的。头文件，hash槽数据，索引数据组成。
2.定义了索引文件的槽数量，和索引数量
3.得到了IndexHeader，索引头数据。

    private AtomicLong beginTimestamp = new AtomicLong(0);
    private AtomicLong endTimestamp = new AtomicLong(0);
    private AtomicLong beginPhyOffset = new AtomicLong(0);
    private AtomicLong endPhyOffset = new AtomicLong(0);
    private AtomicInteger hashSlotCount = new AtomicInteger(0);

    private AtomicInteger indexCount = new AtomicInteger(1);

indexHeader由6个属性组成，开始，结束时间。开始结束物理偏移量。槽数量，索引数量。因为IndexHeader也是存储在磁盘中的，从属性中，可以确定一个IndexHeader占用了40个字节。

    public boolean putKey(final String key, final long phyOffset, final long storeTimestamp) {
        if (this.indexHeader.getIndexCount() < this.indexNum) {
            int keyHash = indexKeyHashMethod(key); // 通过key hash进行分配
            int slotPos = keyHash % this.hashSlotNum; // 槽的位置
            int absSlotPos = IndexHeader.INDEX_HEADER_SIZE + slotPos * hashSlotSize; // 已经占的位置，头文件和所属槽的地址

            FileLock fileLock = null;

            try {

                // fileLock = this.fileChannel.lock(absSlotPos, hashSlotSize,
                // false);
                // 记录了这个槽对应的值，该值是记录最近一次put索引时的索引位置，但初始都是0
                // 那么索引的位置怎么拿到?通过header中的indexCount获取
                int slotValue = this.mappedByteBuffer.getInt(absSlotPos);
                if (slotValue <= invalidIndex || slotValue > this.indexHeader.getIndexCount()) {
                    slotValue = invalidIndex;
                }

                long timeDiff = storeTimestamp - this.indexHeader.getBeginTimestamp();

                timeDiff = timeDiff / 1000;

                if (this.indexHeader.getBeginTimestamp() <= 0) {
                    timeDiff = 0;
                } else if (timeDiff > Integer.MAX_VALUE) {
                    timeDiff = Integer.MAX_VALUE;
                } else if (timeDiff < 0) {
                    timeDiff = 0;
                }

                // 需要put索引的真正位置
                int absIndexPos =
                    IndexHeader.INDEX_HEADER_SIZE + this.hashSlotNum * hashSlotSize
                        + this.indexHeader.getIndexCount() * indexSize;

                // 一个索引所占的位置，4个byte=hash值，8个byte=消息物理偏移量，4个byte=时间差，4个byte=上一个索引的位置
                // 这个索引的设计类似与HashMap的结构设计，采用数组与链表的形式
                this.mappedByteBuffer.putInt(absIndexPos, keyHash);
                this.mappedByteBuffer.putLong(absIndexPos + 4, phyOffset);
                this.mappedByteBuffer.putInt(absIndexPos + 4 + 8, (int) timeDiff); // 为什么要记录时间差？因为省空间
                this.mappedByteBuffer.putInt(absIndexPos + 4 + 8 + 4, slotValue);

                // 重新更新一下槽当前的索引位置，提供给下一个索引用
                this.mappedByteBuffer.putInt(absSlotPos, this.indexHeader.getIndexCount());

                if (this.indexHeader.getIndexCount() <= 1) {
                    // 如果是第一个开始放置索引，更新开始物理偏移量和开始存储时间
                    this.indexHeader.setBeginPhyOffset(phyOffset);
                    this.indexHeader.setBeginTimestamp(storeTimestamp);
                }

                this.indexHeader.incHashSlotCount();
                this.indexHeader.incIndexCount();
                this.indexHeader.setEndPhyOffset(phyOffset);
                this.indexHeader.setEndTimestamp(storeTimestamp);

                return true;
            } catch (Exception e) {
                log.error("putKey exception, Key: " + key + " KeyHashCode: " + key.hashCode(), e);
            } finally {
                if (fileLock != null) {
                    try {
                        fileLock.release();
                    } catch (IOException e) {
                        log.error("Failed to release the lock", e);
                    }
                }
            }
        } else {
            log.warn("Over index file capacity: index count = " + this.indexHeader.getIndexCount()
                + "; index max num = " + this.indexNum);
        }

        return false;
    }

IndexFile是如何存储key的，如何结合数据结构存储的？索引文件的基本结构图，如下：

索引文件设计

从文件的横向看，和纵向看，了解清楚内部结构设计思路。
首先确定索引数量还能继续放置。通过key得到一个keyHash值，然后通过keyHash百分比槽的数量，得到了slotPos，该位置就是可以对应槽的位置。但是slotPos只是相对顺序位置，真实的存放位置还需要包含header部分。所以absSlotPos是槽的绝对位置。通过absSlotPos位置得到4个字节长度即slotValue，该值记录的是最近一次放置索引的顺序值。timeDiff为什么要时间差，并且转换成了单位秒。因为文件的IndexHeader存储了文件的开始时间的，如果要得到索引的最终时间，就可以通过开始时间加上时间差。从而到达从long8个字节只需要int 4个字节存储，磁盘空间可以剩下很多。因为indexHeader中存放了当前索引存放的顺序位置，就能得到absIndexPos绝对索引位置，在存放索引数据时，一个索引存放需要20个字节。除了存放的key的hash值，物理偏移量，时间差等，还存储了同一个hash槽的上一个索引顺序位置，这样就能组合成了一个单向链表了。存储索引后，更新了当前hash槽中的索引顺序编号。并且增加了indexHeader中的索引数量，更新了最大的物理偏移量phyOffset，和最大存储消息的时间。
既然有了存储的逻辑，那么查询索引如何实现呢？IndexFile中查询方法

    public void selectPhyOffset(final List phyOffsets, final String key, final int maxNum,
        final long begin, final long end, boolean lock) {
        if (this.mappedFile.hold()) {
            // 找出hashslot 位置，得到索引编号，通过索引编号找出具体的索引信息，然后依次找出上一个索引的位置进行遍历
            int keyHash = indexKeyHashMethod(key);
            int slotPos = keyHash % this.hashSlotNum;
            int absSlotPos = IndexHeader.INDEX_HEADER_SIZE + slotPos * hashSlotSize;

            FileLock fileLock = null;
            try {
                if (lock) {
                    // fileLock = this.fileChannel.lock(absSlotPos,
                    // hashSlotSize, true);
                }
                int slotValue = this.mappedByteBuffer.getInt(absSlotPos);
                if (slotValue <= invalidIndex || slotValue > this.indexHeader.getIndexCount()
                    || this.indexHeader.getIndexCount() <= 1) {
                } else {
                    for (int nextIndexToRead = slotValue; ; ) {
                        if (phyOffsets.size() >= maxNum) {
                            break;
                        }

                        int absIndexPos =
                            IndexHeader.INDEX_HEADER_SIZE + this.hashSlotNum * hashSlotSize
                                + nextIndexToRead * indexSize;

                        int keyHashRead = this.mappedByteBuffer.getInt(absIndexPos);
                        long phyOffsetRead = this.mappedByteBuffer.getLong(absIndexPos + 4);

                        long timeDiff = (long) this.mappedByteBuffer.getInt(absIndexPos + 4 + 8);
                        int prevIndexRead = this.mappedByteBuffer.getInt(absIndexPos + 4 + 8 + 4);

                        if (timeDiff < 0) {
                            break;
                        }

                        timeDiff *= 1000L;

                        long timeRead = this.indexHeader.getBeginTimestamp() + timeDiff;
                        boolean timeMatched = (timeRead >= begin) && (timeRead <= end);

                        if (keyHash == keyHashRead && timeMatched) {
                            phyOffsets.add(phyOffsetRead);
                        }

                        if (prevIndexRead <= invalidIndex
                            || prevIndexRead > this.indexHeader.getIndexCount()
                            || prevIndexRead == nextIndexToRead || timeRead < begin) {
                            // 1.索引已经没有上一个索引位置。2.前一个索引编号大于了当前编号，
                            // 3.索引编号一致，4，时间小于查询的时间
                            break;
                        }

                        nextIndexToRead = prevIndexRead;
                    }
                }
            } catch (Exception e) {
                log.error("selectPhyOffset exception ", e);
            } finally {
                if (fileLock != null) {
                    try {
                        fileLock.release();
                    } catch (IOException e) {
                        log.error("Failed to release the lock", e);
                    }
                }

                this.mappedFile.release();
            }
        }
    }

先从查询的key计算得到keyHash，定位到属于哪个槽，然后得到这个槽的绝对位置absSlotPos。通过absSlotPos读取4个字节，就能获取到最近索引的顺便编号。最终计算得到absIndexPos绝对索引位置，然后依次读取响应的数据，与查询的时间，关键字比较等查找合适的物理偏移量。然后nextIndexToRead重新赋值到当前索引指向的顺序编号prevIndexRead，继续循环。如何结束循环呢？直到索引存储的上一个索引的编号为0，才查找结束，或者查找内容满了等等。

RocketMQ最重要的存储有这些数据组成，包括消息元数据，消费队列数据，和索引数据等。在设计中运用了很多线程方式，解耦很多业务关联。例如在存储消息的时候，也会将消息存储到对应的队列中，但是RocketMQ设计中，消息的存放是加了锁的同步代码块，为了保证效率，提高代码执行速率，尽可能减少其他工作，能解耦的用异步方式处理，所以只将消息存放到MappedFile中。我们知道消息要进行2次处理后，才能更加有效的查询消息，所以用重放线程来控制消息的二次处理，包括消费队列的控制，索引的添加等等。

rocketMQ存储 NO.2

DefaultAppendMessageCallback 继续聊

ReputMessageService

CommitLogDispatcherBuildConsumeQueue 构建消费队列

CommitLogDispatcherBuildIndex 构建索引分发器

你可能感兴趣的:(rocketMQ存储 NO.2)