笔者最近在业务上需要使用到MetaQ,也借此阅读了MetaQ的相关源码,准备分享MetaQ源码分析。先扫扫盲,如果读者对MetaQ已经较为熟悉,可以跳过下一段落。
一、MetaQ简介
MetaQ(全称Metamorphosis)是一个高性能、高可用、可扩展的分布式消息中间件,,MetaQ具有消息存储顺序写、吞吐量大和支持本地和XA事务等特性,适用于大吞吐量、顺序消息、广播和日志数据传输等场景,METAQ在阿里巴巴各个子公司被广泛应用,每天转发250亿+条消息。主要应用于异步解耦,Mysql数据复制,收集日志等场景。
主要特点
- 生产者、服务器和消费者都可分布
- 消息存储顺序写
- 性能极高,吞吐量大
- 支持消息顺序
- 支持本地和XA事务
- 客户端pull,随机读,利用sendfile系统调用,zero-copy ,批量拉数据
- 支持消费端事务
- 支持消息广播模式
- 支持异步发送消息
- 支持http协议
- 支持消息重试和recover
- 数据迁移、扩容对用户透明
- 消费状态保存在客户端
- 支持同步和异步复制两种HA
二、MetaQ的总体架构
消息中间件消费消息有两种方式,推送消息和拉取消息,一般对于实时性要求非常高的消息中间件,会更多的采用推送的模式,如ActivitiQ,而MetaQ采用的主动拉取消息的模式,通过Zookeeper进行协调,MetaQ分为三部分:Server、Client(包括消息产生者和消费者)以及协调者Zookeeper,总体结构如下:
三、工程结构分析
MetaQ的源码可以从以下链接获取:https://github.com/killme2008/Metamorphosis (本序列基于MetaQ 1.4.5进行分析),下载工程到本地,我们发现主要有如下几个工程,先大致介绍一下工程的划分:
-
- metamorphosis-client,生产者和消费者客户端
- metamorphosis-client-extension,扩展的客户端。用于将消费处理失败的消息存入notify(未提供),和使用meta作为log4j appender,可以透明地使用log4j API发送消息到meta
- metamorphosis-commons,客户端和服务端一些公用的东西
- metamorphosis-dashboard, metaq的Web信息展示
- metamorphosis-example,客户端使用的例子
- metamorphosis-http-client,使用http协议的客户端
- metamorphosis-server,服务端工程
- metamorphosis-server-wrapper,扩展的服务端,用于将其他插件集成到服务端,提供扩展功能,支持高可用的同步异步复制
- metamorphosis-storm-spout,用于将meta消息接入到twitter storm集群做实时分析
- metamorphosis-tools,提供服务端管理和操作的一些工具
工程依赖图大致如下:
源码分析也将以分Server和Client两阶段进行,将优先进行Server的分析,下一篇将正式带大家进入MetaQ的世界。
消息,是MetaQ最重要的资源,在分析MetaQ之前必须了解的概念,我们所做的一切都是围绕消息进行的,让我们看看MetaQ中消息的定义是怎样的,MetaQ的类Message定义了消息的格式:
- public class Message implements Serializable {
- private long id; //消息的ID
- private String topic; //消息属于哪个主题
- private byte[] data; //消息的内容
- private String attribute; //消息的属性
- private int flag; //属性标志位,如果属性不为空,则该标志位true
- private Partition partition; //该主题下的哪个分区,简单的理解为发送到该主题下的哪个队列
- private transient boolean readOnly; //消息是否只读
- private transient boolean rollbackOnly = false; //该消息是否需要回滚,主要用于事务实现的需要
- }
从对消息的定义,我们可以看出消息都有一个唯一的ID,并且归属于某个主题,发送到该主题下的某个分区,具有一些基本属性。
MetaQ分为Broker和Client以及Zookeeper,系统结构如下:
下面我们先分析Broker源码。
Broker主要围绕发送消息和消费消息的主线进行,对于Broker来说就是输入、输出流的处理。在该主线下,Broker主要分为如下模块:网络传输模块、消息存储模块、消息统计模块以及事务模块,本篇首先针对独立性较强的消息存储模块进行分析。
在进行存储模块分析之前,我们得了解Broker中的一个重要的类MetaConfig,MetaConfig是Broker配置加载器,通过MetaConfig可以获取到各模块相关的配置,所以MetaConfig是贯穿所有模块的类。MetaConfig实现MetaConfigMBean接口,该接口定义如下:
- public interface MetaConfigMBean {
- /**
- * Reload topics configuration
- */
- public void reload();
- /**关闭分区 */
- public void closePartitions(String topic, int start, int end);
- /**打开一个topic的所有分区 */
- public void openPartitions(String topic);
- }
MetaConfig注册到了MBeanServer上,所以可以通过JMX协议重新加载配置以及关闭和打开分区。为了加载的配置立即生效,MetaConfig内置了一个通知机制,可以通过向MetaConfig注册监听器的方式关注相关配置的变化,监听器需实现PropertyChangeListener接口。
- public void addPropertyChangeListener(final String propertyName, final PropertyChangeListener listener) {
- this.propertyChangeSupport.addPropertyChangeListener(propertyName, listener);
- }
- public void removePropertyChangeListener(final PropertyChangeListener listener) {
- this.propertyChangeSupport.removePropertyChangeListener(listener);
- }
目前MetaConfig发出的事件通知有三种:配置文件发生变化(configFileChecksum)、主题发生变化(topics,topicConfigMap)以及刷新存储的频率发生变化(unflushInterval),代码如下:
- //configFileChecksum通知
- private Ini createIni(final File file) throws IOException, InvalidFileFormatException {
- ……
- this.propertyChangeSupport.firePropertyChange("configFileChecksum", null, null);
- return conf;
- }
- public void setConfigFileChecksum(long configFileChecksum) {
- this.configFileChecksum = configFileChecksum;
- this.propertyChangeSupport.firePropertyChange("configFileChecksum", null, null);
- }
- //topics、topicConfigMap和unflushInterval通知
- private void populateTopicsConfig(final Ini conf) {
- ……
- if (!newTopicConfigMap.equals(this.topicConfigMap)) {
- this.topics = newTopics;
- this.topicConfigMap = newTopicConfigMap;
- this.propertyChangeSupport.firePropertyChange("topics", null, null);
- this.propertyChangeSupport.firePropertyChange("topicConfigMap", null, null);
- }
- this.propertyChangeSupport.firePropertyChange("unflushInterval", null, null);
- ……
需要注意的是,调用reload方法时,只对topic的配置生效,对全局配置不生效,只重载topic的配置。
好吧,废话了许多,让我们正式进入存储模块的分析吧。
Broker的存储模块用于存储Client发送的等待被消费的消息,Broker采用文件存储的方式来存储消息,存储模块类图如下:
MessageSet代表一个消息集合,可能是一个文件也可能是文件的一部分,其定义如下:
- /**
- * 消息集合
- */
- public interface MessageSet {
- public MessageSet slice(long offset, long limit) throws IOException; //获取一个消息集合
- public void write(GetCommand getCommand, SessionContext ctx);
- public long append(ByteBuffer buff) throws IOException; //存储一个消息,这时候还没有存储到磁盘,需要调用flush方法才能保证存储到磁盘
- public void flush() throws IOException; //提交到磁盘
- public void read(final ByteBuffer bf, long offset) throws IOException; //读取消息
- public void read(final ByteBuffer bf) throws IOException; //读取消息
- public long getMessageCount();//该集合的消息数量
- }
FileMessageSet实现了MessageSet接口和Closeable接口,实现Closeable接口主要是为了在文件关闭的时候确保文件通道关闭以及内容是否提交到磁盘
- public void close() throws IOException {
- if (!this.channel.isOpen()) {
- return;
- }
- //保证在文件关闭前,将内容提交到磁盘,而不是在缓存中
- if (this.mutable) {
- this.flush();
- }
- //关闭文件通道
- this.channel.close();
- }
下面让我们来具体分析一下FileMessageSet这个类,
- public class FileMessageSet implements MessageSet, Closeable {
- ……
- private final FileChannel channel; //对应的文件通道
- private final AtomicLong messageCount; //内容数量
- private final AtomicLong sizeInBytes;
- private final AtomicLong highWaterMark; // 已经确保写入磁盘的水位
- private final long offset; // 镜像offset
- private boolean mutable; // 是否可变
- public FileMessageSet(final FileChannel channel, final long offset, final long limit, final boolean mutable) throws IOException {
- this.channel = channel;
- this.offset = offset;
- this.messageCount = new AtomicLong(0);
- this.sizeInBytes = new AtomicLong(0);
- this.highWaterMark = new AtomicLong(0);
- this.mutable = mutable;
- if (mutable) {
- final long startMs = System.currentTimeMillis();
- final long truncated = this.recover();
- if (this.messageCount.get() > 0) {
- log.info("Recovery succeeded in " + (System.currentTimeMillis() - startMs) / 1000 + " seconds. " + truncated + " bytes truncated.");
- }
- } else {
- try {
- this.sizeInBytes.set(Math.min(channel.size(), limit) - offset);
- this.highWaterMark.set(this.sizeInBytes.get());
- } catch (final Exception e) {
- log.error("Set sizeInBytes error", e);
- }
- }
- }
- //注意FileMessageSet的mutable属性,如果mutable为true的时候,将会调用recover()方法,该方法主要是验证文件内容的完整性,后面会详细介绍,如果mutable为false的时候,表明该文件不可更改,这个时候磁盘水位和sizeInBytes的值均为文件大小。
- ……
- public long append(final ByteBuffer buf) throws IOException {
- //如果mutable属性为false的时候,不允许追加消息在文件尾
- if (!this.mutable) {
- throw new UnsupportedOperationException("Immutable message set");
- }
- final long offset = this.sizeInBytes.get();
- int sizeInBytes = 0;
- while (buf.hasRemaining()) {
- sizeInBytes += this.channel.write(buf);
- }
- //在这个时候并没有将内容写入磁盘,还在通道的缓存中,需要在适当的时候调用flush方法
- //这个时候磁盘的水位是不更新的,只有在保证写入磁盘才会更新磁盘的水位信息
- this.sizeInBytes.addAndGet(sizeInBytes); 、
- this.messageCount.incrementAndGet();
- return offset;
- }
- //提交到磁盘
- public void flush() throws IOException {
- this.channel.force(true);
- this.highWaterMark.set(this.sizeInBytes.get());
- }
- ……
- @Override
- public MessageSet slice(final long offset, final long limit) throws IOException {
- //返回消息集
- return new FileMessageSet(this.channel, offset, limit, false);
- }
- static final Logger transferLog = LoggerFactory.getLogger("TransferLog");
- @Override
- public void read(final ByteBuffer bf, final long offset) throws IOException {
- //读取内容
- int size = 0;
- while (bf.hasRemaining()) {
- final int l = this.channel.read(bf, offset + size);
- if (l < 0) {
- break;
- }
- size += l;
- }
- }
- @Override
- public void read(final ByteBuffer bf) throws IOException {
- this.read(bf, this.offset);
- }
- //下面的write方法主要是提供zero拷贝
- @Override
- public void write(final GetCommand getCommand, final SessionContext ctx) {
- final IoBuffer buf = this.makeHead(getCommand.getOpaque(), this.sizeInBytes.get());
- // transfer to socket
- this.tryToLogTransferInfo(getCommand, ctx.getConnection());
- ctx.getConnection().transferFrom(buf, null, this.channel, this.offset, this.sizeInBytes.get());
- }
- public long write(final WritableByteChannel socketChanel) throws IOException {
- try {
- return this.getFileChannel().transferTo(this.offset, this.getSizeInBytes(), socketChanel);
- } catch (final IOException e) {
- // Check to see if the IOException is being thrown due to
- // http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5103988
- final String message = e.getMessage();
- if (message != null && message.contains("temporarily unavailable")) {
- return 0;
- }
- throw e;
- }
- }
- ……
- private static boolean fastBoot = Boolean.valueOf(System.getProperty("meta.fast_boot", "false"));
- private long recover() throws IOException {
- //如果在System属性里设置了 meta.fast_boot为true,表示快速启动,快速启动不检测文件是否损坏,不对内容进行校验
- if (fastBoot) {
- final long size = this.channel.size();
- this.sizeInBytes.set(size);
- this.highWaterMark.set(size);
- this.messageCount.set(0);
- this.channel.position(size);
- return 0;
- }
- if (!this.mutable) {
- throw new UnsupportedOperationException("Immutable message set");
- }
- final long len = this.channel.size();
- final ByteBuffer buf = ByteBuffer.allocate(MessageUtils.HEADER_LEN);
- long validUpTo = 0L;
- long next = 0L;
- long msgCount = 0;
- do {
- next = this.validateMessage(buf, validUpTo, len);
- if (next >= 0) {
- msgCount++;
- validUpTo = next;
- }
- } while (next >= 0);
- this.channel.truncate(validUpTo);
- this.sizeInBytes.set(validUpTo);
- this.highWaterMark.set(validUpTo);
- this.messageCount.set(msgCount);
- this.channel.position(validUpTo);
- return len - validUpTo;
- }
- 消息在磁盘上存储结构如下:
- /*
- * 20个字节的头部 *
- *
- *
- message length(4 bytes),including attribute and payload
- *
- checksum(4 bytes)
- *
- message id(8 bytes)
- *
- message flag(4 bytes)
- *
- * length长度的内容
- */
- //在存储之前将checksum的信息存入到磁盘,读取的时候再进行校验,比较前后的
- //checksum是否一致,防止消息被篡改
- private long validateMessage(final ByteBuffer buf, final long start, final long len) throws IOException {
- buf.rewind();
- long read = this.channel.read(buf);
- if (read < MessageUtils.HEADER_LEN) {
- return -1;
- }
- buf.flip();
- final int messageLen = buf.getInt();
- final long next = start + MessageUtils.HEADER_LEN + messageLen;
- if (next > len) {
- return -1;
- }
- final int checksum = buf.getInt();
- if (messageLen < 0) {
- // 数据损坏
- return -1;
- }
- final ByteBuffer messageBuffer = ByteBuffer.allocate(messageLen);
- // long curr = start + MessageUtils.HEADER_LEN;
- while (messageBuffer.hasRemaining()) {
- read = this.channel.read(messageBuffer);
- if (read < 0) {
- throw new IOException("文件在recover过程中被修改");
- }
- // curr += read;
- }
- if (CheckSum.crc32(messageBuffer.array()) != checksum) {
- //采用crc32对内容进行校验是否一致
- return -1;
- } else {
- return next;
- }
- }
FileMessageSet是MetaQ 服务器端存储的一个元素,代表了对一个文件的读写操作,能够对文件内容进行完整性和一致性校验,并能提供统计数据,接下来要介绍的是通过MessageStore怎样的组合将一个个的FileMessageSet,单纯的讲,MetaQ的存储非常值得借鉴。
前面忘了先介绍一下Broker消息存储的组织方式,我们前面知道了一条消息属于某个Topic下的某个分区,消息存储的组织方式是按照此方式进行组织的,结构图如下:
所以对于每个Topic而言,分区是最小的元素,对外API主要由MessageStore提供,一个MessageStore实例代表一个分区的实例,分区存储具体的内容。在MetaQ中,分区的存储采用的多文件的方式进行组合,即MessageStore由多个FileMessageSet组成,而FileMessageSet在MessageStore被包装成Segment,代码如下(MessageStore是值得好好分析的一个类):
- public class MessageStore extends Thread implements Closeable {
- …….
- }
MessageStore继承了Thread类,继承该类主要是为了实现异步写入方式
- public MessageStore(final String topic, final int partition, final MetaConfig metaConfig,
- final DeletePolicy deletePolicy, final long offsetIfCreate) throws IOException {
- this.metaConfig = metaConfig; //全局配置信息
- this.topic = topic; //当前主题
- final TopicConfig topicConfig = this.metaConfig.getTopicConfig(this.topic);
- String dataPath = metaConfig.getDataPath(); //当前分区的存储路径
- if (topicConfig != null) {
- dataPath = topicConfig.getDataPath();
- }
- final File parentDir = new File(dataPath);
- this.checkDir(parentDir); //检测父目录是否存在
- this.partitionDir = new File(dataPath + File.separator + topic + "-" + partition);
- this.checkDir(this.partitionDir);
- // this.topic = topic;
- this.partition = partition; //当前分区
- this.unflushed = new AtomicInteger(0); //未提交的消息数
- this.lastFlushTime = new AtomicLong(SystemTimer.currentTimeMillis()); //最后一次提交时间
- this.unflushThreshold = topicConfig.getUnflushThreshold(); //最大允许的未flush消息数,超过此值将强制force到磁盘,默认1000
- this.deletePolicy = deletePolicy; //由于是多文件的存储方式,消费过的消息或过期消息需要删除从而腾出空间给新消息的,默认提供归档和过期删除的方式
- // Make a copy to avoid getting it again and again.
- this.maxTransferSize = metaConfig.getMaxTransferSize();
- //启动异步写入的时候,消息提交到磁盘的size配置,同时也是配置组写入时,消息最大长度的控制参数,如果消息长度大于该参数,则会同步写入
- this.maxTransferSize = this.maxTransferSize > ONE_M_BYTES ? ONE_M_BYTES : this.maxTransferSize;
- // Check directory and load exists segments.
- this.checkDir(this.partitionDir);
- this.loadSegments(offsetIfCreate);
- if (this.useGroupCommit()) {
- this.start();
- }
- }
首先是获取配置信息,其次由于MessageStore采用的多文件存储方,所以要检查父目录的存在,最后则是加载校验已有数据,如果配置了异步写入,则启动异步写入线程(如果unflushThreshold<= 0,则认为启动异步写入的方式)
我们发现在构造方法的倒数第3行调用了loadSegments()方法去加载校验文件,看看该方法到底做了些什么事情
- private void loadSegments(final long offsetIfCreate) throws IOException {
- final List
accum = new ArrayList (); - final File[] ls = this.partitionDir.listFiles();
- if (ls != null) {
- //遍历目录下的所有.meta后缀的数据文件,将所有文件都变为不可变的文件
- for (final File file : ls) {
- if (file.isFile() && file.toString().endsWith(FILE_SUFFIX)) {
- if (!file.canRead()) {
- throw new IOException("Could not read file " + file);
- }
- final String filename = file.getName();
- final long start = Long.parseLong(filename.substring(0, filename.length() - FILE_SUFFIX.length()));
- // 先作为不可变的加载进来
- accum.add(new Segment(start, file, false));
- }
- }
- }
- if (accum.size() == 0) {
- // 没有可用的文件,创建一个,索引从offsetIfCreate开始
- final File newFile = new File(this.partitionDir, this.nameFromOffset(offsetIfCreate));
- accum.add(new Segment(offsetIfCreate, newFile));
- } else {
- // 至少有一个文件,校验并按照start升序排序
- Collections.sort(accum, new Comparator
() { - @Override
- public int compare(final Segment o1, final Segment o2) {
- if (o1.start == o2.start) {
- return 0;
- } else if (o1.start > o2.start) {
- return 1;
- } else {
- return -1;
- }
- }
- });
- // 校验文件,是否前后衔接,如果不是,则认为数据文件被破坏或者篡改,抛出异常
- this.validateSegments(accum);
- // 最后一个文件修改为可变
- final Segment last = accum.remove(accum.size() - 1);
- last.fileMessageSet.close();
- log.info("Loading the last segment in mutable mode and running recover on " + last.file.getAbsolutePath());
- final Segment mutable = new Segment(last.start, last.file);
- accum.add(mutable);
- log.info("Loaded " + accum.size() + " segments...");
- }
- this.segments = new SegmentList(accum.toArray(new Segment[accum.size()]));
- //多个segmentg通过SegmentList组织起来,SegmentList能保证在并发访问下的删除、添加保持一致性,SegmentList没有采用java的关键字 synchronized进行同步,而是使用类似cvs原语的方式进行同步访问(因为绝大部分情况下并没有并发问题,可以极大的提高效率),该类比较简单就不再分析
- }
MessageStore采用Segment方式组织存储,Segment包装了FileMessageSet,由FileMessageSet进行读写,MessageStore并将多个Segment进行前后衔接,衔接方式为:第一个Segment对应的消息文件命名为0.meta,第二个则命名为第一个文件的开始位置+第一个Segment的大小,图示如下(假设现在每个文件大小都为1024byte):
为什么要这样进行设计呢,主要是为了提高查询效率。MessageStore将最后一个Segment变为可变Segment,因为最后一个Segment相当于文件尾,消息是有先后顺序的,必须将消息添加到最后一个Segment上。
关注validateSegments()方法做了些什么事情
- private void validateSegments(final List
segments) { - //验证按升序排序的Segment是否前后衔接,确保文件没有被篡改和破坏(这里的验证是比较简单的验证,消息内容的验证在FileMessageSet中,通过比较checksum进行验证,在前面的篇幅中介绍过,这两种方式结合可以在范围上从大到小进行验证,保证内容基本不被破坏和篡改)
- this.writeLock.lock();
- try {
- for (int i = 0; i < segments.size() - 1; i++) {
- final Segment curr = segments.get(i);
- final Segment next = segments.get(i + 1);
- if (curr.start + curr.size() != next.start) {
- throw new IllegalStateException("The following segments don't validate: "
- + curr.file.getAbsolutePath() + ", " + next.file.getAbsolutePath());
- }
- }
- } finally {
- this.writeLock.unlock();
- }
- }
ITEye整理格式好麻烦,下面的代码分析直接在代码中分析
- //添加消息的方式有两种,同步和异步
- public void append(final long msgId, final PutCommand req, final AppendCallback cb) {
- //首先将内容包装成前面介绍过的消息存储格式
- this.appendBuffer(MessageUtils.makeMessageBuffer(msgId, req), cb);
- }
- //异步写入的包装类
- private static class WriteRequest {
- public final ByteBuffer buf;
- public final AppendCallback cb;
- public Location result;
- public WriteRequest(final ByteBuffer buf, final AppendCallback cb) {
- super();
- this.buf = buf;
- this.cb = cb;
- }
- }
- //这里比较好的设计是采用回调的方式来,对于异步写入实现就变得非常容易
- //AppendCallback返回的是消息成功写入的位置Location(起始位置和消息长度),该Location并不是相对于当前Segment的开始位置0,而是相对于当前Segment给定的值(对应文件命名值即为给定的值),以后查询消息的时候直接使用该位置就可以快速定位到消息写入到哪个文件
- //这也就是为什么文件名的命名采用前后衔接的方式,这也通过2分查找可以快速定位消息的位置
- private void appendBuffer(final ByteBuffer buffer, final AppendCallback cb) {
- if (this.closed) {
- throw new IllegalStateException("Closed MessageStore.");
- }
- //如果启动异步写入并且消息长度小于一次提交的最大值maxTransferSize,则将该消息放入异步写入队列
- if (this.useGroupCommit() && buffer.remaining() < this.maxTransferSize) {
- this.bufferQueue.offer(new WriteRequest(buffer, cb));
- } else {
- Location location = null;
- final int remainning = buffer.remaining();
- this.writeLock.lock();
- try {
- final Segment cur = this.segments.last();
- final long offset = cur.start + cur.fileMessageSet.append(buffer);
- this.mayBeFlush(1);
- this.mayBeRoll();
- location = Location.create(offset, remainning);
- } catch (final IOException e) {
- log.error("Append file failed", e);
- location = Location.InvalidLocaltion;
- } finally {
- this.writeLock.unlock();
- if (cb != null) {
- cb.appendComplete(location);
- }
- //调用回调方法,数据写入文件缓存
- }
- }
- }
- ……
- //判断是否启用异步写入,如果设置为unflushThreshold <=0的数字,则认为启动异步写入;如果设置为unflushThreshold =1,则是同步写入,即每写入一个消息都会提交到磁盘;如果unflushThreshold>0,则是依赖组提交或者是超时提交
- private boolean useGroupCommit() {
- return this.unflushThreshold <= 0;
- }
- @Override
- public void run() {
- // 等待force的队列
- final LinkedList
toFlush = new LinkedList (); - WriteRequest req = null;
- long lastFlushPos = 0;
- Segment last = null;
- //存储没有关闭并且线程没有被中断
- while (!this.closed && !Thread.currentThread().isInterrupted()) {
- try {
- if (last == null) {
- //获取最后的一个segment,将消息写入最后segment对应的文件
- last = this.segments.last();
- lastFlushPos = last.fileMessageSet.highWaterMark();
- }
- if (req == null) {
- //如果等待提交到磁盘的队列toFlush为空,则两种可能:一、刚刚提交完,列表为空;二、等待写入消息的队列为空,如果判断toFlush,则调用bufferQueue.take()方法,可以阻塞住队列,而如果toFlush不为空,则调用bufferQueue.poll,这是提高性能的一种做法。
- if (toFlush.isEmpty()) {
- req = this.bufferQueue.take();
- } else {
- req = this.bufferQueue.poll();
- //如果当前请求为空,表明等待写入的消息已经没有了,这时候文件缓存中的消息需要提交到磁盘,防止消息丢失;或者如果已经写入文件的大小大于maxTransferSize,则提交到磁盘
- //这里需要注意的是,会出现这样一种情况,刚好最后一个segment的文件快满了,这时候是不会roll出一个新的segment写入消息的,而是直接追加到原来的segment尾部,可能导致segment对应的文件大小大于配置的单个segment大小
- if (req == null || last.fileMessageSet.getSizeInBytes() > lastFlushPos + this.maxTransferSize) {
- // 强制force,确保内容保存到磁盘
- last.fileMessageSet.flush();
- lastFlushPos = last.fileMessageSet.highWaterMark();
- // 通知回调
- //异步写入比组写入可靠,因为异步写入一定是提交到磁盘的时候才进行回调的,而组写入如果依赖组提交的方式,则可能会丢失数据,因为组写入在消息写入到文件缓存的时候就进行回调了(除非设置unflushThreshold=1)
- for (final WriteRequest request : toFlush) {
- request.cb.appendComplete(request.result);
- }
- toFlush.clear();
- // 是否需要roll
- this.mayBeRoll();
- // 如果切换文件,重新获取last
- if (this.segments.last() != last) {
- last = null;
- }
- continue;
- }
- }
- }
- if (req == null) {
- continue;
- }
- //写入文件,并计算写入位置
- final int remainning = req.buf.remaining();
- //写入位置为:当前segment给定的值 + 加上文件已有的长度
- final long offset = last.start + last.fileMessageSet.append(req.buf);
- req.result = Location.create(offset, remainning);
- if (req.cb != null) {
- toFlush.add(req);
- }
- req = null;
- } catch (final IOException e) {
- log.error("Append message failed,*critical error*,the group commit thread would be terminated.", e);
- // TODO io异常没办法处理了,简单跳出?
- break;
- } catch (final InterruptedException e) {
- // ignore
- }
- }
- // terminated
- //关闭store 前,将等待写入队列中的剩余消息写入最后一个文件,这时候如果最后一个Segment满了也不会roll出新的Segment,持续的将消息写入到最后一个Segment,所以这时候也会发生Segment的size大于配置的size的情况
- try {
- for (WriteRequest request : this.bufferQueue) {
- final int remainning = request.buf.remaining();
- final long offset = last.start + last.fileMessageSet.append(request.buf);
- if (request.cb != null) {
- request.cb.appendComplete(Location.create(offset, remainning));
- }
- }
- this.bufferQueue.clear();
- } catch (IOException e) {
- log.error("Append message failed", e);
- }
- }
- ……
- //Append多个消息,返回写入的位置
- public void append(final List
msgIds, final List putCmds, final AppendCallback cb) { - this.appendBuffer(MessageUtils.makeMessageBuffer(msgIds, putCmds), cb);
- }
- /**
- * 重放事务操作,如果消息没有存储成功,则重新存储,并返回新的位置
- */
- public void replayAppend(final long offset, final int length, final int checksum, final List
msgIds, - final List
reqs, final AppendCallback cb) throws IOException { - //如果消息没有存储,则重新存储,写到最后一个Segment尾部
- final Segment segment = this.findSegment(this.segments.view(), offset);
- if (segment == null) {
- this.append(msgIds, reqs, cb);
- } else {
- final MessageSet messageSet = segment.fileMessageSet.slice(offset - segment.start, offset - segment.start + length);
- final ByteBuffer buf = ByteBuffer.allocate(length);
- messageSet.read(buf, offset - segment.start);
- buf.flip();
- final byte[] bytes = new byte[buf.remaining()];
- buf.get(bytes);
- // 这个校验和是整个消息的校验和,这跟message的校验和不一样,注意区分
- final int checkSumInDisk = CheckSum.crc32(bytes);
- // 没有存入,则重新存储
- if (checksum != checkSumInDisk) {
- this.append(msgIds, reqs, cb);
- } else {
- // 正常存储了消息,无需处理
- if (cb != null) {
- this.notifyCallback(cb, null);
- }
- }
- }
- }
- //判断是否需要roll,如果当前 messagestore最后一个segment的size>=配置的segment size,则产生新的segment,并将新的segment作为最后一个segment,原来最后的segment提交一次,并将mutable设置为false
- private void mayBeRoll() throws IOException {
- if (this.segments.last().fileMessageSet.getSizeInBytes() >= this.metaConfig.getMaxSegmentSize()) {
- this.roll();
- }
- }
- String nameFromOffset(final long offset) {
- final NumberFormat nf = NumberFormat.getInstance();
- nf.setMinimumIntegerDigits(20);
- nf.setMaximumFractionDigits(0);
- nf.setGroupingUsed(false);
- return nf.format(offset) + FILE_SUFFIX;
- }
- private long nextAppendOffset() throws IOException {
- final Segment last = this.segments.last();
- last.fileMessageSet.flush();
- return last.start + last.size();
- }
- private void roll() throws IOException {
- final long newOffset = this.nextAppendOffset();
- //新的segment对应的存储文件的命名为原来最后一个segment的起始位置 + segment的size
- final File newFile = new File(this.partitionDir, this.nameFromOffset(newOffset));
- this.segments.last().fileMessageSet.flush();
- this.segments.last().fileMessageSet.setMutable(false);
- this.segments.append(new Segment(newOffset, newFile));
- }
- //判断是否需要消息提交到磁盘,判断的条件有两个,如果达到组提交的条件或者达到间隔的提交时间
- private void mayBeFlush(final int numOfMessages) throws IOException {
- if (this.unflushed.addAndGet(numOfMessages) > this.metaConfig.getTopicConfig(this.topic).getUnflushThreshold()
- || SystemTimer.currentTimeMillis() - this.lastFlushTime.get() > this.metaConfig.getTopicConfig(this.topic).getUnflushInterval()) {
- this.flush0();
- }
- }
- //提交到磁盘
- public void flush() throws IOException {
- this.writeLock.lock();
- try {
- this.flush0();
- } finally {
- this.writeLock.unlock();
- }
- }
- private void flush0() throws IOException {
- if (this.useGroupCommit()) {
- return;
- }
- //由于只有最后一个segment是可变,即可写入消息的,所以只需要提交最后一个segment的消息
- this.segments.last().fileMessageSet.flush();
- this.unflushed.set(0);
- this.lastFlushTime.set(SystemTimer.currentTimeMillis());
- }
- @Override
- public void close() throws IOException {
- this.closed = true;
- this.interrupt();
- //等待子线程完成写完异步队列中剩余未写的消息
- try {
- this.join(500);
- } catch (InterruptedException e) {
- Thread.currentThread().interrupt();
- }
- //关闭segment,保证内容都已经提交到磁盘
- for (final Segment segment : this.segments.view()) {
- segment.fileMessageSet.close();
- }
- }
- //返回segment的信息,主要包括segment的开始位置以及 segment 的size
- public List
getSegmentInfos() { - final List
rt = new ArrayList (); - for (final Segment seg : this.segments.view()) {
- rt.add(new SegmentInfo(seg.start, seg.size()));
- }
- return rt;
- }
- /**
- * 返回当前最大可读的offset
- */
- //需要注意的是,在文件缓存中的消息是不可读的,可以通过getSizeInBytes()方法来判断还有多少内容还在文件缓存中,getSizeInBytes()方法返回的值是包括所有在磁盘和缓存中的size
- public long getMaxOffset() {
- final Segment last = this.segments.last();
- if (last != null) {
- return last.start + last.size();
- } else {
- return 0;
- }
- }
- /**
- * 返回当前最小可读的offset
- */
- public long getMinOffset() {
- Segment first = this.segments.first();
- if (first != null) {
- return first.start;
- } else {
- return 0;
- }
- }
- /**
- * 根据offset和maxSize返回所在MessageSet, 当offset超过最大offset的时候返回null,
- * 当offset小于最小offset的时候抛出ArrayIndexOutOfBounds异常
- */
- //代码的注释以及清楚的解析了作用
- public MessageSet slice(final long offset, final int maxSize) throws IOException {
- final Segment segment = this.findSegment(this.segments.view(), offset);
- if (segment == null) {
- return null;
- } else {
- return segment.fileMessageSet.slice(offset - segment.start, offset - segment.start + maxSize);
- }
- }
- /**
- * 根据offset查找文件,如果超过尾部,则返回null,如果在头部之前,则抛出ArrayIndexOutOfBoundsException
- */
- //指定位置找到对应的segment,由于前面的文件组织方式,所以这里可以采用2分查找的方式,
- //效率很高
- Segment findSegment(final Segment[] segments, final long offset) {
- if (segments == null || segments.length < 1) {
- return null;
- }
- // 老的数据不存在,返回最近最老的数据
- final Segment last = segments[segments.length - 1];
- // 在头部以前,抛出异常
- if (offset < segments[0].start) {
- throw new ArrayIndexOutOfBoundsException();
- }
- // 刚好在尾部或者超出范围,返回null
- if (offset >= last.start + last.size()) {
- return null;
- }
- // 根据offset二分查找
- int low = 0;
- int high = segments.length - 1;
- while (low <= high) {
- final int mid = high + low >>> 1;
- final Segment found = segments[mid];
- if (found.contains(offset)) {
- return found;
- } else if (offset < found.start) {
- high = mid - 1;
- } else {
- low = mid + 1;
- }
- }
- return null;
- }
前面,我们已经把Broker存储最重要的一个类具体分析了一遍,接下来,我们分析一下其删除的策略。前面介绍过Messagestore采用的多文件存储的组织方式,而存储空间不可能无限大,得有一定的删除策略对其进行删除以腾出空间给新的消息。
MetaQ允许自定义删除策略,需要实现接口DeletePolicy,默认提供了两种删除策略:过期删除(DiscardDeletePolicy)和过期打包删除(ArchiveDeletePolicy)。DiscardDeletePolicy和ArchiveDeletePolicy都比较简单,DiscardDeletePolicy主要是对于超过一定时期的文件进行删除,ArchiveDeletePolicy则是先打包备份再删除。
自定义策略是如何被识别和使用的呢,MetaQ定义了DeletePolicyFactory,所有删除策略的实例都由DeletePolicyFactory提供,DeletePolicyFactory对外提供了注册机制,利用反射机制生成实例,每个自定义的删除策略都必须有一个无参构造,DeletePolicyFactory生成实例代码如下:
- public static DeletePolicy getDeletePolicy(String values) {
- String[] tmps = values.split(",");
- String name = tmps[0];
- Class extends DeletePolicy> clazz = policyMap.get(name);
- if (clazz == null) {
- throw new UnknownDeletePolicyException(name);
- }
- try {
- //直接调用class的newInstance()方法,该方法必须要求有一个无参构造
- DeletePolicy deletePolicy = clazz.newInstance();
- String[] initValues = null;
- if (tmps.length >= 2) {
- initValues = new String[tmps.length - 1];
- System.arraycopy(tmps, 1, initValues, 0, tmps.length - 1);
- }
- deletePolicy.init(initValues);
- return deletePolicy;
- }
- catch (Exception e) {
- throw new MetamorphosisServerStartupException("New delete policy `" + name + "` failed", e);
- }
- }
DeletePolicy和MessageStore如何结合在一起的呢?则是粘合剂MessageStoreManager,MessageStoreManager是存储模块的管家,负责与其他模块联系,也是MessageStore管理器,管理所有的MessageStore以及其删除策略,MessageStoreManager也是要好好分析的一个类。
- private final ConcurrentHashMap
/* topic */ , ConcurrentHashMap/* partition */, MessageStore>> stores = new ConcurrentHashMap >(); - //前面的存储组织方式介绍过一个主题对应多一个分区,每个分区对应一个MessageStore实例,分区号使用数值来表示,stores就是按照该方式组织管理的
- private final MetaConfig metaConfig;
- //参数配置
- private ScheduledThreadPoolExecutor scheduledExecutorService;// =
- // Executors.newScheduledThreadPool(2);
- //调度服务,对不同的MessageStore实例flush,将数据提到到硬盘
- private final DeletePolicy deletePolicy;
- //删除策略选择器,这里采用的一个topic对应一种策略,而不是一个MessageStore对应一个策略实例,一个策略实例在同一个topic的不同MessageStore实例间是重用的
- private DeletePolicySelector deletePolicySelector;
- public static final int HALF_DAY = 1000 * 60 * 60 * 12;
- //topic 集合
- private final Set
topicsPatSet = new HashSet (); - private final ConcurrentHashMap
> unflushIntervalMap = new ConcurrentHashMap >(); - //前面曾介绍过MessageStore的提交方式有两种:组提交和定时提交,unflushIntervalMap是存放
- //定时提交的任务
- private Scheduler scheduler;
- //定时调度器,用于定时调度删除任务
- public MessageStoreManager(final MetaConfig metaConfig, final DeletePolicy deletePolicy) {
- this.metaConfig = metaConfig;
- this.deletePolicy = deletePolicy;
- //生成策略选择器
- this.newDeletePolicySelector();
- //添加匿名监听器,监听topic列表变化,如果列表发生变化,则新增列表并重新生成选择器
- this.metaConfig.addPropertyChangeListener("topics", new PropertyChangeListener() {
- public void propertyChange(final PropertyChangeEvent evt) {
- MessageStoreManager.this.makeTopicsPatSet();
- MessageStoreManager.this.newDeletePolicySelector();
- }
- });
- //添加匿名监听,监听unflushInternal变化,如果发生变化
- this.metaConfig.addPropertyChangeListener("unflushInterval", new PropertyChangeListener() {
- public void propertyChange(final PropertyChangeEvent evt) {
- MessageStoreManager.this.scheduleFlushTask();
- }
- });
- this.makeTopicsPatSet();
- //初始化调度
- this.initScheduler();
- // 定时flush,该方法作者有详细注释就不在解释了
- this.scheduleFlushTask();
- }
MessageStoreManager实现接口Service,在启动是会调用init方法,关闭时调用dispose方法
- public void init() {
- // 加载已有数据并校验
- try {
- this.loadMessageStores(this.metaConfig);
- } catch (final IOException e) {
- log.error("load message stores failed", e);
- throw new MetamorphosisServerStartupException("Initilize message store manager failed", e);
- } catch (InterruptedException e) {
- Thread.currentThread().interrupt();
- }
- this.startScheduleDeleteJobs();
- }
- //
- private Set
getDataDirSet(final MetaConfig metaConfig) throws IOException { - final Set
paths = new HashSet (); - // public data path
- //公共数据目录
- paths.add(metaConfig.getDataPath());
- // topic data path
- //私有数据目录
- for (final String topic : metaConfig.getTopics()) {
- final TopicConfig topicConfig = metaConfig.getTopicConfig(topic);
- if (topicConfig != null) {
- paths.add(topicConfig.getDataPath());
- }
- }
- final Set
fileSet = new HashSet (); - for (final String path : paths) {
- //验证数据目录是否存在
- fileSet.add(this.getDataDir(path));
- }
- return fileSet;
- }
- private void loadMessageStores(final MetaConfig metaConfig) throws IOException, InterruptedException {
- //加载数据目录列表,再加载每个目录下的数据
- for (final File dir : this.getDataDirSet(metaConfig)) {
- this.loadDataDir(metaConfig, dir);
- }
- }
- private void loadDataDir(final MetaConfig metaConfig, final File dir) throws IOException, InterruptedException {
- log.warn("Begin to scan data path:" + dir.getAbsolutePath());
- final long start = System.currentTimeMillis();
- final File[] ls = dir.listFiles();
- int nThreads = Runtime.getRuntime().availableProcessors() + 1;
- ExecutorService executor = Executors.newFixedThreadPool(nThreads);
- int count = 0;
- //将加载验证每个分区的数据包装成一个个任务
- List
> tasks = new ArrayList >(); - for (final File subDir : ls) {
- if (!subDir.isDirectory()) {
- log.warn("Ignore not directory path:" + subDir.getAbsolutePath());
- } else {
- final String name = subDir.getName();
- final int index = name.lastIndexOf('-');
- if (index < 0) {
- log.warn("Ignore invlaid directory:" + subDir.getAbsolutePath());
- continue;
- }
- //包装任务
- tasks.add(new Callable
() { - //回调方法,方法将具体的加载验证分区数据
- @Override
- public MessageStore call() throws Exception {
- log.warn("Loading data directory:" + subDir.getAbsolutePath() + "...");
- final String topic = name.substring(0, index);
- final int partition = Integer.parseInt(name.substring(index + 1)); //构造MessageStore实例的时候会自动加载验证数据,在初始化MessageStore实例的时候会给该MessageStore实例选择该topic的删除策略
- final MessageStore messageStore = new MessageStore(topic, partition, metaConfig,
- MessageStoreManager.this.deletePolicySelector.select(topic, MessageStoreManager.this.deletePolicy));
- return messageStore;
- }
- });
- count++;
- if (count % nThreads == 0 || count == ls.length) {
- //如果配置了并行加载,则使用并行加载
- if (metaConfig.isLoadMessageStoresInParallel()) {
- this.loadStoresInParallel(executor, tasks);
- } else {
- //串行加载验证数据
- this.loadStores(tasks);
- }
- }
- }
- }
- executor.shutdownNow();
- log.warn("End to scan data path in " + (System.currentTimeMillis() - start) / 1000 + " secs");
- }
在init方法中做的一件事情就是加载校验已有的数据,加载校验的方式有两种个,串行和并行。
- //串行加载验证数据,则在主线程上完成验证加载工作,其缺点是较慢,好处是不会打乱日志顺序
- private void loadStores(List
> tasks) throws IOException, InterruptedException { - for (Callable
task : tasks) { - MessageStore messageStore;
- try {
- messageStore = task.call();
- ConcurrentHashMap
/* partition */ , MessageStore> map = this.stores.get(messageStore.getTopic()); - if (map == null) {
- map = new ConcurrentHashMap
(); - this.stores.put(messageStore.getTopic(), map);
- }
- map.put(messageStore.getPartition(), messageStore);
- } catch (IOException e) {
- throw e;
- } catch (InterruptedException e) {
- throw e;
- } catch (Exception e) {
- throw new IllegalStateException(e);
- }
- }
- tasks.clear();
- }
- //并行加载数据,当数据过多的时候,启动并行加载数据可以加快启动速度;但是会打乱启动的日志顺序,默认不启用。
- private void loadStoresInParallel(ExecutorService executor, List
> tasks) throws InterruptedException { - CompletionService
completionService = new ExecutorCompletionService (executor); - for (Callable
task : tasks) { - completionService.submit(task);
- }
- for (int i = 0; i < tasks.size(); i++) {
- try {
- //确保任务都已经运行完毕
- MessageStore messageStore = completionService.take().get();
- ConcurrentHashMap
/* partition */ , MessageStore> map = this.stores.get(messageStore.getTopic()); - if (map == null) {
- map = new ConcurrentHashMap
(); - this.stores.put(messageStore.getTopic(), map);
- }
- map.put(messageStore.getPartition(), messageStore);
- } catch (ExecutionException e) {
- throw ThreadUtils.launderThrowable(e);
- }
- }
- tasks.clear();
- }
MessageStoreManager关闭时调用dispose方法,确保资源都正确释放。
- public void dispose() {
- //关闭调度器和调度池
- this.scheduledExecutorService.shutdown();
- if (this.scheduler != null) {
- try {
- this.scheduler.shutdown(true);
- } catch (final SchedulerException e) {
- log.error("Shutdown quartz scheduler failed", e);
- }
- }
- //确保每一个 MessageStore实例都正确关闭
- for (final ConcurrentHashMap
/* partition */ , MessageStore> subMap : MessageStoreManager.this.stores - .values()) {
- if (subMap != null) {
- for (final MessageStore msgStore : subMap.values()) {
- if (msgStore != null) {
- try {
- msgStore.close();
- } catch (final Throwable e) {
- log.error("Try to run close " + msgStore.getTopic() + "," + msgStore.getPartition() + " failed", e);
- }
- }
- }
- }
- }
- //清空stores列表
- this.stores.clear();
- }
MessageStoreManager对外提供了获取的MessageStore的方法getMessageStore(final String topic, final int partition)和getOrCreateMessageStore(final String topic, final int partition) throws IOException。
getMessageStore()从stores列表查找对应的MessageStore,如果不存在则返回空;而getOrCreateMessage()则先检查对应的topic是否曾经配置,如果没有则抛出异常,如果有则判断stores是否已有MessageStore实例,如果没有,则生成MessageStore实例放入到stores列表并返回,如果有,则直接返回。
- public MessageStore getMessageStore(final String topic, final int partition) {
- final ConcurrentHashMap
/* partition */ , MessageStore> map = this.stores.get(topic); - if (map == null) {
- //如果topic对应的MessageStore实例列表不存在,则直接返回null
- return null;
- }
- return map.get(partition);
- }
- Collection
getMessageStoresByTopic(final String topic) { - final ConcurrentHashMap
/* partition */ , MessageStore> map = this.stores.get(topic); - if (map == null) {
- return Collections.emptyList();
- }
- return map.values();
- }
- public MessageStore getOrCreateMessageStore(final String topic, final int partition) throws IOException {
- return this.getOrCreateMessageStoreInner(topic, partition, 0);
- }
- public MessageStore getOrCreateMessageStore(final String topic, final int partition, final long offsetIfCreate) throws IOException {
- return this.getOrCreateMessageStoreInner(topic, partition, offsetIfCreate);
- }
- private MessageStore getOrCreateMessageStoreInner(final String topic, final int partition, final long offsetIfCreate) throws IOException {
- //判断topic是否可用,即是否在topicsPatSet列表中
- if (!this.isLegalTopic(topic)) {
- throw new IllegalTopicException("The server do not accept topic " + topic);
- }
- //判断分区号是否正确
- if (partition < 0 || partition >= this.getNumPartitions(topic)) {
- log.warn("Wrong partition " + partition + ",valid partitions (0," + (this.getNumPartitions(topic) - 1) + ")");
- throw new WrongPartitionException("wrong partition " + partition);
- }
- ConcurrentHashMap
/* partition */ , MessageStore> map = this.stores.get(topic); - //如果topic对应的列表不存在,则生成列表,放进stores中
- if (map == null) {
- map = new ConcurrentHashMap
(); - final ConcurrentHashMap
/* partition */ , MessageStore> oldMap = this.stores.putIfAbsent(topic, map); - if (oldMap != null) {
- map = oldMap;
- }
- }
- //判断列表中是否有存在分区号位partition为的MessageStore实例,如果有,直接返回;如果没有,则生成实例并放进列表中
- MessageStore messageStore = map.get(partition);
- if (messageStore != null) {
- return messageStore;
- } else {
- // 对string加锁,特例
- synchronized (topic.intern()) {
- messageStore = map.get(partition);
- // double check
- if (messageStore != null) {
- return messageStore;
- }
- messageStore = new MessageStore(topic, partition, this.metaConfig, this.deletePolicySelector.select(topic, this.deletePolicy), offsetIfCreate);
- log.info("Created a new message storage for topic=" + topic + ",partition=" + partition);
- map.put(partition, messageStore);
- }
- }
- return messageStore;
- }
- boolean isLegalTopic(final String topic) {
- for (final Pattern pat : this.topicsPatSet) {
- if (pat.matcher(topic).matches()) {
- return true;
- }
- }
- return false;
- }
通过MessageStoreManager,我们把MessageStore和删除策略很好的组织在一起,并在MessageStoreManager提供定时提交的功能,提升了数据的可靠性;通过MessageStoreManager也为其他模块访问存储模块提供了接口。
我觉得MessageStoreManager设计不好的地方在于topicsPatSet,在topic列表发生变化的时候,没有先清空topicsPatSet,而是直接添加,而且没有对topic对应的MessageStore实例进行重新初始化,如果MessageStore实例已经存在,新删除策略配置不能生效。个人建议是一旦topic列表发生变化的时候,重新初始化整个存储模块,保证一致性。
Broker接收从Producer(Client端)发送的消息,也能够返回消息到Consumer(Client),对于Broker来说,就是网络输入输出流的处理。
Broker使用淘宝内部的gecko框架作为网络传输框架,gecko是一个NIO框架,能够支持一下特性:
1、 可自定义协议,协议可扩展、紧凑、高效
2、 可自动管理重连,重连由客户端发起
3、 需进行心跳检测,及时发现连接失效
4、 请求应答模型应当支持同步和异步
5、 连接的分组管理,并且在重连情况下能正确处理连接的分组
6、 请求的发送应当支持四种模型: (1) 向单个连接发起请求 (2) 向分组内的某个连接发起请求,这个选择策略可定义 (3) 向分组内的所有连接发起请求 (4) 向多个分组发起请求,每个分组的请求遵循(2)
7、 编程模型上尽量做到简单、易用、透明,向上层代码屏蔽网络处理的复杂细节。
8、 高度可配置,包括网络参数、服务层参数等
9、 高度可靠,正确处理网络异常,对内存泄露等隐患有预防措施
10、 可扩展
如果时间允许的话,笔者也可以做一下 gecko的源码分析
由于网络模块与其他模块关联性极强,不像存储模块可以独立分析,所以此篇文章开始将从全局开始分析Broker。
先看看Broker的启动类MetamorphosisStartup:
- public static void main(final String[] args) {
- final String configFilePath = getConfigFilePath(args);
- final MetaConfig metaConfig = getMetaConfig(configFilePath);
- final MetaMorphosisBroker server = new MetaMorphosisBroker(metaConfig);
- server.start();
- }
从MetamorphosisStartup可以看出其逻辑是先加载了配置文件,然后构造了MetaMorphosisBroker实例,并调用该实例的start方法,MetaMorphosisBroker才是Broker真正的启动类。
看看真正的启动类MetaMorphosisBroker, MetaMorphosisBroker实现接口MetaMorphosisBrokerMBean,可以通过 jmx 协议关闭MetaMorphosisBroker。看看在构造MetaMorphosisBroker实例的时候干了些什么事情。
- public MetaMorphosisBroker(final MetaConfig metaConfig) {
- //配置信息
- this.metaConfig = metaConfig;
- //Broker对外提供的nio Server
- this.remotingServer = newRemotingServer(metaConfig);
- //线程池管理器,主要是提供给nio Server在并发环境下可以使用多线程处理,提高性能
- this.executorsManager = new ExecutorsManager(metaConfig);
- //全局唯一的id生成器
- this.idWorker = new IdWorker(metaConfig.getBrokerId());
- //存储模块管理器
- this.storeManager = new MessageStoreManager(metaConfig, this.newDeletePolicy(metaConfig));
- //统计模块管理器
- this.statsManager = new StatsManager(this.metaConfig, this.storeManager, this.remotingServer);
- //zookeeper客户端,前面介绍过metaq使用zookeeper作为中间协调者,Broker会将自己注册到zookeeper上,也会从zookeeper查询相关数据
- this.brokerZooKeeper = new BrokerZooKeeper(metaConfig);
- //网络输入输出流处理器
- final BrokerCommandProcessor next = new BrokerCommandProcessor(this.storeManager, this.executorsManager, this.statsManager, this.remotingServer, metaConfig, this.idWorker, this.brokerZooKeeper);
- //事务存储引擎
- JournalTransactionStore transactionStore = null;
- try {
- transactionStore = new JournalTransactionStore(metaConfig.getDataLogPath(), this.storeManager, metaConfig);
- } catch (final Exception e) {
- throw new MetamorphosisServerStartupException("Initializing transaction store failed.", e);
- }
- //带事务处理的网络输入输出流处理器,设计采用了责任链的设计模式,使用事务存储引擎存储中间结果
- this.brokerProcessor = new TransactionalCommandProcessor(metaConfig, this.storeManager, this.idWorker, next, transactionStore, this.statsManager);
- //钩子,JVM退出钩子,钩子实现在JVM退出的时候尽力正确关闭 MetaMorphosisBroker
- this.shutdownHook = new ShutdownHook();
- //注册钩子
- Runtime.getRuntime().addShutdownHook(this.shutdownHook);
- //注册MBean,因为MetaMorphosisBroker实现MetaMorphosisBrokerMBean接口,可以将自己作为MBean注册到MBeanServer
- MetaMBeanServer.registMBean(this, null);
- }
前面我们知道在启动的时候会调用MetaMorphosisBroker的start() 方法,来看看start()方法里究竟做了些什么事情
- public synchronized void start() {
- //判断是否已经启动,如果已经启动,则不在启动
- if (!this.shutdown) {
- return;
- }
- this.shutdown = false;
- //初始化存储模块,加载验证已有数据
- this.storeManager.init();
- //初始化线程池
- this.executorsManager.init();
- //初始化统计模块
- this.statsManager.init();
- //向nio server注册处理器
- this.registerProcessors();
- try {
- //NIO server启动
- this.remotingServer.start();
- } catch (final NotifyRemotingException e) {
- throw new MetamorphosisServerStartupException("start remoting server failed", e);
- }
- try {
- //在/brokers/ids下创建临时节点,名称为节点Id
- this.brokerZooKeeper.registerBrokerInZk();
- //如果为master节点,则创建/brokers/ids/master_config_checksum节点
- this.brokerZooKeeper.registerMasterConfigFileChecksumInZk();
- //添加主题列表监听器,监听主题列表变化,如果主题列表发生变化,则向zookeeper重新注册主题和分区信息
- this.addTopicsChangeListener();
- //注册主题和分区信息
- this.registerTopicsInZk();
- //设置标志位主题和分区注册成功
- this.registerZkSuccess = true;
- } catch (final Exception e) {
- this.registerZkSuccess = false;
- throw new MetamorphosisServerStartupException("Register broker to zk failed", e);
- }
- log.info("Starting metamorphosis server...");
- //初始化输入输出流处理器
- this.brokerProcessor.init();
- log.info("Start metamorphosis server successfully");
- }
下面,让我们具体来看看start()方法里调用的MetaMorphosisBroker每一个方法,首先是registerProcessors()方法:
- private void registerProcessors() {
- //注册Get命令处理器
- this.remotingServer.registerProcessor(GetCommand.class, new GetProcessor(this.brokerProcessor, this.executorsManager.getGetExecutor()));
- //注册Put命令的处理器
- this.remotingServer.registerProcessor(PutCommand.class, new PutProcessor(this.brokerProcessor, this.executorsManager.getUnOrderedPutExecutor()));
- //查询最近有效的offset处理器
- this.remotingServer.registerProcessor(OffsetCommand.class, new OffsetProcessor(this.brokerProcessor, this.executorsManager.getGetExecutor()));
- //心跳检测处理器
- this.remotingServer.registerProcessor(HeartBeatRequestCommand.class, new VersionProcessor(this.brokerProcessor));
- //注册退出命令处理器
- this.remotingServer.registerProcessor(QuitCommand.class, new QuitProcessor(this.brokerProcessor));
- //注册统计信息查询处理器
- this.remotingServer.registerProcessor(StatsCommand.class, new StatsProcessor(this.brokerProcessor));
- //注册事务命令处理器
- this.remotingServer.registerProcessor(TransactionCommand.class, new TransactionProcessor(this.brokerProcessor, this.executorsManager.getUnOrderedPutExecutor()));
- }
依赖于不同的处理器,可以将不同的请求进行处理并返回结果。接下来就是addTopicsChangeListener()方法。
- //addTopicsChangeListener方法比较简单,主要简单配置的topic列表的变化,前面介绍过MetaConfig提供监听机制监听topic列表的变化,该方法向MetaConfig注册一个匿名监听器监听topic列表变化,一旦发生变化则向zookeeper进行注册
- private void addTopicsChangeListener() {
- // 监听topics列表变化并注册到zk
- this.metaConfig.addPropertyChangeListener("topics", new PropertyChangeListener() {
- public void propertyChange(final PropertyChangeEvent evt) {
- try {
- MetaMorphosisBroker.this.registerTopicsInZk();
- } catch (final Exception e) {
- log.error("Register topic in zk failed", e);
- }
- }
- });
- }
MetaMorphosisBroker在启动过程中被调用的方法还有registerTopicsInZk()方法,registerTopicsInZk完成向zookeeper注册topic和分区信息功能。在分析方法之前,有必要插入分析一下Broker在zk上注册的结构,代码在common工程的类MetaZookeeper,该结构是Broker和Client共享的。
Zk中有4中类型的根目录,分别是:
1) /consumers:存放消费者列表以及消费记录,消费者列表主要是以组的方式存在,结构主要如下:
/consumers/xxGroup/ids/xxConsumerId:DATA(“:”后的DATA表示节点xxConsumerId对应的数据) 组内消费者Id;DATA为订阅主题列表,以”,”分隔
/consumers/xxGroup/offsets/xxTopic/分区N:DATA 组内主题分区N的消费进度;DATA为topic下分区N具体进度值
/consumers/xxGroup/owners/xxTopic/分区N:DATA 组内主题分区N的的消费者;DATA为消费者ID,表示XXTopic下分区N的数据由指定的消费者进行消费
2) /brokers/ids:存放Broker列表,如果Broker与Zookeeper失去连接,则会自动注销在/brokers/ids下的broker记录,例子如下:
/brokers/ids/xxBroker
3) /brokers/topics-pub:存放发布的主题列表以及对应的可发送消息的Broker列表,例子如下:
/brokers/topics-pub/xxTopic/xxBroker
/brokers/topics-pub下记录的是可发送消息到xxTopic的Broker列表,意味着有多少个Broker允许存储Client发送到Topic数据
4) /brokers/topics-sub:存放订阅的主题列表以及对应可订阅的Broker列表,例子如下:
/brokers/topics-sub/xxTopic/xxBroker
/brokers/topics-sub下记录的可订阅xxTopic的Broker列表,意味着有多少个Broker允许被Client订阅topic的数据
具体代码如下:
- public MetaZookeeper(final ZkClient zkClient, final String root) {
- //zk客户端
- this.zkClient = zkClient;
- //根路径,默认为空
- this.metaRoot = this.normalize(root);
- //前面讲的消费者列表
- this.consumersPath = this.metaRoot + "/consumers";
- //前面讲的brokers列表
- this.brokerIdsPath = this.metaRoot + "/brokers/ids";
- //前面讲的/brokers/topics-pub
- this.brokerTopicsPubPath = this.metaRoot + "/brokers/topics-pub";
- //前面讲的/brokers/topics-sub
- this.brokerTopicsSubPath = this.metaRoot + "/brokers/topics-sub";
- }
至于更复杂的,我们将在后面具体再进行分析,主要先了解该存储结构即可。
回归正题, registerTopicsInZk方法完成向zookeeper注册topic和分区信息功能
- private void registerTopicsInZk() throws Exception {
- // 先注册配置的topic到zookeeper
- for (final String topic : this.metaConfig.getTopics()) {
- this.brokerZooKeeper.registerTopicInZk(topic, true);
- }
- // 注册加载的topic到zookeeper
- // 从下面代码可以看出,如果当前没有配置的topic,但前面配置过的topic如果有消息存在,依然会向zk注册,在某种程度,我认为这个设计不好,为什么?
- 答:我们前面分析过MessageStoreManager类,里面有getMessageStore()方法和getOrCreateMessageStore()方法,在调用getMessageStore()方法时没有检查参数topic是否在topicsPatSet列表中(topicsPatSet只包含了配置的topic),而getOrCreateMessageStore()方法却检查了,这就意味着使用getOrCreateMessageStore()方法时,如果要查询获取不在topicsPatSet列表中的MessageStore实例会抛出异常,而调用getMessageStore()不会,让人产生疑惑。个人见解认为一旦配置发生更改,如果要做热加载的话则先卸载再重新加载会更合适,而且在getOrCreateMessageStore()和getMessageStore()方法都使用topicsPatSet进行判断,保持一致性
- for (final String topic : this.storeManager.getMessageStores().keySet()) {
- this.brokerZooKeeper.registerTopicInZk(topic, true);
- }
- }
MetaMorphosisBroker还有两个方法,一个是newDeletePolicy()方法,另一个是stop()方法。newDeletePolicy()用于生产全局的存储模块的删除策略,如果没有配置删除策略,则使用该策略。
- //全局删除策略
- private DeletePolicy newDeletePolicy(final MetaConfig metaConfig) {
- final String deletePolicy = metaConfig.getDeletePolicy();
- if (deletePolicy != null) {
- return DeletePolicyFactory.getDeletePolicy(deletePolicy);
- }
- return null;
- }
而stop()方法则主要在MetaMorphosisBroker关闭的时候销毁资源,尽力保证MetaQ的正确关闭。
- public synchronized void stop() {
- //如果关闭了,则不再关闭
- if (this.shutdown) {
- return;
- }
- log.info("Stopping metamorphosis server...");
- this.shutdown = true;
- //关闭与zk连接,注销与当前节点相关的配置
- this.brokerZooKeeper.close(this.registerZkSuccess);
- try {
- // Waiting for zookeeper to notify clients.
- Thread.sleep(this.brokerZooKeeper.getZkConfig().zkSyncTimeMs);
- } catch (InterruptedException e) {
- // ignore
- }
- //释放线程池
- this.executorsManager.dispose();
- //释放存储模块
- this.storeManager.dispose();
- //释放统计模块
- this.statsManager.dispose();
- //关闭NIO Server
- try {
- this.remotingServer.stop();
- } catch (final NotifyRemotingException e) {
- log.error("Shutdown remoting server failed", e);
- }
- //释放输入输出流处理器
- this.brokerProcessor.dispose();
- //如果是独立的zk,则关闭zk
- EmbedZookeeperServer.getInstance().stop();
- //释放钩子
- if (!this.runShutdownHook && this.shutdownHook != null) {
- Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
- }
- log.info("Stop metamorphosis server successfully");
- }
前面介绍过MetaQ使用gecko框架作为网络传输框架,Gecko采用请求/响应的方式组织传输。MetaQ依据定义了请求和响应的命令,由于命令Client和Broker均需要使用,所以放在了common工程的类MetaEncodeCommand中:
- public String GET_CMD = "get"; //请求数据请求
- public String RESULT_CMD = "result"; //结果响应(不包括消息)
- public String OFFSET_CMD = "offset"; //查询最近有效的offset请求
- public String PUT_CMD = "put"; //发送消息命令请求
- public String SYNC_CMD = "sync"; //同步数据请求
- public String QUIT_CMD = "quit"; //退出请求,客户端发送此命令后,服务器将主动关闭连接
- public String VERSION_CMD = "version"; //查询服务器版本请求,也用于心跳检测
- public String STATS_CMD = "stats"; //查询统计信息请求
- public String TRANS_CMD = "transaction"; //事务请求
- //还有一个响应,这里没有响应头的定义,就是返回的消息集
在分析Broker启动类MetaMorphosisBroker时,分析过registerProcessors()方法,针对于不同的请求,Broker注册了不同的处理器,详情见registerProcessors()方法。MetaQ传输采用文本协议设计,非常透明,MetaEncodeCommand定义是请求类型。
Broker分为请求和响应的命令,先看看请求的类图:
注:由于类图工具出现问题,AbstractRequestCommand跟RequestCommand的实现关系并画成依赖关系,请大家理解成实现关系,以后类图同样这样理解(除非接口与接口或者类与类关系使用依赖箭头一定是描述成依赖的)。
所有的请求的命令均继承AbstractRequestCommand类并实现RequestCommand接口,RequestCommand是Gecko框架定义的接口,所有的命令均在编码后能被Gecko框架组织传输,传输协议是透明的文本协议。AbstractRequestCommand定义了基本属性topic和opaque。
- public abstract class AbstractRequestCommand implements RequestCommand, MetaEncodeCommand {
- private Integer opaque; //主要用于标识请求,响应的时候该标识被带回,用于客户端区分是哪个请求的响应
- private String topic;
- }
响应命令的类图如下:
请求命令只分为两类,带有消息集合的响应(DataCommand);带有其他结果的响应(BooleanCommand)。DataCommand里携带的消息的格式与消息的存储结构一直,这样可以提高Broker的处理能力,将消息解析、正确性等验证放在Client,充分发挥Client的计算能力。这里比较麻烦的一点就是其他结果的响应均有BooleanCommand完成,BooleanCommand中只有code和message熟悉,code用来返回响应状态码,比如统计结果的信息的携带就必须由message熟悉来完成,所以结果的响应能力有限,而且必须先转换成字符串,具体如下:
- /**
- * 应答命令,协议格式如下: result code length opaque\r\n message
- */
- public class BooleanCommand extends AbstractResponseCommand implements BooleanAckCommand {
- private String message;
- /**
- * status code in http protocol
- */
- private final int code;//响应的状态码
- public BooleanCommand(final int code, final String message, final Integer opaque) {
- super(opaque);
- this.code = code;
- switch (this.code) {
- case HttpStatus.Success:
- this.setResponseStatus(ResponseStatus.NO_ERROR);
- break;
- default:
- this.setResponseStatus(ResponseStatus.ERROR);
- break;
- }
- this.message = message;
- }
- public String getErrorMsg() {
- return this.message;
- }
- public int getCode() {
- return this.code;
- }
- public void setErrorMsg(final String errorMsg) {
- this.message = errorMsg;
- }
- public IoBuffer encode() {
- //对结果进行编码,以便能在网络上传输
- final byte[] bytes = ByteUtils.getBytes(this.message);
- final int messageLen = bytes == null ? 0 : bytes.length;
- final IoBuffer buffer = IoBuffer.allocate(11 + ByteUtils.stringSize(this.code)
- + ByteUtils.stringSize(this.getOpaque()) + ByteUtils.stringSize(messageLen) + messageLen);
- ByteUtils.setArguments(buffer, MetaEncodeCommand.RESULT_CMD, this.code, messageLen, this.getOpaque());
- if (bytes != null) {
- buffer.put(bytes);
- }
- buffer.flip();
- return buffer;
- }
- }
前面讲到的每个请求都会携带一个属性opaque并且该opaque将会在响应里被带回,AbstractResponseCommand里定义了该被带回的属性opaque。
- /**
- * 应答命令基类
- */
- public abstract class AbstractResponseCommand implements ResponseCommand, MetaEncodeCommand {
- private Integer opaque;
- private InetSocketAddress responseHost; // responseHost和responseTime尚未发现在哪来调用,预计是作者预留的属性
- private long responseTime;
- private ResponseStatus responseStatus; //响应状态
- }
不知道研究过MetaQ源码的朋友们发现DataCommand的encode()是一个空实现没,虽然作者有注释,运行Broker确实不存在问题,但就从设计者的角度来看,还是有些小问题的,代码如下:
- public class DataCommand extends AbstractResponseCommand {
- private final byte[] data;
- ……
- @Override
- public IoBuffer encode() {
- //作者注释: 不做任何事情,发送data command由transferTo替代
- //笔者注释:因为Borker采用了BrokerCommandProcessor 中zeroCopy的机制,所以不会该encode方法的调用,但如果以后改动后,允许zeroCopy变成可配置项,如果该方法不实现,就会出现解析问题,因为配置容易加上,但容易忘记该处的实现。
- return null;
- }
- }
有人会问,Gecko什么调用MetaEncodeCommand的encode()方法,让命令变成可见的明文在网络传输,Gecko又在什么时候将网络传输的数据包装成一个个Command对象?
或许有人已经注意到了笔者在介绍Broker启动类MetaMorphosisBroker的时候估计漏掉了一个方法newRemotingServer()方法,即创建Gecko Server。
- private static RemotingServer newRemotingServer(final MetaConfig metaConfig) {
- final ServerConfig serverConfig = new ServerConfig();
- serverConfig.setWireFormatType(new MetamorphosisWireFormatType()); //注册了MetamorphosisWireFormatType实例,该实例负责编码和解码Command
- serverConfig.setPort(metaConfig.getServerPort());
- final RemotingServer server = RemotingFactory.newRemotingServer(serverConfig);
- return server;
- }
在该方法内注册了一个MetamorphosisWireFormatType实例,该实例负责Command 的编码解码工作,MetamorphosisWireFormatType实现接口WireFormatType。
- public class MetamorphosisWireFormatType extends WireFormatType {
- public static final String SCHEME = "meta";
- public String getScheme() {
- return SCHEME;
- }
- public String name() {
- return "metamorphosis";
- }
- public CodecFactory newCodecFactory() {
- return new MetaCodecFactory();
- }
- public CommandFactory newCommandFactory() {
- return new MetaCommandFactory();
- }
MetamorphosisWireFormatType本身并没有进行编码解码,而是交给了类MetaCodecFactory去实现,另外我们也看到newCommandFactory()方法,该方法主要是用于连接的心跳检测。下面让我们分别来看看这两个类: MetaCommandFactory和MetaCodecFactory,MetaCommandFactory和MetaCodecFactory均是MetamorphosisWireFormatType的内部类
用于心跳检测的类MetaCommandFactory,该类主要有两个方法,创建心跳请求的createHeartBeatCommand()方法和响应心跳请求的createBooleanAckCommand()方法:
- static class MetaCommandFactory implements CommandFactory {
- public BooleanAckCommand createBooleanAckCommand(final CommandHeader request, final ResponseStatus responseStatus, final String errorMsg) {
- //响应心跳请求
- int httpCode = -1;
- switch (responseStatus) {
- case NO_ERROR:
- httpCode = HttpStatus.Success;
- break;
- case THREADPOOL_BUSY:
- case NO_PROCESSOR:
- httpCode = HttpStatus.ServiceUnavilable;
- break;
- case TIMEOUT:
- httpCode = HttpStatus.GatewayTimeout;
- break;
- default:
- httpCode = HttpStatus.InternalServerError;
- break;
- }
- return new BooleanCommand(httpCode, errorMsg, request.getOpaque());
- }
- public HeartBeatRequestCommand createHeartBeatCommand() {
- //前面介绍过VersionCommand用于心跳检测,就是用于此处
- return new VersionCommand(OpaqueGenerator.getNextOpaque());
- }
- }
MetaCodecFactory是MetaQ(包括Broker和Client,因为编码解码Broker和Client都需要)网络传输最重要的一个类,负责命令的编码解码,MetaCodecFactory要实现Gecko框架定义的接口CodecFactory,MetaCodecFactory实例才能被Gecko框架使用,接口CodecFactory就定义了两个方法,返回编码器和解码器(由于Client和Broker均需要使用到MetamorphosisWireFormatType,所以MetamorphosisWireFormatType放在common工程中):
- static class MetaCodecFactory implements CodecFactory {
- //返回解码器
- @Override
- public Decoder getDecoder() {
- return new Decoder() {
- //Gecko框架会在适当的时候调用该方法,并将数据放到参数buff中,
- //用户可以根据buff的内容进行解析,包装成对应的Command类型
- public Object decode(final IoBuffer buff, final Session session) {
- if (buff == null || !buff.hasRemaining()) {
- return null;
- }
- buff.mark();
- //匹配第一个{‘\r’, ‘\n’},也就是找到命令的内容(不包括数据),目前只有PutCommand和SynCommand有数据部分,其他的命令都只有命令的内容
- final int index = LINE_MATCHER.matchFirst(buff);
- if (index >= 0) {
- //获取命令内容
- final byte[] bytes = new byte[index - buff.position()];
- buff.get(bytes);
- //跳过\r\n
- buff.position(buff.position() + 2);
- //将命令字节数组转换成字符串
- final String line = ByteUtils.getString(bytes);
- if (log.isDebugEnabled()) {
- log.debug("Receive command:" + line);
- }
- //以空格为单位分离内容
- final String[] sa = SPLITER.split(line);
- if (sa == null || sa.length == 0) {
- throw new MetaCodecException("Blank command line.");
- }
- //判断内容的第一个字母
- final byte op = (byte) sa[0].charAt(0);
- switch (op) {
- case 'p':
- //如果是p的话,认为是put命令,具体见MetaEncodeCommand定义的命令的内容并解析put命令,具体格式在每个命令的实现类里的注释都有,下面的各个方法的注释也有部分
- return this.decodePut(buff, sa);
- case 'g':
- //如果是g的话,认为是get命令
- return this.decodeGet(sa);
- case 't':
- //如果是g的话,认为是事务命令
- return this.decodeTransaction(sa);
- case 'r':
- //如果是g的话,认为是结果响应
- return this.decodeBoolean(buff, sa);
- case 'v':
- //如果是v的话,则可能是心跳请求或者数据响应,所以得使用更详细的信息进行判断
- if (sa[0].equals("value")) {
- return this.decodeData(buff, sa);
- } else {
- return this.decodeVersion(sa);
- }
- case 's':
- //如果是s的话,则可能是统计请求或者同步,所以得使用更详细的信息进行判断
- if (sa[0].equals("stats")) {
- return this.decodeStats(sa);
- } else {
- return this.decodeSync(buff, sa);
- }
- case 'o':
- //如果是o的话,查询最近可用位置请求
- return this.decodeOffset(sa);
- case 'q':
- //如果是q的话,退出连接请求
- return this.decodeQuit();
- default:
- throw new MetaCodecException("Unknow command:" + line);
- }
- } else {
- return null;
- }
- }
- private Object decodeQuit() {
- return new QuitCommand();
- }
- private Object decodeVersion(final String[] sa) {
- if (sa.length >= 2) {
- return new VersionCommand(Integer.parseInt(sa[1]));
- } else {
- return new VersionCommand(Integer.MAX_VALUE);
- }
- }
- // offset topic group partition offset opaque\r\n
- private Object decodeOffset(final String[] sa) {
- this.assertCommand(sa[0], "offset");
- return new OffsetCommand(sa[1], sa[2], Integer.parseInt(sa[3]), Long.parseLong(sa[4]), Integer.parseInt(sa[5]));
- }
- // stats item opaque\r\n
- // opaque可以为空
- private Object decodeStats(final String[] sa) {
- this.assertCommand(sa[0], "stats");
- int opaque = Integer.MAX_VALUE;
- if (sa.length >= 3) {
- opaque = Integer.parseInt(sa[2]);
- }
- String item = null;
- if (sa.length >= 2) {
- item = sa[1];
- }
- return new StatsCommand(opaque, item);
- }
- // value totalLen opaque\r\n data
- private Object decodeData(final IoBuffer buff, final String[] sa) {
- this.assertCommand(sa[0], "value");
- final int valueLen = Integer.parseInt(sa[1]);
- if (buff.remaining() < valueLen) {
- buff.reset();
- return null;
- } else {
- final byte[] data = new byte[valueLen];
- buff.get(data);
- return new DataCommand(data, Integer.parseInt(sa[2]));
- }
- }
- /**
- * result code length opaque\r\n message
- *
- * @param buff
- * @param sa
- * @return
- */
- private Object decodeBoolean(final IoBuffer buff, final String[] sa) {
- this.assertCommand(sa[0], "result");
- final int valueLen = Integer.parseInt(sa[2]);
- if (valueLen == 0) {
- return new BooleanCommand(Integer.parseInt(sa[1]), null, Integer.parseInt(sa[3]));
- } else {
- if (buff.remaining() < valueLen) {
- buff.reset();
- return null;
- } else {
- final byte[] data = new byte[valueLen];
- buff.get(data);
- return new BooleanCommand(Integer.parseInt(sa[1]), ByteUtils.getString(data), Integer.parseInt(sa[3]));
- }
- }
- }
- // get topic group partition offset maxSize opaque\r\n
- private Object decodeGet(final String[] sa) {
- this.assertCommand(sa[0], "get");
- return new GetCommand(sa[1], sa[2], Integer.parseInt(sa[3]), Long.parseLong(sa[4]), Integer.parseInt(sa[5]), Integer.parseInt(sa[6]));
- }
- // transaction key sessionId type [timeout] [unique qualifier]
- // opaque\r\n
- private Object decodeTransaction(final String[] sa) {
- this.assertCommand(sa[0], "transaction");
- final TransactionId transactionId = this.getTransactionId(sa[1]);
- final TransactionType type = TransactionType.valueOf(sa[3]);
- switch (sa.length) {
- case 7:
- // Both include timeout and unique qualifier.
- int timeout = Integer.valueOf(sa[4]);
- String uniqueQualifier = sa[5];
- TransactionInfo info = new TransactionInfo(transactionId, sa[2], type, uniqueQualifier, timeout);
- return new TransactionCommand(info, Integer.parseInt(sa[6]));
- case 6:
- // Maybe timeout or unique qualifier
- if (StringUtils.isNumeric(sa[4])) {
- timeout = Integer.valueOf(sa[4]);
- info = new TransactionInfo(transactionId, sa[2], type, null, timeout);
- return new TransactionCommand(info, Integer.parseInt(sa[5]));
- } else {
- uniqueQualifier = sa[4];
- info = new TransactionInfo(transactionId, sa[2], type, uniqueQualifier, 0);
- return new TransactionCommand(info, Integer.parseInt(sa[5]));
- }
- case 5:
- // Without timeout and unique qualifier.
- info = new TransactionInfo(transactionId, sa[2], type, null);
- return new TransactionCommand(info, Integer.parseInt(sa[4]));
- default:
- throw new MetaCodecException("Invalid transaction command:" + StringUtils.join(sa));
- }
- }
- private TransactionId getTransactionId(final String s) {
- return TransactionId.valueOf(s);
- }
- // sync topic partition value-length flag msgId
- // opaque\r\n
- private Object decodeSync(final IoBuffer buff, final String[] sa) {
- this.assertCommand(sa[0], "sync");
- final int valueLen = Integer.parseInt(sa[3]);
- if (buff.remaining() < valueLen) {
- buff.reset();
- return null;
- } else {
- final byte[] data = new byte[valueLen];
- buff.get(data);
- switch (sa.length) {
- case 7:
- // old master before 1.4.4
- return new SyncCommand(sa[1], Integer.parseInt(sa[2]), data, Integer.parseInt(sa[4]), Long.valueOf(sa[5]), -1, Integer.parseInt(sa[6]));
- case 8:
- // new master since 1.4.4
- return new SyncCommand(sa[1], Integer.parseInt(sa[2]), data, Integer.parseInt(sa[4]), Long.valueOf(sa[5]), Integer.parseInt(sa[6]), Integer.parseInt(sa[7]));
- default:
- throw new MetaCodecException("Invalid Sync command:" + StringUtils.join(sa));
- }
- }
- }
- // put topic partition value-length flag checksum
- // [transactionKey]
- // opaque\r\n
- private Object decodePut(final IoBuffer buff, final String[] sa) {
- this.assertCommand(sa[0], "put");
- final int valueLen = Integer.parseInt(sa[3]);
- if (buff.remaining() < valueLen) {
- buff.reset();
- return null;
- } else {
- final byte[] data = new byte[valueLen];
- buff.get(data);
- switch (sa.length) {
- case 6:
- // old clients before 1.4.4
- return new PutCommand(sa[1], Integer.parseInt(sa[2]), data, null, Integer.parseInt(sa[4]), Integer.parseInt(sa[5]));
- case 7:
- // either transaction command or new clients since
- // 1.4.4
- String slot = sa[5];
- char firstChar = slot.charAt(0);
- if (Character.isDigit(firstChar) || '-' == firstChar) {
- // slot is checksum.
- int checkSum = Integer.parseInt(slot);
- return new PutCommand(sa[1], Integer.parseInt(sa[2]), data, Integer.parseInt(sa[4]), checkSum, null, Integer.parseInt(sa[6]));
- } else {
- // slot is transaction id.
- return new PutCommand(sa[1], Integer.parseInt(sa[2]), data, this.getTransactionId(slot), Integer.parseInt(sa[4]), Integer.parseInt(sa[6]));
- }
- case 8:
- // New clients since 1.4.4
- // A transaction command
- return new PutCommand(sa[1], Integer.parseInt(sa[2]), data, Integer.parseInt(sa[4]), Integer.parseInt(sa[5]), this.getTransactionId(sa[6]), Integer.parseInt(sa[7]));
- default:
- throw new MetaCodecException("Invalid put command:" + StringUtils.join(sa));
- }
- }
- }
- private void assertCommand(final String cmd, final String expect) {
- if (!expect.equals(cmd)) {
- throw new MetaCodecException("Expect " + expect + ",but was " + cmd);
- }
- }
- };
- }
- @Override
- public Encoder getEncoder() {
- //返回编码器
- return new Encoder() {
- @Override
- public IoBuffer encode(final Object message, final Session session) {
- //框架会在适当的时候调用编码器的encode()方法,前面说过如果响应的命令是DataCommand的时候假设不是zeroCopy的话,会出现问题。原因就在这里,因为如果不使用zeroCopy的话,返回给Gecko框架的是一个DataCommand的实例,这时候会调用到此方法,而此方法并没有按照DataCommand的格式进行编码,解码器会识别不了,所以容易出问题
- return ((MetaEncodeCommand) message).encode();
- }
- };
- }
前面还介绍到过MetaMorphosisBroker在启动时会注册请求类型与Processor的映射,见代码:
- private void registerProcessors() {
- this.remotingServer.registerProcessor(GetCommand.class, new GetProcessor(this.brokerProcessor, this.executorsManager.getGetExecutor()));
- this.remotingServer.registerProcessor(PutCommand.class, new PutProcessor(this.brokerProcessor, this.executorsManager.getUnOrderedPutExecutor()));
- this.remotingServer.registerProcessor(OffsetCommand.class, new OffsetProcessor(this.brokerProcessor, this.executorsManager.getGetExecutor()));
- this.remotingServer.registerProcessor(HeartBeatRequestCommand.class, new VersionProcessor(this.brokerProcessor));
- this.remotingServer.registerProcessor(QuitCommand.class, new QuitProcessor(this.brokerProcessor));
- this.remotingServer.registerProcessor(StatsCommand.class, new StatsProcessor(this.brokerProcessor));
- this.remotingServer.registerProcessor(TransactionCommand.class, new TransactionProcessor(this.brokerProcessor, this.executorsManager.getUnOrderedPutExecutor()));
- }
依据注册的类型,Gecko框架将会根据解析出来的命令实例调用处理器的不同方法,并返回不同请求的响应。下面让我们来看看不同的处理到底做了些什么事情?因为是Broker针对请求的处理,所以所有的Processor都在server工程中,先上类图:
所有的处理器均实现了RequestProcessor接口,该接口由Gecko框架定义,RequestProcessor类中只定义了两个方法:
- public interface RequestProcessor
extends RequestCommand> { - /**
- * 处理请求
- *
- * @param request请求命令
- * @param conn 请求来源的连接
- */
- public void handleRequest(T request, Connection conn);
- /**
- * 用户自定义的线程池,如果提供,那么请求的处理都将在该线程池内执行
- *
- * @return
- */
- public ThreadPoolExecutor getExecutor();
- }
所以,加上上一篇文章,我们可以得出MetaQ的大致网络处理流程图解如下:
上一篇以及上上篇基本介绍了MetaQ如何使用Gecko框架在网络上传输数据,今天将继续进一步介绍在Broker,各种命令的处理逻辑(暂时将不涉及到事务处理)。
依旧是在MetaMorphosisBroker的registerProcessors()方法中,我们可以注意到一点,每个Processor的实例在构造的时候都注入了一个brokerProcessor的变量,该变量的类型为CommandProcessor。其实,各个Processor的业务逻辑又委托给了CommandProcessor进行处理,比如我们看看其中的GetProcessor的源码:
- public class GetProcessor implements RequestProcessor
{ - public static final Logger log = LoggerFactory.getLogger(GetProcessor.class);
- private final ThreadPoolExecutor executor;
- private final CommandProcessor processor;
- public GetProcessor(final CommandProcessor processor, final ThreadPoolExecutor executor) {
- this.processor = processor;
- this.executor = executor;
- }
- @Override
- public ThreadPoolExecutor getExecutor() {
- return this.executor;
- }
- @Override
- public void handleRequest(final GetCommand request, final Connection conn) {
- // Processor并没有处理具体的业务逻辑,而是将业务逻辑交给CommandProcessor的processGetCommand()进行处理,Processor只是将处理结果简单的返回给客户端
- final ResponseCommand response = this.processor.processGetCommand(request, SessionContextHolder.getOrCreateSessionContext(conn, null));
- if (response != null) {
- RemotingUtils.response(conn, response);
- }
- }
- }
CommandProcessor业务逻辑的处理模块采用责任链的处理方式,目前来说只有两个类型的业务逻辑处理单元:带有事务处理(TransactionalCommandProcessor)的和不带有事务处理(BrokerCommandProcessor)的。老习惯,先上类图:
CommandProcessor接口定义如下:
- public interface CommandProcessor extends Service {
- //处理Put命令,结果通过PutCallback的回调返回
- public void processPutCommand(final PutCommand request, final SessionContext sessionContext, final PutCallback cb) throws Exception;
- //处理Get命令
- public ResponseCommand processGetCommand(GetCommand request, final SessionContext ctx);
- /**
- * Under conditions that cannot use notify-remoting directly.
- */
- //处理Get命令,并根据条件zeroCopy是否使用zeroCopy
- public ResponseCommand processGetCommand(GetCommand request, final SessionContext ctx, final boolean zeroCopy);
- //处理查询最近可用offset位置请求
- public ResponseCommand processOffsetCommand(OffsetCommand request, final SessionContext ctx);
- //处理退出请求
- public void processQuitCommand(QuitCommand request, final SessionContext ctx);
- public ResponseCommand processVesionCommand(VersionCommand request, final SessionContext ctx);
- //处理统计请求
- public ResponseCommand processStatCommand(StatsCommand request, final SessionContext ctx);
- //下面主要定义与事务相关的方法,暂时先不介绍
- public void removeTransaction(final XATransactionId xid);
- public Transaction getTransaction(final SessionContext context, final TransactionId xid) throws MetamorphosisException, XAException;
- public void forgetTransaction(final SessionContext context, final TransactionId xid) throws Exception;
- public void rollbackTransaction(final SessionContext context, final TransactionId xid) throws Exception;
- public void commitTransaction(final SessionContext context, final TransactionId xid, final boolean onePhase) throws Exception;
- public int prepareTransaction(final SessionContext context, final TransactionId xid) throws Exception;
- public void beginTransaction(final SessionContext context, final TransactionId xid, final int seconds) throws Exception;
- public TransactionId[] getPreparedTransactions(final SessionContext context, String uniqueQualifier) throws Exception;
- }
细心的读者会发现,每个定义的方法的参数都有一个参数SessionContext,SessionContext携带了连接的信息,由Broker创建,具体代码见SessionContextHolder的getOrCreateSessionContext()方法,getOrCreateSessionContext()方法在Processor委托给CommandProcessor处理业务逻辑时被调用。
BrokerCommandProcessor和TransactionalCommandProcessor其实就是各模块的粘合剂,将各模块的功能统一协调形成整体对外提供功能。BrokerCommandProcessor的实现并不难理解,下面让我们来具体分析一下BrokerCommandProcessor这个类:
- //Put请求的业务逻辑处理
- @Override
- public void processPutCommand(final PutCommand request, final SessionContext sessionContext, final PutCallback cb) {
- final String partitionString = this.metaConfig.getBrokerId() + "-" + request.getPartition();
- //统计计算
- this.statsManager.statsPut(request.getTopic(), partitionString, 1);
- this.statsManager.statsMessageSize(request.getTopic(), request.getData().length);
- int partition = -1;
- try {
- //如果对应存储的分区已经关闭,则拒绝该消息
- if (this.metaConfig.isClosedPartition(request.getTopic(), request.getPartition())) {
- log.warn("Can not put message to partition " + request.getPartition() + " for topic=" + request.getTopic() + ",it was closed");
- if (cb != null) {
- cb.putComplete(new BooleanCommand(HttpStatus.Forbidden, this.genErrorMessage(request.getTopic(), request.getPartition()) + "Detail:partition[" + partitionString + "] has been closed", request.getOpaque()));
- }
- return;
- }
- partition = this.getPartition(request);
- //获取对应Topic分区的MessageStore实例
- final MessageStore store = this.storeManager.getOrCreateMessageStore(request.getTopic(), partition);
- // 如果是动态添加的topic,需要注册到zk
- //就到目前为止,我着实没想明白下面这句代码的用途是什么?
- //如果topic没有在该Broker的配置中配置,在MessageStoreManager中的isLegalTopic()方法中检查就通不过而抛出异常,那么下面这句代码怎么样都不会被执行,而Client要向Broker发送消息,一定要先发布topic,保证topic在zk发布;
- this.brokerZooKeeper.registerTopicInZk(request.getTopic(), false);
- // 设置唯一id
- final long messageId = this.idWorker.nextId();
- //存储消息,之前的文章介绍过Broker的存储使用回调的方式,易于异步的实现,代码简单不分析
- store.append(messageId, request, new StoreAppendCallback(partition, partitionString, request, messageId, cb));
- } catch (final Exception e) {
- //发生异常,统计计算回滚
- this.statsManager.statsPutFailed(request.getTopic(), partitionString, 1);
- log.error("Put message failed", e);
- if (cb != null) {
- //返回结果
- cb.putComplete(new BooleanCommand(HttpStatus.InternalServerError, this.genErrorMessage(request.getTopic(), partition) + "Detail:" + e.getMessage(), request.getOpaque()));
- }
- }
- }
- @Override
- // GET请求的业务逻辑处理
- public ResponseCommand processGetCommand(final GetCommand request, final SessionContext ctx) {
- //默认为zeroCopy
- return this.processGetCommand(request, ctx, true);
- }
- @Override
- public ResponseCommand processGetCommand(final GetCommand request, final SessionContext ctx, final boolean zeroCopy) {
- //获取查询信息
- final String group = request.getGroup();
- final String topic = request.getTopic();
- //统计计数(请求数统计)
- this.statsManager.statsGet(topic, group, 1);
- // 如果分区被关闭,禁止读数据 --wuhua
- if (this.metaConfig.isClosedPartition(topic, request.getPartition())) {
- log.warn("can not get message for topic=" + topic + " from partition " + request.getPartition() + ",it closed,");
- return new BooleanCommand(HttpStatus.Forbidden, "Partition[" + this.metaConfig.getBrokerId() + "-" + request.getPartition() + "] has been closed", request.getOpaque());
- }
- //获取topic对应分区的MessageStore实例,如果实例不存在,则返回NotFound
- final MessageStore store = this.storeManager.getMessageStore(topic, request.getPartition());
- if (store == null) {
- //统计计数
- this.statsManager.statsGetMiss(topic, group, 1);
- return new BooleanCommand(HttpStatus.NotFound, "The topic `" + topic + "` in partition `" + request.getPartition() + "` is not exists", request.getOpaque());
- }
- //如果请求的起始位置<0,判定该请求无效
- if (request.getMaxSize() <= 0) {
- return new BooleanCommand(HttpStatus.BadRequest, "Bad request,invalid max size:" + request.getMaxSize(), request.getOpaque());
- }
- try {
- //读取由request.getOffset()开始的消息集合
- final MessageSet set = store.slice(request.getOffset(), Math.min(this.metaConfig.getMaxTransferSize(), request.getMaxSize()));
- //如果当前消息集不为空
- if (set != null) {
- //判断是否zeroCopy,如果是zeroCopy,则直接写;如果不是,则将消息集包装成DataCommand,这也就是前面为什么说DataCommand要实现encode()方法的缘故
- if (zeroCopy) {
- set.write(request, ctx);
- return null;
- } else {
- // refer to the code of line 440 in MessageStore
- // create two copies of byte array including the byteBuffer
- // and new bytes
- // this may not a good use case of Buffer
- final ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.metaConfig.getMaxTransferSize(), request.getMaxSize()));
- set.read(byteBuffer);
- byteBuffer.flip();
- final byte[] bytes = new byte[byteBuffer.remaining()];
- byteBuffer.get(bytes);
- return new DataCommand(bytes, request.getOpaque());
- }
- } else {
- //如果为空消息集,则认为请求无效
- //统计计数
- this.statsManager.statsGetMiss(topic, group, 1);
- this.statsManager.statsGetFailed(topic, group, 1);
- // 当请求的偏移量大于实际最大值时,返回给客户端实际最大的偏移量.
- final long maxOffset = store.getMaxOffset();
- final long requestOffset = request.getOffset();
- if (requestOffset > maxOffset && (this.metaConfig.isUpdateConsumerOffsets() || requestOffset == Long.MAX_VALUE)) {
- log.info("offset[" + requestOffset + "] is exceeded,tell the client real max offset: " + maxOffset + ",topic=" + topic + ",group=" + group);
- this.statsManager.statsOffset(topic, group, 1);
- return new BooleanCommand(HttpStatus.Moved, String.valueOf(maxOffset), request.getOpaque());
- } else {
- return new BooleanCommand(HttpStatus.NotFound, "Could not find message at position " + requestOffset, request.getOpaque());
- }
- }
- } catch (final ArrayIndexOutOfBoundsException e) {
- log.error("Could not get message from position " + request.getOffset() + ",it is out of bounds,topic=" + topic);
- // 告知最近可用的offset
- this.statsManager.statsGetMiss(topic, group, 1);
- this.statsManager.statsGetFailed(topic, group, 1);
- final long validOffset = store.getNearestOffset(request.getOffset());
- this.statsManager.statsOffset(topic, group, 1);
- return new BooleanCommand(HttpStatus.Moved, String.valueOf(validOffset), request.getOpaque());
- } catch (final Throwable e) {
- log.error("Could not get message from position " + request.getOffset(), e);
- this.statsManager.statsGetFailed(topic, group, 1);
- return new BooleanCommand(HttpStatus.InternalServerError, this.genErrorMessage(request.getTopic(), request.getPartition()) + "Detail:" + e.getMessage(), request.getOpaque());
- }
- }
- //查询最近可用offset请求的业务逻辑处理
- @Override
- public ResponseCommand processOffsetCommand(final OffsetCommand request, final SessionContext ctx) {
- //统计计数
- this.statsManager.statsOffset(request.getTopic(), request.getGroup(), 1);
- //获取topic对应分区的MessageStore实例
- final MessageStore store = this.storeManager.getMessageStore(request.getTopic(), request.getPartition());
- //如果为空,则返回未找到
- if (store == null) {
- return new BooleanCommand(HttpStatus.NotFound, "The topic `" + request.getTopic() + "` in partition `" + request.getPartition() + "` is not exists", request.getOpaque());
- }
- //获取topic对应分区最近可用的offset
- final long offset = store.getNearestOffset(request.getOffset());
- return new BooleanCommand(HttpStatus.Success, String.valueOf(offset), request.getOpaque());
- }
- //退出请求业务逻辑处理
- @Override
- public void processQuitCommand(final QuitCommand request, final SessionContext ctx) {
- try {
- if (ctx.getConnection() != null) {
- //关闭与客户端的连接
- ctx.getConnection().close(false);
- }
- } catch (final NotifyRemotingException e) {
- // ignore
- }
- }
- //版本查询请求业务逻辑处理
- @Override
- public ResponseCommand processVesionCommand(final VersionCommand request, final SessionContext ctx) {
- //返回当前Broker版本
- return new BooleanCommand(HttpStatus.Success, BuildProperties.VERSION, request.getOpaque());
- }
- //统计请求查询业务逻辑处理
- @Override
- public ResponseCommand processStatCommand(final StatsCommand request, final SessionContext ctx) {
- //判断类型,如果类型以config 开头,则传输整个配置文件
- final String item = request.getItem();
- if ("config".equals(item)) {
- return this.processStatsConfig(request, ctx);
- } else {
- //如果是获取统计结果,则从统计模块获取响应结果并返回给客户端
- final String statsInfo = this.statsManager.getStatsInfo(item);
- return new BooleanCommand(HttpStatus.Success, statsInfo, request.getOpaque());
- }
- }
- //获取配置文件内容,使用zeroCopy将文件内容发送到客户端,构造的响应用BooleanCommand
- @SuppressWarnings("resource")
- private ResponseCommand processStatsConfig(final StatsCommand request, final SessionContext ctx) {
- try {
- final FileChannel fc = new FileInputStream(this.metaConfig.getConfigFilePath()).getChannel();
- // result code length opaque\r\n
- IoBuffer buf = IoBuffer.allocate(11 + 3 + ByteUtils.stringSize(fc.size()) + ByteUtils.stringSize(request.getOpaque()));
- ByteUtils.setArguments(buf, MetaEncodeCommand.RESULT_CMD, HttpStatus.Success, fc.size(), request.getOpaque());
- buf.flip();
- ctx.getConnection().transferFrom(buf, null, fc, 0, fc.size(), request.getOpaque(),
- new SingleRequestCallBackListener() {
- @Override
- public void onResponse(ResponseCommand responseCommand, Connection conn) {
- this.closeChannel();
- }
- @Override
- public void onException(Exception e) {
- this.closeChannel();
- }
- private void closeChannel() {
- try {
- fc.close();
- } catch (IOException e) {
- log.error("IOException while stats config", e);
- }
- }
- @Override
- public ThreadPoolExecutor getExecutor() {
- return null;
- }
- }, 5000, TimeUnit.MILLISECONDS);
- } catch (FileNotFoundException e) {
- log.error("Config file not found:" + this.metaConfig.getConfigFilePath(), e);
- return new BooleanCommand(HttpStatus.InternalServerError, "Config file not found:" + this.metaConfig.getConfigFilePath(), request.getOpaque());
- } catch (IOException e) {
- log.error("IOException while stats config", e);
- return new BooleanCommand(HttpStatus.InternalServerError, "Read config file error:" + e.getMessage(), request.getOpaque());
- } catch (NotifyRemotingException e) {
- log.error("NotifyRemotingException while stats config", e);
- }
- return null;
- }
如果不使用内容的事务,Broker已经完成了从网络接收数据—>处理请求(存储消息/查询结果等)—>返回结果的流程,Broker最基础的流程已经基本分析完毕。