SEDA

Cassandra 的操作使用的并发模型。SEDA将应用程序分解为由事件队列分隔的各个阶段，并引入动态资源控制器的概念，允许应用程序动态调整，不断适应变化的负载。它是事件驱动的，收到请求后，先构造event，然后放到stage的请求队列中，stage从请求队列里拿到event进行处理，处理结束后，构造event_next并放入stage_next的请求队列。Stage之间通过队列来衔接，每个stage单独治理，各stage之间相互解耦。以异步响应的方式处理事件。

image

Threaded server design:Each incoming request is dispatched to a separate thread,which performs the entire processing for the request and return a result to the client. Edges re-preset control flow between components. Note that other I/O operations,such as disk access,are not shown here,but are incorporated within each threads' request processing

image

cassandra.concurrent

Cassandra中基于SEDA的并发模型实现目录。它是整个模型的基础。

StageManager

StageManager类中主要维护了一份stages列表。

EnumMap stages = new EnumMap<>(Stage.class);

static {
    stages.put(Stage.TRACING, tracingExecutor());
......

}

根据不同的Stage枚举创建出不同的ExecutorService，在使用时获取对应的ExecuteService执行不同的任务。

Ex:

ListenableFutureTask task = ListenableFutureTask.create(runnable, null); StageManager.getStage(Stage.GOSSIP).execute(task);

StageManager.getStage(Stage.TRACING).execute(new WrappedRunnable(){});

LocalAwareExecutorService

Cassandra所有线程池的抽象接口，继承ExecutorService类，构建了基本的任务模型。添加了两个自己的方法：

    // we need a way to inject a TraceState directly into the Executor context without going through
    // the global Tracing sessions; see CASSANDRA-5668
    public void execute(Runnable command, ExecutorLocals locals);

    // permits executing in the context of the submitting thread
    public void maybeExecuteImmediately(Runnable command);

ExecutorLocals是cassandra的trace跟踪类。调用execute(Runnable command, ExecutorLocals locals)实现了链路的跟踪。

cassandra-concurrent.jpg

常用的实现类只有两个：

SEPExecutor：Executor中引用了SharedExecutorPool，SharedExecutorPool并非单例模式，但它在cassandra中是一个静态单例。

public static final SharedExecutorPool SHARED = new SharedExecutorPool("SharedPool");

通过newExecutor方法创建SEPExecutor，把自己方入创建的SEPExecutor中，并维护一个SEPExecutor列表。这意味着所有的共享一个SharedExecutorPool

    public LocalAwareExecutorService newExecutor(int maxConcurrency, int maxQueuedTasks, String jmxPath, String name)
    {
        SEPExecutor executor = new SEPExecutor(this, maxConcurrency, maxQueuedTasks, jmxPath, name);
        executors.add(executor);
        return executor;
    }

SEPExecutor调用execute时，SEPExecutor会把Thread封装成一个FutureTask放入Queue。这些Task最终是通过SEPWorker去执行的，SEPWorker在SEPExecutor的execute第一次被调用或停止后第一次被调用时通过SharedExecutorPool创建一个Work.SPINNING的SEPWorker，他是一个自循环worker，他有匿名内部类Worker可以用来治理状态和保存SEPExecutor，使得SEPWorker能找到SEPExecutor。由于execute放入Runnable是会使用Work.SPINNING的worker，所以任务并不会立刻被执行，SEPWorker会在自旋时分配SEPExecutor以及切换状态。而上面使用提到的maybeExecuteImmediately方法会立即执行任务。

// 4种Work:
static final Work STOP_SIGNALLED = new Work();
static final Work STOPPED = new Work();
static final Work SPINNING = new Work();
static final Work WORKING = new Work();

Work work = schedule(Work.SPINNING);
new SEPWorker(workerId.incrementAndGet(), work, this);

SEPWorker创建后启动Thread，Thread自旋执行ThreadPool

SEPWorker(Long workerId, Work initialState, SharedExecutorPool pool)
    {
        this.pool = pool;
        this.workerId = workerId;
        thread = new FastThreadLocalThread(this, pool.poolName + "-Worker-" + workerId);
        thread.setDaemon(true);
        set(initialState);
        thread.start();
    }

public void run()
    {
        SEPExecutor assigned = null;
        Runnable task = null;
        try
        {
            while (true)
            {
               if (isSpinning() && !selfAssign())
                 {
                    doWaitSpin();
                    continue;
                }
              ...  
 }
         }
    }

SEPExecutor、SEPWorker和SharedExecutorPool组成了一套线程池的体系。与其他线程池不用的是，它所有worker共享一个pool，空闲的worker自由的寻找需要执行的executor，executor保存需要执行的runnable。

JMXEnabledThreadPoolExecutor：
JMXEnabledThreadPoolExecutor相对简单一些，它继承了ThreadPoolExecutor类，大部分功能和父类相同，提供了部分扩展。比如DebuggableThreadPoolExecutor类中对Exception处理的扩展，以及ThreadPoolExecutor创建时维护了ThreadPoolMetrics和MBeanWrapper这两个类。

public static final RejectedExecutionHandler blockingExecutionHandler = new RejectedExecutionHandler()
    {...... };
 public DebuggableThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue workQueue, ThreadFactory threadFactory)
    {
        super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, threadFactory);
        allowCoreThreadTimeOut(true);

        // block task submissions until queue has room.
        // this is fighting TPE's design a bit because TPE rejects if queue.offer reports a full queue.
        // we'll just override this with a handler that retries until it gets in.  ugly, but effective.
        // (there is an extensive analysis of the options here at
        //  http://today.java.net/pub/a/today/2008/10/23/creating-a-notifying-blocking-thread-pool-executor.html)
        this.setRejectedExecutionHandler(blockingExecutionHandler);
    }

public JMXEnabledThreadPoolExecutor(int corePoolSize,
                                        int maxPoolSize,
                                        long keepAliveTime,
                                        TimeUnit unit,
                                        BlockingQueue workQueue,
                                        NamedThreadFactory threadFactory,
                                        String jmxPath)
    {
        super(corePoolSize, maxPoolSize, keepAliveTime, unit, workQueue, threadFactory);
        super.prestartAllCoreThreads();
        metrics = new ThreadPoolMetrics(this, jmxPath, threadFactory.id);

        mbeanName = "org.apache.cassandra." + jmxPath + ":type=" + threadFactory.id;
        MBeanWrapper.instance.registerMBean(this, mbeanName);
    }

ThreadPoolMetrics：基于metrics-core.jar包中的MetricRegistry实现对于该线程池指标的监控。
MBeanWrapper：MBeanWrapper有两个实现类NoOpMBeanWrapper、PlatformMBeanWrapper。NoOpMBeanWrapper不做任何事情。PlatformMBeanWrapper使用了JMX的MBeanServer类。

Cassandra启动

Cassandra通过CassandraDaemon类的main函数启动

    public static void main(String[] args)
    {
        instance.activate();
    }

启动时，Cassandra需要做以下几件事：

初始化配置文件

public void applyConfig()
    {
        DatabaseDescriptor.daemonInitialization();
    }

DatabaseDescriptor是Cassandra的配置管理类，Cassandra节点的配置属性会保存在Config类中。

public static void daemonInitialization() throws ConfigurationException
    {
        daemonInitialization(DatabaseDescriptor::loadConfig); //通过loadConfig方法将Config加载入DatabaseDescriptor。
    }

Cassandra读写操作

通过thrift包我们可以对Cassandra进行通信，CassandraServer是读写命令的入口。

1. Read

相关知识点：

SEDA模型：cassandra核心思想
ThreadPoolExecutor：多线程管理相关点
Metric：线程调用监控
MBeanServer：
Netty：Cassandra基于CQL3协议的通讯基于Netty框架

参考资料：

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Understanding Cassandra Code Base

Cassandra源码阅读（未完成）