Java 并发框架 Disruptor 源码分析:RingBuffer

  • Java 并发框架 Disruptor 源码分析RingBuffer
    • Disruptor 介绍
    • RingBuffer 介绍
    • RingBuffer 源码分析
      • 初始化
      • 写操作
      • 读操作
      • 总结
    • 参考资料

Java 并发框架 Disruptor 源码分析:RingBuffer

Disruptor 介绍

按照官方文档的说法:Disruptor 是一个高性能的线程间通信库。它来自于 LMAX 对并发、性能和非阻塞算法的研究,如今交易系统基础架构的核心部分。

The LMAX Disruptor is a high performance inter-thread messaging library. It grew out of LMAX’s research into concurrency, performance and non-blocking algorithms and today forms a core part of their Exchange’s infrastructure.

Disruptor 高性能的原因有以下几点:
1. 无锁数据结构 RingBuffer
2. 伪共享 & 缓存行填充

这篇文章里,我们首先介绍一下环形缓冲区 RingBuffer,然后深入源码分析一下 Disruptor 是如何做到无锁操作 RingBuffer的。

RingBuffer 介绍

环形缓冲区(ring buffer),是一种用于表示一个固定尺寸、头尾相连的缓冲区的数据结构,适合缓存数据流。RingBuffer 通常采用数组实现,对 CPU 缓存友好,性能比链表好。

一个圆形缓冲区有四个关键参数:
1. 内存地址。
2. 缓冲区长度。
3. 存储在缓冲区中的有效数据的开始位置:读指针。
4. 存储在缓冲区中的有效数据的结尾位置:写指针。

下面深入源码研究一下 Disruptor 中 RingBuffer 的实现,看看它使如何做到无锁读写的。

RingBuffer 源码分析

在看源码之前,需要先了解一下 Disruptor 是如何使用的:Disruptor 入门。

初始化

我们先来看一下 RingBuffer 类的构造方法:

public final class RingBuffer<E> extends RingBufferFields<E> implements Cursored, EventSequencer<E>, EventSink<E> 
{
    RingBuffer(EventFactory eventFactory, Sequencer sequencer)
    {
        super(eventFactory, sequencer);
    }
}

abstract class RingBufferFields extends RingBufferPad
{
    private static final int BUFFER_PAD;
    private static final long REF_ARRAY_BASE;
    private static final int REF_ELEMENT_SHIFT;
    private static final Unsafe UNSAFE = Util.getUnsafe();

    static
    {
        // 获取用户给定数组寻址的换算因子,也就是数组中每个元素引用所占字节数目
        final int scale = UNSAFE.arrayIndexScale(Object[].class);
        if (4 == scale)
        {
            REF_ELEMENT_SHIFT = 2;
        }
        else if (8 == scale)
        {
            REF_ELEMENT_SHIFT = 3;
        }
        else
        {
            throw new IllegalStateException("Unknown pointer size");
        }
        BUFFER_PAD = 128 / scale;
        // Including the buffer pad in the array base offset
        REF_ARRAY_BASE = UNSAFE.arrayBaseOffset(Object[].class) + (BUFFER_PAD << REF_ELEMENT_SHIFT);
    }

    RingBufferFields(EventFactory eventFactory, Sequencer sequencer)
    {
        this.sequencer = sequencer;
        this.bufferSize = sequencer.getBufferSize();

        if (bufferSize < 1)
        {
            throw new IllegalArgumentException("bufferSize must not be less than 1");
        }
        if (Integer.bitCount(bufferSize) != 1)
        {
            // bufferSize 必须是 2 的 N 次方
            throw new IllegalArgumentException("bufferSize must be a power of 2");
        }

        this.indexMask = bufferSize - 1;
        this.entries = new Object[sequencer.getBufferSize() + 2 * BUFFER_PAD];
        fill(eventFactory);
    }
    // 只对数组中间的 bufferSize 个元素进行初始化
    private void fill(EventFactory eventFactory)
    {
        for (int i = 0; i < bufferSize; i++)
        {
            entries[BUFFER_PAD + i] = eventFactory.newInstance();
        }
    }    
}   

先简单说明一下构造方法中两个参数:
1. Sequencer:生产者用于访问缓存的控制器,它持有消费者序号的引用;新事件发布后通过 WaitStrategy 通知正在等待的SequenceBarrier。
2. EventFactory:RingBuffer 中存储的元素的初始化工厂类。

从构造方法中我们看到“bufferSize must be a power of 2”,这么要求的目的是方便使用位操作来获取读写元素在内存中的位置,其效率比取余 % 操作高得多。RingBuffer 这一点和 Linux 内核中的 kfifo 是一致的。

申请的数组 entries 实际大小为 bufferSize + 2 * BUFFER_PAD,BUFFER_PAD 个数组元素占用 128 字节,也就是说在数组前后各加了 128 字节的填充,这主要是为了防止伪共享。

写操作

官方文档中写数据的示例如下:

// 自定义的 RingBuffer 中的数据
public class LongEvent {
    private long value;

    public void set(long value) {
        this.value = value;
    }
}

public class LongEventProducer {
    private final RingBuffer ringBuffer;

    public LongEventProducer(RingBuffer ringBuffer) {
        this.ringBuffer = ringBuffer;
    }

    public void onData(ByteBuffer bb) {
        long sequence = ringBuffer.next();  // 申请下一个写节点序号
        try {
            LongEvent event = ringBuffer.get(sequence); // 根据序号获取待写入的元素
            event.set(bb.getLong(0));  // 写入数据
        } finally {
            ringBuffer.publish(sequence);   // 提交
        }
    }
}

从代码中可以看出来,RingBuffer 的写操作分为三个步骤:
1. 申请下一个节点。
2. 写入数据。
3. 提交。

申请下一个可写入的节点序号调用的是 RingBuffer 的 next 方法,该方法将事情委托给了 Sequencer 的同名方法。Sequencer 有两个实现:单生产者版本 SingleProducerSequencer、多生产者版本 MultiProducerSequencer。两者的区别是多生产者需要竞争获取下一个写节点,而单生产者版本无此竞争。我们先看看多生产者版本的代码:

public final class RingBuffer<E> extends RingBufferFields<E> implements Cursored, EventSequencer<E>, EventSink<E>
{
    @Override
    public long next()
    {
        return sequencer.next();
    }   
}   

public final class MultiProducerSequencer extends AbstractSequencer
{
    @Override
    public long next()
    {
        return next(1);
    }

    // 允许一次获取多个写节点
    @Override
    public long next(int n)
    {
        if (n < 1)
        {
            throw new IllegalArgumentException("n must be > 0");
        }

        long current;
        long next;

        do
        {
            // cursor 代表当前写指针位置
            current = cursor.get();
            next = current + n;

            long wrapPoint = next - bufferSize;
            // cachedGatingSequence 是最慢的消费者(读指针)所处的位置
            long cachedGatingSequence = gatingSequenceCache.get();

            // 如果空间满则等待
            if (wrapPoint > cachedGatingSequence || cachedGatingSequence > current)
            {
                long gatingSequence = Util.getMinimumSequence(gatingSequences, current);

                if (wrapPoint > gatingSequence)
                {
                    waitStrategy.signalAllWhenBlocking();
                    LockSupport.parkNanos(1); // TODO, should we spin based on the wait strategy?
                    continue;
                }

                gatingSequenceCache.set(gatingSequence);
            }
            // 否则使用 CAS 操作更新 cursor
            else if (cursor.compareAndSet(current, next))
            {
                break;
            }
        }
        while (true);

        return next;
    }   
}   

MultiProducerSequencer 使用 CAS 操作来更新写指针位置,这块是和 SingleProducerSequencer 的主要区别,单生产者模式由于没有写竞争,所以是直接设置的。之所以要特意区分单生产者和多生产者是因为,CAS 操作毕竟还是要损耗一些性能的,在没有竞争的情况下,直接赋值效率更高。

读操作

如果需要消费数据,则需要实现 EventHandler 接口,并将其放入 disruptor 中。

public class LongEventHandler implements EventHandler<LongEvent> {
    public void onEvent(LongEvent event, long sequence, boolean endOfBatch) {
        System.out.println(Thread.currentThread().getName() + " Event: " + event);
    }
}

Disruptor disruptor = new Disruptor<>(factory, bufferSize, Executors.newFixedThreadPool(3));
// Connect the handler
disruptor.handleEventsWith(new LongEventHandler());

Disruptor 关联 EventHandler 的代码如下:

public class Disruptor<T>
{
    public EventHandlerGroup handleEventsWith(final EventHandlersuper T>... handlers)
    {
        return createEventProcessors(new Sequence[0], handlers);
    }
    EventHandlerGroup createEventProcessors( final Sequence[] barrierSequences,
        final EventHandlersuper T>[] eventHandlers)
    {
        checkNotStarted();

        // Sequence 保存消费者最近读过的数据位置,读过则表示此位置可被生产者写入
        final Sequence[] processorSequences = new Sequence[eventHandlers.length];
        // 消费者从 SequenceBarrier 获取下一个可消费数据,多组消费者使用同一个 SequenceBarrier
        final SequenceBarrier barrier = ringBuffer.newBarrier(barrierSequences);

        // 这里多个 eventHandler 表示多组消费者,同一份数据会交给所有 eventHandler 处理
        for (int i = 0, eventHandlersLength = eventHandlers.length; i < eventHandlersLength; i++)
        {
            final EventHandlersuper T> eventHandler = eventHandlers[i];

            final BatchEventProcessor batchEventProcessor =
                new BatchEventProcessor(ringBuffer, barrier, eventHandler);

            if (exceptionHandler != null)
            {
                batchEventProcessor.setExceptionHandler(exceptionHandler);
            }

            consumerRepository.add(batchEventProcessor, eventHandler, barrier);
            processorSequences[i] = batchEventProcessor.getSequence();
        }

        updateGatingSequencesForNextInChain(barrierSequences, processorSequences);

        return new EventHandlerGroup(this, consumerRepository, processorSequences);
    }       
}

真正负责轮询处理数据的是 BatchEventProcessor 类,大致步骤如下:
1. 获取可读数据序号。
2. 挨个处理数据。
3. 更新已读数据位置。

public final class BatchEventProcessor<T> implements EventProcessor {
    @Override
    public void run() {
        if (!running.compareAndSet(false, true)) {
            throw new IllegalStateException("Thread is already running");
        }
        sequenceBarrier.clearAlert();

        notifyStart();

        T event = null;
        long nextSequence = sequence.get() + 1L;
        try {
            while (true) {
                try {
                    // 获取下一批可读的数据
                    final long availableSequence = sequenceBarrier.waitFor(nextSequence);
                    if (batchStartAware != null) {
                        batchStartAware.onBatchStart(availableSequence - nextSequence + 1);
                    }

                    // 挨个处理
                    while (nextSequence <= availableSequence) {
                        // 根据序号,获取数据
                        event = dataProvider.get(nextSequence);
                        // 调用 eventHandler 处理数据
                        eventHandler.onEvent(event, nextSequence, nextSequence == availableSequence);
                        nextSequence++;
                    }

                    // 更新消费完成的数据位置
                    sequence.set(availableSequence);
                } catch (final TimeoutException e) {
                    notifyTimeout(sequence.get());
                } catch (final AlertException ex) {
                    if (!running.get()) {
                        break;
                    }
                } catch (final Throwable ex) {
                    exceptionHandler.handleEventException(ex, nextSequence, event);
                    sequence.set(nextSequence);
                    nextSequence++;
                }
            }
        } finally {
            notifyShutdown();
            running.set(false);
        }
    }
}

这里的关键是获取可读数据序号,我们深入看一下 ProcessingSequenceBarrier 的 waitFor 方法:

final class ProcessingSequenceBarrier implements SequenceBarrier
{
    @Override
    public long waitFor(final long sequence)
        throws AlertException, InterruptedException, TimeoutException
    {
        checkAlert();
        // waitStrategy 默认采用的 BlockingWaitStrategy
        long availableSequence = waitStrategy.waitFor(sequence, cursorSequence, dependentSequence, this);

        if (availableSequence < sequence)
        {
            return availableSequence;
        }

        return sequencer.getHighestPublishedSequence(sequence, availableSequence);
    }   
}   

该方法主要调用了 waitStrategy 的 waitFor 方法,以默认的 waitStrategy 为例看看代码:

public final class BlockingWaitStrategy implements WaitStrategy
{
    private final Lock lock = new ReentrantLock();
    private final Condition processorNotifyCondition = lock.newCondition();

    @Override
    public long waitFor(long sequence, Sequence cursorSequence, Sequence dependentSequence, SequenceBarrier barrier)
        throws AlertException, InterruptedException
    {
        long availableSequence;
        // cursorSequence 相当于写指针,sequence 相当于读指针,前者小于后者,表示 RingBuffer 空,消费者需要等待
        if (cursorSequence.get() < sequence)
        {
            lock.lock();
            try
            {
                while (cursorSequence.get() < sequence)
                {
                    barrier.checkAlert();
                    processorNotifyCondition.await();
                }
            }
            finally
            {
                lock.unlock();
            }
        }

        // 当消费者之间没有依赖关系的时候,dependentSequence 就是 cursorSequence
        // 存在依赖关系的时候,dependentSequence 里存放的是一组依赖的 Sequence,get 方法得到的是消费最慢的依赖的位置
        while ((availableSequence = dependentSequence.get()) < sequence)
        {
            barrier.checkAlert();
        }

        return availableSequence;
    }
}    

BatchEventProcessor 适用的是一组消费者里只有一个消费者的情况,那么当同一组消费者中有多个消费者时怎么办呢?使用的是 WorkerPool,一个 WorkerPool 包含多个 WorkProcessor 消费者,WorkProcessor 负责轮询消费数据。对应的 Disruptor 创建消费者组方法如下:

public class Disruptor<T>
{
    public EventHandlerGroup handleEventsWithWorkerPool(final WorkHandler... workHandlers)
    {
        return createWorkerPool(new Sequence[0], workHandlers);
    }
    EventHandlerGroup createWorkerPool(
        final Sequence[] barrierSequences, final WorkHandlersuper T>[] workHandlers)
    {
        final SequenceBarrier sequenceBarrier = ringBuffer.newBarrier(barrierSequences);
        final WorkerPool workerPool = new WorkerPool(ringBuffer, sequenceBarrier, exceptionHandler, workHandlers);


        consumerRepository.add(workerPool, sequenceBarrier);

        final Sequence[] workerSequences = workerPool.getWorkerSequences();

        updateGatingSequencesForNextInChain(barrierSequences, workerSequences);

        return new EventHandlerGroup(this, consumerRepository, workerSequences);
    }       
}

public final class WorkerPool<T>
{
    private final AtomicBoolean started = new AtomicBoolean(false);
    private final Sequence workSequence = new Sequence(Sequencer.INITIAL_CURSOR_VALUE);
    private final RingBuffer ringBuffer;
    // WorkProcessors are created to wrap each of the provided WorkHandlers
    private final WorkProcessor[] workProcessors;
    public WorkerPool(
        final RingBuffer ringBuffer,
        final SequenceBarrier sequenceBarrier,
        final ExceptionHandlersuper T> exceptionHandler,
        final WorkHandlersuper T>... workHandlers)
    {
        this.ringBuffer = ringBuffer;
        final int numWorkers = workHandlers.length;
        workProcessors = new WorkProcessor[numWorkers];

        for (int i = 0; i < numWorkers; i++)
        {
            workProcessors[i] = new WorkProcessor(
                ringBuffer,
                sequenceBarrier,
                workHandlers[i],
                exceptionHandler,
                workSequence);
        }
    }
}    

我们再看一下负责轮询处理数据的 WorkProcessor 类:

public final class WorkProcessor<T>
    implements EventProcessor
{
    private final AtomicBoolean running = new AtomicBoolean(false);
    private final Sequence sequence = new Sequence(Sequencer.INITIAL_CURSOR_VALUE);
    private final RingBuffer ringBuffer;
    private final SequenceBarrier sequenceBarrier;
    private final WorkHandlersuper T> workHandler;
    private final ExceptionHandlersuper T> exceptionHandler;
    private final Sequence workSequence;
    private final TimeoutHandler timeoutHandler;

    public WorkProcessor(
        final RingBuffer ringBuffer,
        final SequenceBarrier sequenceBarrier,
        final WorkHandlersuper T> workHandler,
        final ExceptionHandlersuper T> exceptionHandler,
        final Sequence workSequence)
    {
        this.ringBuffer = ringBuffer;
        this.sequenceBarrier = sequenceBarrier;
        this.workHandler = workHandler;
        this.exceptionHandler = exceptionHandler;
        this.workSequence = workSequence;

        if (this.workHandler instanceof EventReleaseAware)
        {
            ((EventReleaseAware) this.workHandler).setEventReleaser(eventReleaser);
        }

        timeoutHandler = (workHandler instanceof TimeoutHandler) ? (TimeoutHandler) workHandler : null;
    }   
    @Override
    public void run()
    {
        // 一个 Processor 只能由一个线程运行
        if (!running.compareAndSet(false, true))
        {
            throw new IllegalStateException("Thread is already running");
        }
        sequenceBarrier.clearAlert();

        notifyStart();

        boolean processedSequence = true;
        long cachedAvailableSequence = Long.MIN_VALUE;
        long nextSequence = sequence.get();
        T event = null;
        while (true)
        {
            try
            {
                if (processedSequence)
                {
                    processedSequence = false;
                    do
                    {
                        nextSequence = workSequence.get() + 1L;
                        sequence.set(nextSequence - 1L);
                    }
                    // 一组消费者共享同一个 workSequence,使用 CAS 竞争获取可读数据序号
                    while (!workSequence.compareAndSet(nextSequence - 1L, nextSequence));
                }

                // 可读数据序号 cachedAvailableSequence 大于等于 nextSequence 时,处理一个数据
                if (cachedAvailableSequence >= nextSequence)
                {
                    event = ringBuffer.get(nextSequence);
                    workHandler.onEvent(event);
                    processedSequence = true;
                }
                else
                {
                    // 获取可读数据
                    cachedAvailableSequence = sequenceBarrier.waitFor(nextSequence);
                }
            }
            catch (final TimeoutException e)
            {
                notifyTimeout(sequence.get());
            }
            catch (final AlertException ex)
            {
                if (!running.get())
                {
                    break;
                }
            }
            catch (final Throwable ex)
            {
                // handle, mark as processed, unless the exception handler threw an exception
                exceptionHandler.handleEventException(ex, nextSequence, event);
                processedSequence = true;
            }
        }

        notifyShutdown();
        running.set(false);
    }   
}   

和单消费者的 BatchEventProcessor 不同的是:
1. 除了要向 sequenceBarrier 申请可读数据序号之外,同组消费者之间保证互斥访问(通过 workSequence 保证)。
2. BatchEventProcessor 中申请一次可以处理一批数据,而这里一次只能处理一个数据。

总结

在生产者端担任写指针角色的是 Sequencer 对象,在消费者端担任读指针角色的是 Sequence 对象,SequenceBarrier 用来在消费者之间以及消费者和RingBuffer之间建立依赖关系:根据生产者的写指针、所依赖的其他消费者的读指针来计算下一个可消费数据的位置。

在多生产者中负责确保线程安全的是 MultiProducerSequencer,多消费者中确保线程安全的是 WorkProcessor,对读写节点的竞争都采用 CAS 操作,效率比重量级锁高。

不同的 WaitStrategy 决定了当 RingBuffer 空或者满时,消费者和生产者的等待策略。

生产者和消费者端都特别针对无竞争、有竞争做了区分:SingleProducerSequencer 和 MultiProducerSequencer、BatchEventProcessor 和 WorkProcessor。这主要是为了优化无竞争的情况,有竞争的时候使用 CAS ,无竞争的时候连 CAS 都不需要,性能更高

参考资料

  1. Disruptor 入门官方文档
  2. Disruptor 入门官方文档中文版
  3. 并发框架 Disruptor 译文
  4. Wiki: 环形缓冲区
  5. Disruptor 使用指南
  6. Disruptor 3.0 的实现细节:含有很多类图

你可能感兴趣的:(Java,系统架构,Java,进阶)