Netty 分享之 ByteBuf

逅弈转载请注明原创出处，谢谢！

什么是ByteBuf

我们知道，数据在网络上是以字节流的形式进行传输的。Java官方的NIO提供了一个ByteBuffer类作为字节的容器。但是ByteBuffer的使用比较复杂，尤其是需要通过flip()方法对读写进行切换。因此netty重新设计了一个字节容器，即ByteBuf。ByteBuf被设计为一个抽象类，其有以下特点：

定义了两个不同的索引，分别为读和写索引
读写模式不需要调用flip()进行切换
实现了ReferenceCounted接口，支持引用计数
支持池化
方法可以链式调用
容器可以按需增长

ByteBuf的结构

整个ByteBuf被设计成包含两个指针：读指针readerIndex，写指针writerIndex。ByteBuf中字节内容的读取有两种方式：随机读取和顺序读取，随机读取是通过下标，就像读取一个byte[]数组一样；顺序读取则是通过读指针的方式访问ByteBuf中的字节。为什么要设计两个指针呢，我能想到的两点原因是：

ByteBuffer只有一个索引，所以必须要调用flip()方法在读模式和写模式之间进行来回的切换，而设计为两个索引的话，读和写互不影响，可以同时进行。
所有的网络通信都涉及字节序列的移动，ByteBuf的两个指针正好保证了这一点，并且非常高效的保证了这一点。

下面这个简化的图说明了一个ByteBuf被readerIndex和writerIndex分成了三个部分：


 +-------------------+------------------+------------------+
 | discardable bytes | readable bytes | writable bytes |
 | | (CONTENT) | |
 +-------------------+------------------+------------------+
 | | | |
 0 <= readerIndex <= writerIndex <= capacity

上图中三个部分的解释如下：
discardable bytes
表示已经被读过的字节，可以被丢弃了，当前readerIndex已经到达了被丢弃字节的索引位置，0~readerIndex的就被视为discard的，调用discardReadBytes方法，可以释放这部分空间，它的作用类似ByteBuffer的compact方法，将字节往前移动，readerIndex设置为0，writerIndex设置为oldWriterIndex - oldReaderIndex
readable bytes
表示还没有被读过的字节，是当前ByteBuf对象中保存的内容。需要注意的是，如果可读字节数已经耗尽了，这时再次尝试从ByteBuf中读取内容的话，将会抛出IndexOutOfBoundsException的异常，所以在读数据之前最好先通过byteBuf.isReadable()判断下：

while(byteBuf.isReadable()){
    // 读内容
}

writable bytes
表示当前ByteBuf对象中剩余可写的字节空间，当writerIndex移动到capacity-1的位置时，就不可再写了，ByteBuf默认的最大容量是Integer.MAX_VALUE。如果没有更多的可写空间了，但是仍然还在往ByteBuf中写数据的话，会抛出IndexOutOfBoundsException的异常，所以在往ByteBuf中写数据之前最好先通过byteBuf.isWriteable()判断下：

while(byteBuf.isWriteable()){
    // 写内容
}

ByteBuf中有很多可以获取byte的方法，但是并不是所有的方法都会移动readerIndex，具体方式如下：
所有以get开头的方法只会读取相应的字节，但是不会移动读指针。
所有以read或skip*开头的方法都会读取或跳过指定的字节，并且会将读指针往后移动。如果read方法的参数传递的不是具体的字节数(即访问的是readByte()或readInt()这一类的方法)而是一个ByteBuf对象或者byte[]，则将会把当前ByteBuf中的字节传输到参数中指定的ByteBuf或者byte[]中去，并且把当前ByteBuf的读指针向后移动具体的位数。

同理设置或写入byte的方法也一样，所有以set开头的方法只会更新指定索引位置的字节，所有以write开头的方法会在当前writerIndex出写入具体的字节，并将writerIndex往后移动。

对于JDK中的ByteBuffer类，调用mark操作会将当前的位置指针备份到mark变量中，当调用reset操作之后，重新将指针的当前位置恢复为备份在mark中的值。
Netty的ByteBuf也有类似的接口。因为ByteBuf有读索引和写索引，因此，它总共有4个相关的方法：markReaderIndex、resetReaderIndex、markWriterIndex、resetWriterIndex。

ByteBuf的创建

现在我们知道了什么是ByteBuf了，也知道了它的结构，那么该如何创建ByteBuf呢？对于如何创建一个ByteBuf对象，官方的建议是通过ByteBufAllocator来创建。查看Unpooled的源码可以发现，其为我们提供了许多创建ByteBuf的方法，但最终都是这几种，只是参数不一样而已，并且都是静态的方法：

// 在堆上分配一个ByteBuf，并指定初始容量和最大容量
public static ByteBuf buffer(int initialCapacity, int maxCapacity) {
    return ALLOC.heapBuffer(initialCapacity, maxCapacity);
}
// 在堆外分配一个ByteBuf，并指定初始容量和最大容量
public static ByteBuf directBuffer(int initialCapacity, int maxCapacity) {
    return ALLOC.directBuffer(initialCapacity, maxCapacity);
}
// 使用包装的方式，将一个byte[]包装成一个ByteBuf后返回
public static ByteBuf wrappedBuffer(byte[] array) {
    if (array.length == 0) {
        return EMPTY_BUFFER;
    }
    return new UnpooledHeapByteBuf(ALLOC, array, array.length);
}
// 返回一个组合ByteBuf，并指定组合的个数
public static CompositeByteBuf compositeBuffer(int maxNumComponents){
    return new CompositeByteBuf(ALLOC, false, maxNumComponents);
}

除了以上的包装方法之外还有一些常用的包装方法，比如参数是一个ByteBuf的包装方法，比如参数是一个原生的ByteBuffer的包装方法，比如指定一个内存地址和大小的包装方法。
另外还有一些copy*开头的方法，实际是调用了buffer(int initialCapacity, int maxCapacity)或directBuffer(int initialCapacity, int maxCapacity)方法，然后将具体的内容write进生成的ByteBuf中返回。
以上所有的这些方法都实际通过一个叫ALLOC的静态变量进行了调用，来实现具体的ByteBuf的创建，而这个ALLOC实际是一个ByteBufAllocator:

private static final ByteBufAllocator ALLOC = UnpooledByteBufAllocator.DEFAULT;

ByteBufAllocator是一个专门负责ByteBuf分配的接口，对应的Unpooled实现类就是UnpooledByteBufAllocator。在UnpooledByteBufAllocator类中可以看到UnpooledByteBufAllocator.DEFAULT变量是一个final类型的静态变量

/**
 * Default instance which uses leak-detection for direct buffers.
 * 默认的UnpooledByteBufAllocator实例，并且会对堆外内存进行泄漏检测
 */
public static final UnpooledByteBufAllocator DEFAULT = new UnpooledByteBufAllocator(PlatformDependent.directBufferPreferred());

而UnpooledByteBufAllocator又继承自AbstractByteBufAllocator抽象类，我们以创建一个堆内ByteBuf为例看一下具体的实现过程，该方法在AbstractByteBufAllocator类中：

@Override
public ByteBuf heapBuffer(int initialCapacity, int maxCapacity) {
    if (initialCapacity == 0 && maxCapacity == 0) {
        return emptyBuf;
    }
    validate(initialCapacity, maxCapacity);
    return newHeapBuffer(initialCapacity, maxCapacity);
}
// 创建堆内ByteBuf的抽象方法
protected abstract ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity);

其中抽象方法会在具体的ByteBufAllocator实现类中实现，对于Unpooled则是在UnpooledByteBufAllocator中实现：

@Override
protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity) {
    // 如果当前平台支持unsafe
    return PlatformDependent.hasUnsafe() 
    // 使用Unsafe类型的HeapByteBuf创建
    ? new InstrumentedUnpooledUnsafeHeapByteBuf(this, initialCapacity, maxCapacity) 
    // 使用非Unsafe类型的HeapByteBuf创建
    : new InstrumentedUnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
}

以上两个类最终都会调用到UnpooledHeapByteBuf中，唯一的不同是两个类分别继承了不同的父类，在读写bytes的时候会有不同：

private static final class InstrumentedUnpooledUnsafeHeapByteBuf extends UnpooledUnsafeHeapByteBuf {
    // 省略
}
private static final class InstrumentedUnpooledHeapByteBuf extends UnpooledHeapByteBuf {
    // 省略
}

例如在UnpooledUnsafeHeapByteBuf和UnpooledHeapByteBuf中对_getByte()方法使用了不同的实现方式：

// UnpooledUnsafeHeapByteBuf中的实现方式
@Override
protected byte _getByte(int index) {
    return UnsafeByteBufUtil.getByte(array, index);
}
// UnsafeByteBufUtil.getByte最终会调用到PlatformDependent0.getByte
static byte getByte(byte[] data, int index) {
    return UNSAFE.getByte(data, BYTE_ARRAY_BASE_OFFSET + index);
}
// UnpooledHeapByteBuf中的实现方式
@Override
protected byte _getByte(int index) {
    return HeapByteBufUtil.getByte(array, index);
}
// 而HeapByteBufUtil.getByte直接在array上通过下标获取byte
static byte getByte(byte[] memory, int index) {
    return memory[index];
}

ByteBuf的种类

ByteBuf主要有三个子类

EmptyByteBuf
WrappedByteBuf
AbstractByteBuf

其中比较常用的是AbstractByteBuf，而AbstractByteBuf类又有一个子类AbstractReferenceCountedByteBuf，该类主要是实现了引用计数，然后AbstractReferenceCountedByteBuf又有很多的子类：

CompositeByteBuf
ReadOnlyByteBufferBuf
UnpooledHeapByteBuf
UnpooledDirectByteBuf
PooledByteBuf
UnpooledUnsafeDirectByteBuf
AbstractPooledDerivedByteBuf
FixedCompositeByteBuf

以上每一个类也都还有各自的子类，具体的类的继承关系图，可以通过IEDA的show diagram的功能来生成一个类关系图，这里不再详细描述。

其中有一个需要特别拧出来讲解的是CompositeByteBuf，即组合ByteBuf。netty通过该类实现了多个ByteBuf的组合且不需要进行对象的拷贝。其内部维持了一个ComponentList类型的变量components，ComponentList是一个继承ArrayList的内部类，如下的代码所示：

private static final class ComponentList extends ArrayList {
    ComponentList(int initialCapacity) {
        super(initialCapacity);
    }
    // Expose this methods so we not need to create a new subList just to remove a range of elements.
    @Override
    public void removeRange(int fromIndex, int toIndex) {
        super.removeRange(fromIndex, toIndex);
    }
}

从上可知ComponentList是一个保存Component的List，而Component也是一个内部类，它的内部保存了一个final类型的ByteBuf对象，如下面的代码所示：

private static final class Component {
    final ByteBuf buf;
    final int length;
    int offset;
    int endOffset;
    Component(ByteBuf buf) {
        this.buf = buf;
        length = buf.readableBytes();
    }
    void freeIfNecessary() {
        // We should not get a NPE here. If so, it must be a bug.
        buf.release(); 
    }
}

往CompositeByteBuf中添加ByteBuf时，实际上是将ByteBuf封装成一个Component，然后将他添加到components中去，如下列代码所示：

private int addComponent0(boolean increaseWriterIndex, int cIndex, ByteBuf buffer) {
    assert buffer != null;
    boolean wasAdded = false;
    try {
        checkComponentIndex(cIndex);
        int readableBytes = buffer.readableBytes();
        // No need to consolidate - just add a component to the list.
        //将ByteBuf封装成一个Component
        @SuppressWarnings("deprecation")
        Component c = new Component(buffer.order(ByteOrder.BIG_ENDIAN).slice());
        if (cIndex == components.size()) {
            wasAdded = components.add(c);
            if (cIndex == 0) {
                c.endOffset = readableBytes;
            } else {
                Component prev = components.get(cIndex - 1);
                c.offset = prev.endOffset;
                c.endOffset = c.offset + readableBytes;
            }
        } else {
            components.add(cIndex, c);
            wasAdded = true;
            if (readableBytes != 0) {
                updateComponentOffsets(cIndex);
            }
        }
        if (increaseWriterIndex) {
            writerIndex(writerIndex() + buffer.readableBytes());
        }
        return cIndex;
    } finally {
        if (!wasAdded) {
            buffer.release();
        }
    }
}

ByteBuf的池化

我们知道对于频繁使用的对象或者创建比较耗时的对象，那么为了优化系统的性能，通常会对这些对象进行池化。例如我们所知的线程池、数据库连接池、字符串常量池等等。在netty中ByteBuf也是被频繁使用的一种对象，数据的流入流出都需要使用到ByteBuf，幸运的是netty对ByteBuf也实现了池化。
Netty中支持ByteBuf的池化，而引用计数就是实现池化的关键技术点，不过并非只有池化的ByteBuf才有引用计数，非池化的也会有引用计数。
ByteBuf类实现了ReferenceCounted接口，该接口标记一个类是一个需要用引用计数来管理的类。
让我们看一下ReferenceCounted接口定义的一些方法：

public interface ReferenceCounted {
    // 返回当前对象的引用计数值，如果是0则表示当前对象已经被释放了
    int refCnt();
    // 引用计数加1
    ReferenceCounted retain();
    // 引用计数加increment
    ReferenceCounted retain(int increment);
    // 引用计数减1
    boolean release();
    // 引用计数减decrement，如果当前引用计数为0，则释放当前对象，如果释放成功则返回true
    boolean release(int decrement);
}

每一个使用引用计数的对象，都会维护一个自身的引用计数，当对象被创建时，引用计数为1，通过retain()增加引用计数，release()减少引用计数，如果引用计数为0，则释放当前对象。
在ByteBuf的各个子类中，他们会自己决定如何释放对象，如果是池化的 ByteBuf，那么就会返回到池子中，如果不是池化的，则销毁底层的字节数组引用或者释放对应的堆外内存。
引用计数的ByteBuf是通过AbstractReferenceCountedByteBuf的release() 方法实现，而release()方法实际调用了release0()方法，让我们看一下具体的方法实现：

private boolean release0(int decrement) {
    // AtomicIntegerFieldUpdater类的getAndAdd方法返回的是对象原来的值，然后再进行add操作
    int oldRef = refCntUpdater.getAndAdd(this, -decrement);
    // 如果oldRef==decrement，则说明该对象的引用计数正好被释放完，则可以进行对象的释放操作，也即调用deallocate()方法
    if (oldRef == decrement) {
        deallocate();
        return true;
    // 如果引用计数的原值小于要释放的值，或者decrement小于0，则会抛出引用计数出错的异常IllegalReferenceCountException
    } else if (oldRef < decrement || oldRef - decrement > oldRef) {
        // Ensure we don't over-release, and avoid underflow.
        // 此处会将引用计数的值再增加回来
        refCntUpdater.getAndAdd(this, decrement);
        throw new IllegalReferenceCountException(oldRef, decrement);
    }
    return false;
}
// 引用计数对象的释放方法是一个抽象方法，由各个子类具体实现
protected abstract void deallocate();

下面看看各个ByteBuf的实现类是怎么处理对象释放的

未池化的堆内ByteBuf(UnpooledHeapByteBuf):

@Override
protected void deallocate() {
    freeArray(array);
    // 将byte[]的引用释放
    array = null;
}

未池化的堆外ByteBuf(UnpooledDirectByteBuf):

@Override
protected void deallocate() {
    ByteBuffer buffer = this.buffer;
    if (buffer == null) {
        return;
    }
    this.buffer = null;
    // 释放堆外Buffer
    if (!doNotFree) {
        freeDirect(buffer);
    }
}

池化的堆内ByteBuf(PooledHeapByteBuf)/堆外ByteBuf(PooledDirectByteBuf)都是调用的PooledByteBuf:

@Override
protected final void deallocate() {
    if (handle >= 0) {
        final long handle = this.handle;
        this.handle = -1;
        memory = null;
        tmpNioBuf = null;
        chunk.arena.free(chunk, handle, maxLength, cache);
        chunk = null;
        // 将该ByteBuf循环使用，即放回到池中去
        recycle();
    }
}
private void recycle() {
    recyclerHandle.recycle(this);
}
static final class DefaultHandle implements Handle {
    private Stack stack;
    // 该变量就是用来保存回收的ByteBuf对象的
    private Object value;
    DefaultHandle(Stack stack) {
        this.stack = stack;
    }
    @Override
    public void recycle(Object object) {
        if (object != value) {
            throw new IllegalArgumentException("object does not belong to handle");
        }
        // 把handle的对象push到栈中去
        stack.push(this);
    }
}

可以看出释放池化的ByteBuf对象就是将该对象重新回收的一个过程。
但是在DefaultHandle的recycle中并未对value对象进行赋值，而是用object和value进行了一个比较，那ByteBuf对象是什么时候保存到stack中去的呢？
我们看一下这个DefaultHandle的value在什么地方赋值的就知道了：

可以看到在初始化ByteBuf的时候就已经是从Recycler中通过get()方式获取了，此时stack中是没有ByteBuf对象的，终止会回调到当前ByteBuf类的newObject方法来创建一个实际的ByteBuf对象，然后将对象保存到Stack中，以后再需要从池化的ByteBuf中重新获取一个ByteBuf时，只需要从stack中pop出一个handle对象，然后返回该对象的value变量即可，如下面代码所示：

class PooledHeapByteBuf extends PooledByteBuf {
    static PooledHeapByteBuf newInstance(int maxCapacity) {
        // 从Recycler中获取ByteBuf
        PooledHeapByteBuf buf = RECYCLER.get();
        buf.reuse(maxCapacity);
        return buf;
    }
}
public abstract class Recycler {
    public final T get() {
        if (maxCapacityPerThread == 0) {
            return newObject((Handle) NOOP_HANDLE);
        }
        // 从ThreadLocal中取得stack
        Stack stack = threadLocal.get();
        // 从stack中pop出handle对象
        DefaultHandle handle = stack.pop();
        // 如果handle对象为空，则重新创建一个
        if (handle == null) {
            handle = stack.newHandle();
            handle.value = newObject(handle);
        }
        // 返回handle对象的value变量，就是一个ByteBuf
        return (T) handle.value;
    }
}
static final class Stack {
    // 从stack中pop出一个handle对象，该对象的value变量就是一个ByteBuf
    DefaultHandle pop() {
        int size = this.size;
        if (size == 0) {
            if (!scavenge()) {
                return null;
            }
            size = this.size;
        }
        size --;
        DefaultHandle ret = elements[size];
        elements[size] = null;
        if (ret.lastRecycledId != ret.recycleId) {
            throw new IllegalStateException("recycled multiple times");
        }
        ret.recycleId = 0;
        ret.lastRecycledId = 0;
        this.size = size;
        return ret;
    }
}

而在释放ByteBuf对象并回收的过程其实只是对当前对象和保存在stack中的对象进行了一个比较。以下是将封装好的handle对象push到stack中去的代码：

static final class Stack {
    // 将handle对象push到stack中去
    void push(DefaultHandle item) {
        Thread currentThread = Thread.currentThread();
        if (threadRef.get() == currentThread) {
            // 如果当前线程属于当前stack，则直接push
            pushNow(item);
        } else {
            // 如果当前线程不属于当前stack，则稍后push
            pushLater(item, currentThread);
        }
    }
}

从上面我们已经了解了ByteBuf的分配是通过ByteBufAllocator接口进行分配的，而该接口有两种实现：PooledByteBufAllocator和UnpooledByteBufAllocator。Netty默认使用了PooledByteBufAllocator对分配的ByteBuf进行池化，但是我们可以通过ChannelConfig或者在ServerBootStrap引导程序中指定一个分配器来更改默认的设置。
以上对于ByteBuf的池化只是从引用计数的角度进行了分析，实际的PooledByteBufAllocator实现了一种称为jemalloc的被大量现代操作系统所采用的高效方法来分配内存。

ByteBuf的零拷贝

我们知道netty能够实现高性能的一个原因就是他内部实现了ByteBuf的一个零拷贝，关于ByteBuf的零拷贝我将在一个新的文章中与大家探讨和分享。

鸣谢：

非常感谢闪电侠大大对引用计数释放部分的解惑

更多原创好文，请关注「逅弈逐码」