死磕Netty源码之内存分配详解(一)(PooledByteBufAllocator)

前言

为了避免频繁的内存分配给系统带来负担以及GC对系统性能带来波动,Netty4使用了内存池来管理内存的分配和回收,Netty内存池参考了Slab分配和Buddy分配思想。Slab分配是将内存分割成大小不等的内存块,在用户线程请求时根据请求的内存大小分配最为贴近Size的内存快,减少内存碎片同时避免了内存浪费。Buddy分配是把一块内存块等量分割回收时候进行合并,尽可能保证系统中有足够大的连续内存

关于我:http://huangth.com

GitHub地址:https://github.com/RobertoHuang

免责声明:本系列博客并非原创,主要借鉴和抄袭闪电侠,占小狼等知名博主博客。如有侵权请及时联系

内存数据结构

内存分级从上到下主要分为: Arena、ChunkList、Chunk、Page、SubPage,他们关系如下图

PooledArena是一块连续的内存块,为了优化并发性能在Netty内存池中存在一个由多个Arena组成的数组,在多个线程进行内存分配时会按照轮询策略选择一个Arena进行内存分配。一个PoolArena内存块是由两个SubPagePools(用来存储零碎内存)和多个ChunkList组成,两个SubpagePools数组分别为tinySubpagePools和smallSubpagePools。每个ChunkList里包含多个Chunk按照双向链表排列,每个Chunk里包含多个Page(默认2048个),每个Page(默认大小为8k字节)由多个Subpage组成。Subpage由M个”块”构成,块的大小由第一次申请内存大小决定。当分配一次内存之后此page会被加入到PoolArena的tinySubpagePools或smallSubpagePools中,下次分配时就如果”块”大小相同则由其直接分配

当利用Arena来进行分配内存时,根据申请内存的大小有不同的策略。例如:如果申请内存的大小小于512时,则首先在cache尝试分配,如果分配不成功则会在tinySubpagePools尝试分配,如果分配不成功,则会在PoolChunk重新找一个PoolSubpage来进行内存分配,分配之后将此PoolSubpage保存到tinySubpagePools中

内存分配源码分析

Netty内存分配包括了堆内存和非堆内存(Direct内存),但是内存分配的核心算法是类似,所以从堆内存分配代码入手

public static void main(String[] args) {
    PooledByteBufAllocator pooledByteBufAllocator = new PooledByteBufAllocator(false);
    ByteBuf byteBuf = pooledByteBufAllocator.heapBuffer(252);
    System.out.println(byteBuf);
}

1.堆内存(HeapByteBuf)字节缓冲区:特点是内存的分配和回收速度快可以被JVM自动回收,缺点就是如果进行Socket的IO读写,需要额外做一次内存复制(将堆内存对应的缓冲区复制到内核Channel中),性能会有一定程度的下降

2.直接内存(DirectByteBuf)字节缓冲区:非堆内存它在堆外进行内存分配,相比于堆内存它的分配和回收速度会慢一些,但是将它写入或者从Socket Channel中读取时,由于少一次内存复制速度比堆内存快(需要手动维护GC)

Netty的最佳实践是在I/O通信线程的读写缓冲区使用DirectByteBuf,后端业务消息的编解码模块使用HeapByteBuf

PooledByteBufAllocator

静态代码

以下是PooledByteBufAllocator比较重要的一些静态属性,它们在静态代码块中被初始化

// 默认堆内存类型PoolArena个数
private static final int DEFAULT_NUM_HEAP_ARENA;
// 默认直接内存类型PoolArena个数
private static final int DEFAULT_NUM_DIRECT_ARENA;

// 默认页大小 => 8K
private static final int DEFAULT_PAGE_SIZE;
// 每个chunk中的page是用平衡二叉树映射管理每个PoolSubpage是否被分配
// maxOrder为树的深度,深度为maxOrder层的节点数量为1 << maxOrder 默认maxOrder => 11
private static final int DEFAULT_MAX_ORDER;

// tiny cache的大小 => 默认512
private static final int DEFAULT_TINY_CACHE_SIZE;
// small cache的大小 => 默认256
private static final int DEFAULT_SMALL_CACHE_SIZE;
// normal cache的大小 => 默认64
private static final int DEFAULT_NORMAL_CACHE_SIZE;

// 最小PAGE_SIZE => 默认4K
private static final int MIN_PAGE_SIZE = 4096;
// 最大Chunk的大小 => 默认等于2的30次方 即1G
private static final int MAX_CHUNK_SIZE = (int) (((long) Integer.MAX_VALUE + 1) / 2);

以上这些常量除了最后两个都是在如下的static块中进行初始化

static {
    // 1.对DEFAULT_PAGE_SIZE进行初始化默认是8K
    // 用户可以通过设置io.netty.allocator.pageSize来设置
    int defaultPageSize = SystemPropertyUtil.getInt("io.netty.allocator.pageSize", 8192);
    Throwable pageSizeFallbackCause = null;
    try {
        // 2.检查pageSize是否大于MIN_PAGE_SIZE(4K)且是2的幂次方
        validateAndCalculatePageShifts(defaultPageSize);
    } catch (Throwable t) {
        pageSizeFallbackCause = t;
        defaultPageSize = 8192;
    }
    DEFAULT_PAGE_SIZE = defaultPageSize;
    // 3.对树的深度DEFAULT_MAX_ORDER进行初始化 默认是11
    // 用户可以通过io.netty.allocator.maxOrder来进行设置
    int defaultMaxOrder = SystemPropertyUtil.getInt("io.netty.allocator.maxOrder", 11);
    Throwable maxOrderFallbackCause = null;
    try {
        // 4.校验maxOrder 期望值(0-14之间)
        validateAndCalculateChunkSize(DEFAULT_PAGE_SIZE, defaultMaxOrder);
    } catch (Throwable t) {
        maxOrderFallbackCause = t;
        defaultMaxOrder = 11;
    }
    // 5.对树的深度DEFAULT_MAX_ORDER进行初始化 默认值为11
    // 用户可以通过io.netty.allocator.maxOrder来进行设置
    DEFAULT_MAX_ORDER = defaultMaxOrder;


    final Runtime runtime = Runtime.getRuntime();
    final int defaultMinNumArena = NettyRuntime.availableProcessors() * 2;
    // 6.初始化默认chunk的大小,为PageSize * (2的maxOrder幂)
    final int defaultChunkSize = DEFAULT_PAGE_SIZE << DEFAULT_MAX_ORDER;
    // 7.计算PoolAreana的个数 PoolArena默认为:cpu核心线程数与最大堆内存/2/(3*chunkSize)这两个数中的较小者
    // 这里的除以2是为了确保系统分配的所有PoolArena占用的内存不超过系统可用内存的一半,这里的除以3是为了保证每个PoolArena至少可以由3个PoolChunk组成
    // 用户可以通过io.netty.allocator.numHeapArenas/numDirectArenas来进行修改
    DEFAULT_NUM_HEAP_ARENA = Math.max(0,
            SystemPropertyUtil.getInt(
                    "io.netty.allocator.numHeapArenas",
                    (int) Math.min(
                            defaultMinNumArena,
                            runtime.maxMemory() / defaultChunkSize / 2 / 3)));
    DEFAULT_NUM_DIRECT_ARENA = Math.max(0,
            SystemPropertyUtil.getInt(
                    "io.netty.allocator.numDirectArenas",
                    (int) Math.min(
                            defaultMinNumArena,
                            PlatformDependent.maxDirectMemory() / defaultChunkSize / 2 / 3)));

    // 8.对cache sizes进行了设置
    DEFAULT_TINY_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.tinyCacheSize", 512);
    DEFAULT_SMALL_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.smallCacheSize", 256);
    DEFAULT_NORMAL_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.normalCacheSize", 64);

    // 9.对其他常量进行设置
    DEFAULT_MAX_CACHED_BUFFER_CAPACITY = SystemPropertyUtil.getInt("io.netty.allocator.maxCachedBufferCapacity", 32 * 1024);
    DEFAULT_CACHE_TRIM_INTERVAL = SystemPropertyUtil.getInt("io.netty.allocator.cacheTrimInterval", 8192);
    DEFAULT_USE_CACHE_FOR_ALL_THREADS = SystemPropertyUtil.getBoolean("io.netty.allocator.useCacheForAllThreads", true);
    DEFAULT_DIRECT_MEMORY_CACHE_ALIGNMENT = SystemPropertyUtil.getInt("io.netty.allocator.directMemoryCacheAlignment", 0);
}

构造方法

public PooledByteBufAllocator(boolean preferDirect) {
    this(preferDirect, DEFAULT_NUM_HEAP_ARENA, DEFAULT_NUM_DIRECT_ARENA, DEFAULT_PAGE_SIZE, DEFAULT_MAX_ORDER);
}

public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder) {
    this(preferDirect, nHeapArena, nDirectArena, pageSize, maxOrder, DEFAULT_TINY_CACHE_SIZE, DEFAULT_SMALL_CACHE_SIZE, DEFAULT_NORMAL_CACHE_SIZE);
}

public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder, int tinyCacheSize, int smallCacheSize, int normalCacheSize) {
    this(preferDirect, nHeapArena, nDirectArena, pageSize, maxOrder, tinyCacheSize, smallCacheSize, normalCacheSize, DEFAULT_USE_CACHE_FOR_ALL_THREADS, DEFAULT_DIRECT_MEMORY_CACHE_ALIGNMENT);
}

public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder, int tinyCacheSize, int smallCacheSize, int normalCacheSize, boolean useCacheForAllThreads, int directMemoryCacheAlignment) {
    super(preferDirect);
    // 1.实例化了threadCache 字段
    threadCache = new PoolThreadLocalCache(useCacheForAllThreads);
    // 2.使用了默认的值初始化了如下的字段
    this.tinyCacheSize = tinyCacheSize;
    this.smallCacheSize = smallCacheSize;
    this.normalCacheSize = normalCacheSize;
    // 3.chunkSize初始化 其值为pageSize*2^maxOrder
    chunkSize = validateAndCalculateChunkSize(pageSize, maxOrder);

    if (nHeapArena < 0) {
        throw new IllegalArgumentException("nHeapArena: " + nHeapArena + " (expected: >= 0)");
    }

    if (nDirectArena < 0) {
        throw new IllegalArgumentException("nDirectArea: " + nDirectArena + " (expected: >= 0)");
    }

    if (directMemoryCacheAlignment < 0) {
        throw new IllegalArgumentException("directMemoryCacheAlignment: " + directMemoryCacheAlignment + " (expected: >= 0)");
    }

    if (directMemoryCacheAlignment > 0 && !isDirectMemoryCacheAlignmentSupported()) {
        throw new IllegalArgumentException("directMemoryCacheAlignment is not supported");
    }

    if ((directMemoryCacheAlignment & -directMemoryCacheAlignment) != directMemoryCacheAlignment) {
        throw new IllegalArgumentException("directMemoryCacheAlignment: " + directMemoryCacheAlignment + " (expected: power of two)");
    }

    // 4.检查pageSize是否大于4K且为2的幂次方如果不是则抛异常
    // 如果是则返回Integer.SIZE - 1 - Integer.numberOfLeadingZeros(pageSize)的结果
    // 通俗的说pageShifts就是pageSize二进制表示时尾部0的个数 pageSize是8192时它的二进制表示是10000000000000,那么这个pageShifts就是13
    int pageShifts = validateAndCalculatePageShifts(pageSize);

    // 5.实例化heapArenas数组
    if (nHeapArena > 0) {
        heapArenas = newArenaArray(nHeapArena);
        List metrics = new ArrayList(heapArenas.length);
        for (int i = 0; i < heapArenas.length; i ++) {
            PoolArena.HeapArena arena = new PoolArena.HeapArena(this, pageSize, maxOrder, pageShifts, chunkSize, directMemoryCacheAlignment);
            heapArenas[i] = arena;
            metrics.add(arena);
        }
        heapArenaMetrics = Collections.unmodifiableList(metrics);
    } else {
        heapArenas = null;
        heapArenaMetrics = Collections.emptyList();
    }

    // 6.实例化directArenas数组
    if (nDirectArena > 0) {
        directArenas = newArenaArray(nDirectArena);
        List metrics = new ArrayList(directArenas.length);
        for (int i = 0; i < directArenas.length; i ++) {
            PoolArena.DirectArena arena = new PoolArena.DirectArena(this, pageSize, maxOrder, pageShifts, chunkSize, directMemoryCacheAlignment);
            directArenas[i] = arena;
            metrics.add(arena);
        }
        directArenaMetrics = Collections.unmodifiableList(metrics);
    } else {
        directArenas = null;
        directArenaMetrics = Collections.emptyList();
    }
    metric = new PooledByteBufAllocatorMetric(this);
}

总结:PooledByteBufAllocator类中主要完成了一些参数的初始化操作,最重要的为HeapArena数组和DirectArena数组。在利用PooledByteBufAllocator分配内存时其实就是利用Arena数组中的元素来完成。在下一篇博客中将对Arena进行详细介绍

你可能感兴趣的:(Netty源码深度解析,死磕Netty源码)