Go语言——内存管理

参考：

图解 TCMalloc

Golang 内存管理

Go 内存管理

问题

内存碎片：避免内存碎片，提高内存利用率。
多线程：稳定性，效率问题。

内存分配

内存划分

arena即为所谓的堆区，应用中需要的内存从这里分配, 大小为512G，为了方便管理把arena区域划分成一个个的page，每个page为8KB,一共有512GB/8KB个页
spans区域存放span的指针，每个指针对应一个page，所以span区域的大小为(512GB/8KB) * 指针大小8byte = 512M
bitmap区域大小也是通过arena计算出来512GB / (指针大小(8 byte) * 8 / 2) = 16G，用于表示arena区域中哪些地址保存了对象, 并且对象中哪些地址包含了指针，主要用于GC。

分配细节

object size > 32K，则使用 mheap 直接分配。
object size < 16 byte，不包含指针使用 mcache 的小对象分配器 tiny 直接分配；包含指针分配策略与[16 B, 32 K]类似。
object size >= 16 byte && size <=32K byte 时，先使用 mcache 中对应的 size class 分配。
如果 mcache 对应的 size class 的 span 已经没有可用的块，则向 mcentral 请求。
如果 mcentral 也没有可用的块，则向 mheap 申请，并切分。
如果 mheap 也没有合适的 span，则向操作系统申请。

span

可以看出span是一个非常重要的数据结构，每个span包含若干个连续的page。

小对象分配会在span page中划分更小的粒度；大对象通过多页实现。

size class

go1.10\src\runtime\sizeclasses.go

// class  bytes/obj  bytes/span  objects  tail waste  max waste
//     1          8        8192     1024           0     87.50%
//     2         16        8192      512           0     43.75%
//     3         32        8192      256           0     46.88%
//     4         48        8192      170          32     31.52%
//     5         64        8192      128           0     23.44%
//     6         80        8192      102          32     19.07%
//     7         96        8192       85          32     15.95%
//     8        112        8192       73          16     13.56%
//     9        128        8192       64           0     11.72%
//    10        144        8192       56         128     11.82%

//    ...
//    65      28672       57344        2           0      4.91%
//    66      32768       32768        1           0     12.50%

上表中每列含义如下：

class： class ID，每个span结构中都有一个class ID, 表示该span可处理的对象类型
bytes/obj：该class代表对象的字节数
bytes/span：每个span占用堆的字节数，也即页数*页大小
objects: 每个span可分配的对象个数，也即（bytes/spans）/（bytes/obj）
tail bytes: 每个span产生的内存碎片，也即（bytes/spans）%（bytes/obj）

上表可见最大的对象是32K大小，超过32K大小的由特殊的class表示，该class ID为0，每个class只包含一个对象。所以上面只有列出了1-66。

有点像装箱算法，按照规格分配，减少内存碎片。

struct

span是内存管理的基本单位，每个span用来管子特定的size class对象，根据size class，span将若干个页分成块进行管理。

go1.10\src\runtime\mheap.go

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
   
    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span
    
    nelems uintptr // number of object in the span.
    
    allocBits  *gcBits
    gcmarkBits *gcBits
    
    allocCount  uint16     // number of allocated objects
    spanclass   spanClass  // size class and noscan (uint8)
    
    elemsize    uintptr    // computed from sizeclass or from npages
}

以size class 10为例，npages=1，nelems=56，spanclass=10，elemsize=144；startAddr指arena区位置；next和prev指spans区，span链表；allocBits是一个bitmap，标记分配块分配情况，这个设计我也用过，之前用redis bitmap实现了IPAM。

cache

从上面我们知道go通过span来分配内存，那在哪里用span？通过之前的学习Go语言——goroutine并发模型，我们知道每个P都有mcache，通过mcache管理每个G需要的内存。

go1.10\src\runtime\mcache.go

type mcache struct {
   tiny             uintptr
   tinyoffset       uintptr
    
   alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass
}

numSpanClasses = _NumSizeClasses << 1
_NumSizeClasses = 67

alloc是span数组，长度是67 << 1，说明每种size class有2组元素。第一组span对象中包含了指针，叫做scan，表示需要gc scan；第二组没有指针，叫做noscan。提高gc scan性能。

mcache初始没有span，G先从central动态申请span，并缓存在cache。

central

go1.10\src\runtime\mcentral.go

type mcentral struct {
   lock      mutex
   spanclass spanClass
   nonempty  mSpanList // list of spans with a free object, ie a nonempty free list
   empty     mSpanList // list of spans with no free objects (or cached in an mcache)

   // nmalloc is the cumulative count of objects allocated from
   // this mcentral, assuming all spans in mcaches are
   // fully-allocated. Written atomically, read under STW.
   nmalloc uint64
}

lock: 多个G并发从central申请span，所以需要lock，保证一致性
spanclass : 每个mcentral管理着一组有相同size class的span列表
nonempty: 指还有内存可用的span列表
empty: 指没有内存可用的span列表
nmalloc: 指累计分配的对象个数

线程从central获取span步骤如下：

加锁
从nonempty列表获取一个可用span，并将其从链表中删除
将取出的span放入empty链表
将span返回给线程
解锁
线程将该span缓存进cache

线程将span归还步骤如下：

加锁
将span从empty列表删除
将span加入nonempty列表
解锁

heap

central只管理特定的size class span，所以必然有一个更上层的数据结构，管理所有的sizeclass central，这就是heap。

go1.10\src\runtime\mheap.go

type mheap struct {
   lock      mutex
   
   spans []*mspan

   // Malloc stats.
   largealloc  uint64                  // bytes allocated for large objects
   nlargealloc uint64                  // number of large object allocations
   largefree   uint64                  // bytes freed for large objects (>maxsmallsize)
   nlargefree  uint64                  // number of frees for large objects (>maxsmallsize)
    
   // range of addresses we might see in the heap
   bitmap        uintptr // Points to one byte past the end of the bitmap
   bitmap_mapped uintptr

   arena_start uintptr
   arena_used  uintptr // Set with setArenaUsed.

   arena_alloc uintptr
   arena_end   uintptr

   arena_reserved bool

   central [numSpanClasses]struct {
      mcentral mcentral
      pad      [sys.CacheLineSize - unsafe.Sizeof(mcentral{})%sys.CacheLineSize]byte
   }
}

spans：映射span -> page
large：大对象，>32K
bitmap： gc
arena： arena区相关信息，pages，堆区
central：通过size class管理span，每种size class对应两个central

heap

code

前面都是背景知识，现在开始具体的分配过程

go1.10\src\runtime\malloc.go

注释

小对象

申请

// Allocating a small object proceeds up a hierarchy of caches:
//
// 1. Round the size up to one of the small size classes
//    and look in the corresponding mspan in this P's mcache.
//    Scan the mspan's free bitmap to find a free slot.
//    If there is a free slot, allocate it.
//    This can all be done without acquiring a lock.
//
// 2. If the mspan has no free slots, obtain a new mspan
//    from the mcentral's list of mspans of the required size
//    class that have free space.
//    Obtaining a whole span amortizes the cost of locking
//    the mcentral.
//
// 3. If the mcentral's mspan list is empty, obtain a run
//    of pages from the mheap to use for the mspan.
//
// 4. If the mheap is empty or has no page runs large enough,
//    allocate a new group of pages (at least 1MB) from the
//    operating system. Allocating a large run of pages
//    amortizes the cost of talking to the operating system.

将申请的内存大小，向上取整成对应的size class；并且从P的mcache中找对应的mspan。
如果mspan有空闲slot，就分配。这个过程不需要lock，因为只有一个G会向P申请。
如果mspan没有空闲slot，就从mcentral获取新的mspan。这个过程需要lock，因为会有多个G同时申请。
如果mcentral没有mspan，就从mheap申请。
如果mheap空间不足，就想OS申请一组page，最少1MB。

释放

// Sweeping an mspan and freeing objects on it proceeds up a similar
// hierarchy:
//
// 1. If the mspan is being swept in response to allocation, it
//    is returned to the mcache to satisfy the allocation.
//
// 2. Otherwise, if the mspan still has allocated objects in it,
//    it is placed on the mcentral free list for the mspan's size
//    class.
//
// 3. Otherwise, if all objects in the mspan are free, the mspan
//    is now "idle", so it is returned to the mheap and no longer
//    has a size class.
//    This may coalesce it with adjacent idle mspans.
//
// 4. If an mspan remains idle for long enough, return its pages
//    to the operating system.

如果mspan在响应分配时被扫描，就返回mcache以满足分配。（不是很理解）
如果mspan中有分配的对象，就将它放置到mcentral的free list中
如果mspan空闲，就返回mheap，并且不关联size class，即变成page
如果msapn空闲很久，就把page还给OS，缩容。

大对象

// Allocating and freeing a large object uses the mheap
// directly, bypassing the mcache and mcentral.

大对象都是直接操作mheap，跳过mcache和mcentral

实现

tiny( < 16 byte )

off := c.tinyoffset
// Align tiny pointer for required (conservative) alignment.
if size&7 == 0 {
   off = round(off, 8)
} else if size&3 == 0 {
   off = round(off, 4)
} else if size&1 == 0 {
   off = round(off, 2)
}
if off+size <= maxTinySize && c.tiny != 0 {
   // The object fits into existing tiny block.
   x = unsafe.Pointer(c.tiny + off)
   c.tinyoffset = off + size
   c.local_tinyallocs++
   mp.mallocing = 0
   releasem(mp)
   return x
}
// Allocate a new maxTinySize block.
span := c.alloc[tinySpanClass]
v := nextFreeFast(span)
if v == 0 {
   v, _, shouldhelpgc = c.nextFree(tinySpanClass)
}
x = unsafe.Pointer(v)
(*[2]uint64)(x)[0] = 0
(*[2]uint64)(x)[1] = 0
// See if we need to replace the existing tiny block with the new one
// based on amount of remaining free space.
if size < c.tinyoffset || c.tiny == 0 {
   c.tiny = uintptr(x)
   c.tinyoffset = size
}
size = maxTinySize

tinyoffset表示tiny当前分配到什么地址了，之后的分配根据 tinyoffset 寻址。

先根据要分配的对象大小进行地址对齐，比如size是8的倍数，tinyoffset 和 8 对齐。
然后就是进行分配。如果tiny剩余的空间不够用，则重新申请一个 16 byte 的内存块，并分配给 object。
如果新块剩余空间比老快大，就用新的内存块替换。

large（> 32 k）

var s *mspan
shouldhelpgc = true
systemstack(func() {
   s = largeAlloc(size, needzero, noscan)
})
s.freeindex = 1
s.allocCount = 1
x = unsafe.Pointer(s.base())
size = s.elemsize

func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
   if size+_PageSize < size {
      throw("out of memory")
   }
   npages := size >> _PageShift
   if size&_PageMask != 0 {
      npages++
   }

   // Deduct credit for this span allocation and sweep if
   // necessary. mHeap_Alloc will also sweep npages, so this only
   // pays the debt down to npage pages.
   deductSweepCredit(npages*_PageSize, npages)

   s := mheap_.alloc(npages, makeSpanClass(0, noscan), true, needzero)
   if s == nil {
      throw("out of memory")
   }
   s.limit = s.base() + size
   heapBitsForSpan(s.base()).initSpan(s)
   return s
}

跳过了mspan和mcentral，直接在mheap上面分配。

normal（[16 b, 32 k]）

var sizeclass uint8
if size <= smallSizeMax-8 {
   sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
} else {
   sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
}
size = uintptr(class_to_size[sizeclass])
spc := makeSpanClass(sizeclass, noscan)
span := c.alloc[spc]
v := nextFreeFast(span)
if v == 0 {
   v, span, shouldhelpgc = c.nextFree(spc)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
   memclrNoHeapPointers(unsafe.Pointer(v), size)
}

对于 size 介于 16 ~ 32K byte 的内存分配

先计算应该分配的 sizeclass，
然后去 mcache 里面 alloc[sizeclass] 申请，
如果 mcache.alloc[sizeclass] 不足以申请，则 mcache 向 mcentral 申请，然后再分配。
mcentral 给 mcache 分配完之后会判断自己需不需要扩充，如果需要则向 mheap 申请。

计算出 sizeclass，那么就可以去 mcache.alloc[sizeclass] 分配了，注意这是一个 mspan 指针，真正的分配函数是 nextFreeFast() 函数:

// nextFreeFast returns the next free object if one is quickly available.
// Otherwise it returns 0.
func nextFreeFast(s *mspan) gclinkptr {
   theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
   if theBit < 64 {
      result := s.freeindex + uintptr(theBit)
      if result < s.nelems {
         freeidx := result + 1
         if freeidx%64 == 0 && freeidx != s.nelems {
            return 0
         }
         s.allocCache >>= uint(theBit + 1)
         s.freeindex = freeidx
         s.allocCount++
         return gclinkptr(result*s.elemsize + s.base())
      }
   }
   return 0
}

allocCache 这里是用位图表示内存是否可用，1 表示可用。然后通过 span 里面的 freeindex 和 elemsize 来计算地址即可。
当mcache没有可用地址时，通过nextFree向mcentral甚至mheap申请:

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
   s = c.alloc[spc]
   shouldhelpgc = false
   freeIndex := s.nextFreeIndex()
    
   if freeIndex == s.nelems {
      systemstack(func() {
         c.refill(spc)
      })
      shouldhelpgc = true
      s = c.alloc[spc]

      freeIndex = s.nextFreeIndex()
   }

   if freeIndex >= s.nelems {
      throw("freeIndex is not valid")
   }

   v = gclinkptr(freeIndex*s.elemsize + s.base())
   s.allocCount++
   return
}

mcache 向 mcentral，如果 mcentral 不够，则向 mheap 申请。

// Gets a span that has a free object in it and assigns it
// to be the cached span for the given sizeclass. Returns this span.
func (c *mcache) refill(spc spanClass) {
   _g_ := getg()

   _g_.m.locks++
   // Return the current cached span to the central lists.
   s := c.alloc[spc]

   // Get a new cached span from the central lists.
   s = mheap_.central[spc].mcentral.cacheSpan()

   c.alloc[spc] = s
   _g_.m.locks--
}

Go语言——内存管理

Go语言——内存管理

问题

内存分配

分配细节

span

size class

struct

cache

central

heap

code

注释

小对象

大对象

实现

tiny( < 16 byte )

large（> 32 k）

normal（[16 b, 32 k]）

你可能感兴趣的:(Go语言——内存管理)