golang map源码分析

1. 前言

map是CS中非常基础的数据结构，关于golang map的基本使用，这里不再赘述，可以参考官方文档。
golang的map实现是基于hash查找表，并且基于链表来解决hash碰撞问题。

2. 环境信息

go版本：go1.15.4 darwin/amd64

3. go map数据结构分析

map的基础结构体是hmap，该结构体存在文件runtime/map.go中
hmap源码：

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed
    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
    extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and elem do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and elem do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

count：map中kv对的数量；
flags：map的一些标志位；
B：map中bucket数量为2^B个；意味着此时map数据结构中可以存储loadFactor * 2^B个数据，如果超过，则需要扩容；todo
noverflow：map中溢出bucket的近似数量；todo
hash0：hash函数的种子；
buckets：map中bucket的首指针，map中一共有2^B个bucket；如果count==0的情况下，该字段可能是nil；
oldbuckets：map中旧bucket的首指针，该字段只有在map扩容的时候，才不等于nil；todo
nevacuate：map中bucket迁移数量，至多有此数量的bucket从旧bucket迁移到新bucket；todo
extra：扩展字段；

bmap是bucket真正的结构体

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

tophash：存储hash值的高8位；
keys：key数组，隐藏字段；
values：value数组，隐藏字段；
overflow：溢出buceket指针，隐藏字段；

bmap.tophash中除了存储hash值的高8位，也可以用来存储一些状态码。

    // Possible tophash values. We reserve a few possibilities for special marks.
    // Each bucket (including its overflow buckets, if any) will have either all or none of its
    // entries in the evacuated* states (except during the evacuate() method, which only happens
    // during map writes and thus no one else can observe the map during that time).
    emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
    emptyOne       = 1 // this cell is empty
    evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
    evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
    evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
    minTopHash     = 5 // minimum tophash for a normal filled cell.

bmap结构图

bmap.png

hmap结构图

gomap.png

下面我们重点分析一下map的创建和增删改查操作，我们会展示源码，同时在源码上增加中文注释，作为对源码的分析；golang编译器会根据不同情况，调用不同的函数，我们下面分析的是runtime/map.go文件中的基本函数；一些其他优化函数，例如runtime/map_faststr.go中对map[string]type类型的优化，感兴趣的同学可以自行查看。

3.1. map创建

示例代码

func main() {
    m1 := make(map[string]string)
    m2 := make(map[string]string, 9)
}

我们可以通过汇编编译代码看到go map创建调用的底层函数是makemap，该函数存在文件runtime/map.go中；事实上，不同的map声明方式，go标准编译器选择不同的函数调用，例如m1 := make(map[string]string)代码，编译器会调用函数runtime.makemap_small，但是大部分场景下都是调用makemap。下面我们分析下函数makemap：

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    // 检查申请的map空间是否超过内存限制
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {
        hint = 0
    }
    // 初始化hmap
    // initialize Hmap
    if h == nil {
        h = new(hmap)
    }
    // hash初始种子
    h.hash0 = fastrand()
    // 计算B
    // Find the size parameter B which will hold the requested # of elements.
    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    if h.B != 0 {
        var nextOverflow *bmap
        // 调用函数makeBucketArray，分配bucket和溢出bucket的内存
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

// makeBucketArray initializes a backing array for map buckets.
// 1<= 4，则表示申请的map空间较大，我们事先申请一些溢出bucket，这样可以提高效率
    // For small b, overflow buckets are unlikely.
    // Avoid the overhead of the calculation.
    if b >= 4 {
        // Add on the estimated number of overflow buckets
        // required to insert the median number of elements
        // used with this value of b.
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {
            nbuckets = up / t.bucket.size
        }
    }

    if dirtyalloc == nil {
        buckets = newarray(t.bucket, int(nbuckets))
    } else {
        // dirtyalloc was previously generated by
        // the above newarray(t.bucket, int(nbuckets))
        // but may not be empty.
        buckets = dirtyalloc
        size := t.bucket.size * nbuckets
        if t.bucket.ptrdata != 0 {
            memclrHasPointers(buckets, size)
        } else {
            memclrNoHeapPointers(buckets, size)
        }
    }

    if base != nbuckets {
        // We preallocated some overflow buckets.
        // To keep the overhead of tracking these overflow buckets to a minimum,
        // we use the convention that if a preallocated overflow bucket's overflow
        // pointer is nil, then there are more available by bumping the pointer.
        // We need a safe non-nil pointer for the last overflow bucket; just use buckets.
        // nextOverflow是溢出bucket的首地址；
        // last是最后一个溢出bucket的首地址；
        // 每个溢出bucket对应的bmap结构体中的溢出bucket指针都是nil；但是last的溢出bucket指针是bucket的起始地址；
        nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
        last.setoverflow(t, (*bmap)(buckets))
    }
    return buckets, nextOverflow
}

总结：

函数makemap会根据不同的声明方式和参数，决定map的初始化空间大小；

map中kv都存储在bucket中，每个bucket可以存8对kv；

如果len(map) > 0，则map中至少存在一个bucket，所以map的空间都是8的整数倍；

如果map申请空间较大（大于等于128），表示出现key碰撞的几率较大，则会事先创建一些溢出bucket，以备后期使用；

3.2. map查找元素

示例代码

func main() { m1 := make(map[int8]int) m1[1] = 1 v, ok := m1[1] fmt.Println(v, ok) }

map查找元素操作，底层调用的函数mapaccess1和mapaccess2，该函数存在文件runtime/map.go中；这两个函数基本一致，只是函数mapaccess2会返回bool类型，表示key是否存在。事实上，对于不同的map key类型，编译器会调用不同的函数来实现map的增删改查操作，其中针对特殊key类型的优化函数，存在文件runtime/map_fast32.go，runtime/map_fast64.go和runtime/map_faststr.go中；例如，如果key的类型是string，map的查找操作会调用优化函数mapaccess1_faststr和mapaccess2_faststr。本文只分析基本的函数，对于优化函数，感兴趣的同学可以自行查看源码。下面我们分析函数mapaccess2：

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) { // 启用数据竞争检测 if raceenabled && h != nil { callerpc := getcallerpc() pc := funcPC(mapaccess2) racereadpc(unsafe.Pointer(h), callerpc, pc) raceReadObjectPC(t.key, key, callerpc, pc) } // 启用-msan检测 if msanenabled && h != nil { msanread(key, t.key.size) } if h == nil || h.count == 0 { if t.hashMightPanic() { t.hasher(key, 0) // see issue 23734 } return unsafe.Pointer(&zeroVal[0]), false } // map不支持并发安全，并发读写会产生panic if h.flags&hashWriting != 0 { throw("concurrent map read and map write") } // 计算hash值 hash := t.hasher(key, uintptr(h.hash0)) // m表示map中bucket数量 m := bucketMask(h.B) // 利用`hash mod m`可以计算bucket索引，b表示对应bucket的首地址 b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize))) // map正在迁移的场景，如果map正在迁移，则优先从oldbuckets中查找kv if c := h.oldbuckets; c != nil { // map是否在扩容迁移，如果是扩容迁移，则oldbuckets实际的bucket数量是m的一半（扩容会让bucket数量增加一倍） if !h.sameSizeGrow() { // There used to be half as many buckets; mask down one more power of two. m >>= 1 } // 根据hash值，查找oldbuckets中对应的bucket地址 oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize))) // 如果oldb的标志位不是撤离状态，则我们从oldb中查找kv if !evacuated(oldb) { b = oldb } } // top表示hash的高8位，如果hash高8位小于5，则top需要加上5；因为5表示`minTopHash`，top如果是小于等于5，都是表示特殊状态；正常的key的top值都是大于5的 top := tophash(hash) bucketloop: // 逐个查找对应bucket和其溢出bucket for ; b != nil; b = b.overflow(t) { // 一个bucket有8对kv，逐个查找 for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { // 如果b.tophash[i] == emptyRest，表示剩下的kv对都是空的，所以直接跳出循环 if b.tophash[i] == emptyRest { break bucketloop } continue } // 查找对应的key的地址 k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } // 比较key是否相等 if t.key.equal(key, k) { // 如果key相等，则找到对应的value e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } // 返回value return e, true } } } // 返回对应的0值 return unsafe.Pointer(&zeroVal[0]), false }

总结：

判断是否并发读写，如果是，则抛出panic；

计算hash值，根据hash的地位找到对应的bucket，根据高8位，找到对应的kv槽位；

map迁移场景下，优先从oldbuckets中查找kv；

比较key，相等则返回value，不等则返回0值；

map kv定位过程如下图：

mapkey.png

3.4. map新增元素和更新元素

示例代码

func main() { m1 := make(map[int8]int) m1[1] = 1 m1[2] = 2 m1[1] = 11 fmt.Println(m1) }

map的新增和更新元素操作，都会调用函数mapassign，该函数存在文件runtime/map.go中。

// Like mapaccess, but allocates a slot for the key if it is not present in the map. func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer { if h == nil { panic(plainError("assignment to entry in nil map")) } if raceenabled { callerpc := getcallerpc() pc := funcPC(mapassign) racewritepc(unsafe.Pointer(h), callerpc, pc) raceReadObjectPC(t.key, key, callerpc, pc) } if msanenabled { msanread(key, t.key.size) } // map不支持并发读写 if h.flags&hashWriting != 0 { throw("concurrent map writes") } // 计算hash值 hash := t.hasher(key, uintptr(h.hash0)) // map状态设置为hashWriting // Set hashWriting after calling t.hasher, since t.hasher may panic, // in which case we have not actually done a write. h.flags ^= hashWriting // 如果map没有初始化bucket，此时会申请bucket空间 if h.buckets == nil { h.buckets = newobject(t.bucket) // newarray(t.bucket, 1) } again: // 根据hash值，计算bucket索引 bucket := hash & bucketMask(h.B) // 判断map是否正在扩容 if h.growing() { // 函数growWork是将hmp.oldbuckets中对应的bucket迁移到新的buckets中 growWork(t, h, bucket) } // 目标bucket的首地址 b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize))) top := tophash(hash) var inserti *uint8 var insertk unsafe.Pointer var elem unsafe.Pointer bucketloop: for { // 遍历tophash查找key是否已经存在，或者是否有空位插入kv for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { // tophash中可能有多个空位，我们记录第一个空位的索引，后面的空位跳过 if isEmpty(b.tophash[i]) && inserti == nil { inserti = &b.tophash[i] insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) } // tophash值表示剩余都是空位，则直接结束循环，因为后面全是空位，不会有相同的key在后面的槽位，此次操作必然是插入，而不是更新 if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } if !t.key.equal(key, k) { continue } // already have a mapping for key. Update it. if t.needkeyupdate() { typedmemmove(t.key, k, key) } elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) goto done } // 如果bucket是满的，而且没有发现相同的key，则继续查找溢出bucket ovf := b.overflow(t) if ovf == nil { break } b = ovf } // Did not find mapping for key. Allocate new cell & add entry. // If we hit the max load factor or we have too many overflow buckets, // and we're not already in the middle of growing, start growing. // 程序运行到此处，必然是由于没有找到相同的key，此次操作是插入，不是更新； // 插入一对kv，我们需要判断map是否需要扩容； // overLoadFactor函数用来判断map是否由于数据太多，需要增量1倍扩容； // tooManyOverflowBuckets函数用来判断map是否需要等量迁移，map由于删除操作，溢出bucket很多，但是数据分布很稀疏，我们可以通过等量迁移，将数据更加紧凑的存储在一起，节约空间； // 具体可以看evacuate函数分析； if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) { // hashGrow函数主要是设置hmap.flags为扩容状态，申请新的内存空间用来扩容，同时设置hmap.oldbuckets为原来的hmap.buckets hashGrow(t, h) goto again // Growing the table invalidates everything, so try again } // inserti == nil表示没有插入的槽位，需要申请溢出bucket if inserti == nil { // all current buckets are full, allocate a new one. newb := h.newoverflow(t, b) inserti = &newb.tophash[0] insertk = add(unsafe.Pointer(newb), dataOffset) elem = add(insertk, bucketCnt*uintptr(t.keysize)) } // store new key/elem at insert position if t.indirectkey() { kmem := newobject(t.key) *(*unsafe.Pointer)(insertk) = kmem insertk = kmem } if t.indirectelem() { vmem := newobject(t.elem) *(*unsafe.Pointer)(elem) = vmem } typedmemmove(t.key, insertk, key) *inserti = top h.count++ done: // 设置flags，并写入value if h.flags&hashWriting == 0 { throw("concurrent map writes") } h.flags &^= hashWriting if t.indirectelem() { elem = *((*unsafe.Pointer)(elem)) } return elem } // 迁移oldbucket中的对应bucket func growWork(t *maptype, h *hmap, bucket uintptr) { // make sure we evacuate the oldbucket corresponding // to the bucket we're about to use evacuate(t, h, bucket&h.oldbucketmask()) // evacuate one more oldbucket to make progress on growing if h.growing() { evacuate(t, h, h.nevacuate) } } // bucket迁移函数 func evacuate(t *maptype, h *hmap, oldbucket uintptr) { // old bucket索引 b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) // 如果是等量迁移，则newbit表示bucket数量；如果是增量迁移，newbit表示增量前的bucket数量； newbit := h.noldbuckets() // 待迁移bucket是否是迁移状态 if !evacuated(b) { // TODO: reuse overflow buckets instead of using new ones, if there // is no iterator using the old buckets. (If !oldIterator.) // xy contains the x and y (low and high) evacuation destinations. // 同一个hash值，在新旧buckets中对应的bucket索引可能是不一样的； // 例如hash值是1001，旧buckets数量是8，新buckets数量是16，那么该hash值在旧buckets中索引是1，新buckets中索引是9； // x表示新旧索引不变的情况下，新bucket的索引；y表示新索引增加newbit情况下，新bucket的索引； var xy [2]evacDst x := &xy[0] x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize))) x.k = add(unsafe.Pointer(x.b), dataOffset) x.e = add(x.k, bucketCnt*uintptr(t.keysize)) if !h.sameSizeGrow() { // Only calculate y pointers if we're growing bigger. // Otherwise GC can see bad pointers. y := &xy[1] y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize))) y.k = add(unsafe.Pointer(y.b), dataOffset) y.e = add(y.k, bucketCnt*uintptr(t.keysize)) } for ; b != nil; b = b.overflow(t) { // 待迁移bucket中kv的首地址 k := add(unsafe.Pointer(b), dataOffset) e := add(k, bucketCnt*uintptr(t.keysize)) for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) { top := b.tophash[i] // 如果tophash为空，则跳过，这样就可以让数据紧凑，节约内存空间； if isEmpty(top) { b.tophash[i] = evacuatedEmpty continue } if top < minTopHash { throw("bad map state") } k2 := k if t.indirectkey() { k2 = *((*unsafe.Pointer)(k2)) } var useY uint8 if !h.sameSizeGrow() { // Compute hash to make our evacuation decision (whether we need // to send this key/elem to bucket x or bucket y). hash := t.hasher(k2, uintptr(h.hash0)) if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) { // If key != key (NaNs), then the hash could be (and probably // will be) entirely different from the old hash. Moreover, // it isn't reproducible. Reproducibility is required in the // presence of iterators, as our evacuation decision must // match whatever decision the iterator made. // Fortunately, we have the freedom to send these keys either // way. Also, tophash is meaningless for these kinds of keys. // We let the low bit of tophash drive the evacuation decision. // We recompute a new random tophash for the next level so // these keys will get evenly distributed across all buckets // after multiple grows. useY = top & 1 top = tophash(hash) } else { if hash&newbit != 0 { useY = 1 } } } // 检查迁移状态 if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY { throw("bad evacuatedN") } // 设置tophash值 b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY dst := &xy[useY] // evacuation destination // dst用来接收迁移的bucket（包括溢出bucket）中的kv； // 迁移过来的有效kv数量达到8之后，dst会申请溢出bucket； if dst.i == bucketCnt { dst.b = h.newoverflow(t, dst.b) dst.i = 0 dst.k = add(unsafe.Pointer(dst.b), dataOffset) dst.e = add(dst.k, bucketCnt*uintptr(t.keysize)) } dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check if t.indirectkey() { *(*unsafe.Pointer)(dst.k) = k2 // copy pointer } else { typedmemmove(t.key, dst.k, k) // copy elem } if t.indirectelem() { *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e) } else { typedmemmove(t.elem, dst.e, e) } dst.i++ // These updates might push these pointers past the end of the // key or elem arrays. That's ok, as we have the overflow pointer // at the end of the bucket to protect against pointing past the // end of the bucket. dst.k = add(dst.k, uintptr(t.keysize)) dst.e = add(dst.e, uintptr(t.elemsize)) } } // 迁移完成后，清理bucket kv和溢出bucket的指针；保留tophash; // Unlink the overflow buckets & clear key/elem to help GC. if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 { b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)) // Preserve b.tophash because the evacuation // state is maintained there. ptr := add(b, dataOffset) n := uintptr(t.bucketsize) - dataOffset memclrHasPointers(ptr, n) } } // hmap.nevacuate累加 if oldbucket == h.nevacuate { advanceEvacuationMark(h, t, newbit) } } func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) { h.nevacuate++ // Experiments suggest that 1024 is overkill by at least an order of magnitude. // Put it in there as a safeguard anyway, to ensure O(1) behavior. stop := h.nevacuate + 1024 if stop > newbit { stop = newbit } for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) { h.nevacuate++ } if h.nevacuate == newbit { // newbit == # of oldbuckets // Growing is all done. Free old main bucket array. h.oldbuckets = nil // Can discard old overflow buckets as well. // If they are still referenced by an iterator, // then the iterator holds a pointers to the slice. if h.extra != nil { h.extra.oldoverflow = nil } h.flags &^= sameSizeGrow } } func hashGrow(t *maptype, h *hmap) { // If we've hit the load factor, get bigger. // Otherwise, there are too many overflow buckets, // so keep the same number of buckets and "grow" laterally. bigger := uint8(1) if !overLoadFactor(h.count+1, h.B) { bigger = 0 h.flags |= sameSizeGrow } oldbuckets := h.buckets newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil) flags := h.flags &^ (iterator | oldIterator) if h.flags&iterator != 0 { flags |= oldIterator } // commit the grow (atomic wrt gc) h.B += bigger h.flags = flags h.oldbuckets = oldbuckets h.buckets = newbuckets h.nevacuate = 0 h.noverflow = 0 if h.extra != nil && h.extra.overflow != nil { // Promote current overflow buckets to the old generation. if h.extra.oldoverflow != nil { throw("oldoverflow is not nil") } h.extra.oldoverflow = h.extra.overflow h.extra.overflow = nil } if nextOverflow != nil { if h.extra == nil { h.extra = new(mapextra) } h.extra.nextOverflow = nextOverflow } // the actual copying of the hash table data is done incrementally // by growWork() and evacuate(). }

总结：

map优先检查是否有相同的key，如果有，则表示是更新操作；

如果没有相同的key，则表示是插入操作；如果有空位，则在第一个空位处插入；如果没有空位，则增加一个溢出bucket，在溢出bucket中插入；插入操作可能会触发扩容操作；

map不是一次性完成扩容的，而是逐步完成扩容的；当在一个bucket中执行插入操作的时候，如果发现需要扩容，则会把这个bucket（包含溢出bucket）全部迁移到新申请的buckets空间中，同时多扩容一个bucket（个人理解是加速扩容速度，否则因为个别bucket一直没有使用，导致map一直维护新旧两个buckets）；

map库容分为等量迁移和加倍扩容：等量迁移是为了让稀疏的数据分布更加紧凑（由于删除操作，map可能会很稀疏），加倍扩容是由于插入数据过多，申请一个加倍的空间来存储kv，同时加倍扩容也会删除空的槽位，让数据分布紧凑；

3.5. map删除元素

示例代码

func main() { m1 := make(map[int8]int) m1[1] = 1 delete(m1, 1) }

map删除元素操作调用的底层函数是mapdelete该函数存在文件runtime/map.go中.

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) { if raceenabled && h != nil { callerpc := getcallerpc() pc := funcPC(mapdelete) racewritepc(unsafe.Pointer(h), callerpc, pc) raceReadObjectPC(t.key, key, callerpc, pc) } if msanenabled && h != nil { msanread(key, t.key.size) } // h == nil || h.count == 0的时候，直接返回； // 不过如果map的key类型是无法比较的话，这里会报错runtime error: hash of unhashable type xxx // 所以会调用一次t.hasher函数，该函数会报合适的panic，可以参考issue 23734：https://github.com/golang/go/issues/23734 if h == nil || h.count == 0 { if t.hashMightPanic() { t.hasher(key, 0) // see issue 23734 } return } // map不支持并发读写 if h.flags&hashWriting != 0 { throw("concurrent map writes") } hash := t.hasher(key, uintptr(h.hash0)) // Set hashWriting after calling t.hasher, since t.hasher may panic, // in which case we have not actually done a write (delete). h.flags ^= hashWriting // bucket索引 bucket := hash & bucketMask(h.B) // 如果map正在扩容过程中，此时会优先扩容，一次扩容2个bucket； if h.growing() { growWork(t, h, bucket) } b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize))) bOrig := b top := tophash(hash) search: for ; b != nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { // 如果top=emptyRest，则表示后面的槽位都是空的，所以直接返回； if b.tophash[i] == emptyRest { break search } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) k2 := k if t.indirectkey() { k2 = *((*unsafe.Pointer)(k2)) } if !t.key.equal(key, k2) { continue } // 删除kv // Only clear key if there are pointers in it. if t.indirectkey() { *(*unsafe.Pointer)(k) = nil } else if t.key.ptrdata != 0 { memclrHasPointers(k, t.key.size) } e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { *(*unsafe.Pointer)(e) = nil } else if t.elem.ptrdata != 0 { memclrHasPointers(e, t.elem.size) } else { memclrNoHeapPointers(e, t.elem.size) } // 删除kv之后，首先将top值修改为emptyOne，如果后续的kv都没有，会将当前top值修改为emptyRest； // 同时，当前top值修改，可能会导致之前的top值也需要相应修改； b.tophash[i] = emptyOne // If the bucket now ends in a bunch of emptyOne states, // change those to emptyRest states. // It would be nice to make this a separate function, but // for loops are not currently inlineable. // 如果已经是当前bucket的最后一个元素，则会继续寻找溢出bucket； if i == bucketCnt-1 { if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest { goto notLast } } else { // 如果下一个top值不是emptyRest，则表示当前的top值不需要修改成emptyRest； if b.tophash[i+1] != emptyRest { goto notLast } } // 循环修改top值； // 由于当前top值修改为emptyRest，可能导致前一个top值或者前一个bucket的最后一个top值也要相应修改； for { b.tophash[i] = emptyRest if i == 0 { if b == bOrig { break // beginning of initial bucket, we're done. } // Find previous bucket, continue at its last entry. c := b for b = bOrig; b.overflow(t) != c; b = b.overflow(t) { } i = bucketCnt - 1 } else { i-- } if b.tophash[i] != emptyOne { break } } notLast: h.count-- break search } } if h.flags&hashWriting == 0 { throw("concurrent map writes") } h.flags &^= hashWriting }

总结：

删除操作也不可以并发；

删除时候，也会触发扩容迁移；个人理解，go map不会一次性完成扩容迁移，这样应该比较消耗时间和性能，go map通过用户行为不断触发扩容迁移（一次就会扩容迁移2个bucket），这样虽然会有较长时间保留着old buckets，但是对map响应和用户体验影响较小，所以应该是一种折中和平衡的方案；

删除时候，会依次遍历改变top值；

3.6. map遍历元素

示例代码

func main() { m1 := make(map[int8]int) m1[1] = 1 for k,v := range m1 { fmt.Println(k,v) } }

map遍历元素分为两步，首先调用函数mapiterinit，初始化迭代器结构体hiter；然后调用函数mapiternext来循环遍历kv；下面我们首先看下迭代器hiter的结构，然后分析一下函数mapiterinit和函数mapiternext源码，这两个函数都存在于文件runtime/map.go中。
迭代器hiter的结构

// A hash iteration structure. // If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate // the layout of this structure. type hiter struct { key unsafe.Pointer // Must be in first position. Write nil to indicate iteration end (see cmd/internal/gc/range.go). elem unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go). t *maptype h *hmap buckets unsafe.Pointer // bucket ptr at hash_iter initialization time bptr *bmap // current bucket overflow *[]*bmap // keeps overflow buckets of hmap.buckets alive oldoverflow *[]*bmap // keeps overflow buckets of hmap.oldbuckets alive startBucket uintptr // bucket iteration started at offset uint8 // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1) wrapped bool // already wrapped around from end of bucket array to beginning B uint8 i uint8 bucket uintptr checkBucket uintptr }

key：key指针；

elem：elem指针；

bptr：当前正待遍历的bucket指针；

startBucket：遍历起始的bucket索引；

offset：遍历每个bucket的时候，起始的cell索引；

wrapped：map遍历一般是从中间的bucket开始往末尾bucket遍历，如果已经到了末尾，则会继续从头开始遍历；该标志位为真时候，表示开始从头开始遍历；

i：当前cell索引；

bucket：当前bucket索引；

checkBucket：需要检查的bucket索引；当map遍历的之前，map正在扩容迁移(growing)过程中，此时找到一个待遍历的bucket，我们会先找到旧bucket，如果旧bucket还没有迁移，同时我们知道，如果迁移结束，该bucket中的kv肯定会迁移到2个bucket（例如B=1，旧的buckets是b0和b1；扩容后B=2，新的buckets是b0，b1，b2，b3，根据之前扩容迁移的过程分析，旧的b0会迁移到新的b0和b2）；所以map只会返回最终会迁移到新bucket的kv；checkBucket就是上述场景下的bucket索引；

迭代函数源码：

// mapiterinit initializes the hiter struct used for ranging over maps. // The hiter struct pointed to by 'it' is allocated on the stack // by the compilers order pass or on the heap by reflect_mapiterinit. // Both need to have zeroed hiter since the struct contains pointers. func mapiterinit(t *maptype, h *hmap, it *hiter) { if raceenabled && h != nil { callerpc := getcallerpc() racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit)) } // 遍历没有初始化的map不会报错 if h == nil || h.count == 0 { return } if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 { throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go } it.t = t it.h = h // grab snapshot of bucket state it.B = h.B it.buckets = h.buckets if t.bucket.ptrdata == 0 { // Allocate the current slice and remember pointers to both current and old. // This preserves all relevant overflow buckets alive even if // the table grows and/or overflow buckets are added to the table // while we are iterating. h.createOverflow() it.overflow = h.extra.overflow it.oldoverflow = h.extra.oldoverflow } // 每次map遍历的起始bucket槽位和起始cell槽位都是随机的，原因就是这两个槽位是根据随机数来产生的 // decide where to start r := uintptr(fastrand()) if h.B > 31-bucketCntBits { r += uintptr(fastrand()) << 31 } it.startBucket = r & bucketMask(h.B) it.offset = uint8(r >> h.B & (bucketCnt - 1)) // iterator state it.bucket = it.startBucket // 修改hmap状态，原子操作 // Remember we have an iterator. // Can run concurrently with another mapiterinit(). if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator { atomic.Or8(&h.flags, iterator|oldIterator) } mapiternext(it) } func mapiternext(it *hiter) { h := it.h if raceenabled { callerpc := getcallerpc() racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext)) } if h.flags&hashWriting != 0 { throw("concurrent map iteration and map write") } t := it.t bucket := it.bucket b := it.bptr i := it.i checkBucket := it.checkBucket next: if b == nil { // bucket表示当前的`bucket`索引，wrapped表示是否从头遍历了； // map一般是从中间`bucket`开始遍历，如果遍历到末尾则wrapped=true，bucket=0，从头开始继续遍历； // 所以下面的判断条件如果为真，就是已经遍历结束了； if bucket == it.startBucket && it.wrapped { // end of iteration it.key = nil it.elem = nil return } // 遍历之后，h.B可能继续变大 if h.growing() && it.B == h.B { // Iterator was started in the middle of a grow, and the grow isn't done yet. // If the bucket we're looking at hasn't been filled in yet (i.e. the old // bucket hasn't been evacuated) then we need to iterate through the old // bucket and only return the ones that will be migrated to this bucket. oldbucket := bucket & it.h.oldbucketmask() b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))) if !evacuated(b) { checkBucket = bucket } else { b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize))) checkBucket = noCheck } } else { b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize))) checkBucket = noCheck } bucket++ // bucket遍历到末尾后，从头开始继续遍历 if bucket == bucketShift(it.B) { bucket = 0 it.wrapped = true } i = 0 } for ; i < bucketCnt; i++ { offi := (i + it.offset) & (bucketCnt - 1) if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty { // TODO: emptyRest is hard to use here, as we start iterating // in the middle of a bucket. It's feasible, just tricky. continue } k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize)) if checkBucket != noCheck && !h.sameSizeGrow() { // Special case: iterator was started during a grow to a larger size // and the grow is not done yet. We're working on a bucket whose // oldbucket has not been evacuated yet. Or at least, it wasn't // evacuated when we started the bucket. So we're iterating // through the oldbucket, skipping any keys that will go // to the other new bucket (each oldbucket expands to two // buckets during a grow). if t.reflexivekey() || t.key.equal(k, k) { // If the item in the oldbucket is not destined for // the current new bucket in the iteration, skip it. hash := t.hasher(k, uintptr(h.hash0)) // 跳过不会迁移到当前bucket的kv if hash&bucketMask(it.B) != checkBucket { continue } } else { // Hash isn't repeatable if k != k (NaNs). We need a // repeatable and randomish choice of which direction // to send NaNs during evacuation. We'll use the low // bit of tophash to decide which way NaNs go. // NOTE: this case is why we need two evacuate tophash // values, evacuatedX and evacuatedY, that differ in // their low bit. if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) { continue } } } if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) || !(t.reflexivekey() || t.key.equal(k, k)) { // This is the golden data, we can return it. // OR // key!=key, so the entry can't be deleted or updated, so we can just return it. // That's lucky for us because when key!=key we can't look it up successfully. it.key = k if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } it.elem = e } else { // The hash table has grown since the iterator was started. // The golden data for this key is now somewhere else. // Check the current hash table for the data. // This code handles the case where the key // has been deleted, updated, or deleted and reinserted. // NOTE: we need to regrab the key as it has potentially been // updated to an equal() but not identical key (e.g. +0.0 vs -0.0). rk, re := mapaccessK(t, h, k) if rk == nil { continue // key has been deleted } it.key = rk it.elem = re } it.bucket = bucket if it.bptr != b { // avoid unnecessary write barrier; see issue 14921 it.bptr = b } it.i = i + 1 it.checkBucket = checkBucket return } b = b.overflow(t) i = 0 goto next } // returns both key and elem. Used by map iterator func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) { if h == nil || h.count == 0 { return nil, nil } hash := t.hasher(key, uintptr(h.hash0)) m := bucketMask(h.B) b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize))) if c := h.oldbuckets; c != nil { if !h.sameSizeGrow() { // There used to be half as many buckets; mask down one more power of two. m >>= 1 } oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize))) if !evacuated(oldb) { b = oldb } } top := tophash(hash) bucketloop: for ; b != nil; b = b.overflow(t) { for i := uintptr(0); i < bucketCnt; i++ { if b.tophash[i] != top { if b.tophash[i] == emptyRest { break bucketloop } continue } k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize)) if t.indirectkey() { k = *((*unsafe.Pointer)(k)) } if t.key.equal(key, k) { e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize)) if t.indirectelem() { e = *((*unsafe.Pointer)(e)) } return k, e } } } return nil, nil }

总结：

map遍历首先会初始化迭代器hiter，然后调用遍历函数mapiternext；

map遍历的起始bucket和起始cell都是随机的；

如果map遍历前，map进入一个growing过程，则map遍历效果等效于该growing全部结束后的的效果；也就是说，一个新bucket，可能还没有迁移进数据，但是map可以正常返回未来会迁移进入该bucket的数据；

4. 其他

如何获取调用的具体map函数

准备代码

package main import ( "fmt" ) func main() { m1 := make(map[string]string, 9) fmt.Println(m1) for i := 0; i < 20; i++ { str := fmt.Sprintf("%d", i) m1[str] = str } a := m1["0"] b, ok := m1["0"] fmt.Println(a,b,ok) }

打印汇编代码命令

go tool compile -N -l -S main.go > main.txt

根据汇编代码，查找调用函数CALL runtime.makemap(SB)

5. 参考

Golang map源码详解

go基础之map

年度最佳【golang】map详解

golang map源码分析

1. 前言

2. 环境信息

3. go map数据结构分析

bmap结构图

hmap结构图

3.1. map创建

3.2. map查找元素

3.4. map新增元素和更新元素

3.5. map删除元素

3.6. map遍历元素

4. 其他

5. 参考

你可能感兴趣的:(golang map源码分析)