1. 前言
map是CS中非常基础的数据结构,关于golang map的基本使用,这里不再赘述,可以参考官方文档。
golang的map实现是基于hash查找表,并且基于链表来解决hash碰撞问题。
2. 环境信息
- go版本:go1.15.4 darwin/amd64
3. go map数据结构分析
map的基础结构体是hmap
,该结构体存在文件runtime/map.go
中hmap
源码:
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
// mapextra holds fields that are not present on all maps.
type mapextra struct {
// If both key and elem do not contain pointers and are inline, then we mark bucket
// type as containing no pointers. This avoids scanning such maps.
// However, bmap.overflow is a pointer. In order to keep overflow buckets
// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
// overflow and oldoverflow are only used if key and elem do not contain pointers.
// overflow contains overflow buckets for hmap.buckets.
// oldoverflow contains overflow buckets for hmap.oldbuckets.
// The indirection allows to store a pointer to the slice in hiter.
overflow *[]*bmap
oldoverflow *[]*bmap
// nextOverflow holds a pointer to a free overflow bucket.
nextOverflow *bmap
}
count
:map中kv对的数量;flags
:map的一些标志位;B
:map中bucket数量为2^B
个;意味着此时map数据结构中可以存储loadFactor * 2^B
个数据,如果超过,则需要扩容;todonoverflow
:map中溢出bucket的近似数量;todohash0
:hash函数的种子;buckets
:map中bucket的首指针,map中一共有2^B
个bucket;如果count==0的情况下,该字段可能是nil;oldbuckets
:map中旧bucket的首指针,该字段只有在map扩容的时候,才不等于nil;todonevacuate
:map中bucket迁移数量,至多有此数量的bucket从旧bucket迁移到新bucket;todoextra
:扩展字段;
bmap
是bucket真正的结构体
// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
tophash
:存储hash值的高8位;keys
:key数组,隐藏字段;values
:value数组,隐藏字段;overflow
:溢出buceket指针,隐藏字段;
bmap.tophash
中除了存储hash值的高8位,也可以用来存储一些状态码。
// Possible tophash values. We reserve a few possibilities for special marks.
// Each bucket (including its overflow buckets, if any) will have either all or none of its
// entries in the evacuated* states (except during the evacuate() method, which only happens
// during map writes and thus no one else can observe the map during that time).
emptyRest = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
emptyOne = 1 // this cell is empty
evacuatedX = 2 // key/elem is valid. Entry has been evacuated to first half of larger table.
evacuatedY = 3 // same as above, but evacuated to second half of larger table.
evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
minTopHash = 5 // minimum tophash for a normal filled cell.
bmap结构图
hmap结构图
下面我们重点分析一下map的创建和增删改查操作,我们会展示源码,同时在源码上增加中文注释,作为对源码的分析;golang编译器会根据不同情况,调用不同的函数,我们下面分析的是runtime/map.go
文件中的基本函数;一些其他优化函数,例如runtime/map_faststr.go
中对map[string]type
类型的优化,感兴趣的同学可以自行查看。
3.1. map创建
示例代码
func main() {
m1 := make(map[string]string)
m2 := make(map[string]string, 9)
}
我们可以通过汇编编译代码看到go map创建调用的底层函数是makemap
,该函数存在文件runtime/map.go
中;事实上,不同的map声明方式,go标准编译器选择不同的函数调用,例如m1 := make(map[string]string)
代码,编译器会调用函数runtime.makemap_small
,但是大部分场景下都是调用makemap
。下面我们分析下函数makemap
:
// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
// 检查申请的map空间是否超过内存限制
mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
if overflow || mem > maxAlloc {
hint = 0
}
// 初始化hmap
// initialize Hmap
if h == nil {
h = new(hmap)
}
// hash初始种子
h.hash0 = fastrand()
// 计算B
// Find the size parameter B which will hold the requested # of elements.
// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
B := uint8(0)
for overLoadFactor(hint, B) {
B++
}
h.B = B
// allocate initial hash table
// if B == 0, the buckets field is allocated lazily later (in mapassign)
// If hint is large zeroing this memory could take a while.
if h.B != 0 {
var nextOverflow *bmap
// 调用函数makeBucketArray,分配bucket和溢出bucket的内存
h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
if nextOverflow != nil {
h.extra = new(mapextra)
h.extra.nextOverflow = nextOverflow
}
}
return h
}
// makeBucketArray initializes a backing array for map buckets.
// 1<= 4,则表示申请的map空间较大,我们事先申请一些溢出bucket,这样可以提高效率
// For small b, overflow buckets are unlikely.
// Avoid the overhead of the calculation.
if b >= 4 {
// Add on the estimated number of overflow buckets
// required to insert the median number of elements
// used with this value of b.
nbuckets += bucketShift(b - 4)
sz := t.bucket.size * nbuckets
up := roundupsize(sz)
if up != sz {
nbuckets = up / t.bucket.size
}
}
if dirtyalloc == nil {
buckets = newarray(t.bucket, int(nbuckets))
} else {
// dirtyalloc was previously generated by
// the above newarray(t.bucket, int(nbuckets))
// but may not be empty.
buckets = dirtyalloc
size := t.bucket.size * nbuckets
if t.bucket.ptrdata != 0 {
memclrHasPointers(buckets, size)
} else {
memclrNoHeapPointers(buckets, size)
}
}
if base != nbuckets {
// We preallocated some overflow buckets.
// To keep the overhead of tracking these overflow buckets to a minimum,
// we use the convention that if a preallocated overflow bucket's overflow
// pointer is nil, then there are more available by bumping the pointer.
// We need a safe non-nil pointer for the last overflow bucket; just use buckets.
// nextOverflow是溢出bucket的首地址;
// last是最后一个溢出bucket的首地址;
// 每个溢出bucket对应的bmap结构体中的溢出bucket指针都是nil;但是last的溢出bucket指针是bucket的起始地址;
nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
last.setoverflow(t, (*bmap)(buckets))
}
return buckets, nextOverflow
}
总结:
- 函数
makemap
会根据不同的声明方式和参数,决定map的初始化空间大小; - map中kv都存储在bucket中,每个bucket可以存8对kv;
- 如果
len(map) > 0
,则map中至少存在一个bucket,所以map的空间都是8的整数倍; - 如果map申请空间较大(大于等于128),表示出现key碰撞的几率较大,则会事先创建一些溢出bucket,以备后期使用;
3.2. map查找元素
示例代码
func main() {
m1 := make(map[int8]int)
m1[1] = 1
v, ok := m1[1]
fmt.Println(v, ok)
}
map查找元素操作,底层调用的函数mapaccess1
和mapaccess2
,该函数存在文件runtime/map.go
中;这两个函数基本一致,只是函数mapaccess2
会返回bool类型,表示key是否存在。事实上,对于不同的map key类型,编译器会调用不同的函数来实现map的增删改查操作,其中针对特殊key类型的优化函数,存在文件runtime/map_fast32.go
,runtime/map_fast64.go
和runtime/map_faststr.go
中;例如,如果key的类型是string,map的查找操作会调用优化函数mapaccess1_faststr
和mapaccess2_faststr
。本文只分析基本的函数,对于优化函数,感兴趣的同学可以自行查看源码。下面我们分析函数mapaccess2
:
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
// 启用数据竞争检测
if raceenabled && h != nil {
callerpc := getcallerpc()
pc := funcPC(mapaccess2)
racereadpc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
// 启用-msan检测
if msanenabled && h != nil {
msanread(key, t.key.size)
}
if h == nil || h.count == 0 {
if t.hashMightPanic() {
t.hasher(key, 0) // see issue 23734
}
return unsafe.Pointer(&zeroVal[0]), false
}
// map不支持并发安全,并发读写会产生panic
if h.flags&hashWriting != 0 {
throw("concurrent map read and map write")
}
// 计算hash值
hash := t.hasher(key, uintptr(h.hash0))
// m表示map中bucket数量
m := bucketMask(h.B)
// 利用`hash mod m`可以计算bucket索引,b表示对应bucket的首地址
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
// map正在迁移的场景,如果map正在迁移,则优先从oldbuckets中查找kv
if c := h.oldbuckets; c != nil {
// map是否在扩容迁移,如果是扩容迁移,则oldbuckets实际的bucket数量是m的一半(扩容会让bucket数量增加一倍)
if !h.sameSizeGrow() {
// There used to be half as many buckets; mask down one more power of two.
m >>= 1
}
// 根据hash值,查找oldbuckets中对应的bucket地址
oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
// 如果oldb的标志位不是撤离状态,则我们从oldb中查找kv
if !evacuated(oldb) {
b = oldb
}
}
// top表示hash的高8位,如果hash高8位小于5,则top需要加上5;因为5表示`minTopHash`,top如果是小于等于5,都是表示特殊状态;正常的key的top值都是大于5的
top := tophash(hash)
bucketloop:
// 逐个查找对应bucket和其溢出bucket
for ; b != nil; b = b.overflow(t) {
// 一个bucket有8对kv,逐个查找
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
// 如果b.tophash[i] == emptyRest,表示剩下的kv对都是空的,所以直接跳出循环
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
// 查找对应的key的地址
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
// 比较key是否相等
if t.key.equal(key, k) {
// 如果key相等,则找到对应的value
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
e = *((*unsafe.Pointer)(e))
}
// 返回value
return e, true
}
}
}
// 返回对应的0值
return unsafe.Pointer(&zeroVal[0]), false
}
总结:
- 判断是否并发读写,如果是,则抛出panic;
- 计算hash值,根据hash的地位找到对应的bucket,根据高8位,找到对应的kv槽位;
- map迁移场景下,优先从oldbuckets中查找kv;
- 比较key,相等则返回value,不等则返回0值;
- map kv定位过程如下图:
3.4. map新增元素和更新元素
示例代码
func main() {
m1 := make(map[int8]int)
m1[1] = 1
m1[2] = 2
m1[1] = 11
fmt.Println(m1)
}
map的新增和更新元素操作,都会调用函数mapassign
,该函数存在文件runtime/map.go
中。
// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
if h == nil {
panic(plainError("assignment to entry in nil map"))
}
if raceenabled {
callerpc := getcallerpc()
pc := funcPC(mapassign)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled {
msanread(key, t.key.size)
}
// map不支持并发读写
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
// 计算hash值
hash := t.hasher(key, uintptr(h.hash0))
// map状态设置为hashWriting
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write.
h.flags ^= hashWriting
// 如果map没有初始化bucket,此时会申请bucket空间
if h.buckets == nil {
h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}
again:
// 根据hash值,计算bucket索引
bucket := hash & bucketMask(h.B)
// 判断map是否正在扩容
if h.growing() {
// 函数growWork是将hmp.oldbuckets中对应的bucket迁移到新的buckets中
growWork(t, h, bucket)
}
// 目标bucket的首地址
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)
var inserti *uint8
var insertk unsafe.Pointer
var elem unsafe.Pointer
bucketloop:
for {
// 遍历tophash查找key是否已经存在,或者是否有空位插入kv
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
// tophash中可能有多个空位,我们记录第一个空位的索引,后面的空位跳过
if isEmpty(b.tophash[i]) && inserti == nil {
inserti = &b.tophash[i]
insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
}
// tophash值表示剩余都是空位,则直接结束循环,因为后面全是空位,不会有相同的key在后面的槽位,此次操作必然是插入,而不是更新
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
if !t.key.equal(key, k) {
continue
}
// already have a mapping for key. Update it.
if t.needkeyupdate() {
typedmemmove(t.key, k, key)
}
elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
goto done
}
// 如果bucket是满的,而且没有发现相同的key,则继续查找溢出bucket
ovf := b.overflow(t)
if ovf == nil {
break
}
b = ovf
}
// Did not find mapping for key. Allocate new cell & add entry.
// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
// 程序运行到此处,必然是由于没有找到相同的key,此次操作是插入,不是更新;
// 插入一对kv,我们需要判断map是否需要扩容;
// overLoadFactor函数用来判断map是否由于数据太多,需要增量1倍扩容;
// tooManyOverflowBuckets函数用来判断map是否需要等量迁移,map由于删除操作,溢出bucket很多,但是数据分布很稀疏,我们可以通过等量迁移,将数据更加紧凑的存储在一起,节约空间;
// 具体可以看evacuate函数分析;
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
// hashGrow函数主要是设置hmap.flags为扩容状态,申请新的内存空间用来扩容,同时设置hmap.oldbuckets为原来的hmap.buckets
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
// inserti == nil表示没有插入的槽位,需要申请溢出bucket
if inserti == nil {
// all current buckets are full, allocate a new one.
newb := h.newoverflow(t, b)
inserti = &newb.tophash[0]
insertk = add(unsafe.Pointer(newb), dataOffset)
elem = add(insertk, bucketCnt*uintptr(t.keysize))
}
// store new key/elem at insert position
if t.indirectkey() {
kmem := newobject(t.key)
*(*unsafe.Pointer)(insertk) = kmem
insertk = kmem
}
if t.indirectelem() {
vmem := newobject(t.elem)
*(*unsafe.Pointer)(elem) = vmem
}
typedmemmove(t.key, insertk, key)
*inserti = top
h.count++
done:
// 设置flags,并写入value
if h.flags&hashWriting == 0 {
throw("concurrent map writes")
}
h.flags &^= hashWriting
if t.indirectelem() {
elem = *((*unsafe.Pointer)(elem))
}
return elem
}
// 迁移oldbucket中的对应bucket
func growWork(t *maptype, h *hmap, bucket uintptr) {
// make sure we evacuate the oldbucket corresponding
// to the bucket we're about to use
evacuate(t, h, bucket&h.oldbucketmask())
// evacuate one more oldbucket to make progress on growing
if h.growing() {
evacuate(t, h, h.nevacuate)
}
}
// bucket迁移函数
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
// old bucket索引
b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
// 如果是等量迁移,则newbit表示bucket数量;如果是增量迁移,newbit表示增量前的bucket数量;
newbit := h.noldbuckets()
// 待迁移bucket是否是迁移状态
if !evacuated(b) {
// TODO: reuse overflow buckets instead of using new ones, if there
// is no iterator using the old buckets. (If !oldIterator.)
// xy contains the x and y (low and high) evacuation destinations.
// 同一个hash值,在新旧buckets中对应的bucket索引可能是不一样的;
// 例如hash值是1001,旧buckets数量是8,新buckets数量是16,那么该hash值在旧buckets中索引是1,新buckets中索引是9;
// x表示新旧索引不变的情况下,新bucket的索引;y表示新索引增加newbit情况下,新bucket的索引;
var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.e = add(x.k, bucketCnt*uintptr(t.keysize))
if !h.sameSizeGrow() {
// Only calculate y pointers if we're growing bigger.
// Otherwise GC can see bad pointers.
y := &xy[1]
y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
y.k = add(unsafe.Pointer(y.b), dataOffset)
y.e = add(y.k, bucketCnt*uintptr(t.keysize))
}
for ; b != nil; b = b.overflow(t) {
// 待迁移bucket中kv的首地址
k := add(unsafe.Pointer(b), dataOffset)
e := add(k, bucketCnt*uintptr(t.keysize))
for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
top := b.tophash[i]
// 如果tophash为空,则跳过,这样就可以让数据紧凑,节约内存空间;
if isEmpty(top) {
b.tophash[i] = evacuatedEmpty
continue
}
if top < minTopHash {
throw("bad map state")
}
k2 := k
if t.indirectkey() {
k2 = *((*unsafe.Pointer)(k2))
}
var useY uint8
if !h.sameSizeGrow() {
// Compute hash to make our evacuation decision (whether we need
// to send this key/elem to bucket x or bucket y).
hash := t.hasher(k2, uintptr(h.hash0))
if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
// If key != key (NaNs), then the hash could be (and probably
// will be) entirely different from the old hash. Moreover,
// it isn't reproducible. Reproducibility is required in the
// presence of iterators, as our evacuation decision must
// match whatever decision the iterator made.
// Fortunately, we have the freedom to send these keys either
// way. Also, tophash is meaningless for these kinds of keys.
// We let the low bit of tophash drive the evacuation decision.
// We recompute a new random tophash for the next level so
// these keys will get evenly distributed across all buckets
// after multiple grows.
useY = top & 1
top = tophash(hash)
} else {
if hash&newbit != 0 {
useY = 1
}
}
}
// 检查迁移状态
if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
throw("bad evacuatedN")
}
// 设置tophash值
b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
dst := &xy[useY] // evacuation destination
// dst用来接收迁移的bucket(包括溢出bucket)中的kv;
// 迁移过来的有效kv数量达到8之后,dst会申请溢出bucket;
if dst.i == bucketCnt {
dst.b = h.newoverflow(t, dst.b)
dst.i = 0
dst.k = add(unsafe.Pointer(dst.b), dataOffset)
dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
}
dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
if t.indirectkey() {
*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
} else {
typedmemmove(t.key, dst.k, k) // copy elem
}
if t.indirectelem() {
*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
} else {
typedmemmove(t.elem, dst.e, e)
}
dst.i++
// These updates might push these pointers past the end of the
// key or elem arrays. That's ok, as we have the overflow pointer
// at the end of the bucket to protect against pointing past the
// end of the bucket.
dst.k = add(dst.k, uintptr(t.keysize))
dst.e = add(dst.e, uintptr(t.elemsize))
}
}
// 迁移完成后,清理bucket kv和溢出bucket的指针;保留tophash;
// Unlink the overflow buckets & clear key/elem to help GC.
if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
// Preserve b.tophash because the evacuation
// state is maintained there.
ptr := add(b, dataOffset)
n := uintptr(t.bucketsize) - dataOffset
memclrHasPointers(ptr, n)
}
}
// hmap.nevacuate累加
if oldbucket == h.nevacuate {
advanceEvacuationMark(h, t, newbit)
}
}
func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
h.nevacuate++
// Experiments suggest that 1024 is overkill by at least an order of magnitude.
// Put it in there as a safeguard anyway, to ensure O(1) behavior.
stop := h.nevacuate + 1024
if stop > newbit {
stop = newbit
}
for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
h.nevacuate++
}
if h.nevacuate == newbit { // newbit == # of oldbuckets
// Growing is all done. Free old main bucket array.
h.oldbuckets = nil
// Can discard old overflow buckets as well.
// If they are still referenced by an iterator,
// then the iterator holds a pointers to the slice.
if h.extra != nil {
h.extra.oldoverflow = nil
}
h.flags &^= sameSizeGrow
}
}
func hashGrow(t *maptype, h *hmap) {
// If we've hit the load factor, get bigger.
// Otherwise, there are too many overflow buckets,
// so keep the same number of buckets and "grow" laterally.
bigger := uint8(1)
if !overLoadFactor(h.count+1, h.B) {
bigger = 0
h.flags |= sameSizeGrow
}
oldbuckets := h.buckets
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
flags := h.flags &^ (iterator | oldIterator)
if h.flags&iterator != 0 {
flags |= oldIterator
}
// commit the grow (atomic wrt gc)
h.B += bigger
h.flags = flags
h.oldbuckets = oldbuckets
h.buckets = newbuckets
h.nevacuate = 0
h.noverflow = 0
if h.extra != nil && h.extra.overflow != nil {
// Promote current overflow buckets to the old generation.
if h.extra.oldoverflow != nil {
throw("oldoverflow is not nil")
}
h.extra.oldoverflow = h.extra.overflow
h.extra.overflow = nil
}
if nextOverflow != nil {
if h.extra == nil {
h.extra = new(mapextra)
}
h.extra.nextOverflow = nextOverflow
}
// the actual copying of the hash table data is done incrementally
// by growWork() and evacuate().
}
总结:
- map优先检查是否有相同的key,如果有,则表示是更新操作;
- 如果没有相同的key,则表示是插入操作;如果有空位,则在第一个空位处插入;如果没有空位,则增加一个溢出bucket,在溢出bucket中插入;插入操作可能会触发扩容操作;
- map不是一次性完成扩容的,而是逐步完成扩容的;当在一个bucket中执行插入操作的时候,如果发现需要扩容,则会把这个bucket(包含溢出bucket)全部迁移到新申请的buckets空间中,同时多扩容一个bucket(个人理解是加速扩容速度,否则因为个别bucket一直没有使用,导致map一直维护新旧两个buckets);
- map库容分为等量迁移和加倍扩容:等量迁移是为了让稀疏的数据分布更加紧凑(由于删除操作,map可能会很稀疏),加倍扩容是由于插入数据过多,申请一个加倍的空间来存储kv,同时加倍扩容也会删除空的槽位,让数据分布紧凑;
3.5. map删除元素
示例代码
func main() {
m1 := make(map[int8]int)
m1[1] = 1
delete(m1, 1)
}
map删除元素操作调用的底层函数是mapdelete
该函数存在文件runtime/map.go
中.
func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
if raceenabled && h != nil {
callerpc := getcallerpc()
pc := funcPC(mapdelete)
racewritepc(unsafe.Pointer(h), callerpc, pc)
raceReadObjectPC(t.key, key, callerpc, pc)
}
if msanenabled && h != nil {
msanread(key, t.key.size)
}
// h == nil || h.count == 0的时候,直接返回;
// 不过如果map的key类型是无法比较的话,这里会报错runtime error: hash of unhashable type xxx
// 所以会调用一次t.hasher函数,该函数会报合适的panic,可以参考issue 23734:https://github.com/golang/go/issues/23734
if h == nil || h.count == 0 {
if t.hashMightPanic() {
t.hasher(key, 0) // see issue 23734
}
return
}
// map不支持并发读写
if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
hash := t.hasher(key, uintptr(h.hash0))
// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write (delete).
h.flags ^= hashWriting
// bucket索引
bucket := hash & bucketMask(h.B)
// 如果map正在扩容过程中,此时会优先扩容,一次扩容2个bucket;
if h.growing() {
growWork(t, h, bucket)
}
b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
bOrig := b
top := tophash(hash)
search:
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
// 如果top=emptyRest,则表示后面的槽位都是空的,所以直接返回;
if b.tophash[i] == emptyRest {
break search
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
k2 := k
if t.indirectkey() {
k2 = *((*unsafe.Pointer)(k2))
}
if !t.key.equal(key, k2) {
continue
}
// 删除kv
// Only clear key if there are pointers in it.
if t.indirectkey() {
*(*unsafe.Pointer)(k) = nil
} else if t.key.ptrdata != 0 {
memclrHasPointers(k, t.key.size)
}
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
*(*unsafe.Pointer)(e) = nil
} else if t.elem.ptrdata != 0 {
memclrHasPointers(e, t.elem.size)
} else {
memclrNoHeapPointers(e, t.elem.size)
}
// 删除kv之后,首先将top值修改为emptyOne,如果后续的kv都没有,会将当前top值修改为emptyRest;
// 同时,当前top值修改,可能会导致之前的top值也需要相应修改;
b.tophash[i] = emptyOne
// If the bucket now ends in a bunch of emptyOne states,
// change those to emptyRest states.
// It would be nice to make this a separate function, but
// for loops are not currently inlineable.
// 如果已经是当前bucket的最后一个元素,则会继续寻找溢出bucket;
if i == bucketCnt-1 {
if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
goto notLast
}
} else { // 如果下一个top值不是emptyRest,则表示当前的top值不需要修改成emptyRest;
if b.tophash[i+1] != emptyRest {
goto notLast
}
}
// 循环修改top值;
// 由于当前top值修改为emptyRest,可能导致前一个top值或者前一个bucket的最后一个top值也要相应修改;
for {
b.tophash[i] = emptyRest
if i == 0 {
if b == bOrig {
break // beginning of initial bucket, we're done.
}
// Find previous bucket, continue at its last entry.
c := b
for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
}
i = bucketCnt - 1
} else {
i--
}
if b.tophash[i] != emptyOne {
break
}
}
notLast:
h.count--
break search
}
}
if h.flags&hashWriting == 0 {
throw("concurrent map writes")
}
h.flags &^= hashWriting
}
总结:
- 删除操作也不可以并发;
- 删除时候,也会触发扩容迁移;个人理解,go map不会一次性完成扩容迁移,这样应该比较消耗时间和性能,go map通过用户行为不断触发扩容迁移(一次就会扩容迁移2个bucket),这样虽然会有较长时间保留着old buckets,但是对map响应和用户体验影响较小,所以应该是一种折中和平衡的方案;
- 删除时候,会依次遍历改变top值;
3.6. map遍历元素
示例代码
func main() {
m1 := make(map[int8]int)
m1[1] = 1
for k,v := range m1 {
fmt.Println(k,v)
}
}
map
遍历元素分为两步,首先调用函数mapiterinit
,初始化迭代器结构体hiter
;然后调用函数mapiternext
来循环遍历kv;下面我们首先看下迭代器hiter
的结构,然后分析一下函数mapiterinit
和函数mapiternext
源码,这两个函数都存在于文件runtime/map.go
中。
迭代器hiter
的结构
// A hash iteration structure.
// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate
// the layout of this structure.
type hiter struct {
key unsafe.Pointer // Must be in first position. Write nil to indicate iteration end (see cmd/internal/gc/range.go).
elem unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go).
t *maptype
h *hmap
buckets unsafe.Pointer // bucket ptr at hash_iter initialization time
bptr *bmap // current bucket
overflow *[]*bmap // keeps overflow buckets of hmap.buckets alive
oldoverflow *[]*bmap // keeps overflow buckets of hmap.oldbuckets alive
startBucket uintptr // bucket iteration started at
offset uint8 // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
wrapped bool // already wrapped around from end of bucket array to beginning
B uint8
i uint8
bucket uintptr
checkBucket uintptr
}
key
:key指针;elem
:elem指针;bptr
:当前正待遍历的bucket
指针;startBucket
:遍历起始的bucket索引;offset
:遍历每个bucket的时候,起始的cell索引;wrapped
:map遍历一般是从中间的bucket开始往末尾bucket遍历,如果已经到了末尾,则会继续从头开始遍历;该标志位为真时候,表示开始从头开始遍历;i
:当前cell索引;bucket
:当前bucket索引;checkBucket
:需要检查的bucket索引;当map遍历的之前,map正在扩容迁移(growing)过程中,此时找到一个待遍历的bucket,我们会先找到旧bucket,如果旧bucket还没有迁移,同时我们知道,如果迁移结束,该bucket中的kv肯定会迁移到2个bucket(例如B=1,旧的buckets是b0和b1;扩容后B=2,新的buckets是b0,b1,b2,b3,根据之前扩容迁移的过程分析,旧的b0会迁移到新的b0和b2);所以map只会返回最终会迁移到新bucket的kv;checkBucket就是上述场景下的bucket索引;
迭代函数源码:
// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
if raceenabled && h != nil {
callerpc := getcallerpc()
racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
}
// 遍历没有初始化的map不会报错
if h == nil || h.count == 0 {
return
}
if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
}
it.t = t
it.h = h
// grab snapshot of bucket state
it.B = h.B
it.buckets = h.buckets
if t.bucket.ptrdata == 0 {
// Allocate the current slice and remember pointers to both current and old.
// This preserves all relevant overflow buckets alive even if
// the table grows and/or overflow buckets are added to the table
// while we are iterating.
h.createOverflow()
it.overflow = h.extra.overflow
it.oldoverflow = h.extra.oldoverflow
}
// 每次map遍历的起始bucket槽位和起始cell槽位都是随机的,原因就是这两个槽位是根据随机数来产生的
// decide where to start
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
it.startBucket = r & bucketMask(h.B)
it.offset = uint8(r >> h.B & (bucketCnt - 1))
// iterator state
it.bucket = it.startBucket
// 修改hmap状态,原子操作
// Remember we have an iterator.
// Can run concurrently with another mapiterinit().
if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
atomic.Or8(&h.flags, iterator|oldIterator)
}
mapiternext(it)
}
func mapiternext(it *hiter) {
h := it.h
if raceenabled {
callerpc := getcallerpc()
racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiternext))
}
if h.flags&hashWriting != 0 {
throw("concurrent map iteration and map write")
}
t := it.t
bucket := it.bucket
b := it.bptr
i := it.i
checkBucket := it.checkBucket
next:
if b == nil {
// bucket表示当前的`bucket`索引,wrapped表示是否从头遍历了;
// map一般是从中间`bucket`开始遍历,如果遍历到末尾则wrapped=true,bucket=0,从头开始继续遍历;
// 所以下面的判断条件如果为真,就是已经遍历结束了;
if bucket == it.startBucket && it.wrapped {
// end of iteration
it.key = nil
it.elem = nil
return
}
// 遍历之后,h.B可能继续变大
if h.growing() && it.B == h.B {
// Iterator was started in the middle of a grow, and the grow isn't done yet.
// If the bucket we're looking at hasn't been filled in yet (i.e. the old
// bucket hasn't been evacuated) then we need to iterate through the old
// bucket and only return the ones that will be migrated to this bucket.
oldbucket := bucket & it.h.oldbucketmask()
b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
if !evacuated(b) {
checkBucket = bucket
} else {
b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
checkBucket = noCheck
}
} else {
b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))
checkBucket = noCheck
}
bucket++
// bucket遍历到末尾后,从头开始继续遍历
if bucket == bucketShift(it.B) {
bucket = 0
it.wrapped = true
}
i = 0
}
for ; i < bucketCnt; i++ {
offi := (i + it.offset) & (bucketCnt - 1)
if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
// TODO: emptyRest is hard to use here, as we start iterating
// in the middle of a bucket. It's feasible, just tricky.
continue
}
k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+uintptr(offi)*uintptr(t.elemsize))
if checkBucket != noCheck && !h.sameSizeGrow() {
// Special case: iterator was started during a grow to a larger size
// and the grow is not done yet. We're working on a bucket whose
// oldbucket has not been evacuated yet. Or at least, it wasn't
// evacuated when we started the bucket. So we're iterating
// through the oldbucket, skipping any keys that will go
// to the other new bucket (each oldbucket expands to two
// buckets during a grow).
if t.reflexivekey() || t.key.equal(k, k) {
// If the item in the oldbucket is not destined for
// the current new bucket in the iteration, skip it.
hash := t.hasher(k, uintptr(h.hash0))
// 跳过不会迁移到当前bucket的kv
if hash&bucketMask(it.B) != checkBucket {
continue
}
} else {
// Hash isn't repeatable if k != k (NaNs). We need a
// repeatable and randomish choice of which direction
// to send NaNs during evacuation. We'll use the low
// bit of tophash to decide which way NaNs go.
// NOTE: this case is why we need two evacuate tophash
// values, evacuatedX and evacuatedY, that differ in
// their low bit.
if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
continue
}
}
}
if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
!(t.reflexivekey() || t.key.equal(k, k)) {
// This is the golden data, we can return it.
// OR
// key!=key, so the entry can't be deleted or updated, so we can just return it.
// That's lucky for us because when key!=key we can't look it up successfully.
it.key = k
if t.indirectelem() {
e = *((*unsafe.Pointer)(e))
}
it.elem = e
} else {
// The hash table has grown since the iterator was started.
// The golden data for this key is now somewhere else.
// Check the current hash table for the data.
// This code handles the case where the key
// has been deleted, updated, or deleted and reinserted.
// NOTE: we need to regrab the key as it has potentially been
// updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
rk, re := mapaccessK(t, h, k)
if rk == nil {
continue // key has been deleted
}
it.key = rk
it.elem = re
}
it.bucket = bucket
if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
it.bptr = b
}
it.i = i + 1
it.checkBucket = checkBucket
return
}
b = b.overflow(t)
i = 0
goto next
}
// returns both key and elem. Used by map iterator
func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) {
if h == nil || h.count == 0 {
return nil, nil
}
hash := t.hasher(key, uintptr(h.hash0))
m := bucketMask(h.B)
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))
if c := h.oldbuckets; c != nil {
if !h.sameSizeGrow() {
// There used to be half as many buckets; mask down one more power of two.
m >>= 1
}
oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))
if !evacuated(oldb) {
b = oldb
}
}
top := tophash(hash)
bucketloop:
for ; b != nil; b = b.overflow(t) {
for i := uintptr(0); i < bucketCnt; i++ {
if b.tophash[i] != top {
if b.tophash[i] == emptyRest {
break bucketloop
}
continue
}
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
if t.key.equal(key, k) {
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
e = *((*unsafe.Pointer)(e))
}
return k, e
}
}
}
return nil, nil
}
总结:
map
遍历首先会初始化迭代器hiter
,然后调用遍历函数mapiternext
;map
遍历的起始bucket和起始cell都是随机的;- 如果
map
遍历前,map
进入一个growing过程,则map
遍历效果等效于该growing全部结束后的的效果;也就是说,一个新bucket,可能还没有迁移进数据,但是map
可以正常返回未来会迁移进入该bucket的数据;
4. 其他
如何获取调用的具体
map
函数准备代码
package main import ( "fmt" ) func main() { m1 := make(map[string]string, 9) fmt.Println(m1) for i := 0; i < 20; i++ { str := fmt.Sprintf("%d", i) m1[str] = str } a := m1["0"] b, ok := m1["0"] fmt.Println(a,b,ok) }
打印汇编代码命令
go tool compile -N -l -S main.go > main.txt
- 根据汇编代码,查找调用函数
CALL runtime.makemap(SB)