Go 语言三色标记扫描对象是 DFS 还是 BFS?

最近在看左神新书 《Go 语言设计与实现》的垃圾收集器时产生一个疑惑,花了点时间搞清楚了记录一下。

Go 语言垃圾回收的实现使用了标记清除算法,将对象的状态抽象成黑色(活跃对象)、灰色(活跃对象中间状态)、白色(潜在垃圾对象也是所有对象的默认状态)三种,注意没有具体的字段标记颜色。

整个标记过程就是把白色对象标黑的过程:
1.首先将 ROOT 根对象(包括全局变量、goroutine 栈上的对象等)放入到灰色集合
2.选一个灰色对象,标成黑色,将所有可达的子对象放入到灰色集合
3.重复2的步骤,直到灰色集合中为空

下图是书上的插图,看上去是一个典型的深度优先搜索的算法。


Go 语言设计与实现

下图是刘丹冰写的《Golang 修养之路》的插图,看上去是一个典型的广度优先搜索的算法。


Golang 修养之路

我疑惑的点在于这个标记过程是深度优先算法还是广度优先算法,因为很多文章博客对此都没有很清楚的说明,作为学习者这种细节其实也不影响对整个 GC 流程的理解,但是这种细节我非常喜欢扣:)

对着书和源码摸索着大致找到了一个结果是深度优先。下面看下大致的过程,源码基于1.15.2版本:

gcStart 是 Go 语言三种条件触发 GC 的共同入口

func gcStart(trigger gcTrigger) {
    ......
    // 启动后台标记任务
    gcBgMarkStartWorkers()
    ......
}

启动后台标记任务

func gcBgMarkStartWorkers() {
    // Background marking is performed by per-P G's. Ensure that
    // each P has a background GC G.
    for _, p := range allp {
        if p.gcBgMarkWorker == 0 {
            // 为每个处理器创建用于执行后台标记任务的 Goroutine
            go gcBgMarkWorker(p)
            ......
        }
    }
}

为每个处理器创建用于执行后台标记任务的 Goroutine

func gcBgMarkWorker(_p_ *p) {
    ......
    for {
        // Go to sleep until woken by gcController.findRunnable.
        // We can't releasem yet since even the call to gopark
        // may be preempted.
        // 让当前 G 进入休眠
        gopark(func(g *g, parkp unsafe.Pointer) bool {
            park := (*parkInfo)(parkp)

            // The worker G is no longer running, so it's
            // now safe to allow preemption.
            releasem(park.m.ptr())

            // If the worker isn't attached to its P,
            // attach now. During initialization and after
            // a phase change, the worker may have been
            // running on a different P. As soon as we
            // attach, the owner P may schedule the
            // worker, so this must be done after the G is
            // stopped.
            if park.attach != 0 {
                p := park.attach.ptr()
                park.attach.set(nil)
                // cas the worker because we may be
                // racing with a new worker starting
                // on this P.
                // 把当前的G设到P的gcBgMarkWorker成员
                if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
                    // The P got a new worker.
                    // Exit this worker.
                    return false
                }
            }
            return true
        }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)

        ......

        systemstack(func() {
            // Mark our goroutine preemptible so its stack
            // can be scanned. This lets two mark workers
            // scan each other (otherwise, they would
            // deadlock). We must not modify anything on
            // the G stack. However, stack shrinking is
            // disabled for mark workers, so it is safe to
            // read from the G stack.
            // 设置G的状态为等待中,这样它的栈可以被扫描
            casgstatus(gp, _Grunning, _Gwaiting)
            switch _p_.gcMarkWorkerMode {
            default:
                throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
            case gcMarkWorkerDedicatedMode:
                // 这个模式下P应该专心执行标记
                gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
                if gp.preempt {
                    // We were preempted. This is
                    // a useful signal to kick
                    // everything out of the run
                    // queue so it can run
                    // somewhere else.
                    // 被抢占时把本地运行队列中的所有G都踢到全局运行队列
                    lock(&sched.lock)
                    for {
                        gp, _ := runqget(_p_)
                        if gp == nil {
                            break
                        }
                        globrunqput(gp)
                    }
                    unlock(&sched.lock)
                }
                // Go back to draining, this time
                // without preemption.
                // 继续执行标记
                gcDrain(&_p_.gcw, gcDrainFlushBgCredit)
            case gcMarkWorkerFractionalMode:
                // 执行标记
                gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            case gcMarkWorkerIdleMode:
                // 执行标记, 直到被抢占或者达到一定的量
                gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            }
            // 恢复G的状态到运行中
            casgstatus(gp, _Gwaiting, _Grunning)
        })
        ......
    }
}

上面休眠的 G 会在调度循环中检查并唤醒执行

func schedule() {
    ......
    // 正在 GC,去找 GC 的 g
    if gp == nil && gcBlackenEnabled != 0 {
        gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
        tryWakeP = tryWakeP || gp != nil
    }
    ......
    // 开始执行
    execute(gp, inheritTime)
}

执行标记

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
    .......
    // Drain heap marking jobs.
    // Stop if we're preemptible or if someone wants to STW.
    for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
        // Try to keep work available on the global queue. We used to
        // check if there were waiting workers, but it's better to
        // just keep work available than to make workers wait. In the
        // worst case, we'll do O(log(_WorkbufSize)) unnecessary
        // balances.
        // 将本地一部分工作放回全局队列中
        if work.full == 0 {
            gcw.balance()
        }

        // 获取待扫描的对象,一个 fast path,没有则走 slow path
        b := gcw.tryGetFast()
        if b == 0 {
            b = gcw.tryGet()
            if b == 0 {
                // Flush the write barrier
                // buffer; this may create
                // more work.
                wbBufFlush(nil, 0)
                b = gcw.tryGet()
            }
        }
        if b == 0 {
            // Unable to get work.
            break
        }
        // 扫描获取到的对象
        scanobject(b, gcw)
        ......
}

gcw 是每个 P 独有的所以不用担心并发的问题 和 GMP、mcache 一样设计,减少锁竞争

func (w *gcWork) tryGetFast() uintptr {
    wbuf := w.wbuf1
    if wbuf == nil {
        return 0
    }
    if wbuf.nobj == 0 {
        return 0
    }
        // 从 尾部 取出一个对象,对象数减一,重点是尾部
    wbuf.nobj--
    return wbuf.obj[wbuf.nobj]
}

// slow path
func (w *gcWork) tryGet() uintptr {
    wbuf := w.wbuf1
    if wbuf == nil {
        w.init()
        wbuf = w.wbuf1
        // wbuf is empty at this point.
    }
    // 第一个 buf 为空
    if wbuf.nobj == 0 {
        // 交换第一和第二的 buf
        w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
        wbuf = w.wbuf1
        // 都为空
        if wbuf.nobj == 0 {
            owbuf := wbuf
            // 尝试在全局列表中获取一个不为空的 buf
            wbuf = trygetfull()
            // 全局也没有
            if wbuf == nil {
                return 0
            }
            // 把之前的空 buf 放到全局列表中
            putempty(owbuf)
            w.wbuf1 = wbuf
        }
    }
    // 返回 buf 里最后一个对象
    wbuf.nobj--
    return wbuf.obj[wbuf.nobj]
}

尝试在全局列表中获取一个不为空的 buf

// trygetfull tries to get a full or partially empty workbuffer.
// If one is not immediately available return nil
//go:nowritebarrier
func trygetfull() *workbuf {
    b := (*workbuf)(work.full.pop())
    if b != nil {
        b.checknonempty()
        return b
    }
    return b
}

这是官方实现的无锁队列:)涨见识了,for 循环加原子操作实现栈的 pop

// lfstack is the head of a lock-free stack.
func (head *lfstack) pop() unsafe.Pointer {
    for {
        old := atomic.Load64((*uint64)(head))
        if old == 0 {
            return nil
        }
        node := lfstackUnpack(old)
        next := atomic.Load64(&node.next)
        if atomic.Cas64((*uint64)(head), old, next) {
            return unsafe.Pointer(node)
        }
    }
}

到这里从灰色集合中获取待扫描的对象逻辑说完了。找到对象了接着就是 scanobject(b, gcw) 了,里面有两段逻辑要注意

func scanobject(b uintptr, gcw *gcWork) {
    // Find the bits for b and the size of the object at b.
    //
    // b is either the beginning of an object, in which case this
    // is the size of the object to scan, or it points to an
    // oblet, in which case we compute the size to scan below.
    // 获取 b 的 heapBits 对象
    hbits := heapBitsForAddr(b)
    // 获取 span
    s := spanOfUnchecked(b)
    // span 对应的对象大小
    n := s.elemsize
    if n == 0 {
        throw("scanobject n == 0")
    }
    // 大于 128KB 的大对象 为了更高的性能 打散成小对象,加入到灰色集合中待扫描
    if n > maxObletBytes {
            ......
            // Enqueue the other oblets to scan later.
            // Some oblets may be in b's scalar tail, but
            // these will be marked as "no more pointers",
            // so we'll drop out immediately when we go to
            // scan those.
            for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
                if !gcw.putFast(oblet) {
                    gcw.put(oblet)
                }
            }
        }

        // Compute the size of the oblet. Since this object
        // must be a large object, s.base() is the beginning
        // of the object.
        n = s.base() + s.elemsize - b
        if n > maxObletBytes {
            n = maxObletBytes
        }
    }
    // 一个指针一个指针的扫描
    var i uintptr
    for i = 0; i < n; i += sys.PtrSize {
        // Find bits for this word.
        if i != 0 {
            // Avoid needless hbits.next() on last iteration.
            hbits = hbits.next()
        }
        // Load bits once. See CL 22712 and issue 16973 for discussion.
        bits := hbits.bits()
        // During checkmarking, 1-word objects store the checkmark
        // in the type bit for the one word. The only one-word objects
        // are pointers, or else they'd be merged with other non-pointer
        // data into larger allocations.
        if i != 1*sys.PtrSize && bits&bitScan == 0 {
            break // no more pointers in this object 通过位运算得出已经没有更多的指针了
        }
        if bits&bitPointer == 0 {
            continue // not a pointer   不是指针
        }

        // Work here is duplicated in scanblock and above.
        // If you make changes here, make changes there too.
        // 根据偏移算出对象的指针
        obj := *(*uintptr)(unsafe.Pointer(b + i))

        // At this point we have extracted the next potential pointer. 找到下一个指针了
        // Quickly filter out nil and pointers back to the current object.
        if obj != 0 && obj-b >= n {
            // Test if obj points into the Go heap and, if so,
            // mark the object.
            //
            // Note that it's possible for findObject to
            // fail if obj points to a just-allocated heap
            // object because of a race with growing the
            // heap. In this case, we know the object was
            // just allocated and hence will be marked by
            // allocation itself.
            // 请注意,如果 obj 指向刚刚分配的堆对象,则 findObject 可能会因为堆增长的竞争而失败。
            // 在这种情况下,我们知道对象刚刚被分配,因此将由分配本身标记。
            // 标记期间分配的对象直接标位黑色(混合写屏障)
            // 根据索引位置找到对象进行标色
            if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
                greyobject(obj, b, i, span, gcw, objIndex)
            }
        }
    }
    ......
}

根据索引位置找到对象进行标色

func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
    // obj should be start of allocation, and so must be at least pointer-aligned.
    if obj&(sys.PtrSize-1) != 0 {
        throw("greyobject: obj not pointer-aligned")
    }
    mbits := span.markBitsForIndex(objIndex)
    // 检查是否所有可到达的对象都被正确标记的机制, 仅出错使用
    if useCheckmark {
        if !mbits.isMarked() {
            printlock()
            print("runtime:greyobject: checkmarks finds unexpected unmarked object obj=", hex(obj), "\n")
            print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")

            // Dump the source (base) object
            gcDumpObject("base", base, off)

            // Dump the object
            gcDumpObject("obj", obj, ^uintptr(0))

            getg().m.traceback = 2
            throw("checkmark found unmarked object")
        }
        hbits := heapBitsForAddr(obj)
        if hbits.isCheckmarked(span.elemsize) {
            return
        }
        hbits.setCheckmarked(span.elemsize)
        if !hbits.isCheckmarked(span.elemsize) {
            throw("setCheckmarked and isCheckmarked disagree")
        }
    } else {
        if debug.gccheckmark > 0 && span.isFree(objIndex) {
            print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
            gcDumpObject("base", base, off)
            gcDumpObject("obj", obj, ^uintptr(0))
            getg().m.traceback = 2
            throw("marking free object")
        }

        // If marked we have nothing to do.
        if mbits.isMarked() {
            return
        }
        // 设置标记 黑色
        mbits.setMarked()

        // Mark span. 标记 span
        arena, pageIdx, pageMask := pageIndexOf(span.base())
        if arena.pageMarks[pageIdx]&pageMask == 0 {
            atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
        }

        // If this is a noscan object, fast-track it to black
        // instead of greying it.
        if span.spanclass.noscan() {
            gcw.bytesMarked += uint64(span.elemsize)
            return
        }
    }

    // Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
    // seems like a nice optimization that can be added back in.
    // There needs to be time between the PREFETCH and the use.
    // Previously we put the obj in an 8 element buffer that is drained at a rate
    // to give the PREFETCH time to do its work.
    // Use of PREFETCHNTA might be more appropriate than PREFETCH
    // 尝试将对象存入 gcwork 的缓存中,或全局队列中,用作后面处理
    if !gcw.putFast(obj) {
        gcw.put(obj)
    }
}
这里有一点要特别说明的,我思考了好久才想明白(菜是真菜),greyobject() 方法名很迷惑,标灰对象?其实 mspan 中使用 gcmarkBits 位图代表是否被垃圾回收扫描的状态,只有黑色和白色,mbits.setMarked() 设置的就是 gcmarkBits 对应的 index 位为 1。灰色是抽象出来的中间状态,没有专门的标灰的逻辑,放入到 gcw 中就是标灰。greyobject() 做的事情就是把自身 位置 标成黑色,代表它存活。最后把当前 位置 保存的 对象 放入到灰色集合,是为了扫描这个对象后续的引用。这里 位置对象 的关系有点绕,需要细品。

尝试存入 gcwork 的缓存中,或全局队列中

func (w *gcWork) putFast(obj uintptr) bool {
    w.checkPut(obj, nil)

    wbuf := w.wbuf1
    if wbuf == nil {
        return false
    } else if wbuf.nobj == len(wbuf.obj) {
        return false
    }

    // 在尾部添加 注意
    wbuf.obj[wbuf.nobj] = obj
    wbuf.nobj++
    return true
}
// slow path
func (w *gcWork) put(obj uintptr) {
    w.checkPut(obj, nil)

    flushed := false
    wbuf := w.wbuf1
    // Record that this may acquire the wbufSpans or heap lock to
    // allocate a workbuf.
    lockWithRankMayAcquire(&work.wbufSpans.lock, lockRankWbufSpans)
    lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
    if wbuf == nil {
        w.init()
        wbuf = w.wbuf1
        // wbuf is empty at this point.
    } else if wbuf.nobj == len(wbuf.obj) {
        w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
        wbuf = w.wbuf1
        if wbuf.nobj == len(wbuf.obj) {
            putfull(wbuf)
            w.flushedWork = true
            wbuf = getempty()
            w.wbuf1 = wbuf
            flushed = true
        }
    }
    // 在尾部添加 注意
    wbuf.obj[wbuf.nobj] = obj
    wbuf.nobj++

    ......
}

func putfull(b *workbuf) {
    b.checknonempty()
    work.full.push(&b.node)
}

无锁队列,for 循环加原子操作实现栈的 push

func (head *lfstack) push(node *lfnode) {
    node.pushcnt++
    new := lfstackPack(node, node.pushcnt)
    if node1 := lfstackUnpack(new); node1 != node {
        print("runtime: lfstack.push invalid packing: node=", node, " cnt=", hex(node.pushcnt), " packed=", hex(new), " -> node=", node1, "\n")
        throw("lfstack.push")
    }
    for {
        old := atomic.Load64((*uint64)(head))
        node.next = old
        if atomic.Cas64((*uint64)(head), old, new) {
            break
        }
    }
}

到这里把灰色对象标黑就完成了,又放回灰色集合接着扫下一个指针。

总结:

整个扫描过程,使用了后进先出的栈,模拟递归的系统栈,实现了深度优先搜索的算法。完整的 GC 代码太难看懂了,写错的地方欢迎指正交流哈。

图片来源:

Go 语言设计与实现 垃圾收集器
Golang三色标记+混合写屏障GC模式全分析

你可能感兴趣的:(Go 语言三色标记扫描对象是 DFS 还是 BFS?)