基本概念
三色标记和写屏障
- 起初所有的对象都是白色
- 扫描所有的可达对象,标记为灰色,放入待处理队列
- 从队列中提取灰色对象,将其引用的对象标记成灰色放入队列,自身标记为黑色
- 写屏障监视对象内存修改,重新标色或者放回队列.
当完成全部的扫描和标记工作后,剩余的只有白色和黑色两种,分别代表待回收和活跃对象,清晰操作只需将白色对象内存回收即可。
流程
可分为以下几步:
扫描
a. 设置STW(stop the world,暂停用户进程) . 这将导致所有的Ps都到GC的安全点.在这里无法做内存操作.
b. 扫描所有未扫描的spans, 只有当gc被迫提前是才会有未扫描的spans执行mark阶段
a. 将gc的阶段从_GCoff改为_GCmark,开启写屏障(write barrier),开启辅助gc,并且将root标记工作入队. 可能在所有的Ps开启写屏障之前不会做扫描操作,写屏障使用SWT实现.
b. Start the world. 因为调度器早就运行标记worker和辅助协程执行了部分的分配,在这里GC标记工作已经完成. 写屏障将指针引用的改变和新的引用指针都置为灰色(shade). 新申请的对象会直接标记为黑色
c. gc开始root标记工作. 包括扫描所有的栈,置灰所有全局变量和heap上的指针和非heap上的运行时数据结构。每扫描一个goroutine栈就暂停一个goroutine,将栈上所有指针都置为灰色,然后将goroutine恢复.
d. gc 从灰色的work queue中放出灰色对象,扫描每个灰色对象变成灰色,并且置灰指向它的指针(引用它的对象)
e. 由于gc工作在local cache上处理, 当没有多余的root标记工作或者灰色对象时使用一个分布式算法来检测.在此时,gc转向mark termiation。gc执行mark termination
a. stop the world
b. 将gcphase 改为_GCmarktermination. 然后disable workers和assists(辅助)
c. 清洗mcachesgc 改变sweep phase
a. 将gcphase 改为_GCoff. 设置为sweep状态,暂停写屏障
b. start the world. 从这个点开始所有新申请的对象都是白色的
c. gc 在后台并发执行回收操作当足够的申请时优惠执行以上操作
初始化
mgc.go
初始化gcPercent和triggerRatio
func gcinit() {
// No sweep on the first cycle.
mheap_.sweepdone = 1
// Set a reasonable initial GC trigger.
memstats.triggerRatio = 7 / 8.0
// Set gcpercent from the environment. This will also compute
// and set the GC trigger and goal.
_ = setGCPercent(readgogc())
}
func readgogc() int32 {
p := gogetenv("GOGC")
if p == "off" {
return -1
}
if n, ok := atoi32(p); ok {
return n
}
return 100
}
在为对象分配堆内存时,mallocgc函数会检查垃圾回收出发条件,并按照相关状态启动或者参与辅助回收.
``
malloc.go
// assistG is the G to charge for this allocation, or nil if
// GC is not currently active.
var assistG *g
if gcBlackenEnabled != 0 { //辅助回收
// Charge the current user G for this allocation.
assistG = getg()
if assistG.m.curg != nil {
assistG = assistG.m.curg
}
// Charge the allocation against the G. We'll account
// for internal fragmentation at the end of mallocgc.
assistG.gcAssistBytes -= int64(size)
if assistG.gcAssistBytes < 0 {
// This G is in debt. Assist the GC to correct
// this before allocating. This must happen
// before disabling preemption.
gcAssistAlloc(assistG)
}
}
//直接分配黑色对象
// Allocate black during GC.
// All slots hold nil so no scanning is needed.
// This may be racing with GC so do it atomically if there can be
// a race marking the bit.
if gcphase != _GCoff {
gcmarknewobject(uintptr(x), size, scanSize)
}
当mcache中没有可用对象时会返回shouldhelpgc=true
,根据这个字段mallocgc中如果达到gcTrigger会开启辅助回收
if freeIndex == s.nelems {
// The span is full.
if uintptr(s.allocCount) != s.nelems {
println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
throw("s.allocCount != s.nelems && freeIndex == s.nelems")
}
c.refill(spc)
shouldhelpgc = true
s = c.alloc[spc]
freeIndex = s.nextFreeIndex()
}
gcTrigger的判断条件,heap_live为活跃对象总量
mgc.go
下
// test reports whether the trigger condition is satisfied, meaning
// that the exit condition for the _GCoff phase has been met. The exit
// condition should be tested when allocating.
func (t gcTrigger) test() bool {
if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
return false
}
switch t.kind {
case gcTriggerHeap:
// Non-atomic access to heap_live for performance. If
// we are going to trigger on this, this thread just
// atomically wrote heap_live anyway and we'll see our
// own write.
return memstats.heap_live >= memstats.gc_trigger
case gcTriggerTime:
if gcpercent < 0 {
return false
}
lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
return lastgc != 0 && t.now-lastgc > forcegcperiod
case gcTriggerCycle:
// t.n > work.cycles, but accounting for wraparound.
return int32(t.n-work.cycles) > 0
}
return true
}
gcStart
// gcStart starts the GC. It transitions from _GCoff to _GCmark (if
// debug.gcstoptheworld == 0) or performs all of GC (if
// debug.gcstoptheworld != 0).
//
// This may return without performing this transition in some cases,
// such as when called on a system stack or with locks held.
func gcStart(trigger gcTrigger) {
// Pick up the remaining unswept/not being swept spans concurrently
for trigger.test() && sweepone() != ^uintptr(0) {
sweep.nbgsweep++
}
// gcBgMarkStartWorkers prepares background mark worker goroutines.
gcBgMarkStartWorkers()
// gcResetMarkState resets global state prior to marking (concurrent or STW) and resets the stack scan state of all Gs.
systemstack(gcResetMarkState)
systemstack(stopTheWorldWithSema)
// Enter concurrent mark phase and enable
// write barriers.
systemstack(func() {
finishsweep_m()
})
gcController.startCycle()
setGCPhase(_GCmark)
// gcBgMarkPrepare sets up state for background marking.
gcBgMarkPrepare() // Must happen before assist enable.
// gcMarkRootPrepare queues root scanning jobs (stacks, globals, and
// some miscellany) and initializes scanning-related state.
gcMarkRootPrepare()
// gcMarkTinyAllocs greys all active tiny alloc blocks.
gcMarkTinyAllocs()
// Concurrent mark.
systemstack(func() {
now = startTheWorldWithSema(trace.enabled)
work.pauseNS += now - work.pauseStart
work.tMark = now
})
}
接着后台work开始执行,gcBgMarkWorker
准备后台mark work goroutines,这些goroutines在mark phase之前不会运行
func gcBgMarkStartWorkers() {
// Background marking is performed by per-P G's. Ensure that
// each P has a background GC G.
for _, p := range allp {
if p.gcBgMarkWorker == 0 {
go gcBgMarkWorker(p)
notetsleepg(&work.bgMarkReady, -1)
noteclear(&work.bgMarkReady)
}
}
}
在每个goroutine中:
func gcBgMarkWorker(_p_ *p) {
for {
gopark(func(g *g, parkp unsafe.Pointer) bool {
}
systemstack(func() {
// Mark our goroutine preemptible so its stack
// can be scanned. This lets two mark workers
// scan each other (otherwise, they would
// deadlock). We must not modify anything on
// the G stack. However, stack shrinking is
// disabled for mark workers, so it is safe to
// read from the G stack.
casgstatus(gp, _Grunning, _Gwaiting)
switch _p_.gcMarkWorkerMode {
default:
throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
case gcMarkWorkerDedicatedMode:
gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
if gp.preempt {
// We were preempted. This is
// a useful signal to kick
// everything out of the run
// queue so it can run
// somewhere else.
lock(&sched.lock)
for {
gp, _ := runqget(_p_)
if gp == nil {
break
}
globrunqput(gp)
}
unlock(&sched.lock)
}
// Go back to draining, this time
// without preemption.
gcDrain(&_p_.gcw, gcDrainFlushBgCredit)
case gcMarkWorkerFractionalMode:
gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
case gcMarkWorkerIdleMode:
gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
}
casgstatus(gp, _Gwaiting, _Grunning)
if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
gcMarkDone()
})
}
在这里调用调用gcDrain
,调用gcMarkDone
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
// Drain root marking jobs.
if work.markrootNext < work.markrootJobs {
for !(preemptible && gp.preempt) {
job := atomic.Xadd(&work.markrootNext, +1) - 1
if job >= work.markrootJobs {
break
}
markroot(gcw, job)
if check != nil && check() {
goto done
}
}
}
}
// Drain heap marking jobs.
for !(preemptible && gp.preempt) {
// Try to keep work available on the global queue. We used to
// check if there were waiting workers, but it's better to
// just keep work available than to make workers wait. In the
// worst case, we'll do O(log(_WorkbufSize)) unnecessary
// balances.
if work.full == 0 {
gcw.balance()
}
b := gcw.tryGetFast()
if b == 0 {
b = gcw.tryGet()
if b == 0 {
// Flush the write barrier
// buffer; this may create
// more work.
wbBufFlush(nil, 0)
b = gcw.tryGet()
}
}
if b == 0 {
// Unable to get work.
break
}
scanobject(b, gcw)
// Flush background scan work credit to the global
// account if we've accumulated enough locally so
// mutator assists can draw on it.
if gcw.scanWork >= gcCreditSlack {
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
if flushBgCredit {
gcFlushBgCredit(gcw.scanWork - initScanWork)
initScanWork = 0
}
checkWork -= gcw.scanWork
gcw.scanWork = 0
if checkWork <= 0 {
checkWork += drainCheckThreshold
if check != nil && check() {
break
}
}
}
}
这里做循环分别标记root和heap,至此标记完成
在setGCPhase(_GCoff)完成标记,并关闭写屏障.并调用
gcSweep(work.mode)`开始回收
func gcMarkDone() {
gcMarkTermination(nextTriggerRatio)
}
func gcMarkTermination(nextTriggerRatio float64) {
systemstack(func() {
// marking is complete so we can turn the write barrier off
setGCPhase(_GCoff)
gcSweep(work.mode)
})
// Prepare workbufs for freeing by the sweeper. We do this
// asynchronously because it can take non-trivial time.
prepareFreeWorkbufs()
// Free stack spans. This must be done between GC cycles.
systemstack(freeStackSpans)
// Ensure all mcaches are flushed. Each P will flush its own
// mcache before allocating, but idle Ps may not. Since this
// is necessary to sweep all spans, we need to ensure all
// mcaches are flushed before we start the next GC cycle.
systemstack(func() {
forEachP(func(_p_ *p) {
_p_.mcache.prepareForSweep()
})
})
}
func gcSweep(mode gcMode) {
if !_ConcurrentSweep || mode == gcForceBlockMode {
// Special case synchronous sweep.
// Record that no proportional sweeping has to happen.
lock(&mheap_.lock)
mheap_.sweepPagesPerByte = 0
unlock(&mheap_.lock)
// Sweep all spans eagerly.
for sweepone() != ^uintptr(0) {
sweep.npausesweep++
}
// Free workbufs eagerly.
prepareFreeWorkbufs()
for freeSomeWbufs(false) {
}
// All "free" events for this mark/sweep cycle have
// now happened, so we can make this profile cycle
// available immediately.
mProf_NextCycle()
mProf_Flush()
return
}
}
释放stack spans
// Prepare workbufs for freeing by the sweeper. We do this
// asynchronously because it can take non-trivial time.
prepareFreeWorkbufs()
// Free stack spans. This must be done between GC cycles.
systemstack(freeStackSpans) // freeStackSpans frees unused stack spans at the end of GC.
Time-triggered
这个协程会在一定时间强制调用GCstart.proc.go中init时启动forcegchelper并一直等待,在sysmon中轮训时如果需要gc则会执行unlock(&forcegc.lock)
,让forcegchelper开始运行.
// forcegcperiod is the maximum time in nanoseconds between garbage
// collections. If we go this long without a garbage collection, one
// is forced to run.
//
// This is a variable for testing purposes. It normally doesn't change.
var forcegcperiod int64 = 2 * 60 * 1e9
// start forcegc helper goroutine
func init() {
go forcegchelper()
}
func forcegchelper() {
forcegc.g = getg()
for {
lock(&forcegc.lock)
if forcegc.idle != 0 {
throw("forcegc: phase error")
}
atomic.Store(&forcegc.idle, 1)
goparkunlock(&forcegc.lock, waitReasonForceGGIdle, traceEvGoBlock, 1)
// this goroutine is explicitly resumed by sysmon
if debug.gctrace > 0 {
println("GC forced")
}
// Time-triggered, fully concurrent.
gcStart(gcTrigger{kind: gcTriggerTime, now: nanotime()})
}
}
在proc.go的main函数中会启动sysmon
func main() {
systemstack(func() {
newm(sysmon, nil)
})
}
在sysmon函数中
noteclear(&sched.sysmonnote)
其他
gcDrain
中worker获取任务流程
将GcWork中的任务做分配,将部分扔到全局的queue中去
先检查w.wbuf2,如果不为空,则调用work.full.push(&b.node)
将其移动到work.full中去
否则判断w.wbuf1中的个数是否>4,满足的话则将一半的数放回全局的queue
接着才开始scanobject
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
if work.full == 0 {
gcw.balance()
}
b := gcw.tryGetFast()
if b == 0 {
b = gcw.tryGet()
if b == 0 {
// Flush the write barrier
// buffer; this may create
// more work.
wbBufFlush(nil, 0)
b = gcw.tryGet()
}
}
scanobject(b, gcw)
}
在tryGet中,首先会从w.wbuf1中获取,如果失败,会交换w.wbuf1和w.wbuf2,类似于mcache那种,如果还是取不到则work.full.pop()
从全局的queue中获取数据。其中work.full使用的是无锁数据结构, 通过CAS(Compare&Swap)
指令来实现
var work struct {
full lfstack // lock-free list of full blocks workbuf
empty lfstack // lock-free list of empty blocks workbuf
func (w *gcWork) tryGet() uintptr {
wbuf := w.wbuf1
if wbuf == nil {
w.init()
wbuf = w.wbuf1
// wbuf is empty at this point.
}
if wbuf.nobj == 0 {
w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
wbuf = w.wbuf1
if wbuf.nobj == 0 {
owbuf := wbuf
wbuf = trygetfull()
if wbuf == nil {
return 0
}
putempty(owbuf)
w.wbuf1 = wbuf
}
}
wbuf.nobj--
return wbuf.obj[wbuf.nobj]
}
内存状态统计
与之相关的数据结构一个是mstats结构,一个是MemStats。用户通过runtime.ReadMemStats函数来获取统计信息。
注意点就是ReadMemStats会进行STW
操作
func ReadMemStats(m *MemStats) {
stopTheWorld("read mem stats")
systemstack(func() {
readmemstats_m(m)
})
startTheWorld()
}
stopTheWorld出现的位置
debug.go:29: stopTheWorld("GOMAXPROCS")
export_debuglog_test.go:38: stopTheWorld("ResetDebugLog")
export_test.go:244: stopTheWorld("CountPagesInUse")
export_test.go:288: stopTheWorld("ReadMemStatsSlow")
heapdump.go:21: stopTheWorld("write heap dump")
mprof.go:729: stopTheWorld("profile")
mprof.go:782: stopTheWorld("stack trace")
mstats.go:446: stopTheWorld("read mem stats")
proc.go:963:func stopTheWorld(reason string) {
trace.go:185: stopTheWorld("start tracing")
trace.go:276: stopTheWorld("stop tracing")
stopTheWorldWithSema
1. func gcStart(trigger gcTrigger) {}
2. func gcMarkDone() {}