- 后端早读课翻译计划第四篇-

- 翻译自： a-journey-with-go

欢迎关注微信公众号： 后端早读课

本文详细讲述了 Golang 中，堆栈设计理念以及演变过程。描述了从 Segment Stack 到 Contiguous Stack 、初始堆栈大小从 8Kb 到 2Kb 的原因。

Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ 文章基于 Go 1.12.

Go 提供了一个轻量且智能的协程管理机制。轻量是因为协程堆栈初始化只有 2Kb，智能是因为协程堆栈可以根据我们的需要自动增加 / 减少。

堆栈的大小定义，我们可以在这里找到 runtime/stack.go:

// The minimum size of stack used by Go code
_StackMin = 2048

我们需要注意的是，它曾经在以下版本的时间里进行过优化：

Go 1.2: 协程堆栈从 4Kb 增长到 8Kb.
Go 1.4: 协程堆栈从 8Kb 减少到 2Kb.

协程堆栈大小的变化主要是因为堆栈分配策略的变化。在文章后面我们一会儿将会提到这个问题。

默认的堆栈大小有的时候并不能满足我们运行的程序。这时候 Go 就会自动的调整堆栈大小。

动态堆栈大小

如果 Go 可以自动的增长栈空间大小，那么也意味着他可以决定堆栈大小到底有没有必要需要修改。让我们看一个例子，分析一下它是怎么工作的：

func main() {
   a := 1
   b := 2

   r := max(a, b)
   println(`max: `+strconv.Itoa(r))
}

func max(a int, b int) int {
   if a >= b {
      return a
   }

   return b
}

这个例子只是计算了两个数字中最大的一个。为了了解 Go 是如何管理协程堆栈分配的，我们可以看下 Go 的编译流程代码，通过命令: go build -gcflags -S main.go. 输出 —— 我只保留了与堆栈有关的一些行 —— 它给我们一些有趣的信息，这些内容展示了 Go 都做了什么：



"".main STEXT size=186 args=0x0 locals=0x70
   0x0000 00000 (/go/src/main.go:5)    TEXT   "".main(SB), 
   ABIInternal, $112-0
   [...]
   0x00b0 00176 (/go/src/main.go:5) CALL   
   runtime.morestack_noctxt(SB)
[...]
0x0000 00000 (/go/src/main.go:13)    TEXT   "".max(SB), 
NOSPLIT|ABIInternal, $0-24
有两条指令涉及到栈大小的更改：

- CALL runtime.morestack_noctxt: 这个方法会在需要的时候增加堆栈大小。

-NOSPLIT: 这条指令的意味着堆栈不需要溢出检测，他与指令  //go:nosplit .比较相似。

我们看到这个方法：runtime.morestack_noctxt ，他会调用 runtime/stack.go 中的 newstack 方法:


func newstack() {
   [...]
   // Allocate a bigger segment and move the stack.
   oldsize := gp.stack.hi - gp.stack.lo
   newsize := oldsize * 2
   if newsize > maxstacksize {
       print("runtime: goroutine stack exceeds ", maxstacksize, "-byte limit\n")
      throw("stack overflow")
   }

   // The goroutine must be executing in order to call newstack,
   // so it must be Grunning (or Gscanrunning).
   casgstatus(gp, _Grunning, _Gcopystack)

   // The concurrent GC will not scan the stack while we are doing the copy since
   // the gp is in a Gcopystack status.
   copystack(gp, newsize, true)
   if stackDebug >= 1 {
      print("stack grow done\n")
   }
   casgstatus(gp, _Gcopystack, _Grunning)
}

首先根据 gp.stack.hi 和 gp.stack.lo 的边界来计算堆栈的大小，他们是指向堆栈头部和尾部的指针。


type stack struct {
   lo uintptr
   hi uintptr
}

然后堆栈大小被乘以 2 倍，如果它没有达到最大值的话 —— 最大值与系统架构有关。


// Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.
// Using decimal instead of binary GB and MB because
// they look nicer in the stack overflow failure message.
if sys.PtrSize == 8 {
   maxstacksize = 1000000000
} else {
   maxstacksize = 250000000
}

现在我们已经了解了运行机制，我们来写个简单的例子来验证以上的内容。为了 debug，我们需要设置 stackDebug 常量，它在上面 newstack 的方法里会打印一些 debug 信息，运行：


func main() {
   var x [10]int
   a(x)
}

//go:noinline
func a(x [10]int) {
   println(`func a`)
   var y [100]int
   b(y)
}

//go:noinline
func b(x [100]int) {
   println(`func b`)
   var y [1000]int
   c(y)
}

//go:noinline
func c(x [1000]int) {
   println(`func c`)
}

//go:noinline 指令是为了避免编译时把所有的方法都放到一行。如果都放到一行的话，我们将看不到每个方法开始时候的堆栈动态增长。

下面是一部分的 debug 日志：

runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]
stack grow done
func a
runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]
stack grow done
runtime: newstack sp=0xc00003f888 stack=[0xc00003e000, 0xc000040000]
stack grow done
runtime: newstack sp=0xc000081888 stack=[0xc00007e000, 0xc000082000]
stack grow done
func b
runtime: newstack sp=0xc0000859f8 stack=[0xc000082000, 0xc00008a000]
func c

我们可以看到堆栈一共有 4 次增长。其实，方法开始会将堆栈增长到它需要的大小。就像我们在代码中看到的，堆栈的边界定义了堆栈的大小，所以我们可以计算每一个新的堆栈的大小 —— newstack stack=[...] 指令提供了当前堆栈边界的指针：

runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]
0xc00002e800 - 0xc00002e000 = 2048
runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]
0xc000077000 - 0xc000076000 = 4096
runtime: newstack sp=0xc00003f888 stack=[0xc00003e000, 0xc000040000]
0xc000040000 - 0xc00003e000 = 8192
runtime: newstack sp=0xc000081888 stack=[0xc00007e000, 0xc000082000]
0xc000082000 - 0xc00007e000 = 16384
runtime: newstack sp=0xc0000859f8 stack=[0xc000082000, 0xc00008a000]
0xc00008a000 - 0xc000082000 = 32768

我们可以看到在编译时 Goroutine 的栈空间初始大小为 2Kb ，在函数起始的地方增长到它所需要的大小，直到大小已经满足运行条件或者达到了系统限制。

堆栈分配管理

动态堆栈分配系统并不是唯一影响我们应用原因。不过，堆栈分配方式也可能会对应用产生很大的影响。通过两个完整的日志跟踪让我们试着理解它是如何管理堆栈的。让我们尝试从前两个堆栈增长的跟踪中了解 Go 是如何进行堆栈管理的：

runtime: newstack sp=0xc00002e6d8 stack=[0xc00002e000, 0xc00002e800]
copystack gp=0xc000000300 [0xc00002e000 0xc00002e6e0 0xc00002e800] -> [0xc000076000 0xc000076ee0 0xc000077000]/4096
stackfree 0xc00002e000 2048
stack grow done
runtime: newstack sp=0xc000076888 stack=[0xc000076000, 0xc000077000]
copystack gp=0xc000000300 [0xc000076000 0xc000076890 0xc000077000] -> [0xc00003e000 0xc00003f890 0xc000040000]/8192
stackfree 0xc000076000 4096
stack grow done

第一条指令显示了当前堆栈的地址， stack=[0xc00002e000, 0xc00002e800] ，并把他复制到新的堆栈里，并且是之前的二倍大小， copystack [0xc00002e000 [...] 0xc00002e800] -> [0xc000076000 [...] 0xc000077000] ，4096 字节的长度和我们上面看到的一样。然后之前的堆栈将被释放： stackfree 0xc00002e000 。我们画了个图可以帮助理解上面的逻辑：

Golang stack growth with contiguous stack

copystack 指令复制了整个堆栈，并把所有的地址都移向新的堆栈。我们可以通过一段简短的代码来很容易的发现这个现象：

func main() {
   var x [10]int
   println(&x)
   a(x)
   println(&x)
}

打印出来的地址为

0xc00002e738
[...]
0xc000089f38

地址 0xc00002e738 是被包含在我们之前看到的堆栈地址之中 stack=[0xc00002e000, 0xc00002e800] ，同样的 0xc000089f38 这个地址也是包含在后一个堆栈之中 stack=[0xc000082000, 0xc00008a000] ，这两个 stack 地址是我们上面通过 debug 模式追踪到的。这也证明了确实所有的值都已经从老的堆栈移到了新的堆栈里。

另外，有趣的是，当垃圾回收被触发时，堆栈会缩小（译者注：一点也不 interesting）。

在我们的例子中，在函数调用之后，堆栈中除了主函数外没有其他的有效函数调用，所以在垃圾回收启动的时候，系统会将堆栈进行缩减。为了证明这个问题，我们可以强制进行垃圾回收：

func main() {
   var x [10]int
   println(&x)
   a(x)
   runtime.GC()
   println(&x)
}

Debug 程序会展示出堆栈缩减的日志：

func c
shrinking stack 32768->16384
copystack gp=0xc000000300 [0xc000082000 0xc000089e60 0xc00008a000] -> [0xc00007e000 0xc000081e60 0xc000082000]/16384

正如我们看到的这样，堆栈大小被缩减为原来的一半，并重用了之前的堆栈地址 stack=[0xc00007e000, 0xc000082000] ，同样在 runtime/stack.go — shrinkstack() 中我们可以看到，缩减函数默认就是将当前堆栈大小除以 2：

oldsize := gp.stack.hi - gp.stack.lo
newsize := oldsize / 2

连续堆栈 VS 分段堆栈

将堆栈复制到更大的堆栈空间中的策略称之为连续堆栈（contiguous stack），与分段堆栈（segmented stack）正好相反。Go 在 1.3 版本中迁移到了连续堆栈的策略。为了看看他们的不同，我们可以在 Go 1.2 版本中跑相同的例子看看。同样，我们需要修改 stackDebug 变量来展示 Debug 跟踪信息。为此，由于 Go 1.2 的 runtime 是用 C 语言写的，所以我们只能重新编译源代码.。这里是例子的运行结果：

func a
runtime: newstack framesize=0x3e90 argsize=0x320 sp=0x7f8875953848 stack=[0x7f8875952000, 0x7f8875953fa0]
   -> new stack [0xc21001d000, 0xc210021950]
func b
func c
runtime: oldstack gobuf={pc:0x400cff sp:0x7f8875953858 lr:0x0} cret=0x1 argsize=0x320

当前的堆栈 stack=[0x7f8875952000, 0x7f8875953fa0] 大小是 8Kb （8192 字节 + 堆栈顶部的大小），同时新的堆栈创建大小为 18864 字节 ( 18768 字节 + 堆栈顶部的大小）。（译者注：这里比较难理解
0x7f8875953fa0 - 0x7f8875952000 并不到 8Kb，应该是笔误，应该是 8096 字节）

内存大小分配的逻辑如下：

// allocate new segment.
framesize += argsize;
framesize += StackExtra;   // room for more functions, Stktop.
if(framesize < StackMin)
   framesize = StackMin;
framesize += StackSystem;

其中常量 StackExtra 是 2048 , StackMin 是 8192 ， StackSystem 从 0 到 512 都有可能（译者注：根据平台来判断的）

所以我们新的堆栈包括了：16016 (frame size) + 800 (arguments) + 2048 (StackExtra) + 0 (StackSystem)。

一旦所有的函数都调用完毕，新的堆栈将被释放（log runtime: oldstack)。这个行为是迫使 Golang 团队转移到连续堆栈的原因之一：

当前分段堆栈机制有一个 “热分离（ hot split）”的问题 —— 如果堆栈快满了，那么函数调用会引起一个新的堆栈块被分配。当所有的函>数调用返回时，新的堆栈块也被回收了。如果同样的调用方式密集地重复发生，分配 / 回收将会导致大量的开销。

https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub

因为这个问题，Go 1.2 将最小堆栈大小增长到了 8Kb。之后因为实现了连续堆栈，则将堆栈大小缩减回了 2Kb。

下图是分段堆栈的演示图：

Golang stack growth with segmented stack

总结

Go 的堆栈管理是非常高效的，而且容易理解。Golang 不是唯一一个没有选择分段堆栈的语言， Rust 语言因为同样的原因而没有选择这个方案。

如果你想了解更深入的堆栈内容，可以阅读 Dave Cheney 的博客文章，该文章讨论了 redzone ，还有 Bill Kennedy 的文章解释了堆栈中的 frames。

阅读原文：https://medium.com/a-journey-with-go/go-how-does-the-goroutine-stack-size-evolve-447fc02085e5

扩展阅读：

contiguous stack

contiguous stack in Go 1.3

StackExtra

StackMin

StackSystem

talks about the redzone

frames in the stack

欢迎关注微信公众号： 后端早读课

Go 协程堆栈设计进化之旅

动态堆栈大小

堆栈分配管理

连续堆栈 VS 分段堆栈

总结

扩展阅读：

你可能感兴趣的:(Go 协程堆栈设计进化之旅)