golang pprof 的使用调试 cpu,heap,gc,逃逸

golang 的服务调优大体需要从cpu, heap,gc入手。
cpu分析是从抽样程序里各个模块的运行效率,单位是ms
heap分析是分析抽样程序各个模块的内存损耗, 单位是 MB
gc分析是记录gc时的一些样本状态,比如gc的次数,gc时对象数量,gc后对象数量,堆大小等

代码

type JR struct {
     
	Name string          `json:"name"`
	Data json.RawMessage `json:"data"`
}

func main() {
     
	var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")
	var memprofile = flag.String("memprofile", "", "write memory profile to `file`")
	flag.Parse()
	if *cpuprofile != "" {
     
		f, err := os.Create(*cpuprofile)
		if err != nil {
     
			log.Fatal("could not create CPU profile: ", err)
		}
		if err := pprof.StartCPUProfile(f); err != nil {
     
			log.Fatal("could not start CPU profile: ", err)
		}
		defer pprof.StopCPUProfile()
	}

	// ... rest of the program ...
	var cm = make(map[string]JR)
	var tmp string
	for i := 0; i < 100000; i++ {
     
		tmp = fmt.Sprintf("user_%d", i)
		cm[tmp] = JR{
     Name: tmp, Data: []byte(`{"kk":9}`)}
	}
	fmt.Println(len(cm))

	if *memprofile != "" {
     
		f, err := os.Create(*memprofile)
		if err != nil {
     
			log.Fatal("could not create memory profile: ", err)
		}
		runtime.GC() // get up-to-date statistics
		if err := pprof.WriteHeapProfile(f); err != nil {
     
			log.Fatal("could not write memory profile: ", err)
		}
		f.Close()
	}
}

一.GC

执行
GODEBUG=gctrace=1 go run main.go 2>xx.log ,标准输出流是GODEBUG=gctrace=1 go run main.go,打印至控制台
会在同路径下生成 xx.log,内含gc信息

gc 1 @0.802s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 2 @2.229s 0%: 0+0.97+0 ms clock, 0+0/0.97/1.9+0 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 3 @3.462s 0%: 0+0+0.99 ms clock, 0+0/0/0+3.9 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 4 @4.528s 0%: 0+1.0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 5 @5.179s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 6 @5.190s 0%: 0+1.0+1.0 ms clock, 0+1.0/1.0/2.0+4.0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 7 @5.268s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 8 @5.314s 0%: 0+0+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 9 @5.440s 0%: 0+0.87+0 ms clock, 0+0/0/0+0 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
# command-line-arguments
gc 1 @0.005s 0%: 0+1.9+1.0 ms clock, 0+0.99/0.99/1.9+4.1 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc 2 @0.010s 0%: 0+2.0+0 ms clock, 0+0.99/0/3.0+0 ms cpu, 6->6->4 MB, 8 MB goal, 4 P
# command-line-arguments
gc 1 @0.002s 0%: 0+1.9+0 ms clock, 0+0/1.9/0+0 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc 2 @0.010s 0%: 0+2.9+0 ms clock, 0+0/1.9/0+0 ms cpu, 7->8->7 MB, 9 MB goal, 4 P
gc 3 @0.016s 0%: 0+5.0+0 ms clock, 0+0/5.0/0+0 ms cpu, 13->14->13 MB, 15 MB goal, 4 P
gc 4 @0.099s 0%: 0+5.8+0 ms clock, 0+5.8/4.8/0+0 ms cpu, 26->26->24 MB, 27 MB goal, 4 P
gc 5 @0.297s 0%: 0+13+0 ms clock, 0+1.9/13/6.9+0 ms cpu, 46->48->42 MB, 48 MB goal, 4 P
gc 1 @0.006s 0%: 0+1.9+0 ms clock, 0+0/1.9/0+0 ms cpu, 5->5->3 MB, 6 MB goal, 4 P
gc 2 @0.012s 0%: 0+1.9+1.0 ms clock, 0+0/0.99/2.9+4.0 ms cpu, 8->8->6 MB, 9 MB goal, 4 P
gc 3 @0.023s 0%: 0+6.9+0 ms clock, 0+0/6.9/0+0 ms cpu, 16->16->13 MB, 17 MB goal, 4 P

解释:

gc 1 @0.038s 1%: 0.55+0.12+0.081 ms clock, 2.2+0/0.42/1.1+0.32 ms cpu, 4->4->0 MB, 5 MB goal, 4 P。

1 表示第一次执行

@0.038s 表示程序执行的总时间

1% 垃圾回收时间占用总的运行时间百分比

0.018+1.3+0.076 ms clock 垃圾回收的时间,分别为STW(stop-the-world)清扫的时间, 并发标记和扫描的时间,STW标记的时间

0.054+0.35/1.0/3.0+0.23 ms cpu 垃圾回收占用cpu时间

4->4->3 MB 堆的大小,gc后堆的大小,存活堆的大小

5 MB goal 整体堆的大小

4 P 使用的处理器数量

二. cpu分析

执行
go run main.go --cpuprofile=cpu.prof
会在当前路径下生成cpu.prof 文件,然后执行
go tool pprof main.go cpu.prof
进入cpu分析模块。
top10 消耗前10
web web展现
其它命令不介绍了。

 flat  flat%   sum%        cum   cum%
      10ms   100%   100%       10ms   100%  runtime.cgocall
     ....
flat:  自相关参数,只受自己模块影响
cum:  联合参数, 是包括自己以及它调用的函数栈的总和.

三. heap分析

执行
go run main.go --memprofile=mem.prof
go tool pprof main.go mem.prof
top10
web

所有效果和cpu类似,只是单位从ms 换成了MB

四. 逃逸分析

执行
go build -gcflags '-m -l' main.go 或者 go build -gcflags ‘-m -m’ main.go
前者消除内联了,
go build -gcflags '-m -l' main.go -l 一个,表示消除内敛
go build -gcflags '-m -l -l' main.go -l 两个 ,表示内联级别比默认强
go build -gcflags '-m -l' main.go -l 3个,强内敛,二进制包体积变大,但是不稳定,可能有bug
4就不讨论了,推荐用0个,默认,或者2个,高内敛

$ go build -gcflags '-m -l -l' main.go
# command-line-arguments
.\main.go:25:14: "could not create CPU profile: " escapes to heap
.\main.go:25:14: err escapes to heap
.\main.go:27:34: f escapes to heap
.\main.go:28:14: "could not start CPU profile: " escapes to heap
.\main.go:28:14: err escapes to heap
.\main.go:37:21: i escapes to heap
.\main.go:38:39: ([]byte)("{\"kk\":9}") escapes to heap
.\main.go:40:17: len(cm) escapes to heap
.\main.go:45:14: "could not create memory profile: " escapes to heap
.\main.go:45:14: err escapes to heap
.\main.go:48:35: f escapes to heap
.\main.go:49:14: "could not write memory profile: " escapes to heap
.\main.go:49:14: err escapes to heap
.\main.go:25:13: main ... argument does not escape
.\main.go:28:13: main ... argument does not escape
.\main.go:34:15: main make(map[string]JR) does not escape
.\main.go:37:20: main ... argument does not escape
.\main.go:40:13: main ... argument does not escape
.\main.go:45:13: main ... argument does not escape
.\main.go:49:13: main ... argument does not escape
<autogenerated>:1: os.(*File).close .this does not escape
<autogenerated>:1: os.(*File).isdir .this does not escape

先说几个结论:
1.魔术字符串 var a ="",会发生逃逸.
2.log和fmt会发生逃逸,返回指针类型的函数,会逃逸。
3.传递指针类型的变量不会逃逸

逃逸的后果,就是对象的寿命变长,对象数量居高不下,gc频率变高,stop the world变长。

以上, cpu,heap, gc, 逃逸, 就是我们需要优化的几个方向。
基于程序代码,做了以上优化以后,如果还遇到瓶颈,还有一些其它调优手段。
linux
cat /proc/cpuinfo| grep "processor"| wc -l 查看逻辑核数,将一些代理如nginx,haproxy的worker设置成该数量
ps aux|head -1;ps aux|grep -v PID|sort -rn -k +3|head 查看高cpu,mem消耗的pid,关掉一下不必要的。

你可能感兴趣的:(go,pprof,linux,gc,go)