关于go性能监控与分析pprof

环境说明：

机器：MacBook Pro
内存：16G
CPU: 2.3 GHz 双核Intel Core i5
Go版本：go1.13.5

一.准备

测试代码：文件名：simpletrie.go

package simpletrie

type Node struct { // 节点
 Char     rune
 Children []*Node
}

func NewNode(r rune) *Node { // 新建节点
 return &Node{Char: r}
}

func (n *Node) insert(r rune) *Node { // 添加子node
 child := n.get(r)
 if child == nil {
     child = NewNode(r)
     n.Children = append(n.Children, child)
 }

 return child
}

func (n *Node) get(r rune) *Node { // 获取指定节点node
 for _, child := range n.Children {
     if child.Char == r {
         return child
     }
 }
 return nil
}

type Trie struct { // 树
 Root *Node
}

func NewTrie() *Trie { // 新建树
 var r rune
 trie := Trie{Root: NewNode(r)}
 return &trie
}

func (tr *Trie) Build(word string) { // 根据指定的内容构建tree
 node := tr.Root
 runeArr := []rune(word)
 for _, char := range runeArr {
     child := node.insert(char)
     node = child
 }
}

func (tr *Trie) Has(word string) bool { // 是否包括指定的内容在tree中
 node := tr.Root
 runeArr := []rune(word)
 for _, char := range runeArr {
     found := node.get(char)
     if found == nil {
         return false
     }
     node = found
 }
 return true
}

测试用例: 需要注意当前测试文件名：simpletrie_cpu_test.go,本测试用例主要用于cpu相关测试。

package simpletrie

import (
 "bufio"
 "log"
 "os"
 "runtime/pprof"
 "testing"
)

func BenchmarkTrieFind(b *testing.B) {
 cpuProfile, _ := os.Create("cpu_profile")
 pprof.StartCPUProfile(cpuProfile)
 defer pprof.StopCPUProfile()

 var trie1 = NewTrie()
 file, err := os.Open("./20k.txt")
 if err != nil {
     log.Fatal(err)
 }
 defer file.Close()

 scanner := bufio.NewScanner(file)

 for scanner.Scan() {
     trie1.Build(scanner.Text())
 }

 b.ResetTimer()

 for i := 0; i < b.N; i++ {
     trie1.Has("42082")
     trie1.Has("oops")
     trie1.Has("Supercalifragilisticexpialidocious")
 }
}

测试文件： 20k.txt

安装Graphviz

在mac上直接通过brew install graphviz
开始进行测试：

首先切换simpletrie_cpu_test.go所在的目录下，执行如下代码：
```
go test -test.bench=".*"
```

接着会在当前目录下生成cpu_profile

二.pprof 部分

cpu_profile

# 引用
"runtime/pprof"

# 代码
cpuProfile, _ := os.Create("cpu_profile")
pprof.StartCPUProfile(cpuProfile)
defer pprof.StopCPUProfile()

(1) 进入pprof

go tool pprof cpu_profile

输出如下内容：需要说明一下若是在多核环境，取样时间(Total samples)占比>100%属于正常现象。

Type: cpu    # profile 类型
Time: Dec 27, 2019 at 4:19pm (CST) # 开始时间
Duration: 1.11s, Total samples = 980ms (88.22%) # 分析数据抽样概况
Entering interactive mode (type "help" for commands, "o" for options)

(2) 了解pprof提供的功能

(pprof) help # 查看对应的交互操作命令

效果如下：

pprof-help

(3) 将cpu_profile输出svg，查看每个方法调用消耗的时间：方块越大代表消耗越大。

cpu-profile

(4) 使用top查看最高消耗的运行点：默认是10条，可以使用topN来输出指定最高的N条信息

Showing nodes accounting for 920ms, 93.88% of 980ms total   # 抽样数据消耗总占比
Showing top 10 nodes out of 52                              # 输出最高消耗的前10条
      flat  flat%   sum%        cum   cum%   
     230ms 23.47% 23.47%      450ms 45.92%  runtime.stringtoslicerune
     220ms 22.45% 45.92%      220ms 22.45%  go-demos/simpletrie.(*Node).get
      80ms  8.16% 54.08%      750ms 76.53%  go-demos/simpletrie.(*Trie).Has
      80ms  8.16% 62.24%       80ms  8.16%  runtime.madvise
      70ms  7.14% 69.39%       70ms  7.14%  runtime.procyield
      70ms  7.14% 76.53%       70ms  7.14%  runtime.pthread_cond_wait
      60ms  6.12% 82.65%       60ms  6.12%  runtime.pthread_cond_signal
      50ms  5.10% 87.76%      200ms 20.41%  runtime.rawruneslice
      30ms  3.06% 90.82%      140ms 14.29%  runtime.mallocgc
      30ms  3.06% 93.88%       30ms  3.06%  runtime.nextFreeFast

输出内容说明

flat: 是指该函数执行耗时, 程序总耗时 980ms, runtime.stringtoslicerune 的 230ms 占了23.47%
sum: 当前函数与排在它上面的其他函数的 flat 占比总和, 比如 23.47% + 22.45% = 45.92%

cum: 是指该函数加上在该函数调用之前累计的总耗时,可通过前面提供的svg图片进行查看比较清晰

这上面的svg图片中runtime.stringtoslicerune耗时是最多的，自己定义的函数go-demos/simpletrie.(*Node).get和go-demos/simpletrie.(*Trie).Has两个函数耗时较长。

cpu-stringtoslicerune

如上图说明：比如simpletrie.Has方法自身消耗 70ms，占总抽样时间840ms的8.33%；其引用simpletrie.get和runtime.stringtoslicerune共计消耗时间：450ms(200ms + 250ms)分别占总抽样时间的23.81%和29.76%。以此类推其他函数调用的总耗时以及自身耗时。

(5) 使用list根据指定的参数来输出指定函数相关数据：

(pprof) list .get  # 查看get函数                   
Total: 980ms       # 总共执行时长
ROUTINE ======================== go-demos/simpletrie.(*Node).get in 
~/gospace/src/go-demos/simpletrie/simple_trie.go  # 函数位置
     220ms      220ms (flat, cum) 22.45% of Total  
         .          .     18:
         .          .     19:  return child
         .          .     20:}
         .          .     21:
         .          .     22:func (n *Node) get(r rune) *Node { 
     100ms      100ms     23:  for _, child := range n.Children { # 循环耗时
      50ms       50ms     24:      if child.Char == r {  # 判断耗时
      70ms       70ms     25:          return child   # 返回结果耗时
         .          .     26:      }
         .          .     27:  }
         .          .     28:  return nil
         .          .     29:}
         .          .     30:

mem_profile

引入memory profile生成

# 引用
"runtime/pprof"

# 代码
memProfile, _ := os.Create("mem_profile")
pprof.WriteHeapProfile(memProfile)

测试用例

package simpletrie

import (
 "bufio"
 "log"
 "os"
 "runtime/pprof"
 "testing"
)

func BenchmarkTrieFindMem(b *testing.B) {
  // 定义memory profile
 memProfile, _ := os.Create("mem_profile")
 
 var trie1 = NewTrie()
 file, err := os.Open("./20k.txt")
 if err != nil {
     log.Fatal(err)
 }
 defer file.Close()

 scanner := bufio.NewScanner(file)

 for scanner.Scan() {
     trie1.Build(scanner.Text())
 }

 b.ResetTimer()

 for i := 0; i < b.N; i++ {
     trie1.Has("42082")
     trie1.Has("oops")
     trie1.Has("Supercalifragilisticexpialidocious")
 }
 // 收集memory信息 
 pprof.WriteHeapProfile(memProfile)
}

 生成mem_profile文件

 go test -bench=".*" -benchmem # 所有方法
 或
 go test -v -bench=BenchmarkTrieFindMem -benchmem #指定方法

(1) 进入pprof

go tool pprof mem_profile

输出如下内容：需要说明一下若是在多核环境，取样时间(Total samples)占比>100%属于正常现象。

Type: inuse_space
Time: Dec 31, 2019 at 5:39pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)

(2) 了解pprof提供的功能

(pprof) help # 查看对应的交互操作命令

效果如下：

memory-help

(3) 将mem_profile输出svg，查看每个方法调用消耗的时间：方块越大代表内存越大。

mem-profile

(4) 使用top查看最高消耗的运行点：默认是10条，可以使用topN来输出指定最高的N条信息

(pprof) top
Showing nodes accounting for 2.50MB, 100% of 2.50MB total
      flat  flat%   sum%        cum   cum%
    2.50MB   100%   100%     2.50MB   100%  go-demos/simpletrie.NewNode
         0     0%   100%     2.50MB   100%  go-demos/simpletrie.(*Node).insert
         0     0%   100%     2.50MB   100%  go-demos/simpletrie.(*Trie).Build
         0     0%   100%     2.50MB   100%  go-demos/simpletrie.BenchmarkTrie_Has
         0     0%   100%     2.50MB   100%  testing.(*B).run1.func1
         0     0%   100%     2.50MB   100%  testing.(*B).runN

输出内容说明

flat: 是指该函数执行耗时, 程序总内存 2.5M, simpletrie.NewNode 的2.5M 占了100%
sum: 当前函数与排在它上面的其他函数的 flat 占比总和, 比如 100% + 0% = 100%
cum: 是指该函数加上在该函数调用之前累计的总耗时,可通过前面提供的svg图片进行查看比较清晰。

mem-statitcs-svg

如上图说明：比如simpletrie.NewNode方法内存占用100%，占总抽样内存2.50M的100%。以此类推其他函数调用的总耗时以及自身耗时。

内存Type: inuse_space 是监控内存的默认选项, 还可以选

-alloc_space,

-inuse_objects,

-alloc_objects

inuse_space 是正在使用的内存大小,

alloc_space是从头到尾一共分配了的内存大小(包括已经回收了的),

后缀为 _objects 的是相应的对象数

(5) 使用list根据指定的参数来输出指定函数相关数据：

(pprof) list .NewNode  # 查看get函数                   
Total: 2.50MB  # 总内存
ROUTINE ======================== go-demos/simpletrie.NewNode in 
~/gospace/src/go-demos/simpletrie/simple_trie.go  
   2.50MB     2.50MB (flat, cum)   100% of Total  # 
        .          .      4:   Char     rune
        .          .      5:   Children []*Node
        .          .      6:}
        .          .      7:
        .          .      8:func NewNode(r rune) *Node {
   2.50MB     2.50MB      9:   return &Node{Char: r}
        .          .     10:}
        .          .     11:
        .          .     12:func (n *Node) insert(r rune) *Node {
        .          .     13:   child := n.get(r)
        .          .     14:   if child == nil {

三. http服务的pprof

在实际应用中，可能需要针对 http 服务进行监控，而Go 已对 pprof 做了一些封装在 net/http/pprof来支持对http服务pprof的支持。

1.测试用例

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
)

func main() {
    http.Handle("/", &SimpleHandler{})
    http.Handle("/hello", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("world"))
    }))
    http.HandleFunc("/ping", hello)
    http.ListenAndServe(":8080", nil)
}

func hello(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("pong"))
}

type SimpleHandler struct{}

func (m *SimpleHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("a Simple Handler ServeHTTP"))
}

2.测试脚本

wrk -c 200 -t 4 -d 3m http://localhost:8080/hello

输出结果如下：

Running 3m test @ http://localhost:8080/hello
  4 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.13ms    5.96ms 247.15ms   92.60%
    Req/Sec    16.44k     3.89k   44.20k    74.76%
  11760293 requests in 3.00m, 1.33GB read
  Socket errors: connect 0, read 2, write 0, timeout 0
Requests/sec:  65298.89
Transfer/sec:      7.54MB

接着使用go tool pprof获取对应的内容：每30s收集一次profile内容

# 注意下面交互命令转义符号 否则会导致命令不能执行
go tool pprof http://127.0.0.1:8080/debug/pprof/profile\?seconds\=30

结果如下：内容分析见前面

Type: cpu
Time: Jan 2, 2020 at 2:22pm (CST)
Duration: 30.14s, Total samples = 53.04s (176.01%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 50.81s, 95.80% of 53.04s total
Dropped 223 nodes (cum <= 0.27s)
Showing top 10 nodes out of 52
      flat  flat%   sum%        cum   cum%
    49.48s 93.29% 93.29%     49.51s 93.34%  syscall.syscall
     0.42s  0.79% 94.08%     25.10s 47.32%  bufio.(*Writer).Flush
     0.38s  0.72% 94.80%      0.38s  0.72%  runtime.pthread_cond_signal
     0.35s  0.66% 95.46%      0.35s  0.66%  runtime.kevent
     0.05s 0.094% 95.55%     25.45s 47.98%  net/http.(*conn).readRequest
     0.03s 0.057% 95.61%     25.27s 47.64%  net.(*conn).Read
     0.03s 0.057% 95.66%      0.29s  0.55%  runtime.gcAssistAlloc1
     0.03s 0.057% 95.72%      0.33s  0.62%  runtime.gentraceback
     0.02s 0.038% 95.76%      0.38s  0.72%  net/http.(*connReader).backgroundRead
     0.02s 0.038% 95.80%     25.27s 47.64%  net/http.readRequest
(pprof)

其他几个指标也差不多, heap, alloc…同上在此就不罗列了。

四. Flame Graph 火焰图

从 Go 1.11 开始, 火焰图被集成进入 Go 官方的 pprof 库.接下来看看go tool pprof的使用

go tool pprof -http=":8888" [binary] [profile]

1.使用实例

go tool pprof -http=":8888" cpu_profile
或
go tool pprof -http=":8888" mem_profile

访问：http://localhost:8888/ui/flamegraph

flame-graph

同样也支持top/list等操作，全部在web页面完成
也可以结合pprof web和http服务

go tool pprof -http=":8888"  http://127.0.0.1:8080/debug/pprof/profile\?seconds\=30

pprof-http-flame-graph

引用

http-pprof
runtime-pprof
profiling-go-with-pprof
how-i-investigated-memory-leaks-in-go-using-pprof-on-a-large-codebase