第04期：Prometheus 数据采集（三）

本期作者：罗韦

爱可生上海研发中心成员，研发工程师，主要负责 DMP 平台监控告警功能的相关工作。

Prometheus 的监控对象各式各样，没有统一标准。为了解决这个问题，Prometheus 制定了一套监控规范，符合这个规范的样本数据可以被 Prometheus 采集并解析样本数据。Exporter 在 Prometheus 监控系统中是一个采集监控数据并通过 Prometheus 监控规范对外提供数据的组件，针对不同的监控对象可以实现不同的 Exporter，这样就解决了监控对象标准不一的问题。从广义上说，所有可以向 Prometheus 提供监控样本数据的程序都可以称为 Exporter，Exporter 的实例也就是我们上期所说的"target"。

Exporter 的运行方式

Exporter 有两种运行方式

集成到应用中

使用 Prometheus 提供的 Client Library，可以很方便地在应用程序中实现监控代码，这种方式可以将程序内部的运行状态暴露给 Prometheus，适用于需要较多自定义监控指标的项目。目前一些开源项目就增加了对 Prometheus 监控的原生支持，如 Kubernetes，ETCD 等。
独立运行

在很多情况下，对象没法直接提供监控接口，可能原因有：

1. 项目发布时间较早，并不支持 Prometheus 监控接口，如 MySQL、Redis；

2. 监控对象不能直接提供 HTTP 接口，如监控 Linux 系统状态指标。

对于上述情况，用户可以选择使用独立运行的 Exporter。除了用户自行实现外，Prometheus 社区也提供了许多独立运行的 Exporter，常见的有 Node Exporter、MySQL Server Exporter。更多详情可以到官网了解：https://prometheus.io/docs/instrumenting/exporters/

Exporter 接口数据规范

Exporter 通过 HTTP 接口以文本形式向 Prometheus 暴露样本数据，格式简单，没有嵌套，可读性强。每个监控指标对应的数据文本格式如下：

# HELP <监控指标名称> <监控指标描述>
# TYPE <监控指标名称> <监控指标类型>
<监控指标名称>{ <标签名称>=<标签值>,<标签名称>=<标签值>...} <样本值1> <时间戳>
<监控指标名称>{ <标签名称>=<标签值>,<标签名称>=<标签值>...} <样本值2> <时间戳>
...

以 # 开头的行，如果后面跟着"HELP"，Prometheus 将这行解析为监控指标的描述，通常用于描述监控数据的来源；
以 # 开头的行，如果后面跟着"TYPE"，Prometheus 将这行解析为监控指标的类型，支持的类型有：Counter、Gauge、Histogram、Summary、Untyped。在往期文章中介绍过 Prometheus 在存储数据时是不区分数据类型的，所以当你在犹豫一个数据类型应该用 Counter 或 Gauge 时，可以试试 Untype；
以 # 开头的行，如果后面没有跟着"HELP"或"TYPE"，则 Prometheus 将这行视为注释，解析时忽略；
如果一个监控指标有多条样本数据，那么每条样本数据的标签值组合应该是唯一的；
每行数据对应一条样本数据；
时间戳应为采集数据的时间，是可选项，如果 Exporter 没有提供时间戳的话，Prometheus Server 会在拉取到样本数据时将时间戳设置为当前时间；
Summary 和 Histogram 类型的监控指标要求提供两行数据分别表示该监控指标所有样本的和、样本数量，命名格式为：<监控指标名称>_sum、<监控指标名称>_count；
Summary 类型的样本数据格式：1. 根据 Exporter 提供的分位点，样本会被计算后拆分成多行数据，每行使用标签"quantile"区分，"quantile"的值包括 Exporter 提供的所有分位点。2. 数据的排列顺序必须是按照标签"quantile"值递增；3. 举个栗子：一个名为x的监控指标，提供的分位点为：0.5, 0.9, 0.99，那么它暴露给 Prometheus 的接口数据格式如下：

# HELP x balabala
# TYPE x summary
x{quantile="0.5"} value1
x{quantile="0.9"} value2
x{quantile="0.99"} value3
x_sum sum(values)
x_count count(values)

Histogram 类型的样本数据格式：
1. 根据 Exporter 提供的 Bucket 值，样本会被计算后拆分成多行数据，每行使用标签"le"区分，"le"为 Exporter 提供的 Buckets；
2. 数据的排列顺序必须是按照标签"le"值递增；
3. 必须要有一行数据的标签 le="+Inf"，值为该监控指标的样本总数；
4. 举个栗子：一个名为 x 的监控指标，提供的 Buckets 为：20, 50, 70，那么它暴露给 Prometheus 的接口数据格式如下：

# HELP x The temperature of cpu
# TYPE x histogram
x_bucket{le="20"} value1
x_bucket{le="50"} value2
x_bucket{le="70"} value3
x_bucket{le="+Inf"} count(values)
x_sum sum(values)
x_count count(values)

这样的文本格式也有不足之处：
1. 文本内容可能过于冗长；
2. Prometheus 在解析时不能校验 HELP 和 TYPE 字段是否缺失，如果缺失 HELP 字段，这条样本数据的来源可能就难以判断；如果缺失 TYPE 字段，Prometheus 对这条样本数据的类型就无从得知；
3. 相比于 protobuf，Prometheus 使用的文本格式没有做任何压缩处理，解析成本较高。

MySQL Server Exporter

针对被广泛使用的关系型数据库 MySQL，Prometheus 官方提供了 MySQL Server Exporter，支持 MySQL 5.6 及以上版本，对于 5.6 以下的版本，部分监控指标可能不支持。
MySQL Server Exporter 监控的信息包括了常用的 global status/variables 信息、schema/table 的统计信息、user 统计信息、innodb 的信息以及主从复制、组复制的信息，监控指标比较全面。但是由于它提供的监控指标中缺少对 MySQL 实例的标识，所以当一台主机上存在多个 MySQL 实例，需要运行多个 MySQL Server Exporter 进行监控时，就会难以区分实例信息。具体使用方式可参考：https://github.com/prometheus/mysqld_exporter

Node Exporter

Prometheus 官方的 Node Exporter 提供对 *NIX 系统、硬件信息的监控，监控指标包括 CPU 使用率/配置、系统平均负载、内存信息、网络状况、文件系统信息统计、磁盘使用情况统计等。对于不同的系统，监控指标会有所差异，如 diskstats 支持 Darwin, Linux, OpenBSD 系统；loadavg 支持 Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris 系统。Node Exporter 的监控指标没有对主机身份的标识，可以通过 relabel 功能在 Prometheus Server 端增加一些标识标签。具体使用方式可参考：https://github.com/prometheus/node_exporter

如何实现一个 Exporter

编写一个简单的 Exporter

使用 prometheus/client_golang 包，我们来编写一个简单的 Exporter，包括 Prometheus 支持的四种监控指标类型

package main

import (
"log"
"net/http"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
//使用GaugeVec类型可以为监控指标设置标签，这里为监控指标增加一个标签"device"
speed = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Name: "disk_available_bytes",
Help: "Disk space available in bytes",
}, []string{"device"})

tasksTotal = prometheus.NewCounter(prometheus.CounterOpts{
Name: "test_tasks_total",
Help: "Total number of test tasks",
})

taskDuration = prometheus.NewSummary(prometheus.SummaryOpts{
Name: "task_duration_seconds",
Help: "Duration of task in seconds",
//Summary类型的监控指标需要提供分位点
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
})

cpuTemperature = prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "cpu_temperature",
Help: "The temperature of cpu",
//Histogram类型的监控指标需要提供Bucket
Buckets: []float64{20, 50, 70, 80},
})
)

func init() {
//注册监控指标
prometheus.MustRegister(speed)
prometheus.MustRegister(tasksTotal)
prometheus.MustRegister(taskDuration)
prometheus.MustRegister(cpuTemperature)
}

func main() {
//模拟采集监控数据
fakeData()

//使用prometheus提供的promhttp.Handler()暴露监控样本数据
//prometheus默认从"/metrics"接口拉取监控样本数据
http.Handle("/metrics", promhttp.Handler())
log.Fatal(http.ListenAndServe(":10000", nil))
}

func fakeData() {
tasksTotal.Inc()
//设置该条样本数据的"device"标签值为"/dev/sda"
speed.With(prometheus.Labels{"device": "/dev/sda"}).Set(82115880)

taskDuration.Observe(10)
taskDuration.Observe(20)
taskDuration.Observe(30)
taskDuration.Observe(45)
taskDuration.Observe(56)
taskDuration.Observe(80)

cpuTemperature.Observe(30)
cpuTemperature.Observe(43)
cpuTemperature.Observe(56)
cpuTemperature.Observe(58)
cpuTemperature.Observe(65)
cpuTemperature.Observe(70)
}

接下来编译、运行我们的 Exporter

GOOS=linux GOARCH=amd64 go build -o my_exporter main.go
./my_exporter &

Exporter 运行起来之后，还要在 Prometheus 的配置文件中加入 Exporter 信息，Prometheus 才能从 Exporter 拉取数据。

static_configs:
- targets: ['localhost:9090','172.17.0.3:10000']

在 Prometheus 的 targets 页面可以看到刚才新增的 Exporter 了

untitled.png

访问"/metrics"接口可以找到如下数据：

Gauge
因为我们使用了 GaugeVec，所以产生了带标签的样本数据

# HELP disk_available_bytes disk space available in bytes
# TYPE disk_available_bytes gauge
disk_available_bytes{device="/dev/sda"} 8.211588e+07

Counter

# HELP test_tasks_total total number of test tasks
# TYPE test_tasks_total counter
test_tasks_total 1

Summary

# HELP task_duration_seconds Duration of task in seconds
# TYPE task_duration_seconds summary
task_duration_seconds{quantile="0.5"} 30
task_duration_seconds{quantile="0.9"} 80
task_duration_seconds{quantile="0.99"} 80
task_duration_seconds_sum 241
task_duration_seconds_count 6

Histogram

# HELP cpu_temperature The temperature of cpu
# TYPE cpu_temperature histogram
cpu_temperature_bucket{le="20"} 0
cpu_temperature_bucket{le="50"} 2
cpu_temperature_bucket{le="70"} 6
cpu_temperature_bucket{le="80"} 6
cpu_temperature_bucket{le="+Inf"} 6
cpu_temperature_sum 322
cpu_temperature_count 6

Exporter实现方式的考量

上面的栗子中，我们在程序一开始就初始化所有的监控指标，这种方案通常接下来会开启一个采样协程去定期采集、更新监控指标的样本数据，最新的样本数据将一直保留在内存中，在接到 Prometheus Server 的请求时，返回内存里的样本数据。这个方案的优点在于，易于控制采样频率；不用担心并发采样可能带来的资源抢占问题。不足之处有：

1. 由于样本数据不会被自动清理，当某个已被采样的采集对象失效了，Prometheus Server 依然能拉取到它的样本数据，只是这个数据从监控对象失效时就已经不会再被更新。这就需要 Exporter 自己提供一个对无效监控对象的数据清理机制；
2. 由于响应 Prometheus Server 的请求是从内存里取数据，如果 Exporter 的采样协程异常卡住，Prometheus Server 也无法感知，拉取到的数据可能是过期数据；
3. Prometheus Server 拉取的数据不是即时采样的，对于某时间点的数据一致性不能保证。

另一种方案是 MySQL Server Exporter 和 Node Exporter 采用的，也是 Prometheus 官方推荐的方案。该方案是在每次接到 Prometheus Server 的请求时，初始化新的监控指标，开启一个采样协程。和方案一不同的是，这些监控指标只在请求期间存活。然后采样协程会去采集所有样本数据并返回给 Prometheus Server。相比于方案一，方案二的数据是即时拉取的，可以保证时间点的数据一致性；因为监控指标会在每次请求时重新初始化，所以也不会存在失效的样本数据。不过方案二同样有不足之处：

1. 当多个拉取请求同时发生时，需要控制并发采集样本的资源消耗；

2. 当多个拉取请求同时发生时，在短时间内需要对同一个监控指标读取多次，对于一个变化频率较低的监控指标来说，多次读取意义不大，却增加了对资源的占用。

相关内容方面的知识，大家还有什么疑问或者想知道的吗？赶紧留言告诉小编吧！