原文:https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
Prometheus支持两种类型的规则,可以对其进行配置,然后定期进行评估:记录规则和警报规则。 要将规则包含在Prometheus中,请创建一个包含必要规则语句的文件,并使Prometheus通过Prometheus配置中的rule_files字段加载该文件。 规则文件使用YAML。
通过将SIGHUP发送到Prometheus进程,可以在运行时重新加载规则文件。 仅当所有规则文件格式正确时,才会应用更改。
要在不启动Prometheus服务器的情况下快速检查规则文件在语法上是否正确,请安装并运行Prometheus的promtool命令行实用工具:
go get github.com/prometheus/prometheus/cmd/promtool
promtool check rules /path/to/example.rules.yml
当该文件在语法上有效时,检查器将已解析规则的文本表示形式打印到标准输出,然后以0返回状态退出。
如果存在任何语法错误或无效的输入参数,它将打印一条错误消息为标准错误,并以1返回状态退出。
记录规则使您可以预先计算经常需要或计算量大的表达式,并将其结果保存为一组新的时间序列。 这样,查询预先计算的结果通常比每次需要原始表达式都要快得多。 这对于仪表板特别有用,仪表板每次刷新时都需要重复查询相同的表达式。
记录和警报规则存在于规则组中。 组中的规则以规则的时间间隔顺序运行。
规则文件的语法为:
groups:
[ - ]
一个简单的示例规则文件:
groups:
- name: example
rules:
- record: job:http_inprogress_requests:sum
expr: sum(http_inprogress_requests) by (job)
# The name of the group. Must be unique within a file.
name:
# How often rules in the group are evaluated.
[ interval: | default = global.evaluation_interval ]
rules:
[ - ... ]
记录规则的语法为:
# The name of the time series to output to. Must be a valid metric name.
record:
# The PromQL expression to evaluate. Every evaluation cycle this is evaluated at the current time, and the result recorded as a new set of time series with the metric name as given by 'record'.
# 要评估的PromQL表达式。 每个评估周期都会在当前时间进行评估,并将结果记录为一组新的时间序列,其度量标准名称由"record"给出。
expr:
# Labels to add or overwrite before storing the result.
labels:
[ : ]
警报规则的语法为:
# The name of the alert. Must be a valid metric name.
alert:
# The PromQL expression to evaluate. Every evaluation cycle this is evaluated at the current time, and all resultant time series become pending/firing alerts.
# 要评估的PromQL表达式。 每个评估周期都会在当前时间进行评估,所有产生的时间序列都会变为待处理/触发警报。
expr:
# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
[ for: | default = 0s ]
# Labels to add or overwrite for each alert.
labels:
[ : ]
# Annotations to add to each alert.
annotations:
[ : ]