Prometheus学习系列(三十八)之报警配置

Alertmanager通过命令行标志和配置文件进行配置。 虽然命令行标志配置了不可变的系统参数,但配置文件定义了禁止规则,通知路由和通知接收器。

可视化编辑器可以帮助构建路由树。

要查看所有可用的命令行标志,请运行alertmanager -h

Alertmanager可以在运行时重新加载其配置。 如果新配置格式不正确,则不会应用更改并记录错误。 通过向进程发送SIGHUP或向/-/reload端点发送HTTP POST请求来触发配置重新加载。

一、配置文件

指定要加载的配置文件,使用--config.file标志

./alertmanager --config.file=simple.yml

该文件以YAML格式写入,由下面描述的方案定义。括号表示参数是可选的。对于非列表参数,该值设置为指定的默认值。

通用占位符定义如下:

  • :与正则表达式匹配的持续时间[0-9]+(ms|[smhdwy])
  • :与正则表达式匹配的字符串[a-zA-Z _][a-zA-Z0-9 _]*
  • :一串unicode字符
  • :当前工作目录中的有效路径
  • :一个可以取值为truefalse的布尔值
  • :常规字符串
  • :一个秘密的常规字符串,例如密码
  • :在使用前进行模板扩展的字符串
  • :在使用之前进行模板扩展的字符串,它是一个秘密
    其他占位符是单独指定的。

可以在此处找到有效的示例文件。

全局配置指定在所有其他配置上下文中有效的参数。它们还可用作其他配置节的默认值。

global:
  # ResolveTimeout is the time after which an alert is declared resolved
  # if it has not been updated.
  [ resolve_timeout:  | default = 5m ]

  # The default SMTP From header field.
  [ smtp_from:  ]
  # The default SMTP smarthost used for sending emails, including port number.
  # Port number usually is 25, or 587 for SMTP over TLS (sometimes referred to as STARTTLS).
  # Example: smtp.example.org:587
  [ smtp_smarthost:  ]
  # The default hostname to identify to the SMTP server.
  [ smtp_hello:  | default = "localhost" ]
  [ smtp_auth_username:  ]
  # SMTP Auth using LOGIN and PLAIN.
  [ smtp_auth_password:  ]
  # SMTP Auth using PLAIN.
  [ smtp_auth_identity:  ]
  # SMTP Auth using CRAM-MD5. 
  [ smtp_auth_secret:  ]
  # The default SMTP TLS requirement.
  [ smtp_require_tls:  | default = true ]

  # The API URL to use for Slack notifications.
  [ slack_api_url:  ]
  [ victorops_api_key:  ]
  [ victorops_api_url:  | default = "https://alert.victorops.com/integrations/generic/20131114/alert/" ]
  [ pagerduty_url:  | default = "https://events.pagerduty.com/v2/enqueue" ]
  [ opsgenie_api_key:  ]
  [ opsgenie_api_url:  | default = "https://api.opsgenie.com/" ]
  [ hipchat_api_url:  | default = "https://api.hipchat.com/" ]
  [ hipchat_auth_token:  ]
  [ wechat_api_url:  | default = "https://qyapi.weixin.qq.com/cgi-bin/" ]
  [ wechat_api_secret:  ]
  [ wechat_api_corp_id:  ]

  # The default HTTP client configuration
  [ http_config:  ]

# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
templates:
  [ -  ... ]

# The root node of the routing tree.
route: 

# A list of notification receivers.
receivers:
  -  ...

# A list of inhibition rules.
inhibit_rules:
  [ -  ... ]
二、

路由块定义路由树中的节点及其子节点。 如果未设置,则其可选配置参数将从其父节点继承。

每个警报都在配置的顶级路由中进入路由树,该路由必须匹配所有警报(即没有任何已配置的匹配器)。 然后它遍历子节点。 如果将continue设置为false,则在第一个匹配的子项后停止。 如果匹配节点上的continue为true,则警报将继续与后续兄弟节点匹配。 如果警报与节点的任何子节点都不匹配(没有匹配的子节点,或者不存在),则根据当前节点的配置参数处理警报。

[ receiver:  ]
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
#
# To aggregate by all possible labels use the special value '...' as the sole label name, for example:
# group_by: ['...'] 
# This effectively disables aggregation entirely, passing through all 
# alerts as-is. This is unlikely to be what you want, unless you have 
# a very low alert volume or your upstream notification system performs 
# its own grouping.
[ group_by: '[' , ... ']' ]

# Whether an alert should continue matching subsequent sibling nodes.
[ continue:  | default = false ]

# A set of equality matchers an alert has to fulfill to match the node.
match:
  [ : , ... ]

# A set of regex-matchers an alert has to fulfill to match the node.
match_re:
  [ : , ... ]

# How long to initially wait to send a notification for a group
# of alerts. Allows to wait for an inhibiting alert to arrive or collect
# more initial alerts for the same group. (Usually ~0s to few minutes.)
[ group_wait:  | default = 30s ]

# How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more.)
[ group_interval:  | default = 5m ]

# How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more).
[ repeat_interval:  | default = 4h ]

# Zero or more child routes.
routes:
  [ -  ... ]

例子:

# The root route with all parameters, which are inherited by the child
# routes if they are not overwritten.
route:
  receiver: 'default-receiver'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  group_by: [cluster, alertname]
  # All alerts that do not match the following child routes
  # will remain at the root node and be dispatched to 'default-receiver'.
  routes:
  # All alerts with service=mysql or service=cassandra
  # are dispatched to the database pager.
  - receiver: 'database-pager'
    group_wait: 10s
    match_re:
      service: mysql|cassandra
  # All alerts with the team=frontend label match this sub-route.
  # They are grouped by product and environment rather than cluster
  # and alertname.
  - receiver: 'frontend-pager'
    group_by: [product, environment]
    match:
      team: frontend
三、

当存在与另一组匹配器匹配的警报(源)时,禁止规则将匹配一组匹配器的警报(目标)静音。 目标和源警报必须具有相同列表中标签名称的equal标签值。

从语义上讲,缺少标签和具有空值的标签是equal的。 因此,如果源和目标警报中都缺少所有相同的标签名称,则禁用规则将适用。

为了防止警报抑制自身,禁止规则将永远不会禁止与规则的目标和源侧匹配的警报。 但是,我们建议以警报永远不会匹配双方的方式选择目标和源匹配器。 理由更容易,并且不会触发这种特殊情况。

# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
  [ : , ... ]
target_match_re:
  [ : , ... ]

# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
  [ : , ... ]
source_match_re:
  [ : , ... ]

# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: '[' , ... ']' ]
四、

http_config允许配置接收器用于与基于HTTP的API服务通信的HTTP客户端。

# Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are
# mutually exclusive.

# Sets the `Authorization` header with the configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username:  ]
  [ password:  ]
  [ password_file:  ]

# Sets the `Authorization` header with the configured bearer token.
[ bearer_token:  ]

# Sets the `Authorization` header with the bearer token read from the configured file.
[ bearer_token_file:  ]

# Configures the TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]
五、

tls_config允许配置TLS连接。

# CA certificate to validate the server certificate with.
[ ca_file:  ]

# Certificate and key files for client cert authentication to the server.
[ cert_file:  ]
[ key_file:  ]

# ServerName extension to indicate the name of the server.
# http://tools.ietf.org/html/rfc4366#section-3.1
[ server_name:  ]

# Disable validation of the server certificate.
[ insecure_skip_verify:  | default = false]
六、

Receiver是一个或多个通知集成的命名配置。

我们没有主动添加新的接收器,我们建议通过webhook接收器实现自定义通知集成。

# The unique name of the receiver.
name: 

# Configurations for several notification integrations.
email_configs:
  [ - , ... ]
hipchat_configs:
  [ - , ... ]
pagerduty_configs:
  [ - , ... ]
pushover_configs:
  [ - , ... ]
slack_configs:
  [ - , ... ]
opsgenie_configs:
  [ - , ... ]
webhook_configs:
  [ - , ... ]
victorops_configs:
  [ - , ... ]
wechat_configs:
  [ - , ... ]
七、
# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = false ]

# The email address to send notifications to.
to: 

# The sender address.
[ from:  | default = global.smtp_from ]

# The SMTP host through which emails are sent.
[ smarthost:  | default = global.smtp_smarthost ]

# The hostname to identify to the SMTP server.
[ hello:  | default = global.smtp_hello ]

# SMTP authentication information.
[ auth_username:  | default = global.smtp_auth_username ]
[ auth_password:  | default = global.smtp_auth_password ]
[ auth_secret:  | default = global.smtp_auth_secret ]
[ auth_identity:  | default = global.smtp_auth_identity ]

# The SMTP TLS requirement.
[ require_tls:  | default = global.smtp_require_tls ]

# TLS configuration.
tls_config:
  [  ]

# The HTML body of the email notification.
[ html:  | default = '{{ template "email.default.html" . }}' ]
# The text body of the email notification.
[ text:  ]

# Further headers email header key/value pairs. Overrides any headers
# previously set by the notification implementation.
[ headers: { : , ... } ]
八、

HipChat通知使用Build Your Own集成。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = false ]

# The HipChat Room ID.
room_id: 
# The auth token.
[ auth_token:  | default = global.hipchat_auth_token ]
# The URL to send API requests to.
[ api_url:  | default = global.hipchat_api_url ]

# See https://www.hipchat.com/docs/apiv2/method/send_room_notification
# A label to be shown in addition to the sender's name.
[ from:   | default = '{{ template "hipchat.default.from" . }}' ]
# The message body.
[ message:   | default = '{{ template "hipchat.default.message" . }}' ]
# Whether this message should trigger a user notification.
[ notify:   | default = false ]
# Determines how the message is treated by the alertmanager and rendered inside HipChat. Valid values are 'text' and 'html'.
[ message_format:   | default = 'text' ]
# Background color for message.
[ color:   | default = '{{ if eq .Status "firing" }}red{{ else }}green{{ end }}' ]

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]
九、

PagerDuty通知通过PagerDuty API发送。 PagerDuty提供了有关如何在此集成的文档。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = true ]

# The following two options are mutually exclusive.
# The PagerDuty integration key (when using PagerDuty integration type `Events API v2`).
routing_key: 
# The PagerDuty integration key (when using PagerDuty integration type `Prometheus`).
service_key: 

# The URL to send API requests to
[ url:  | default = global.pagerduty_url ]

# The client identification of the Alertmanager.
[ client:   | default = '{{ template "pagerduty.default.client" . }}' ]
# A backlink to the sender of the notification.
[ client_url:   | default = '{{ template "pagerduty.default.clientURL" . }}' ]

# A description of the incident.
[ description:  | default = '{{ template "pagerduty.default.description" .}}' ]

# Severity of the incident.
[ severity:  | default = 'error' ]

# A set of arbitrary key/value pairs that provide further detail
# about the incident.
[ details: { : , ... } | default = {
  firing:       '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
  resolved:     '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
  num_firing:   '{{ .Alerts.Firing | len }}'
  num_resolved: '{{ .Alerts.Resolved | len }}'
} ]

# Images to attach to the incident.
images:
  [  ... ]

# Links to attach to the incident.
links:
  [  ... ]

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]
9.1

这些字段记录在PagerDuty API文档中。

source: 
alt: 
text: 
9.2

这些字段记录在PagerDuty API文档中。

href: 
text: 
十、

推送通知通过Pushover API发送。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = true ]

# The recipient user’s user key.
user_key: 

# Your registered application’s API token, see https://pushover.net/apps
token: 

# Notification title.
[ title:  | default = '{{ template "pushover.default.title" . }}' ]

# Notification message.
[ message:  | default = '{{ template "pushover.default.message" . }}' ]

# A supplementary URL shown alongside the message.
[ url:  | default = '{{ template "pushover.default.url" . }}' ]

# Priority, see https://pushover.net/api#priority
[ priority:  | default = '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}' ]

# How often the Pushover servers will send the same notification to the user.
# Must be at least 30 seconds.
[ retry:  | default = 1m ]

# How long your notification will continue to be retried for, unless the user
# acknowledges the notification.
[ expire:  | default = 1h ]

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]
十一、

Slack通知通过Slack webhooks发送。 通知包含附件。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = false ]

# The Slack webhook URL.
[ api_url:  | default = global.slack_api_url ]

# The channel or user to send notifications to.
channel: 

# API request data as defined by the Slack webhook API.
[ icon_emoji:  ]
[ icon_url:  ]
[ link_names:  | default = false ]
[ username:  | default = '{{ template "slack.default.username" . }}' ]
# The following parameters define the attachment.
actions:
  [  ... ]
[ callback_id:  | default = '{{ template "slack.default.callbackid" . }}' ]
[ color:  | default = '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ]
[ fallback:  | default = '{{ template "slack.default.fallback" . }}' ]
fields:
  [  ... ]
[ footer:  | default = '{{ template "slack.default.footer" . }}' ]
[ pretext:  | default = '{{ template "slack.default.pretext" . }}' ]
[ short_fields:  | default = false ]
[ text:  | default = '{{ template "slack.default.text" . }}' ]
[ title:  | default = '{{ template "slack.default.title" . }}' ]
[ title_link:  | default = '{{ template "slack.default.titlelink" . }}' ]
[ image_url:  ]
[ thumb_url:  ]

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]
11.1

这些字段记录在Slack API文档中。

type: 
text: 
url: 
[ style:  [ default = '' ]
11.2

这些字段记录在Slack API文档中。

title: 
value: 
[ short:  | default = slack_config.short_fields ]
十二、

OpsGenie通知通过OpsGenie API发送。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = true ]

# The API key to use when talking to the OpsGenie API.
[ api_key:  | default = global.opsgenie_api_key ]

# The host to send OpsGenie API requests to.
[ api_url:  | default = global.opsgenie_api_url ]

# Alert text limited to 130 characters.
[ message:  ]

# A description of the incident.
[ description:  | default = '{{ template "opsgenie.default.description" . }}' ]

# A backlink to the sender of the notification.
[ source:  | default = '{{ template "opsgenie.default.source" . }}' ]

# A set of arbitrary key/value pairs that provide further detail
# about the incident.
[ details: { : , ... } ]

# Comma separated list of team responsible for notifications.
[ teams:  ]

# Comma separated list of tags attached to the notifications.
[ tags:  ]

# Additional alert note.
[ note:  ]

# Priority level of alert. Possible values are P1, P2, P3, P4, and P5.
[ priority:  ]

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]
十三、

VictorOps通知通过VictorOps API发送出去

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = true ]

# The API key to use when talking to the VictorOps API.
[ api_key:  | default = global.victorops_api_key ]

# The VictorOps API URL.
[ api_url:  | default = global.victorops_api_url ]

# A key used to map the alert to a team.
routing_key: 

# Describes the behavior of the alert (CRITICAL, WARNING, INFO).
[ message_type:  | default = 'CRITICAL' ]

# Contains summary of the alerted problem.
[ entity_display_name:  | default = '{{ template "victorops.default.entity_display_name" . }}' ]

# Contains long explanation of the alerted problem.
[ state_message:  | default = '{{ template "victorops.default.state_message" . }}' ]

# The monitoring tool the state message is from.
[ monitoring_tool:  | default = '{{ template "victorops.default.monitoring_tool" . }}' ]

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]
十四、

webhook接收器允许配置通用接收器。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = true ]

# The endpoint to send HTTP POST requests to.
url: 

# The HTTP client's configuration.
[ http_config:  | default = global.http_config ]

Alertmanager将以下列JSON格式将HTTP POST请求发送到配置的端点:

{
  "version": "4",
  "groupKey": ,    // key identifying the group of alerts (e.g. to deduplicate)
  "status": "",
  "receiver": ,
  "groupLabels": ,
  "commonLabels": ,
  "commonAnnotations": ,
  "externalURL": ,  // backlink to the Alertmanager.
  "alerts": [
    {
      "status": "",
      "labels": ,
      "annotations": ,
      "startsAt": "",
      "endsAt": "",
      "generatorURL":  // identifies the entity that caused the alert
    },
    ...
  ]
}
 
  

有一个与此功能集成的列表。

十五、

微信通知通过微信API发送。

# Whether or not to notify about resolved alerts.
[ send_resolved:  | default = false ]

# The API key to use when talking to the WeChat API.
[ api_secret:  | default = global.wechat_api_secret ]

# The WeChat API URL.
[ api_url:  | default = global.wechat_api_url ]

# The corp id for authentication.
[ corp_id:  | default = global.wechat_api_corp_id ]

# API request data as defined by the WeChat API.
[ message:  | default = '{{ template "wechat.default.message" . }}' ]
[ agent_id:  | default = '{{ template "wechat.default.agent_id" . }}' ]
[ to_user:  | default = '{{ template "wechat.default.to_user" . }}' ]
[ to_party:  | default = '{{ template "wechat.default.to_party" . }}' ]
[ to_tag:  | default = '{{ template "wechat.default.to_tag" . }}' ]``

十六、

Prometheus官网地址:https://prometheus.io/
我的Github:https://github.com/Alrights/prometheus

你可能感兴趣的:(Prometheus)