上次博客《Prometheus - SSL 证书过期监控》已经配置了 Grafana 如何展示 SSL 过期监控面板,本次接着将告警
功能加上,这才是我们的最终目的。
1、先确定好 Prometheus 的规则文件路径
2、编写告警规则
vim /home/data/prometheus/rules/ssl_cert_alerts.yml
groups:
- name: "SSL证书过期提醒"
rules:
- alert: "证书过期时间<30天"
expr: probe_ssl_earliest_cert_expiry{job="SSL证书时间"} - time() < 86400 * 30
for: 0s
labels:
severity: "提示"
annotations:
summary: "{{ $labels.instance }} SSL 证书将在30天后过期,请注意及时续费!"
description: "{{ $labels.instance }} SSL 证书将在30天后过期,请注意及时续费!"
- alert: "证书过期时间<7天"
expr: probe_ssl_earliest_cert_expiry{job="SSL证书时间"} - time() < 86400 * 7
for: 0s
labels:
severity: "告警"
annotations:
summary: "{{ $labels.instance }} SSL 证书将在7天后过期,请注意及时续费!"
description: "{{ $labels.instance }} SSL 证书将在7天后过期,请注意及时续费!"
- alert: "证书过期时间<1天"
expr: probe_ssl_earliest_cert_expiry{job="SSL证书时间"} - time() < 86400 * 1
for: 0s
labels:
severity: "灾难"
annotations:
summary: "{{ $labels.instance }} SSL 证书将在1天后过期,请注意及时续费!"
description: "{{ $labels.instance }} SSL 证书将在1天后过期,请注意及时续费!"
3、重启 Prometheus
docker restart prometheus
1、修改配置文件
vim /home/data/alertmanager/conf/config.yml
global:
resolve_timeout: 5m
route:
group_wait: 0s
group_interval: 5s
repeat_interval: 1m
group_by: ['instance']
receiver: 'web.hook.prometheusalert'
receivers:
- name: 'web.hook.prometheusalert'
webhook_configs:
- url: 'http://YourDingTalk_IP:8060/dingtalk/webhook1/send'
2、重启 Alertmanager
docker restart alertmanager
1、配置文件
vim /home/data/dingtalk/conf/config.yml
templates:
- /etc/prometheus-webhook-dingtalk/templates/default.tmpl
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=8cf8d025f***a4537b22
secret: SECb***95fbab
mention:
all: true
2、模板文件
vim /home/data/dingtalk/templates/default.tmpl
# 注意:这里的templates路径为什么与上面的templates路径不对应,那是因为我是用容器起的DingTalk,取的是容器内部路径
...
...
{{/* Firing */}}
{{ define "default.__text_alert_list" }}{{ range . }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**摘要:** {{ .Annotations.summary }}
**描述:** {{ .Annotations.description }}
**监控:** [grafana](http://grafana_ip:8000/grafana/d/GuJ5DHMnz/fu-wu-qi-jian-kong-tu-biao?orgId=1)
**详情:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}{{ end }}
{{/* Resolved */}}
{{ define "default.__text_resolved_list" }}{{ range . }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**解除时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
**摘要:** {{ .Annotations.summary }}
**监控:** [grafana](http://grafana_ip:8000/grafana/d/GuJ5DHMnz/fu-wu-qi-jian-kong-tu-biao?orgId=1)
**详情:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}{{ end }}
...
...
3、重启 DingTalk
docker restart dingtalk
1、钉钉告警通知
2、钉钉解除告警通知
整体来说都比较简单,重点是要理清楚整个过程链,配置过程中仔细点即可,接下来会继续剖析告警的原理/告警的时机。