目录
前言
1. 安装monitor
2. 安装prometheus-webhook-dingtalk
2.1 配置钉钉告警配置文件
2.2 创建钉钉告警模板
2.3 创建dingtalk configmap配置文件
2.4 安装dingtalk
2.5 dingtalk调用方法
3. 配置告警
3.1 配置告警接收者
3.2 配置路由默认告警接收者
4. 测试
参考RancherLabs 文章: ( Rancher2.6全新Monitoring快速入门_RancherLabs的博客-CSDN博客
对rancher monitor使用过程中的配置讲解补充,补充了alertmanger发送钉钉告警的实现过程,通过安装dingtalk插件完成钉钉机器人告警推送,详细如下
集群-集群工具Monitoring
根据实际需求修改部署要求:如持久化,数据保存时长等
安装完毕后可在: 集群-监控,查看仪表盘
分别可查看alertmanger、grafana和prometheus web界面
注意:先创建钉钉机器人,安全验证配置secret,用于下面的配置文件中
创建dingtalk config.yml
# url和secret改为自己的机器人webhook地址和secret
templates:
- /etc/prometheus-webhook-dingtalk/dingding.tmpl
targets:
k8s:
url: https://oapi.dingtalk.com/robot/send?access_token="你的钉钉webhook"
secret: "你的钉钉webhook secret"
message:
title: '{{ template "ops.title" . }}'
text: '{{ template "ops.content" . }}'
dingding.tmpl
{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}
{{ define "__alert_list" }}{{ range . }}
---
[告警程序]: gz-test-k8s
[告警级别]: {{ index .Labels.severity }}
[告警类型]: {{ index .Labels.alertname }}
[故障实例]: {{ index .Labels.instance }}
[告警主题]: {{ index .Annotations.summary }}
[告警详情]: {{ index .Annotations.description }}
[触发时间]: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
{{ end }}{{ end }}
{{ define "__resolved_list" }}{{ range . }}
---
[告警程序]: gz-test-k8s
[告警级别]: {{ index .Labels.severity }}
[告警类型]: {{ index .Labels.alertname }}
[故障实例]: {{ index .Labels.instance }}
[告警详情]: {{ index .Annotations.description }}
[状态]: 恢复正常
[触发时间]: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
[恢复时间]: {{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
{{ end }}{{ end }}
{{ define "ops.title" }}
{{ template "__subject" . }}
{{ end }}
{{ define "ops.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**侦测到{{ .Alerts.Firing | len }}个故障**
{{ template "__alert_list" .Alerts.Firing }}
---
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**恢复{{ .Alerts.Resolved | len }}个故障**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}
{{ define "ops.link.title" }}{{ template "ops.title" . }}{{ end }}
{{ define "ops.link.content" }}{{ template "ops.content" . }}{{ end }}
{{ template "ops.title" . }}
{{ template "ops.content" . }}
将以上的配置文件保存为文件,通过kubectl创建configmap
$ kubectl create configmap dingtalk-cm --from-file=config.yml=config.yml --from-file=dingding.tmpl=dingding.tmpl -n cattle-monitoring-system
$ kubectl get cm dingtalk-cm -n cattle-monitoring-system
NAME DATA AGE
dingtalk-cm 2 87m
部署文件 dingtalk.yaml
apiVersion: v1
kind: Service
metadata:
name: dingtalk
namespace: cattle-monitoring-system
spec:
selector:
app: dingtalk
ports:
- name: http
protocol: TCP
port: 8060
targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dingtalk
namespace: cattle-monitoring-system
labels:
app: dingtalk
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
selector:
matchLabels:
app: dingtalk
template:
metadata:
labels:
app: dingtalk
spec:
restartPolicy: "Always"
containers:
- name: dingtalk
image: timonwong/prometheus-webhook-dingtalk
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: dingtalk-conf
mountPath: /etc/prometheus-webhook-dingtalk/
resources:
limits:
cpu: "400m"
memory: "500Mi"
requests:
cpu: "100m"
memory: "100Mi"
ports:
- containerPort: 8060
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
periodSeconds: 5
initialDelaySeconds: 30
successThreshold: 1
tcpSocket:
port: 8060
livenessProbe:
tcpSocket:
port: 8060
initialDelaySeconds: 30
periodSeconds: 10
volumes:
- name: dingtalk-conf
configMap:
name: dingtalk-cm
部署到命名空间下:cattle-monitoring-system
> kubectl get deployments.apps dingtalk -n cattle-monitoring-system
NAME READY UP-TO-DATE AVAILABLE AGE
dingtalk 1/1 1 1 68m
> kubectl get svc dingtalk -n cattle-monitoring-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dingtalk ClusterIP 10.43.245.38 8060/TCP 69m
# 和alertmanager部署在同个命名空间则可以使用以下的方式调用
# http://dingtalk:8060/dingtalk/"你的钉钉webhook名字"/send
# 如配置了k8s的接收者,地址如下
http://dingtalk:8060/dingtalk/k8s/send
集群-监控-Alerting-Routes and Receivers-接收者-创建
创建一个默认接收者,同时接收邮件和钉钉告警
集群-监控-Alerting-Routes and Receivers-路由
root匹配全部告警规则,不可删除
rancher monitor安装后,默认配置了针对组件、节点、pod的告警策略,可在 集群-监控-Advanced-PrometheusRule 查看,修改或新增,具体按实际需要
在本配置了接收者和路由后,可以看到当触发告警时,收到对应的邮件通知
钉钉告警
觉得有用点个关注吧~欢迎一起讨论