一、背景介绍
本文是 k8s prometheus adapter —— 拓展 k8s 基于 prometheus 实现动态伸缩 的拓展内容
重在讲解如何配置 alertmanager,使其能够以 webhook 的方式触发告警的推送
其工作模式如下图:
二、配置步骤
2.1 告警 server 代码实现
告警 server 负责接收 alertmanager 的 webhook 调用,然后根据一定的业务规则,对具体人员下发通知
典型的,可以发送短信、发送邮件等
这里,为了简单,我们以 flask 框架为基础,用简单的程序,模拟这个过程
我们编写 app.py 告警 server 程序
from flask import Flask, request
import json
app = Flask(__name__)@app.route('/send', methods=['POST'])
def send():
try:
data = json.loads(request.data)
alerts = data['alerts']
for i in alerts:
print('SEND SMS: ' + str(i))
except Exception as e:
print(e)
return 'ok'
这段代码的主要功能是开放一个接口(/send),用以接收 webhook 调用,
然后,解析告警内容并发送短信(此处只是简单打印 ^_^|||)
2.2 构建告警 server 镜像,并部署
编写 Dockerfile
FROM python
RUN pip install flask
COPY app.py /app.py
COPY run.sh /run.sh
RUN chmod +x /run.shEXPOSE 5000
ENTRYPOINT ["/run.sh"]
其中,run.sh 启动脚本内容如下:
#!/bin/bash
cd /
export FLASK_APP=app.py
flask run -h 0.0.0.0
将 app.py 和 run.sh 放置于同一个目录下,在该目录执行如下命令,构建镜像
docker build -t image.docker.ssdc.solutions/ctsi/flask-alert:1.0 .
推送镜像至私有仓库
docker push image.docker.ssdc.solutions/ctsi/flask-alert:1.0
部署告警server,定义 deployment ,flask-alert.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flask-alert-deployment
namespace: monitoring
spec:
replicas: 1
template:
metadata:
labels:
app: flask-alert
spec:
containers:
- name: flask-alert
image: ctsi/flask-alert:1.0
imagePullPolicy: Always
ports:
- containerPort: 5000
name: http
volumeMounts:
- name: localtime
mountPath: /etc/localtime
volumes:
- name: localtime
hostPath:
path: /etc/localtime
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: flask-alert-service
name: flask-alert-service
namespace: monitoring
spec:
clusterIP: None
ports:
- name: http
port: 5000
targetPort: http
selector:
app: flask-alert
kubectl create -f flask-alert.yaml
2.3 配置 alertmanager webhook 地址
prometheus alertmanager 支持配置自动发现和更新
因此,我们只需要重新生成配置即可
首先,删除原有的配置项
kubectl delete secret alertmanager-main -n monitoring
编写一个 webhook 配置文件,命名为 alertmanager.yaml
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://flask-alert-service.monitoring:5000/send'
注意,这里的 url 要跟 flask-alert-service 提供的服务地址对应上
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
确认下 alertmanager 的配置项是否正确更新了
然后,查看 flask-alert-deployment 的容器日志,可以看到已经收到了来自 alertmanager 的 webhook 告警
而且已经模拟了 SMS 短信的发送动作!
* Serving Flask app "app.py"
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
172.20.198.85 - - [23/Aug/2018 12:55:29] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:44] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:44] "POST /send HTTP/1.1" 200 -
172.20.179.99 - - [23/Aug/2018 12:55:45] "POST /send HTTP/1.1" 200 -
172.20.179.99 - - [23/Aug/2018 12:55:45] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:47] "POST /send HTTP/1.1" 200 -
172.20.179.99 - - [23/Aug/2018 12:55:47] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:47] "POST /send HTTP/1.1" 200 -
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubePersistentVolumeUsageCritical', 'endpoint': 'https-metrics', 'instance': '192.168.13.34:10250', 'job': 'kubelet', 'namespace': 'default', 'persistentvolumeclaim': 'jenkins', 'prometheus': 'monitoring/k8s', 'service': 'kubelet', 'severity': 'critical'}, 'annotations': {'message': 'The persistent volume claimed by jenkins in namespace default has 0% free.', 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical'}, 'startsAt': '2018-08-20T06:01:26.025235557Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=100+%2A+kubelet_volume_stats_available_bytes%7Bjob%3D%22kubelet%22%7D+%2F+kubelet_volume_stats_capacity_bytes%7Bjob%3D%22kubelet%22%7D+%3C+3&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'TargetDown', 'job': 'msapi-service', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'description': '100% of msapi-service targets are down.', 'summary': 'Targets are down'}, 'startsAt': '2018-08-23T00:50:45.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=100+%2A+%28count+by%28job%29+%28up+%3D%3D+0%29+%2F+count+by%28job%29+%28up%29%29+%3E+10&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'DeadMansSwitch', 'prometheus': 'monitoring/k8s', 'severity': 'none'}, 'annotations': {'description': 'This is a DeadMansSwitch meant to ensure that the entire Alerting pipeline is functional.', 'summary': 'Alerting DeadMansSwitch'}, 'startsAt': '2018-08-15T01:10:15.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=vector%281%29&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'DeadMansSwitch', 'prometheus': 'monitoring/k8s', 'severity': 'none'}, 'annotations': {'description': 'This is a DeadMansSwitch meant to ensure that the entire Alerting pipeline is functional.', 'summary': 'Alerting DeadMansSwitch'}, 'startsAt': '2018-08-15T01:10:15.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=vector%281%29&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'TargetDown', 'job': 'msapi-service', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'description': '100% of msapi-service targets are down.', 'summary': 'Targets are down'}, 'startsAt': '2018-08-23T00:50:45.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=100+%2A+%28count+by%28job%29+%28up+%3D%3D+0%29+%2F+count+by%28job%29+%28up%29%29+%3E+10&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeVersionMismatch', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': 'There are 2 different versions of Kubernetes components running.', 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch'}, 'startsAt': '2018-08-15T10:14:47.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=count%28count+by%28gitVersion%29+%28kubernetes_build_info%7Bjob%21%3D%22kube-dns%22%7D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeVersionMismatch', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': 'There are 2 different versions of Kubernetes components running.', 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch'}, 'startsAt': '2018-08-15T10:14:47.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=count%28count+by%28gitVersion%29+%28kubernetes_build_info%7Bjob%21%3D%22kube-dns%22%7D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.31:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.31:10250' is experiencing 5% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.34:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.34:10250' is experiencing 12% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.36:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.36:10250' is experiencing 12% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.31:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.31:10250' is experiencing 5% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.34:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.34:10250' is experiencing 12% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}