prometheus alertmanager webhook 配置教程

一、背景介绍

       本文是 k8s prometheus adapter —— 拓展 k8s 基于 prometheus 实现动态伸缩 的拓展内容

       重在讲解如何配置 alertmanager,使其能够以 webhook 的方式触发告警的推送

       其工作模式如下图:

prometheus alertmanager webhook 配置教程_第1张图片

二、配置步骤

2.1  告警 server 代码实现

       告警 server 负责接收 alertmanager 的 webhook 调用,然后根据一定的业务规则,对具体人员下发通知

       典型的,可以发送短信、发送邮件等

       这里,为了简单,我们以 flask 框架为基础,用简单的程序,模拟这个过程

       我们编写 app.py 告警 server 程序

from flask import Flask, request
import json
app = Flask(__name__)

@app.route('/send', methods=['POST'])
def send():
    try:
        data = json.loads(request.data)
        alerts =  data['alerts']
        for i in alerts:
            print('SEND SMS: ' + str(i))
    except Exception as e:
        print(e)
    return 'ok'

        这段代码的主要功能是开放一个接口(/send),用以接收 webhook 调用,

        然后,解析告警内容并发送短信(此处只是简单打印  ^_^|||)

2.2  构建告警 server 镜像,并部署

       编写 Dockerfile

FROM python 
RUN pip install flask
COPY app.py /app.py
COPY run.sh /run.sh
RUN chmod +x /run.sh

EXPOSE 5000

ENTRYPOINT ["/run.sh"]

       其中,run.sh 启动脚本内容如下:

#!/bin/bash
cd /
export FLASK_APP=app.py
flask run -h 0.0.0.0

        将 app.py 和 run.sh 放置于同一个目录下,在该目录执行如下命令,构建镜像

 docker build -t image.docker.ssdc.solutions/ctsi/flask-alert:1.0 .

        推送镜像至私有仓库

docker push image.docker.ssdc.solutions/ctsi/flask-alert:1.0 

        部署告警server,定义 deployment ,flask-alert.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: flask-alert-deployment
  namespace: monitoring
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: flask-alert
    spec:
      containers:
        - name: flask-alert
          image: ctsi/flask-alert:1.0
          imagePullPolicy: Always
          ports:
            - containerPort: 5000
              name: http
          volumeMounts:
            - name: localtime
              mountPath: /etc/localtime
      volumes:
        - name: localtime
          hostPath:
            path: /etc/localtime
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: flask-alert-service
  name: flask-alert-service
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: http
    port: 5000
    targetPort: http
  selector:
    app: flask-alert

kubectl create -f  flask-alert.yaml

2.3  配置 alertmanager webhook 地址 

       prometheus alertmanager 支持配置自动发现和更新

       因此,我们只需要重新生成配置即可

       首先,删除原有的配置项

kubectl delete secret alertmanager-main -n monitoring

        编写一个 webhook 配置文件,命名为 alertmanager.yaml

global:
  resolve_timeout: 5m
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'webhook'
receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://flask-alert-service.monitoring:5000/send' 

          注意,这里的 url 要跟  flask-alert-service 提供的服务地址对应上

kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

          确认下 alertmanager 的配置项是否正确更新了

          prometheus alertmanager webhook 配置教程_第2张图片 

         然后,查看 flask-alert-deployment 的容器日志,可以看到已经收到了来自 alertmanager 的 webhook 告警

         而且已经模拟了 SMS 短信的发送动作!

 * Serving Flask app "app.py"
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
172.20.198.85 - - [23/Aug/2018 12:55:29] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:44] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:44] "POST /send HTTP/1.1" 200 -
172.20.179.99 - - [23/Aug/2018 12:55:45] "POST /send HTTP/1.1" 200 -
172.20.179.99 - - [23/Aug/2018 12:55:45] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:47] "POST /send HTTP/1.1" 200 -
172.20.179.99 - - [23/Aug/2018 12:55:47] "POST /send HTTP/1.1" 200 -
172.20.198.85 - - [23/Aug/2018 12:55:47] "POST /send HTTP/1.1" 200 -
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubePersistentVolumeUsageCritical', 'endpoint': 'https-metrics', 'instance': '192.168.13.34:10250', 'job': 'kubelet', 'namespace': 'default', 'persistentvolumeclaim': 'jenkins', 'prometheus': 'monitoring/k8s', 'service': 'kubelet', 'severity': 'critical'}, 'annotations': {'message': 'The persistent volume claimed by jenkins in namespace default has 0% free.', 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepersistentvolumeusagecritical'}, 'startsAt': '2018-08-20T06:01:26.025235557Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=100+%2A+kubelet_volume_stats_available_bytes%7Bjob%3D%22kubelet%22%7D+%2F+kubelet_volume_stats_capacity_bytes%7Bjob%3D%22kubelet%22%7D+%3C+3&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'TargetDown', 'job': 'msapi-service', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'description': '100% of msapi-service targets are down.', 'summary': 'Targets are down'}, 'startsAt': '2018-08-23T00:50:45.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=100+%2A+%28count+by%28job%29+%28up+%3D%3D+0%29+%2F+count+by%28job%29+%28up%29%29+%3E+10&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'DeadMansSwitch', 'prometheus': 'monitoring/k8s', 'severity': 'none'}, 'annotations': {'description': 'This is a DeadMansSwitch meant to ensure that the entire Alerting pipeline is functional.', 'summary': 'Alerting DeadMansSwitch'}, 'startsAt': '2018-08-15T01:10:15.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=vector%281%29&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'DeadMansSwitch', 'prometheus': 'monitoring/k8s', 'severity': 'none'}, 'annotations': {'description': 'This is a DeadMansSwitch meant to ensure that the entire Alerting pipeline is functional.', 'summary': 'Alerting DeadMansSwitch'}, 'startsAt': '2018-08-15T01:10:15.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=vector%281%29&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'TargetDown', 'job': 'msapi-service', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'description': '100% of msapi-service targets are down.', 'summary': 'Targets are down'}, 'startsAt': '2018-08-23T00:50:45.022073385Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=100+%2A+%28count+by%28job%29+%28up+%3D%3D+0%29+%2F+count+by%28job%29+%28up%29%29+%3E+10&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeVersionMismatch', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': 'There are 2 different versions of Kubernetes components running.', 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch'}, 'startsAt': '2018-08-15T10:14:47.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=count%28count+by%28gitVersion%29+%28kubernetes_build_info%7Bjob%21%3D%22kube-dns%22%7D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeVersionMismatch', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': 'There are 2 different versions of Kubernetes components running.', 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeversionmismatch'}, 'startsAt': '2018-08-15T10:14:47.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=count%28count+by%28gitVersion%29+%28kubernetes_build_info%7Bjob%21%3D%22kube-dns%22%7D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.31:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.31:10250' is experiencing 5% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.34:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.34:10250' is experiencing 12% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.36:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.36:10250' is experiencing 12% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.31:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.31:10250' is experiencing 5% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'}
SEND SMS: {'status': 'firing', 'labels': {'alertname': 'KubeClientErrors', 'instance': '192.168.13.34:10250', 'job': 'kubelet', 'prometheus': 'monitoring/k8s', 'severity': 'warning'}, 'annotations': {'message': "Kubernetes API server client 'kubelet/192.168.13.34:10250' is experiencing 12% errors.'", 'runbook_url': 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeclienterrors'}, 'startsAt': '2018-08-15T01:28:17.588420806Z', 'endsAt': '0001-01-01T00:00:00Z', 'generatorURL': 'http://prometheus-k8s-0:9090/graph?g0.expr=sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%7Bcode%21~%222..%22%7D%5B5m%5D%29%29+%2A+100+%2F+sum+by%28instance%2C+job%29+%28rate%28rest_client_requests_total%5B5m%5D%29%29+%3E+1&g0.tab=1'} 

 

你可能感兴趣的:(容器技术)