1、部署准备

说明:所有的容器组都运行在monitoring 命名空间
本文参考https://github.com/coreos/kube-prometheus
由于官方维护的版本在现有的部署环境出现问题所以下面做了一些修改及变更不影响整体效果
Alertmanager 项目使用官方yaml 不做任何修改

2、Alertmanager 相关服务的yaml 准备

2.1、下载官方yaml

mkdir kube-prometheus 
cd kube-prometheus 
git clone https://github.com/coreos/kube-prometheus
cd kube-prometheus/manifests
mkdir prometheus-alertmanager
mv alertmanager*  prometheus-alertmanager

2.2、创建 Alertmanager 服务

cd prometheus-alertmanager
kubectl apply -f .  

2.3、 查看alertmanager 状态

[root@jenkins prometheus-alertmanager]#  kubectl get pod -n monitoring -o wide | grep alertmanager
alertmanager-main-0                       2/2     Running   0          36d     10.65.1.136     node02               
alertmanager-main-1                       2/2     Running   0          26d     10.65.4.246     node03               
alertmanager-main-2                       2/2     Running   0          36d     10.65.0.53      node01               
http://10.65.1.136:9093/#/alerts
http://10.65.4.246:9093/#/alerts
http://10.65.0.53:9093/#/alerts
可以分别打开alertmanager web页
[root@jenkins prometheus-alertmanager]#  kubectl get service -n monitoring -o wide | grep alertmanager   
alertmanager-main         ClusterIP   10.64.215.237           9093/TCP            43d   alertmanager=main,app=alertmanager
alertmanager-operated     ClusterIP   None                    9093/TCP,6783/TCP   36d   app=alertmanager
http://10.64.215.237:9093/#/alerts

3、配置 alertmanager webhook 地址 例子

prometheus alertmanager 支持配置自动发现和更新

  因此,我们只需要重新生成配置即可

  首先,删除原有的配置项

kubectl delete secret alertmanager-main -n monitoring
编写一个 webhook 配置文件,命名为 alertmanager.yaml
报警项目参考https://github.com/qist/msg-sender

global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://msg-sender.monitoring:4000/sender/wechat' 

注意,这里的 url 要跟 msg-sender 提供的服务地址对应上

kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

确认下 alertmanager 的配置项是否正确更新了
Config

global:
  resolve_timeout: 5m
  http_config: {}
  smtp_hello: localhost
  smtp_require_tls: true
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  hipchat_api_url: https://api.hipchat.com/
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
  receiver: webhook
  group_by:
  - alertname
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
receivers:
- name: webhook
  webhook_configs:
  - send_resolved: true
    http_config:{}
    url: http://msg-sender.monitoring:4000/sender/wechat
templates: []

然后,查看 msg-sender 的容器日志,可以看到已经收到了来自 alertmanager 的 webhook 告警

而且已经模拟了wechat 的发送动作!

tail -n 10 msg-sender2019-06-19.log
INFO: 2019/06/19 09:29:02 http.go:238: {"errcode":0,"errmsg":"ok","invaliduser":""}
INFO: 2019/06/19 09:29:02 http.go:231: #sendWechat# client:1.8.17.209:41088, to:huangdaquan, requestType:application/x-www-form-urlencoded, content:2019-06-19 09:29:01 platform bulletin is not available!

下一篇: Kubernetes 生产环境安装部署 基于 Kubernetes v1.14.0 之 prometheus与grafana 部署