alert-manager 报警

本机测试

包下载地址

  1. 自己机器启动AlertManger 和 prometheus
./prometheus --config.file=prometheus.yml
./alertmanager --config.file alertmanager.yml

附:

  1. prometheus.yml
global:
  scrape_timeout: 15s
  evaluation_interval: 15s 

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

rule_files:
  - "first_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'export'
    static_configs:
    - targets: ['10.211.55.14:9100','10.211.55.15:9100']
  - job_name: 'alertmanger'
    static_configs:
    - targets: ['localhost:9093']

first_rules.yml

groups:
- name: test-rule
  rules:
  - alert: HighCPU
    expr: 100-avg(irate(node_cpu_seconds_total{job="export",mode="idle"}[5m]))by(instance)*100 > 0.1
    for: 1m
    labels:
      severity: warning 
    annotations:
      #summary: High CPU
      #console: Thank you
      summary: "{{$labels.instance}}: Too many clients detected,{{$labels.job}} xixi"
      description: "{{$labels.instance}}: Client num is above 80% (current value is: {{ $value }}"
  1. alertmanager.yml
global:
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'wzl19971123'
  smtp_require_tls: false


route:
  receiver: mail

receivers:
- name: 'mail'
  email_configs:
  - to: '[email protected]'
  1. 两台虚拟机启动node-exporter
    /node_exporter

不知道为什么,后面发送邮箱失败了,难道是4h的问题??

staging环境测试

这篇文章指引

  1. 报警规则
    查看所有规则(大多数都是prom-operater自带的)
kubectl get PrometheusRule -n prometheus

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: prometheus-operator
    chart: prometheus-operator-6.6.1
    heritage: Tiller
    release: prometheus-operator
  name: new-rule
  namespace: prometheus
spec:
  groups:
  - name: new.rules
    rules:
    - alert: MemoryInsufficient
      annotations:
        summary: memory is exhausted
        description: 'host:{{$labels.node_name}}  Address:{{$labels.instance}}: Memory Usage is above 90% (current value is: {{ $value }}'
      expr: |
        (node_memory_MemTotal_bytes-(node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes))/node_memory_MemTotal_bytes*100>90
      for: 3m
      labels:
        severity: critical
    - alert: DiskStorageInsufficient
      annotations:
        summary: disk storage is exhausted
        description: 'host:{{$labels.node_name}}  Address:{{$labels.instance}}: Disk Storage  Usage is above 90% (current value is: {{ $value }}'
      expr: |
        (node_filesystem_size_bytes{mountpoint="/"}-node_filesystem_free_bytes{mountpoint="/"})/node_filesystem_size_bytes{mountpoint ="/"}*100<0.1
      for: 3m
      labels:
        severity: critical
    - alert: ChoerodonServiceDown
      annotations:
        summary: Choerodon Service unavailable
        description: '{{$labels.pod_name}} is unavailable'
      expr: |
        up{job="kubernetes-pod-choerodon"}==0
      for: 3m
      labels:
        severity: critical
    - alert: NodeDown
      annotations:
        summary: A node is unavailable
        description: 'host:{{$labels.node}}  Address:{{$labels.instance}} is unavailable'
      expr: |
        up{job="node-exporter"}==0
      for: 3m
      labels:
        severity: critical

应用报警规则

kubectl apply -f prometheus-testRules.yaml
  1. alertmanager 邮箱相关配置
    报警规则封装在sercet里面
kbl get Secret  -n prometheus

alertmanager.yaml配置文件

global:
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'admin123'
  smtp_require_tls: false

route:
  receiver: default
  routes:
  - receiver: mail
    match:
      alertname: NodeDown
  - receiver: mail
    match:
      alertname: ChoerodonServiceDown
  - receiver: mail
    match:
      alertname: MemoryInsufficient

receivers:
- name: 'default'
  email_configs:
  - to: '[email protected]'
- name: 'mail'
  email_configs:
  - to: '[email protected]'

我的做法是将配置文件生成一个sercet,在将默认的那个alertmanager-prometheus-operator-alertmanager中的base64 部分替换.

kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

对比两个文件,替换掉.

kbl edit Secret artmanager-main -n prometheus
kbl edit Secret alertmanager-prometheus-operator-alertmanager -n prometheus

拓展:
如果有webhook

receivers:
- name: 'web.hook'
  email_configs:
  - to: '[email protected]'
  webhook_configs:
  - url: 'http://localhost:8060/dingtalk/webhook1/send'
    send_resolved: false

你可能感兴趣的:(alert-manager 报警)