第三部分:alter manager安装
1.下载,安装
https://prometheus.io/download/
tar -zxvf alertmanager-0.15.2.linux-amd64.tar.gz && mv alertmanager-0.15.2.linux-amd64 alertmanager
编辑 alertmanager.yaml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'tong161430222'
smtp_require_tls: false
#templates:
# - '/apps/alertmanager/template/*.tmpl'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: default-receiver
receivers:
- name: 'default-receiver'
email_configs:
- to: '[email protected]'
html: '{{ template "alert.html" . }}'
headers: { Subject: "[WARN] 报警邮件test" }
备注,报警邮件模板,可以默认,也可以自定义,自己指定自定义位置即可
创建账户
# useradd prometheus
# chown -R prometheus:prometheus /apps/alertmanager
# vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/apps/alertmanager/alertmanager --config.file=/apps/alertmanager/alertmanager.yml --storage.path=/apps/alertmanager/data
Restart=on-failure
[Install]
WantedBy=multi-user.target
启动
# systemctl enable alertmanager.service
# systemctl start alertmanager.service
访问
地址:http://ip:9093
2. 配置prometheus和alertmanager关联
编辑prometheus.yml,修改如下配置信息
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
#- alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/apps/prometheus/node_down.yml"
- "/apps/prometheus/memory_over.yml"
# - "second_rules.yml"
保存,重启
rules文件(是在prometheus.yml中加载,修改需要重启prometheus加载)
内存报警规则文件
cat memory_over.yml
groups:
- name: 内存报警规则
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
user: prometheus
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})"
存活报警规则文件
cat node_down.yml
groups:
- name: node存活报警规则
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: prometheus
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
3,验证效果
将内存报警阀值改小,触发报警,邮件如下: