两台服务器:
pro_server:监控平台+分析展示平台,Linux系统oraclelinux 7.3版本
exp_agent:被监控服务器,Linux系统oraclelinux 7.3版本
软件版本:
prometheus-2.3.2.linux-amd64.tar.gz
alertmanager-0.15.2.linux-amd64.tar.gz
node_exporter-0.16.0.linux-amd64.tar.gz
运行前关掉linux的防火墙,两台服务器都要关掉
# systemctl stop firewalld
# systemctl disable firewalld
安装Prometheus server
在监控平台端安装Prometheus Server
[root@pro_server ~]# tar xf prometheus-2.3.2.linux-amd64.tar.gz
[root@pro_server ~]# cd prometheus-2.3.2.linux-amd64
[root@pro_server prometheus-2.3.2.linux-amd64]# ./prometheus --config.file=prometheus.yml
然后通过访问 http://<服务器IP地址>:9090,验证Prometheus是否已安装成功,web显示应该如下
安装node_exporter
在被监控端安装node_exporter
[root@exp_agent ~]# tar xf node_exporter-0.16.0.linux-amd64.tar.gz
[root@exp_agent ~]# cd node_exporter-0.16.0.linux-amd64
[root@exp_agent node_exporter-0.16.0.linux-amd64]# ./node_exporter &
然后在监控端添加node
打开#prometheus_path#/prometheus.yml,添加以下新的节点名以及IP网端号,注意空格和缩进。
- job_name: 'exporter'
static_configs:
- targets: [10.18.34.72:9100]
添加完成后,打开prometheus网页,在status->Targets下查看是否新增的节点能查看
安装alertmanager
tar xf alertmanager-0.15.2.linux-amd64.tar.gz
配置规则
[root@pro_server alertmanager-0.15.2.linux-amd64]# mkdir /etc/prometheus
[root@pro_server alertmanager-0.15.2.linux-amd64]# vi /etc/prometheus/alert.rules
在alert.rules中添加以下代码
groups:
- name: web.hook
rules:
# Alert for any instance that is unreachable for >1 minutes.
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
然后打开#prometheus_path#/prometheus.yml进行配置,添加alerting和rules_files
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "localhost:9093"
rule_files:
- /etc/prometheus/alert.rules
最后配置#alertmanagers_path#/alertmanager.yml
global:
smtp_smarthost: 'smtp.qq.com:587'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'password'//该密码不是邮箱密码,是邮箱授权码
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: '[email protected]'//接收预警邮件的邮箱
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
通过kill被监控端的node_exporter进程可以试验alert是否成功绑定
同时可查看邮箱是否有预警邮件
安装grafana
根据不同系统,grafana安装过程不一样,具体参考:http://docs.grafana.org/installation/
以Linux为例
sudo yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.1.4-1.x86_64.rpm
sudo yum install initscripts fontconfig
sudo rpm -Uvh /var/tmp/yum-root-G9wqx0/grafana-5.1.4-1.x86_64.rpm
安装完成后,启动服务
service grafana-server start
systemctl daemon-reload
systemctl start grafana-server
systemctl status grafana-server // 查看服务状态
然后通过访问 http://<服务器IP地址>:3000,验证Grafana是否安装成功,web显示应该如下
然后注册进入即可。