一开始搞的prometheus operator是一键式部署的,部署确实简单了,但是管理使用起来就不方便就没再用。
后来搞prometheus监控,只在centos7安装prometheus,使用的监控进程服务,监控mysql,监控docker容器。完事后使用grafana在网页显示,使用alertmanager进行报警,功能都算实现了。
再后来监控kubernetes,发现prometheus想要监控k8s集群就必须在k8s集群中部署一个prometheus,然后就在集群中部署了一个prometheus,以及node_exporter、kube-state-metrics分别用来监控集群的node节点主机信息和pod、node状态等。。。
进入正题。。
有关prometheus组件的下载地址:https://prometheus.io/download/
1.1网页下载 prometheus-2.13.0.linux-amd64.tar.gz 或者
wget -c https://github.com/prometheus/prometheus/releases/download/v2.13.0/prometheus-2.13.0.linux-amd64.tar.gz
1.2网页下载 node_exporter-0.18.1.linux-amd64.tar.gz 或者
wget -c https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.g
1.3网页下载 process-exporter-0.5.0.linux-amd64.tar.gz
wget -c https://github.com/ncabatoff/process-exporter/releases/download/v0.5.0/process-exporter-0.5.0.linux-amd64.tar.gz
1.4网页下载 alertmanager-0.19.0.linux-amd64.tar.gz
wget https://github.com/prometheus/alertmanager/releases/download/v0.19.0/alertmanager-0.19.0.linux-amd64.tar.gz
1.5网页下载 grafana-6.4.1.linux-amd64.tar.gz
wget https://dl.grafana.com/oss/release/grafana-6.4.1.linux-amd64.tar.gz
2.解压
先在/opt/目录下创建prometheus、node_exporter 、process_exporter、alertmanager、grafana
mkdir -p /opt//prometheus
mkdir -p /opt/node_exporter
mkdir -p /opt/process_exporter
mkdir -p /opt/alertmanager
mkdir -p /opt/grafana
tar zxf /opt/prometheus-2.13.0.linux-amd64.tar.gz -C /opt/prometheus --strip-components=1
tar zxf /opt/node_exporter-0.18.1.linux-amd64.tar.gz -C /opt/node_exporter --strip-components=1
tar zxf /opt/process-exporter-0.5.0.linux-amd64.tar.gz -C /opt/process_exporter --strip-components=1
tar zxf /opt/alertmanager-0.19.0.linux-amd64.tar.gz -C /opt/alertmanager --strip-components=1
tar zxf /opt/grafana-6.4.1.linux-amd64.tar.gz -C /opt/grafana --strip-components=1
3.配置
vim /opt/prometheus/prometheus.yml
global:
scrape_interval:
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.1.131:9093']
rule_files:
- "rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.1.131:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['192.168.1.131:9100']
- job_name: 'process'
static_configs:
- targets: ['192.168.1.131:9256']
vim /opt/prometheus/rules.yml (查看更多的主机监控规则)
groups:
- name: 监控进程
rules:
- alert: docker_status
expr: namedprocess_namegroup_num_procs{groupname="map[:dockerd]"job="process"} == 0
for: 30s
labels:
area: A
annotations:
summary: "docker进程服务 {{ $labels.instance }} 挂了"
- name: 主机状态-监控告警
rules:
- alert: 主机状态
expr: up == 0
for: 1m
labels:
status: 非常严重
annotations:
summary: "{{$labels.instance}}:服务器宕机"
description: "{{$labels.instance}}:服务器延时超过5分钟"
vim /opt/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.exmail.qq.com:465' # 邮箱smtp服务器代理
smtp_from: '[email protected]' # 发送邮箱名称
smtp_auth_username: '[email protected]' # 邮箱名称
smtp_auth_password: 'Caitong12316' # 邮箱密码或授权码
smtp_require_tls: false
route:
group_by: ['alertname'] # 报警分组依据
group_wait: 30s # 最初即第一次等待多久时间发送一组警报的通知
group_interval: 5m # 在发送新警报前的等待时间
repeat_interval: 2h # 发送重复警报的周期
receiver: email
receivers:
- name: 'email' # 警报
email_configs: # 邮箱配置
- to: '[email protected]' # 接收警报的email配置
headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题
vim /opt/process_exporter/process_conf.yml
process_names:
- name: "{{.Matches}}"
cmdline:
- 'dockerd'
4.启动命令
# 后台运行node_exporter
nohup /opt/node_exporter/node_exporter > /opt/node_exporter/node_exporter.stdout 2>&1 &
# 后台运行prometheus
nohup /opt/prometheus/prometheus > /opt/prometheus/prometheus.stdout 2>&1 &
# 后台运行process-exporter
nohup /opt/process-exporter/process-exporter -config.path process-conf.yaml > /opt/process-exporter/process-exporter.stdout 2>&1 &
# 后台运行alertmanager
nohup /opt/alertmanager/alertmanager --config.file="alertmanager.yml" > /opt/alertmanager/alertmanager.stdout 2>&1 &
# 后台运行grafana
nohup /usr/local/services/grafana/bin/grafana-server > /usr/local/services/grafana/grafana.stdout 2>&1 &
5. 访问:
http://192.168.1.131:9090
http://192.168.1.131:3000 (admin/admin)