更新日志:
2019.08.23 添加增加针对centos6的/etc/init.d/prometheus启动脚本;
一、前言:
之前部门用的zabbix 2.0版本的监控系统年事已高,看着那古老又单调的界面真是一点兴趣都没有。正好最近在学习k8s,偶然间看到一篇文章介绍了使用(Prometheus+Grafana)搭建的监控系统,页面很炫,本着喜新厌旧,颜值即正义,特地搭建了个研究研究,毕竟学习使我快乐!
二、Prometheus(普罗米修斯)的特点简介:
用过zabbix的朋友应该知道,它是用C(agent/server端)+PHP(前端)+Mysql(存储)的架构。本司1800+主机,8W+的监控项,每月近80G的监控数据(history,history_unit等表),虽然做了按月分区分表,但是数据库压力还是很大。而且前端页面单调,二次开发难度高(其实是小破厂本部门没有专门的PHP开发人员,C更没用了)。
而Prometheus(普罗米修斯),使用Go语言开发(Golang好火!有个同学竟然在朋友圈里大声宣布说Go是世界上最好的语言!),是Google BorgMon监控系统的开源版本。(k8s是Google Borg的开源版本)。
监控数据则是存储在自研的基于时间序列的数据库(TSDB)内,获取各节点监控数据的方式是使用pull模型调各监控节点的http端口(服务端主动去客户端拉取数据)(Go语言开发+pull方式调各节点http端口,像不像小米自研那个open-falcon)等等......
Prometheus特点很多,之后的文章再详细介绍以及和zabbix的对比,此篇点到为止
三、安装Prometheus主节点:
Prometheus官网:https://prometheus.io/
wget https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz
tar zxvf prometheus-2.12.0.linux-amd64.tar.gz
cd prometheus-2.12.0.linux-amd64
解压后查看文件夹中两个主要程序和文件:
prometheus #prometheus的应用程序
prometheus.yml #prometheus的配置文件
查看prometheus.yml配置文件:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
暂时先不用改,注意最后一行为登录页面
编辑systemctl配置文件(ExecStart行请根据个人情况修改,--storage.tsdb.path是监控数据存储位置)(适用于centos7)
vim /etc/systemd/system/prometheus.service
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=root
ExecStart=/data/prometheus/prometheus/prometheus --config.file=/data/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus/storage
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl start prometheus #启动prometheus
systemctl status prometheus #查看prometheus运行状态
针对centos6可编写/etc/init.d/prometheus启动脚本:
#!/bin/bash
#
# Comments to support chkconfig
# chkconfig: 2345 98 02
# description: prometheus service script
#
# Source function library.
. /etc/init.d/functions
### Default variables
prog_name="prometheus"
config_file="/data/${prog_name}/${prog_name}/${prog_name}.yml"
prog_path="/data/${prog_name}/${prog_name}/${prog_name}"
data_path="/data/${prog_name}/storage"
pidfile="/var/run/${prog_name}.pid"
prog_logs="/var/log/${prog_name}.log"
#启动项,监听本地9090端口,支持配置热加载
options="--config.file=${config_file} --storage.tsdb.path=${data_path}"
DESC="Prometheus Server"
# Check if requirements are met
[ -x "${prog_path}" ] || exit 1
RETVAL=0
start(){
action $"Starting $DESC..." su -s /bin/sh -c "nohup $prog_path $options >> $prog_logs 2>&1 &" 2> /dev/null
RETVAL=$?
PID=$(pidof ${prog_path})
[ ! -z "${PID}" ] && echo ${PID} > ${pidfile}
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog_name
return $RETVAL
}
stop(){
echo -n $"Shutting down $prog_name: "
killproc -p ${pidfile}
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog_name
return $RETVAL
}
restart() {
stop
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
status $prog_path
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|restart|status}"
RETVAL=1
esac
exit $RETVAL
打开http:127.0.0.1:9090,可见监控页面(用过open-falcon的朋友说说像不像/大雾):
点击上方Status --> targets可见各监控节点(现在只有自己)
四、安装Prometheus被监控端:
下载node端包(prometheus里跟k8s一样管被监控端叫node,zabbix里叫agentd)
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
tar zxvf node_exporter-0.18.1.linux-amd64.tar.gz
cd node_exporter-0.18.1.linux-amd64
编辑systemctl配置文件(ExecStart行请根据个人情况修改)
vim /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=root
ExecStart=/data/prometheus_node/prometheus_node/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl start node_exporter #启动被监控端
systemctl status node_exporter #查看被监控端状态
curl 127.0.0.1:9100/metrics #返回的事监控数据
五、在Prometheus主节点配置被监控端:
编辑主节点配置文件prometheus.yml,在最后添加下面的内容(IP请根据个人情况修改):
- job_name: 'monitor_nodes'
static_configs:
- targets: ['10.1.129.121:9100']
labels:
instance: node01
systemctl restart prometheus #重启主节点服务
刷新页面即可查看到新节点:
六、安装Grafana:
默认的Prometheus的页面也不好看,好在可以使用Grafana给它一个炫技的页面。(Grafana官网:https://grafana.com,其实它也支持zabbix)
安装Grafana:(https://grafana.com/grafana/download)
wget https://dl.grafana.com/oss/release/grafana-6.3.3-1.x86_64.rpm
sudo yum localinstall grafana-6.3.3-1.x86_64.rpm
systemctl start grafana-server #启动granfana
systemctl status grafana-server #查看granfana状态
打开http://10.1.129.86:3000配置(默认用户密码为admin/admin,第一次登陆会提示你改密码)
选择Prometheus为数据源:
按需配置,点击save:
之后点击添加模板:
可在此页面找寻相应的模板然后下载json文件再倒入进去: https://grafana.com/grafana/dashboards
譬如下载个这个 https://grafana.com/grafana/dashboards/8919,下面有说明要安装饼图的插件,按需操作即可,下载json倒入进去:
最终效果:
至此,一个Prometheus+Grafana的测试环境已经安装完毕,此篇分享结束
关于Prometheus+Grafana更深入的研究敬请期待下篇,谢谢!