Prometheus数据库监控系统
环境准备:
服务器内网IP 作用 安装软件
10.0.0.123 监控的服务端 Prometheus(服务端软件) Grafana(数据展示)
10.0.0.236 被监控的客户端 node_exporter(收集服务器数据)mysqld_exporter(收集mysql数据)
1、服务端部署Prometheus系统:
1.1、Prometheus服务器上安装go环境:
[root@prometheus ~]# yum install go -y
[root@prometheus ~]# go version
go version go1.13.11 linux/amd64
1.2、去Prometheus官网下载安装包并解压安装:
[root@prometheus ~]# wget https://github.com/prometheus/prometheus/releases/download/v2.19.0/prometheus-2.19.0.linux-amd64.tar.gz
[root@prometheus ~]# tar zxvf prometheus-2.19.0.linux-amd64.tar.gz
[root@prometheus ~]# mv prometheus-2.19.0.linux-amd64 /opt/prometheus
1.3修改Prometheus配置文件:
[root@prometheus ~]# cd /opt/prometheus
[root@prometheus /opt/prometheus]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
labels:
instance: prometheus
- job_name: 'mysql'
static_configs:
- targets: ['10.0.0.236:9104']
labels:
instance: '10.0.0.236'
- job_name: 'node1'
static_configs:
- targets: ['10.0.0.236:9100']
labels:
instance: 'nd1'
- job_name: 'node2'
static_configs:
- targets: ['10.0.0.123:9100']
labels:
instance: 'nd2'
1.4、启动prometheus方法:
第一种启动方法:
[root@prometheus /opt/prometheus]# nohup ./prometheus --config.file=./prometheus.yml &
第二种启动方法:
[root@prometheus /opt/prometheus]# ./prometheus &
第二种方法启动前需要进行的操作如下:
启动问题1:
level=error ts=2018-11-19T06:01:05.697957445Z caller=main.go:625
err="opening storage failed: lock DB directory: resource temporarily unavailable
解决:删除 lock 文件
rm -f /opt/prometheus/data/lock
启动问题2:
level=error ts=2018-11-19T06:04:47.83421089Z caller=main.go:625
err="error starting web server: listen tcp 0.0.0.0:9090: bind: address already in use"
解决:查找使用9090端口的PID并删掉
lsof -i :9090
kill -9
1.5、启动后访问prometheus,如下图所示证明prometheus启动成功:
http://10.0.0.123:9090/
2、在10.0.0.236客户端上部署node_exporter服务:
2.1、部署安装node_exporter服务并启动:
[root@zabbix ~]#wget https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
[root@zabbix ~]#tar zxvf node_exporter-1.0.1.linux-amd64.tar.gz
[root@zabbix ~]# mv node_exporter-1.0.1.linux-amd64 /opt/node_exporter
[root@zabbix ~]# cd /opt/node_exporter/
[root@zabbix ~]# nohup ./node_exporter &
2.2、检查端口是否开启:
[root@zabbix /opt/node_exporter]# tail -f 10 nohup.out
tail: cannot open ‘10’ for reading: No such file or directory
==> nohup.out <==
level=info ts=2020-06-24T01:34:43.927Z caller=node_exporter.go:112 collector=timex
level=info ts=2020-06-24T01:34:43.927Z caller=node_exporter.go:112 collector=udp_queues
level=info ts=2020-06-24T01:34:43.927Z caller=node_exporter.go:112 collector=uname
level=info ts=2020-06-24T01:34:43.927Z caller=node_exporter.go:112 collector=vmstat
level=info ts=2020-06-24T01:34:43.927Z caller=node_exporter.go:112 collector=xfs
level=info ts=2020-06-24T01:34:43.927Z caller=node_exporter.go:112 collector=zfs
level=info ts=2020-06-24T01:34:43.932Z caller=node_exporter.go:191 msg="Listening on" address=:9100
level=info ts=2020-06-24T01:34:43.932Z caller=tls_config.go:170 msg="TLS is disabled and it cannot be enabled on the fly." http2=false
2.3、 可以在内网通过 [http://10.0.0.236:9100/metrics]查看客户端数据:
备注:出现上面信息,证明已经收集到数据。
3、在10.0.0.236客户端上部署mysqld_exporter服务:
3.1、部署安装mysqld_exporte服务:
[root@zabbix ~]#wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz
[root@zabbix ~]#tar zxvf mysqld_exporter-0.12.1.linux-amd64.tar.gz
[root@zabbix ~]# mv mysqld_exporter-0.12.1.linux-amd64 /opt/mysqld_exporter
3.2、配置数据库连接,创建用户并授权本地和远方登录访问:
允许远方登录:
mysql[(none)]>create user 'mysql_monitor'@'10.0.0.%' identified by 'monitor123456';
mysql[(none)]>grant replication client,process on *.* to mysql_monitor@"10.0.0.%" identified by "monitor123456";
mysql[(none)]>#grant select on performance_schema.* to mysql_monitor@"10.0.0.%";
允许本地登录:
mysql[(none)]>create user 'mysql_monitor'@'localhost' identified by 'monitor123456';
mysql[(none)]>grant replication client,process on *.* to mysql_monitor@"localhost" identified by "monitor123456";
mysql[(none)]>#grant select on performance_schema.* to mysql_monitor@"localhost";
3.3、进入mysqld_exporter安装目录创建.my.cnf配置文件:
[root@zabbix ~]#cd /opt/mysqld_exporter
[root@zabbix /opt/mysqld_exporter]#vim .my.cnf
[client]
host=10.0.0.236
port=3306
user=mysql_monitor
password=monitor123456
3.4、启动 mysqld_exporter并查看端口:
[root@zabbix /opt/mysqld_exporter]#nohup ./mysqld_exporter --config.my-cnf=.my.cnf &
[root@zabbix /opt/mysqld_exporter]# tail -f nohup.out
time="2020-06-22T17:08:30+08:00" level=info msg="Build context (go=go1.12.7, user=root@0b3e56a7bc0a, date=20190729-12:35:58)" source="mysqld_exporter.go:258"
time="2020-06-22T17:08:30+08:00" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:269"
time="2020-06-22T17:08:30+08:00" level=info msg=" --collect.global_status" source="mysqld_exporter.go:273"
time="2020-06-22T17:08:30+08:00" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:273"
time="2020-06-22T17:08:30+08:00" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:273"
time="2020-06-22T17:08:30+08:00" level=info msg=" --collect.info_schema.innodb_cmp" source="mysqld_exporter.go:273"
time="2020-06-22T17:08:30+08:00" level=info msg=" --collect.info_schema.innodb_cmpmem" source="mysqld_exporter.go:273"
time="2020-06-22T17:08:30+08:00" level=info msg=" --collect.info_schema.query_response_time" source="mysqld_exporter.go:273"
time="2020-06-22T17:08:30+08:00" level=info msg="Listening on :9104" source="mysqld_exporter.go:283"
time="2020-06-22T17:08:30+08:00" level=fatal msg="listen tcp :9104: bind: address already in use" source="mysqld_exporter.go:284"
3.5、 可以在内网通过 [http://10.0.0.236:9104/metrics] 查看mysql的相关监控数据
备注:出现上面信息,证明已经收集到数据。
3.6、在Prometheus界面中查看Status--》Targets是否有节点,绿色的“up”证明数据已经被Prometheus所收集到。
备注:由于上面第一节在安装Prometheus的时候已经在主配置文件中把10.0.0.236所监控的mysql和node配置进去,这里就不再重复演示。直接看结果:
4、在oracle数据库服务器上部署oracledb_exporter服务:
4.1、先在oracle数据库服务器上部署go环境:
[root@PWSNBUTEST ~]# yum install -y go
[root@PWSNBUTEST ~]# go version
go version go1.13.11 linux/amd64
4.2、先在oracle数据库服务器上安装oracle客户端:
下载oracle客户端安装包:https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html
安装oracle客户端rpm包:
[root@PWSNBUTEST ~]# ll
-rw-r--r-- 1 root root 52826628 Jun 29 14:38 oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm
-rw-r--r-- 1 root root 708104 Jun 29 14:41 oracle-instantclient12.2-sqlplus-12.2.0.1.0-1.x86_64.rpm
[root@PWSNBUTEST ~]# rpm -ivh oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm
创建文件夹
[root@PWSNBUTEST ~]#mkdir -p /usr/lib/oracle/19.3/client64/network/admin
创建数据库监听文件(HOST,PORT,SERVICE_NAME 需要配置):
[root@PWSNBUTEST ~]#vim /usr/lib/oracle/19.3/client64/network/admin/tnsnames.ora
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = $ip)(PORT = $port))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = $sid)
)
)
配置环境变量,在.bashrc下面添加如下配置,原来的数据不要动。
[root@PWSNBUTEST ~]#vim ~/.bashrc
export ORACLE_HOME=/usr/lib/oracle/19.3/client64
export TNS_ADMIN=$ORACLE_HOME/network/admin
export NLS_LANG='simplified chinese_china'.ZHS16GBK
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export PATH=$ORACLE_HOME/bin:$PATH
使配置完的环境变量生效
[root@PWSNBUTEST ~]#source ~/.bashrc
4.3、在oracle数据库服务器上安装oracle客户端的sqlplus插件:
下载oracle客户端sqlplus安装包:https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html
安装oracle客户端sqlplus插件rpm包:
[root@PWSNBUTEST ~]# ll
-rw-r--r-- 1 root root 52826628 Jun 29 14:38 oracle-instantclient12.2-basic-12.2.0.1.0-1.x86_64.rpm
-rw-r--r-- 1 root root 708104 Jun 29 14:41 oracle-instantclient12.2-sqlplus-12.2.0.1.0-1.x86_64.rpm
[root@PWSNBUTEST ~]# rpm -ivh oracle-instantclient12.2-sqlplus-12.2.0.1.0-1.x86_64.rpm
4.3、Oracle的监控,需要用到第三方写的export:
在官网先在oracle的export:https://prometheus.io/docs/instrumenting/exporters/
通过这个可以找oracle的第三方exporter,这是一个git工程https://github.com/iamseth/oracledb_exporter
上传下面文件到部署的服务器(必须安装oracle客户端,这样才能连的上数据库,这里下载的是oracledb_exporter.linux-amd64,因为oracle客户端版本是12.0的。这里注意客户端版本必须和oracledb_exporter对应,否则oracledb_exporter无法启动服务)
[root@PWSNBUTEST ~]# ls -ltr
-rw-r--r-- 1 oracle dba 5502288 9月 5 13:57 oracledb_exporter.linux-amd64
[root@PWSNBUTEST ~]#chmod +x oracledb_exporter.linux-amd64
再设置执行的环境变量,命令行直接执行如下命令export(这里我们用的system用户):
export DATA_SOURCE_NAME=用户名/密码@数据库服务名
export DATA_SOURCE_NAME=user/password@//myhost:1521/service
如 export DATA_SOURCE_NAME=system/oracle@//ip:1521/testdb
后台启动服务:
[root@PWSNBUTEST ~]# cd /prometheus/
[root@PWSNBUTEST ~]# nohup ./oracledb_exporter &
查看nohup.out是否报错:
[root@PWSNBUTEST prometheus]# cat nohup.out
time="2020-06-29T15:39:46+08:00" level=info msg="Starting oracledb_exporter 0.0.5" source="main.go:394"
time="2020-06-29T15:39:47+08:00" level=info msg="Listening on :9161" source="main.go:402"
网页验证是否有数据:http://数据库ip:9161/
4.4、prometheus服务端prometheus.yml配置,并重启服务:
[root@prome /opt/prometheus]# vim prometheus.yml
- job_name: 'oracledb'
static_configs:
- targets: ['oracle数据库端ip:9161']
[root@prometheus /opt/prometheus]# nohup ./prometheus --config.file=./prometheus.yml &
4.5、在Prometheus界面中查看Status--》Targets是否有节点,绿色的“up”证明数据已经被Prometheus所收集到。
5、服务端部署Grafana服务:
5.1、Prometheus服务器上下载grafana安装包:
[root@prometheus ~]#wget https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm/grafana-6.0.0-1.x86_64.rpm
5.2、安装并启动Grafana服务:
[root@prometheus ~]# mv grafana-6.0.0-1.x86_64.rpm /opt/
[root@prometheus ~]# cd /opt/
[root@prometheus /opt/]# rpm -ivh grafana-6.0.0-1.x86_64.rpm
[root@prometheus ~]# systemctl start grafana-server
[root@prometheus ~]# systemctl enable grafana-server
5.3、检查Grafana服务端口(Grafana服务端口3000):
[root@prometheus ~]# netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 7100/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 7332/master
tcp6 0 0 :::9100 :::* LISTEN 14514/./node_export
tcp6 0 0 :::22 :::* LISTEN 7100/sshd
tcp6 0 0 :::3000 :::* LISTEN 15019/grafana-serve
tcp6 0 0 ::1:25 :::* LISTEN 7332/master
tcp6 0 0 :::9090 :::* LISTEN 15144/./prometheus
udp 0 0 127.0.0.1:323 0.0.0.0:* 6448/chronyd
udp6 0 0 ::1:323 :::* 6448/chronyd
5.4、web访问Grafana服务([http://10.0.0.123:3000/],用户名:初始密码 admin/admin
):
6、让Grafana从Prometheus中拉取数据:
6.1、 添加一个data source ,基础配置如下:
6.2、 导入对node监控的DashBoard:
第一步、首先去网站 https://grafana.com/grafana/dashboards 下载对主机监控的Dashboard, 搜素 Node Exporter:
第二步:点开第一个 ,拷贝对应的Dashboard Id 8919
第三步:回到我们的图形监控平台Grafana,Dashboard ---> import
第四步:输入 dashboard ID 8919,选择数据源为prometheus即可,然后点击导入:
6.3、 导入对mysql监控的DashBoard:
第一步、首先去网站 https://grafana.com/grafana/dashboards 下载对主机监控的Dashboard, 搜索关键字 mysql overview .找到对应的dashboard id 7362:
第二步:回到我们的图形监控平台Grafana,Dashboard ---> import
第三步: 输入 dashboard ID 7362,选择数据源为prometheus即可
6.4、 导入对oracle监控的DashBoard:
第一步、首先去网站 https://grafana.com/grafana/dashboards 下载对主机监控的Dashboard, 搜索关键字 oracle:
第二步:寻找需要的模板,拷贝对应的Dashboard Id 3333或者11121
第三步:回到我们的图形监控平台Grafana,Dashboard ---> import
第四步: 输入 dashboard ID 3333,选择数据源为prometheus即可