如何对服务器硬件和软件进行监控,一款优秀的监控软件是必需的,prometheus就是这样的一款监控软件。
它支持大量的数据库、应用服务器的监控(通过...exporter,本质上是一个agent),当然prometheus也支持节点的监控,包括cpu/mem/disk/network的使用情况。
// --------------------------------------------------------------------------------
参考文献
https://www.digitalocean.com/community/tutorials/how-to-use-prometheus-to-monitor-your-ubuntu-14-04-server
// --------------------------------------------------------------------------------
下载地址
https://github.com/prometheus/prometheus/releases/download/0.15.1/prometheus-0.15.1.linux-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/0.11.0/node_exporter-0.11.0.linux-amd64.tar.gz
下载解压安装node_exporter,注意版本号
cd /opt/linuxsir
mkdir node_exporter
cd node_exporter
tar -xzvf ../node_exporter-0.11.0.linux-amd64.tar.gz
下载解压安装prometheus,注意版本号
cd /opt/linuxsir
mkdir prometheus
cd prometheus
tar -xzvf ../prometheus-0.15.1.linux-amd64.tar.gz
配置文件
在/opt/linuxsir/prometheus下建立配置文件
prometheus.yml
内容为
scrape_configs:
- job_name: "node"
scrape_interval: "5s"
target_groups:
- targets: ['192.168.31.119:9100']
请参考
https://www.digitalocean.com/community/tutorials/how-to-use-prometheus-to-monitor-your-ubuntu-14-04-server
// --------------------------------------------------------------------------------
启动node_exporter
cd /opt/linuxsir
cd node_exporter
./node_exporter &
停止node_exporter用如下命令
netstat -ntlp|grep 9100
显示进程号
kill -9 进程号
启动prometheus
cd /opt/linuxsir
cd prometheus
./prometheus --config.file=prometheus.yml &
停止prometheus用如下命令
netstat -ntlp|grep 9090
显示进程号
kill -9 进程号
访问prometheus
http://192.168.31.119:9090/
访问node information,包括cpu/mem/disk/network的使用情况
http://192.168.31.119:9090/consoles/node.html
// --------------------------------------------------------------------------------
参考查询
可以在prometheus的http://192.168.31.119:9090/界面上,输入查询,显示prometheus监控到的数据
cpu
sum(rate(node_cpu{job='node',mode='user'}[5m])) * 100 / count(count by (cpu)(node_cpu{job='node'}))
sum(rate(node_cpu{job='node',mode='system'}[5m])) * 100 / count(count by (cpu)(node_cpu{job='node'}))
上述两项相加
mem
node_memory_MemTotal{job='node'}
node_memory_MemFree{job='node'}
node_memory_MemFree{job='node'}/node_memory_MemTotal{job='node'}
disk
rate(node_disk_sectors_read{job='node', device='sda' }[5m]) * 512
rate(node_disk_sectors_written{job='node', device='sda' }[5m]) * 512
上述两项相加
network
rate(node_network_receive_bytes{job='node', device!='lo'}[5m])
rate(node_network_transmit_bytes{job='node', device!='lo'}[5m])
// --------------------------------------------------------------------------------
其它参考
understanding-machine-cpu-usage
https://www.robustperception.io/understanding-machine-cpu-usage/
go and export to csv
https://github.com/ryotarai/prometheus-query
https://github.com/ryotarai/prometheus-query
python and export to csv
https://www.robustperception.io/prometheus-query-results-as-csv/
https://www.robustperception.io/prometheus-query-results-as-csv/