1. 提出问题
在工作中OpenStack集群的vm需要解决基础性能指标的监控,如果每台的启动再去手动添加监控node_exporter,再写prometheus.yml的话,对于吾等懒程序员简直就是噩梦,由此开始设计基于Prometheus+Consul的监控方案。
2. 解决方案
1. 通过将node_exporter打包进Image实现强制自动部署
2. 通过开发一个小程序自动注册node_exporter到consul,同时小程序也与node_exporter一样打包进Image
3. 配置Prometheus通过consul来发现node_exporter节点
3. 部署Consul集群
3.1 集群规划
系统 | 主机名 | IP |
---|---|---|
Centos-7.7 | compute-7-1 | 172.16.100.71 |
Centos-7.7 | compute-7-2 | 172.16.100.72 |
Centos-7.7 | compute-7-3 | 172.16.100.73 |
3.1 自行下载Consul并安装
Consul v1.7.2
所有节点分别安装consul
$ wget https://releases.hashicorp.com/consul/1.7.2/consul_1.7.2_linux_amd64.zip
$ unzip consul_1.7.2_linux_amd64.zip
$ mv consul_1.7.2/consul /usr/bin/
$ mkdir /data/consul
$ mkdir /etc/consul.d
$ useradd consul
所有节点分别修改配置文件
$ vim /etc/consul.d/consul_config.json
{
"bootstrap_expect": 1,
"datacenter": "sibat_consul",
"data_dir": "/data/consul",
"node_name": "compute-7-1",
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"bind_addr": "172.16.100.71"
}
$ vim /etc/consul.d/consul_config.json
{
"bootstrap_expect": 1,
"datacenter": "sibat_consul",
"data_dir": "/data/consul",
"node_name": "compute-7-2",
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"bind_addr": "172.16.100.72"
}
$ vim /etc/consul.d/consul_config.json
{
"bootstrap_expect": 1,
"datacenter": "sibat_consul",
"data_dir": "/data/consul",
"node_name": "compute-7-3",
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"bind_addr": "172.16.100.73"
}
}
}
所有节点分别配置systemd,启动consul并设置开机自启动
$ vim /usr/lib/systemd/system/consul.service
[Unit]
Description=consul: the monitoring system
Documentation=http://prometheus.io/docs/
[Service]
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul
3.2 配置master token
初始化master token
$ curl \
--request PUT \
http://172.16.100.71:8500/v1/acl/bootstrap
`{"ID":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"}`
获取encrypt
$ consul keygen
gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=
3.3 配置获取到的master token
compute-7-1:
{
"bootstrap_expect": 1,
"datacenter": "sibat_consul",
"primary_datacenter":"sibat_consul",
"data_dir": "/data/consul",
"start_join":[
"172.16.100.72",
"172.16.100.73"
],
"retry_join":[
"172.16.100.72",
"172.16.100.73"
],
"connect":{
"enabled": true
},
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"node_name": "compute-7-1",
"bind_addr": "172.16.100.71",
"advertise_addr": "172.16.100.71",
"enable_script_checks": false,
"enable_local_script_checks": true,
"log_file": "/var/log",
"log_rotate_bytes": 300000000,
"log_rotate_duration": "360h",
"log_level": "info",
"encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
}
}
}
compute-7-2
{
"datacenter": "sibat_consul",
"primary_datacenter":"sibat_consul",
"data_dir": "/data/consul",
"connect":{
"enabled": true
},
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"node_name": "compute-7-2",
"bind_addr": "172.16.100.72",
"advertise_addr": "172.16.100.72",
"enable_script_checks": false,
"enable_local_script_checks": true,
"log_file": "/var/log",
"log_rotate_bytes": 300000000,
"log_rotate_duration": "360h",
"log_level": "info",
"acl_datacenter": "sibat_consul",
"encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
}
}
}
compute-7-3
{
"datacenter": "sibat_consul",
"primary_datacenter":"sibat_consul",
"data_dir": "/data/consul",
"connect":{
"enabled": true
},
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"node_name": "compute-7-3",
"bind_addr": "172.16.100.73",
"advertise_addr": "172.16.100.73",
"enable_script_checks": false,
"enable_local_script_checks": true,
"log_file": "/var/log",
"log_rotate_bytes": 300000000,
"log_rotate_duration": "360h",
"log_level": "info",
"acl_datacenter": "sibat_consul",
"encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
}
}
}
在三个节点中启动
先在slave节点启动
$ systemctl restart consul
$ systemctl restart consul
之后再master启动
$ systemctl restart consul
启动后我们会查看到服务器日志中出现与权限有关的错误,根据官方文档的说法是因为未配置agent的token导致的,因此还需初始化slave token:
$ curl --request PUT --header "X-Consul-Token: cd76a0f7-5535-40cc-8696-073462acc6c7" --data '{
"Name": "Agent Token",
"Type": "client",
"Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }"
}' http://172.16.100.71:8500/v1/acl/create
3.4 配置获取到的agent token
compute-7-1:
{
"bootstrap_expect": 1,
"datacenter": "sibat_consul",
"primary_datacenter":"sibat_consul",
"data_dir": "/data/consul",
"start_join":[
"172.16.100.72",
"172.16.100.73"
],
"retry_join":[
"172.16.100.72",
"172.16.100.73"
],
"connect":{
"enabled": true
},
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"node_name": "compute-7-1",
"bind_addr": "172.16.100.71",
"advertise_addr": "172.16.100.71",
"enable_script_checks": false,
"enable_local_script_checks": true,
"log_file": "/var/log",
"log_rotate_bytes": 300000000,
"log_rotate_duration": "360h",
"log_level": "info",
"encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
"agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
}
}
}
compute-7-2
{
"datacenter": "sibat_consul",
"primary_datacenter":"sibat_consul",
"data_dir": "/data/consul",
"connect":{
"enabled": true
},
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"node_name": "compute-7-2",
"bind_addr": "172.16.100.72",
"advertise_addr": "172.16.100.72",
"enable_script_checks": false,
"enable_local_script_checks": true,
"log_file": "/var/log",
"log_rotate_bytes": 300000000,
"log_rotate_duration": "360h",
"log_level": "info",
"acl_datacenter": "sibat_consul",
"encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
"agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
}
}
}
compute-7-3
{
"datacenter": "sibat_consul",
"primary_datacenter":"sibat_consul",
"data_dir": "/data/consul",
"connect":{
"enabled": true
},
"server": true,
"client_addr": "0.0.0.0",
"ui": true,
"node_name": "compute-7-3",
"bind_addr": "172.16.100.73",
"advertise_addr": "172.16.100.73",
"enable_script_checks": false,
"enable_local_script_checks": true,
"log_file": "/var/log",
"log_rotate_bytes": 300000000,
"log_rotate_duration": "360h",
"log_level": "info",
"acl_datacenter": "sibat_consul",
"encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
"acl": {
"enabled": true,
"default_policy": "deny",
"enable_token_persistence": true,
"tokens": {
"master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
"agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
}
}
}
在三个节点中启动
先在slave节点启动
$ systemctl restart consul
$ systemctl restart consul
之后再master启动
$ systemctl restart consul
待集群稳定后即可访问UI,http://172.16.100.71:8500
4. 集成Prometheus
$ sudo vim /etc/prometheus/prometheus.yml
...
- job_name: 'OpenStack-vms'
consul_sd_configs:
- server: "172.16.100.71:8500"
token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
services: []
- server: "172.16.100.72:8500"
token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
services: []
- server: "172.16.100.73:8500"
token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: ".*OpenStack-vms.*"
replacement: OpenStack-vms
action: keep
target_label: env
- regex: __meta_consul_service_metadata_(.+)
action: labelmap
...
$ sudo systemctl restart prometheus
启动后,在prometheus UI就可以找到刚才配置的job_name了:
5. VMS自动注册
问题:关于自动注册,原生的组件中都没有较美好的方案。我刚开始使用curl的方式通过shell写入rc.local的方式自动注册,但是发现有时还是会出现没有注册的情况,再加上centos7的并发启动的机制,使得这个过程并不友好。同时还发现consul并不是强一致性的注册中心,有时会出现相同的serviceid同时被注册到不同的节点的情况:
所以使用go语言开发了一个小程序自动注册node_exporter,并使用systemd设置开机自启动来达到自动注册的效果,并通过一套算法来避免重复注册以及实现均衡注册。
5.1 Node_Exporter
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
$ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/
$ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter
$ vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter: the monitoring system
Documentation=http://prometheus.io/docs/
[Service]
User=nobody
ExecStart=/usr/local/node_exporter/node_exporter
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter
5.2 consulR注册小程序
安装consulR小程序
$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip
$ unzip consulR.zip
$ cd consulR
$ chmod +x consulR
$ mv consulR /usr/local/
$ mkdir /data/consul/logs -p
配置文件
$ vim /etc/consul/consulR.yaml
System:
ServiceName: consul-registy-service
ListenAddress: 0.0.0.0
Port: 9984
#通过此IP与端口来检索出口网卡IP地址
FindAddress: 8.8.8.8:80
Logs:
LogFilePath: /data/consul/consul.log
LogLevel: info
Consul:
Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500
#Consul Master Token
Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c
CheckTimeout: 5s
CheckInterval: 5s
CheckDeregisterCriticalServiceAfter: true
CheckDeregisterCriticalServiceAfterTime: 5s
Service:
Tag: node-exporter
#Address空则默认通过FindAddress配置来检索出口网卡IP地址
Address:
Port: 9100
$ chown -R nobody.nobody /etc/consul/consulR.yaml
使用systemd管理
$ vim /usr/lib/systemd/system/consulR.service
[Unit]
Description=Consul
After=network-online.target
[Service]
User=nobody
ExecStart=/usr/local/consulR --confpath=/etc/consul/consulR.yaml
Restart=on-failure
RestartSec=1
[Install]
WantedBy=multi-user.target
设置开机自启动
$ systemctl daemon-reload && systemctl start consulR && systemctl enable consulR
VM关机
$ poweroff
制作镜像
$ qemu-img convert -c disk -O qcow2 centos-fantasy.qcow2
$ openstack image create "CentOS7-Fantasy" --file centos-fantasy.qcow2 --disk-format qcow2 --container-format bare --public
创建镜像后,用这个镜像创建虚拟机,将会自动把9100注册到consul集群,之后就能被Prometheus自动发现了。
6. 监控可视化
在Grafana导入8919模板
这样就可以在instance看到自动发现后的监控主机详情了。。。
很简单对吧?