原文链接:https://blog.51cto.com/john88wang/1745824
一 应用场景描述
线上业务使用RabbitMQ作为消息队列中间件,那么作为运维人员对RabbitMQ的监控就很重要,本文就针对如何从头到尾使用Zabbix来监控RabbitMQ进行说明。
二 RabbitMQ监控要点
RabbitMQ官方提供两种方法来管理和监控RabbitMQ。
1.使用rabbitmqctl管理和监控
Usage:
rabbitmqctl [-n
查看虚拟主机
# rabbitmqctl list_vhosts
查看队列
# rabbitmqctl list_queues
查看exchanges
# rabbitmqctl list_exchanges
查看用户
# rabbitmqctl list_users
查看连接
# rabbitmqctl list_connections
查看消费者信息
# rabbitmqctl list_consumers
查看环境变量
# rabbitmqctl environment
查看未被确认的队列
# rabbitmqctl list_queues name messages_unacknowledged
查看单个队列的内存使用
# rabbitmqctl list_queues name memory
查看准备就绪的队列
# rabbitmqctl list_queues name messages_ready
2.使用RabbitMQ Management插件来监控和管理
开启Management插件
# rabbitmq-plugins enable rabbitmq_management
http://172.28.2.157:15672/
通过这样的网址访问可以看到RabbitMQ的状态
http://172.28.2.157:15672/cli/rabbitmqadmin
下载rabbitmqadmin管理工具
获取vhost列表
# curl -i -u guest:guest http://localhost:15672/api/vhosts
获取频道列表,限制显示格式
# curl -i -u guest:guest "http://localhost:15672/api/channels?sort=message_stats.publish_details.rate&sort_reverse=true&columns=name,message_stats.publish_details.rate,message_stats.deliver_get_details.rate"
显示概括信息
# curl -i -u guest:guest "http://localhost:15672/api/overview"
management_version 管理插件版本
cluster_name 整个RabbitMQ集群的名称,通过rabbitmqctl set_cluster_name 进行设置
publish 发布的消息总数
queue_totals 显示准备就绪的消息,未确认的消息,未提交的消息等
statistics_db_event_queue 显示还未必数据库处理的事件数量
consumers 消费者个数
queues 队列长度
exchanges 队列交换机的数量
connections 连接数
channels 频道数量
显示节点信息
# curl -i -u guest:guest "http://localhost:15672/api/nodes"
disk_free 磁盘剩余空间,以字节表示
disk_free_limit 磁盘报警的阀值
fd_used 使用掉的文件描述符数量
fd_total 可用的文件描述符数量
io_read_avg_time 读操作平均时间,毫秒为单位
io_read_bytes 总共读入磁盘数据大小,以字节为单位
io_read_count 总共读操作的数量
io_seek_avg_time seek操作的平均时间,毫秒单位
io_seek_count seek操作总量
io_sync_avg_time fsync操作的平均时间,毫秒为单位
io_sync_count fsync操作的总量
io_write_avg_time 每个磁盘写操作的平均时间,毫秒为单位
io_write_bytes 写入磁盘数据总量,以字节为单位
io_write_count 磁盘写操作总量
mem_used 内存使用字节
mem_limit 内存报警阀值,默认是总的物理内存的40%
mnesia_disk_tx_count 需要写入到磁盘的Mnesia事务的数量
mnesia_ram_tx_count 不需要写入到磁盘的Mnesia事务的数量
msg_store_write_count 写入到消息存储的消息数量
msg_store_read_count 从消息存储读入的消息数量
proc_used Erlang进程的使用数量
proc_total Erlang进程的最大数量
queue_index_journal_write_count 写入到队列索引日志的记录数量。每条记录表示一个被发布到队列,从消息队列中被投递出或者在消息队列中被q确认的消息
queue_index_read_count 从队列索引读出的记录数量
queue_index_write_count 写入到队列索引的记录数量
sockets_used 以socket方式使用掉的文件描述符数量
partitions
uptime 自从Erlang VM启动时,运行的时间,单位好毫秒
run_queue 等待运行的Erlang进程数量
processors 检测到被Erlang进程使用到的内核数量
net_ticktime 当前设置的内核tick time
查看频道信息
# curl -i -u guest:guest "http://localhost:15672/api/channels"
查看交换机信息
# curl -i -u guest:guest "http://localhost:15672/api/exchanges"
查看队列信息
# curl -i -u guest:guest "http://localhost:15672/api/queues"
查看vhosts信息
# curl -i -u guest:guest "http://localhost:15672/api/vhosts/?name=/"
三 编写监控脚本和添加Zabbix配置文件
监控脚本主要包括三个部分,监控overview,监控当前主机的节点信息,还有监控各个队列
根据网上的脚本进行了修改,新增加了很多监控项目,把原来脚本中的filter去掉了
这里顺便提一下,对于网上的各种代码,不能拿来就用,要结合自身的需求对代码进行分析,也可以提升自己的编码能力,如果只是一味地拿来就用,那永远也得不到提高。
rabbitmq_status.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
|
#!/usr/bin/env /usr/bin/python
'''Python module to query the RabbitMQ Management Plugin REST API and get
results that can then be used by Zabbix.
https://github.com/jasonmcintosh/rabbitmq-zabbix
'''
'''
This script is tested on RabbitMQ 3.5.3
'''
import
json
import
optparse
import
socket
import
urllib2
import
subprocess
import
tempfile
import
os
import
logging
logging.basicConfig(filename
=
'/opt/logs/zabbix/rabbitmq_zabbix.log'
, level
=
logging.WARNING,
format
=
'%(asctime)s %(levelname)s: %(message)s'
)
class
RabbitMQAPI(
object
):
'''Class for RabbitMQ Management API'''
def
__init__(
self
, user_name
=
'guest'
, password
=
'guest'
, host_name
=
'',
protocol
=
'http'
, port
=
15672
, conf
=
'/opt/app/zabbix/conf/zabbix_agentd.conf'
, senderhostname
=
None
):
self
.user_name
=
user_name
self
.password
=
password
self
.host_name
=
host_name
or
socket.gethostname()
self
.protocol
=
protocol
self
.port
=
port
self
.conf
=
conf
or
'/opt/app/zabbix/conf/zabbix_agentd.conf'
self
.senderhostname
=
senderhostname
if
senderhostname
else
host_name
def
call_api(
self
, path):
'''
All URIs will server only resource of type application/json,and will require HTTP basic authentication. The default username and password is guest/guest. /%sf is encoded for the default virtual host '/'
'''
url
=
'{0}://{1}:{2}/api/{3}'
.
format
(
self
.protocol,
self
.host_name,
self
.port, path)
password_mgr
=
urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(
None
, url,
self
.user_name,
self
.password)
handler
=
urllib2.HTTPBasicAuthHandler(password_mgr)
logging.debug(
'Issue a rabbit API call to get data on '
+
path)
######## json.loads() transfer json data to python data
######## json.dump() transfer python data to json data
return
json.loads(urllib2.build_opener(handler).
open
(url).read())
def
list_queues(
self
):
''' curl -i -u guest:guest http://localhost:15672/api/queues
return a list
'''
queues
=
[]
for
queue
in
self
.call_api(
'queues'
):
logging.debug(
"Discovered queue "
+
queue[
'name'
])
element
=
{
'{#VHOSTNAME}'
: queue[
'vhost'
],
'{#QUEUENAME}'
: queue[
'name'
]
}
queues.append(element)
logging.debug(
'Discovered queue '
+
queue[
'vhost'
]
+
'/'
+
queue[
'name'
])
return
queues
def
list_nodes(
self
):
'''Lists all rabbitMQ nodes in the cluster'''
nodes
=
[]
for
node
in
self
.call_api(
'nodes'
):
# We need to return the node name, because Zabbix
# does not support @ as an item parameter
name
=
node[
'name'
].split(
'@'
)[
1
]
element
=
{
'{#NODENAME}'
: name,
'{#NODETYPE}'
: node[
'type'
]}
nodes.append(element)
logging.debug(
'Discovered nodes '
+
name
+
'/'
+
node[
'type'
])
return
nodes
def
check_queue(
self
):
'''Return the value for a specific item in a queue's details.'''
return_code
=
0
#### use tempfile module to create a file on memory, will not be deleted when it is closed , because 'delete' argument is set to False
rdatafile
=
tempfile.NamedTemporaryFile(delete
=
False
)
for
queue
in
self
.call_api(
'queues'
):
self
._get_queue_data(queue, rdatafile)
rdatafile.close()
return_code
=
self
._send_queue_data(rdatafile)
#### os.unlink is used to remove a file
os.unlink(rdatafile.name)
return
return_code
def
_get_queue_data(
self
, queue, tmpfile):
'''Prepare the queue data for sending'''
'''
### one single queue's information like this #####
### curl -i -u guest:guest http://localhost:15672/api/queues dumps a list ###
{"memory":32064,"message_stats":{"ack":3870,"ack_details":{"rate":0.0},"deliver":3871,"deliver_details":{"rate":0.0},"deliver_get":3871,"deliver_get_details":{"rate":0.0},"disk_writes":3870,"disk_writes_details":{"rate":0.0},"publish":3870,"publish_details":{"rate":0.0},"redeliver":1,"redeliver_details":{"rate":0.0}},"messages":0,"messages_details":{"rate":0.0},"messages_ready":0,"messages_ready_details":{"rate":0.0},"messages_unacknowledged":0,"messages_unacknowledged_details":{"rate":0.0},"idle_since":"2016-03-01 22:04:22","consumer_utilisation":"","policy":"","exclusive_consumer_tag":"","consumers":4,"recoverable_slaves":"","state":"running","messages_ram":0,"messages_ready_ram":0,"messages_unacknowledged_ram":0,"messages_persistent":0,"message_bytes":0,"message_bytes_ready":0,"message_bytes_unacknowledged":0,"message_bytes_ram":0,"message_bytes_persistent":0,"disk_reads":0,"disk_writes":3870,"backing_queue_status":{"q1":0,"q2":0,"delta":["delta",0,0,0],"q3":0,"q4":0,"len":0,"target_ram_count":"infinity","next_seq_id":3870,"avg_ingress_rate":0.060962064328682466,"avg_egress_rate":0.060962064328682466,"avg_ack_ingress_rate":0.060962064328682466,"avg_ack_egress_rate":0.060962064328682466},"name":"app000","vhost":"/","durable":true,"auto_delete":false,"arguments":{},"node":"rabbit@test2"}
'''
for
item
in
[
'memory'
,
'messages'
,
'messages_ready'
,
'messages_unacknowledged'
,
'consumers'
]:
#key = rabbitmq.queues[/,queue_memory,queue.helloWorld]
key
=
'"rabbitmq.queues[{0},queue_{1},{2}]"'
.
format
(queue[
'vhost'
], item, queue[
'name'
])
### if item is in queue,value=queue[item],else value=0
value
=
queue.get(item,
0
)
logging.debug(
"SENDER_DATA: - %s %s"
%
(key,value))
tmpfile.write(
"- %s %s\n"
%
(key, value))
## This is a non standard bit of information added after the standard items
for
item
in
[
'deliver_get'
,
'publish'
]:
key
=
'"rabbitmq.queues[{0},queue_message_stats_{1},{2}]"'
.
format
(queue[
'vhost'
], item, queue[
'name'
])
value
=
queue.get(
'message_stats'
, {}).get(item,
0
)
logging.debug(
"SENDER_DATA: - %s %s"
%
(key,value))
tmpfile.write(
"- %s %s\n"
%
(key, value))
def
_send_queue_data(
self
, tmpfile):
'''Send the queue data to Zabbix.'''
'''Get key value from temp file. '''
args
=
'/opt/app/zabbix/sbin/zabbix_sender -c {0} -i {1}'
if
self
.senderhostname:
args
=
args
+
" -s "
+
self
.senderhostname
return_code
=
0
process
=
subprocess.Popen(args.
format
(
self
.conf, tmpfile.name),
shell
=
True
, stdout
=
subprocess.PIPE,
stderr
=
subprocess.PIPE)
out, err
=
process.communicate()
logging.debug(
"Finished sending data"
)
return_code
=
process.wait()
logging.info(
"Found return code of "
+
str
(return_code))
if
return_code !
=
0
:
logging.warning(out)
logging.warning(err)
else
:
logging.debug(err)
logging.debug(out)
return
return_code
def
check_aliveness(
self
):
'''Check the aliveness status of a given vhost. '''
'''virtual host '/' should be encoded as '/%2f' '''
return
self
.call_api(
'aliveness-test/%2f'
)[
'status'
]
def
check_overview(
self
, item):
'''First, check the overview specific items'''
''' curl -i -u guest:guest http://localhost:15672/api/overview '''
## rabbitmq[overview,connections]
if
item
in
[
'channels'
,
'connections'
,
'consumers'
,
'exchanges'
,
'queues'
]:
return
self
.call_api(
'overview'
).get(
'object_totals'
).get(item,
0
)
## rabbitmq[overview,messages]
elif
item
in
[
'messages'
,
'messages_ready'
,
'messages_unacknowledged'
]:
return
self
.call_api(
'overview'
).get(
'queue_totals'
).get(item,
0
)
elif
item
=
=
'message_stats_deliver_get'
:
return
self
.call_api(
'overview'
).get(
'message_stats'
, {}).get(
'deliver_get'
,
0
)
elif
item
=
=
'message_stats_publish'
:
return
self
.call_api(
'overview'
).get(
'message_stats'
, {}).get(
'publish'
,
0
)
elif
item
=
=
'message_stats_ack'
:
return
self
.call_api(
'overview'
).get(
'message_stats'
, {}).get(
'ack'
,
0
)
elif
item
=
=
'message_stats_redeliver'
:
return
self
.call_api(
'overview'
).get(
'message_stats'
, {}).get(
'redeliver'
,
0
)
elif
item
=
=
'rabbitmq_version'
:
return
self
.call_api(
'overview'
).get(
'rabbitmq_version'
,
'None'
)
def
check_server(
self
,item,node_name):
'''Return the value for a specific item in a node's details. '''
'''curl -i -u guest:guest http://localhost:15672/api/nodes'''
'''return a list'''
# hostname hk-prod-mq1.example.com
# self.call_api('nodes')[0]['name'] rabbit@hk-prod-mq1
node_name
=
node_name.split(
'.'
)[
0
]
for
nodeData
in
self
.call_api(
'nodes'
|