一 应用场景描述
线上业务使用RabbitMQ作为消息队列中间件,那么作为运维人员对RabbitMQ的监控就很重要,本文就针对如何从头到尾使用Zabbix来监控RabbitMQ进行说明。
二 RabbitMQ监控要点
RabbitMQ官方提供两种方法来管理和监控RabbitMQ。
1.使用rabbitmqctl管理和监控
Usage:
rabbitmqctl [-n
查看虚拟主机
# rabbitmqctl list_vhosts
查看队列
# rabbitmqctl list_queues
查看exchanges
# rabbitmqctl list_exchanges
查看用户
# rabbitmqctl list_users
查看连接
# rabbitmqctl list_connections
查看消费者信息
# rabbitmqctl list_consumers
查看环境变量
# rabbitmqctl environment
查看未被确认的队列
# rabbitmqctl list_queues name messages_unacknowledged
查看单个队列的内存使用
# rabbitmqctl list_queues name memory
查看准备就绪的队列
# rabbitmqctl list_queues name messages_ready
2.使用RabbitMQ Management插件来监控和管理
开启Management插件
# rabbitmq-plugins enable rabbitmq_management
http://172.28.2.157:15672/
通过这样的网址访问可以看到RabbitMQ的状态
http://172.28.2.157:15672/cli/rabbitmqadmin
下载rabbitmqadmin管理工具
获取vhost列表
# curl -i -u guest:guest http://localhost:15672/api/vhosts
获取频道列表,限制显示格式
# curl -i -u guest:guest "http://localhost:15672/api/channels?sort=message_stats.publish_details.rate&sort_reverse=true&columns=name,message_stats.publish_details.rate,message_stats.deliver_get_details.rate"
显示概括信息
# curl -i -u guest:guest "http://localhost:15672/api/overview"
management_version 管理插件版本
cluster_name 整个RabbitMQ集群的名称,通过rabbitmqctl set_cluster_name 进行设置
publish 发布的消息总数
queue_totals 显示准备就绪的消息,未确认的消息,未提交的消息等
statistics_db_event_queue 显示还未必数据库处理的事件数量
consumers 消费者个数
queues 队列长度
exchanges 队列交换机的数量
connections 连接数
channels 频道数量
显示节点信息
# curl -i -u guest:guest "http://localhost:15672/api/nodes"
disk_free 磁盘剩余空间,以字节表示
disk_free_limit 磁盘报警的阀值
fd_used 使用掉的文件描述符数量
fd_total 可用的文件描述符数量
io_read_avg_time 读操作平均时间,毫秒为单位
io_read_bytes 总共读入磁盘数据大小,以字节为单位
io_read_count 总共读操作的数量
io_seek_avg_time seek操作的平均时间,毫秒单位
io_seek_count seek操作总量
io_sync_avg_time fsync操作的平均时间,毫秒为单位
io_sync_count fsync操作的总量
io_write_avg_time 每个磁盘写操作的平均时间,毫秒为单位
io_write_bytes 写入磁盘数据总量,以字节为单位
io_write_count 磁盘写操作总量
mem_used 内存使用字节
mem_limit 内存报警阀值,默认是总的物理内存的40%
mnesia_disk_tx_count 需要写入到磁盘的Mnesia事务的数量
mnesia_ram_tx_count 不需要写入到磁盘的Mnesia事务的数量
msg_store_write_count 写入到消息存储的消息数量
msg_store_read_count 从消息存储读入的消息数量
proc_used Erlang进程的使用数量
proc_total Erlang进程的最大数量
queue_index_journal_write_count 写入到队列索引日志的记录数量。每条记录表示一个被发布到队列,从消息队列中被投递出或者在消息队列中被q确认的消息
queue_index_read_count 从队列索引读出的记录数量
queue_index_write_count 写入到队列索引的记录数量
sockets_used 以socket方式使用掉的文件描述符数量
partitions
uptime 自从Erlang VM启动时,运行的时间,单位好毫秒
run_queue 等待运行的Erlang进程数量
processors 检测到被Erlang进程使用到的内核数量
net_ticktime 当前设置的内核tick time
查看频道信息
# curl -i -u guest:guest "http://localhost:15672/api/channels"
查看交换机信息
# curl -i -u guest:guest "http://localhost:15672/api/exchanges"
查看队列信息
# curl -i -u guest:guest "http://localhost:15672/api/queues"
查看vhosts信息
# curl -i -u guest:guest "http://localhost:15672/api/vhosts/?name=/"
三 编写监控脚本和添加Zabbix配置文件
监控脚本主要包括三个部分,监控overview,监控当前主机的节点信息,还有监控各个队列
根据网上的脚本进行了修改,新增加了很多监控项目,把原来脚本中的filter去掉了
这里顺便提一下,对于网上的各种代码,不能拿来就用,要结合自身的需求对代码进行分析,也可以提升自己的编码能力,如果只是一味地拿来就用,那永远也得不到提高。
rabbitmq_status.py
#!/usr/bin/env /usr/bin/python '''Python module to query the RabbitMQ Management Plugin REST API and get results that can then be used by Zabbix. https://github.com/jasonmcintosh/rabbitmq-zabbix ''' ''' This script is tested on RabbitMQ 3.5.3 ''' import json import optparse import socket import urllib2 import subprocess import tempfile import os import logging logging.basicConfig(filename='/opt/logs/zabbix/rabbitmq_zabbix.log', level=logging.WARNING, format='%(asctime)s %(levelname)s: %(message)s') class RabbitMQAPI(object): '''Class for RabbitMQ Management API''' def __init__(self, user_name='guest', password='guest', host_name='', protocol='http', port=15672, conf='/opt/app/zabbix/conf/zabbix_agentd.conf', senderhostname=None): self.user_name = user_name self.password = password self.host_name = host_name or socket.gethostname() self.protocol = protocol self.port = port self.conf = conf or '/opt/app/zabbix/conf/zabbix_agentd.conf' self.senderhostname = senderhostname if senderhostname else host_name def call_api(self, path): ''' All URIs will server only resource of type application/json,and will require HTTP basic authentication. The default username and password is guest/guest. /%sf is encoded for the default virtual host '/' ''' url = '{0}://{1}:{2}/api/{3}'.format(self.protocol, self.host_name, self.port, path) password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, url, self.user_name, self.password) handler = urllib2.HTTPBasicAuthHandler(password_mgr) logging.debug('Issue a rabbit API call to get data on ' + path) ######## json.loads() transfer json data to python data ######## json.dump() transfer python data to json data return json.loads(urllib2.build_opener(handler).open(url).read()) def list_queues(self): ''' curl -i -u guest:guest http://localhost:15672/api/queues return a list ''' queues = [] for queue in self.call_api('queues'): logging.debug("Discovered queue " + queue['name']) element = {'{#VHOSTNAME}': queue['vhost'], '{#QUEUENAME}': queue['name'] } queues.append(element) logging.debug('Discovered queue '+queue['vhost']+'/'+queue['name']) return queues def list_nodes(self): '''Lists all rabbitMQ nodes in the cluster''' nodes = [] for node in self.call_api('nodes'): # We need to return the node name, because Zabbix # does not support @ as an item parameter name = node['name'].split('@')[1] element = {'{#NODENAME}': name, '{#NODETYPE}': node['type']} nodes.append(element) logging.debug('Discovered nodes '+name+'/'+node['type']) return nodes def check_queue(self): '''Return the value for a specific item in a queue's details.''' return_code = 0 #### use tempfile module to create a file on memory, will not be deleted when it is closed , because 'delete' argument is set to False rdatafile = tempfile.NamedTemporaryFile(delete=False) for queue in self.call_api('queues'): self._get_queue_data(queue, rdatafile) rdatafile.close() return_code = self._send_queue_data(rdatafile) #### os.unlink is used to remove a file os.unlink(rdatafile.name) return return_code def _get_queue_data(self, queue, tmpfile): '''Prepare the queue data for sending''' ''' ### one single queue's information like this ##### ### curl -i -u guest:guest http://localhost:15672/api/queues dumps a list ### {"memory":32064,"message_stats":{"ack":3870,"ack_details":{"rate":0.0},"deliver":3871,"deliver_details":{"rate":0.0},"deliver_get":3871,"deliver_get_details":{"rate":0.0},"disk_writes":3870,"disk_writes_details":{"rate":0.0},"publish":3870,"publish_details":{"rate":0.0},"redeliver":1,"redeliver_details":{"rate":0.0}},"messages":0,"messages_details":{"rate":0.0},"messages_ready":0,"messages_ready_details":{"rate":0.0},"messages_unacknowledged":0,"messages_unacknowledged_details":{"rate":0.0},"idle_since":"2016-03-01 22:04:22","consumer_utilisation":"","policy":"","exclusive_consumer_tag":"","consumers":4,"recoverable_slaves":"","state":"running","messages_ram":0,"messages_ready_ram":0,"messages_unacknowledged_ram":0,"messages_persistent":0,"message_bytes":0,"message_bytes_ready":0,"message_bytes_unacknowledged":0,"message_bytes_ram":0,"message_bytes_persistent":0,"disk_reads":0,"disk_writes":3870,"backing_queue_status":{"q1":0,"q2":0,"delta":["delta",0,0,0],"q3":0,"q4":0,"len":0,"target_ram_count":"infinity","next_seq_id":3870,"avg_ingress_rate":0.060962064328682466,"avg_egress_rate":0.060962064328682466,"avg_ack_ingress_rate":0.060962064328682466,"avg_ack_egress_rate":0.060962064328682466},"name":"app000","vhost":"/","durable":true,"auto_delete":false,"arguments":{},"node":"rabbit@test2"} ''' for item in [ 'memory','messages','messages_ready','messages_unacknowledged','consumers' ]: #key = rabbitmq.queues[/,queue_memory,queue.helloWorld] key = '"rabbitmq.queues[{0},queue_{1},{2}]"'.format(queue['vhost'], item, queue['name']) ### if item is in queue,value=queue[item],else value=0 value = queue.get(item, 0) logging.debug("SENDER_DATA: - %s %s" % (key,value)) tmpfile.write("- %s %s\n" % (key, value)) ## This is a non standard bit of information added after the standard items for item in ['deliver_get', 'publish']: key = '"rabbitmq.queues[{0},queue_message_stats_{1},{2}]"'.format(queue['vhost'], item, queue['name']) value = queue.get('message_stats', {}).get(item, 0) logging.debug("SENDER_DATA: - %s %s" % (key,value)) tmpfile.write("- %s %s\n" % (key, value)) def _send_queue_data(self, tmpfile): '''Send the queue data to Zabbix.''' '''Get key value from temp file. ''' args = '/opt/app/zabbix/sbin/zabbix_sender -c {0} -i {1}' if self.senderhostname: args = args + " -s " + self.senderhostname return_code = 0 process = subprocess.Popen(args.format(self.conf, tmpfile.name), shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = process.communicate() logging.debug("Finished sending data") return_code = process.wait() logging.info("Found return code of " + str(return_code)) if return_code != 0: logging.warning(out) logging.warning(err) else: logging.debug(err) logging.debug(out) return return_code def check_aliveness(self): '''Check the aliveness status of a given vhost. ''' '''virtual host '/' should be encoded as '/%2f' ''' return self.call_api('aliveness-test/%2f')['status'] def check_overview(self, item): '''First, check the overview specific items''' ''' curl -i -u guest:guest http://localhost:15672/api/overview ''' ## rabbitmq[overview,connections] if item in [ 'channels','connections','consumers','exchanges','queues' ]: return self.call_api('overview').get('object_totals').get(item,0) ## rabbitmq[overview,messages] elif item in [ 'messages','messages_ready','messages_unacknowledged' ]: return self.call_api('overview').get('queue_totals').get(item,0) elif item == 'message_stats_deliver_get': return self.call_api('overview').get('message_stats', {}).get('deliver_get',0) elif item == 'message_stats_publish': return self.call_api('overview').get('message_stats', {}).get('publish',0) elif item == 'message_stats_ack': return self.call_api('overview').get('message_stats', {}).get('ack',0) elif item == 'message_stats_redeliver': return self.call_api('overview').get('message_stats', {}).get('redeliver',0) elif item == 'rabbitmq_version': return self.call_api('overview').get('rabbitmq_version', 'None') def check_server(self,item,node_name): '''Return the value for a specific item in a node's details. ''' '''curl -i -u guest:guest http://localhost:15672/api/nodes''' '''return a list''' # hostname hk-prod-mq1.example.com # self.call_api('nodes')[0]['name'] rabbit@hk-prod-mq1 node_name = node_name.split('.')[0] for nodeData in self.call_api('nodes'): if node_name in nodeData['name']: return nodeData.get(item,0) return 'Not Found' def main(): '''Command-line parameters and decoding for Zabbix use/consumption.''' choices = ['list_queues', 'list_nodes', 'queues', 'check_aliveness', 'overview','server'] parser = optparse.OptionParser() parser.add_option('--username', help='RabbitMQ API username', default='guest') parser.add_option('--password', help='RabbitMQ API password', default='guest') parser.add_option('--hostname', help='RabbitMQ API host', default=socket.gethostname()) parser.add_option('--protocol', help='RabbitMQ API protocol (http or https)', default='http') parser.add_option('--port', help='RabbitMQ API port', type='int', default=15672) parser.add_option('--check', type='choice', choices=choices, help='Type of check') parser.add_option('--metric', help='Which metric to evaluate', default='') parser.add_option('--node', help='Which node to check (valid for --check=server)') parser.add_option('--conf', default='/opt/app/zabbix/conf/zabbix_agentd.conf') parser.add_option('--senderhostname', default='', help='Allows including a sender parameter on calls to zabbix_sender') (options, args) = parser.parse_args() if not options.check: parser.error('At least one check should be specified') logging.debug("Started trying to process data") api = RabbitMQAPI(user_name=options.username, password=options.password, host_name=options.hostname, protocol=options.protocol, port=options.port, conf=options.conf, senderhostname=options.senderhostname) if options.check == 'list_queues': print json.dumps({'data': api.list_queues()},indent=4,separators=(',',':')) elif options.check == 'list_nodes': print json.dumps({'data': api.list_nodes()},indent=4,separators=(',',':')) elif options.check == 'queues': print api.check_queue() elif options.check == 'check_aliveness': print api.check_aliveness() elif options.check == 'overview': #rabbitmq[overview,connections] #--check=overview --metric=connections if not options.metric: parser.error('Missing required parameter: "metric"') else: if options.node: print api.check_overview(options.metric) else: print api.check_overview(options.metric) elif options.check == 'server': #rabbitmq[server,sockets_used] #--check=server --metric=sockets_used if not options.metric: parser.error('Missing required parameter: "metric"') else: if options.node: print api.check_server(options.metric,options.node) else: print api.check_server(options.metric,api.host_name) if __name__ == '__main__': main()
脚本思路:
使用urllib2模块访问RabbitMQ的API接口
对API接口返回的数据进行处理
overview和nodes的数据通过zabbix_agent获取,queues通过zabbix_sender推送给zabbix,zabbix_sender推送之前需要有一个zabbix_agent的key进行主动触发
rabbitmq_status.conf
UserParameter=rabbitmq.discovery_queue,/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=list_queues UserParameter=rabbitmq.queues,/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=queues UserParameter=rabbitmq[*],/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=$1 --metric=$2
四 添加Zabbix监控模板
模板参加附件
参考文档:
http://blog.thomasvandoren.com/monitoring-rabbitmq-queues-with-zabbix.html
http://www.rabbitmq.com/how.html#management
https://github.com/alfss/zabbix-rabbitmq
https://cdn.rawgit.com/rabbitmq/rabbitmq-management/rabbitmq_v3_6_0/priv/www/api/index.html
https://github.com/jasonmcintosh/rabbitmq-zabbix
http://chase-seibert.github.io/blog/2011/07/01/checking-rabbitmq-queue-sizeage-with-nagios.html