为了避免单点故障,生产环境中redis升级为集群模式,需要对redis集群进行监控,一旦有节点出现故障便触发报警。Redis有自带的redis-cli客户端,通过cluster info命令能查询到集群的运行情况,我们可以写个shell脚本,通过zabbix来调用这个脚本实现集群的监控。
一、cluster info命令的使用
命令格式:
redis-cli -h [hostname] -p [port] -a [password] cluster info
1、查询集群运行情况(其中一个master节点即可)
/data/redis/bin/redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info
/data/redis/bin/redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep "cluster_state"
/data/redis/bin/redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_state" | awk -F':' '{print $2}'|grep -c "ok"
/data/redis/bin/redis-cli -h xxx.xxx.xxx.xxx -a 'password' -p 7001 cluster info | grep -w "cluster_known_nodes" | awk -F':' '{print $2}'
二、创建监控脚本
vim /etc/zabbix/zabbix_agentd.d/redis_cluster.sh
#!/bin/bash
REDISCLI="/data/redis/bin/redis-cli"
HOST="xxx.xxx.xxx.xxx"
PORT=7001
PASS="password"
if [[ $# == 1 ]];then
case $1 in
cluster_state)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_state" | awk -F':' '{print $2}' | grep -c "ok"`
echo $result
;;
cluster_slots_assigned)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_assigned" | awk -F':' '{print $2}'`
echo $result
;;
cluster_slots_ok)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_ok" | awk -F':' '{print $2}'`
echo $result
;;
cluster_slots_pfail)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_pfail" | awk -F':' '{print $2}'`
echo $result
;;
cluster_slots_fail)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_slots_fail" | awk -F':' '{print $2}'`
echo $result
;;
cluster_known_nodes)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_known_nodes" | awk -F':' '{print $2}'`
echo $result
;;
cluster_size)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_size" | awk -F':' '{print $2}'`
echo $result
;;
cluster_current_epoch)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_current_epoch" | awk -F':' '{print $2}'`
echo $result
;;
cluster_my_epoch)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_my_epoch" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_ping_sent)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_ping_sent" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_pong_sent)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_pong_sent" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_sent)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_sent" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_ping_received)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_ping_received" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_pong_received)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_pong_received" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_meet_received)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_meet_received" | awk -F':' '{print $2}'`
echo $result
;;
cluster_stats_messages_received)
result=`$REDISCLI -h $HOST -a $PASS -p $PORT cluster info 2>/dev/null | grep -w "cluster_stats_messages_received" | awk -F':' '{print $2}'`
echo $result
;;
*)
echo -e "\033[33mUsage: $0 {cluster_state|cluster_slots_assigned|cluster_slots_ok|cluster_slots_pfail|cluster_slots_fail|cluster_known_nodes|cluster size|cluster current epoch|cluster my epoch|cluster_stats_messages_ping_sent|cluster_stats_messages_pong_sent|cluster_stats_messages_sent|cluster_stats_messages_ping_received|cluster_stats_messages_pong_received|cluster_stats_messages_meet_received|cluster_stats_messages_received}\033[0m"
;;
esac
fi
2、赋予脚本可执行权限
chmod +x /etc/zabbix/zabbix_agentd.d/redis_cluster.sh
3、脚本测试
查看redis集群节点数量
/etc/zabbix/zabbix_agentd.d/redis_cluster.sh cluster_known_nodes
三、创建redis集群监控配置文件
1、编写redis监控配置文件
vim /etc/zabbix/zabbix_agentd.d/redis.conf
UserParameter=Redis.Cluster[*],/etc/zabbix/zabbix_agentd.d/redis_cluster.sh $1
2、重启zabbix-agent
systemctl restart zabbix-agent
3、在zabbix server端测试
zabbix_get -s xxx.xxx.xxx.xxx -p 10050 -k "Redis.Cluster[cluster_slots_ok]"
四、创建并导入监控模板
1、创建监控模板文件
注:此模板报警阀值设置为cluster current epoch < 6、cluster known nodes < 6和clusterstate不是ok
redis_cluster_templates.xml文件内容如下
3.4
2019-12-06T05:43:56Z
Templates
Template Redis Cluster
Template Redis Cluster
redis集群监控模板
Templates
Redis cluster node
Redis cluster slots
Redis cluster stats messages
-
cluster_current_epoch
0
Redis.Cluster[cluster_current_epoch]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster node
-
cluster_known_nodes
0
Redis.Cluster[cluster_known_nodes]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster node
-
cluster_my_epoch
0
Redis.Cluster[cluster_my_epoch]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster node
-
cluster_size
0
Redis.Cluster[cluster_size]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster node
-
cluster_slots_assigned
0
Redis.Cluster[cluster_slots_assigned]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster slots
-
cluster_slots_fail
0
Redis.Cluster[cluster_slots_fail]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster slots
-
cluster_slots_ok
0
Redis.Cluster[cluster_slots_ok]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster slots
-
cluster_slots_pfail
0
Redis.Cluster[cluster_slots_pfail]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster slots
-
cluster_state
0
Redis.Cluster[cluster_state]
30s
90d
365d
0
3
0
0
0
0
0
-
cluster_stats_messages_meet_received
0
Redis.Cluster[cluster_stats_messages_meet_received]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
-
cluster_stats_messages_ping_received
0
Redis.Cluster[cluster_stats_messages_ping_received]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
-
cluster_stats_messages_ping_sent
0
Redis.Cluster[cluster_stats_messages_ping_sent]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
-
cluster_stats_messages_pong_received
0
Redis.Cluster[cluster_stats_messages_pong_received]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
-
cluster_stats_messages_pong_sent
0
Redis.Cluster[cluster_stats_messages_pong_sent]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
-
cluster_stats_messages_received
0
Redis.Cluster[cluster_stats_messages_received]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
-
cluster_stats_messages_sent
0
Redis.Cluster[cluster_stats_messages_sent]
30s
90d
365d
0
3
0
0
0
0
0
Redis cluster stats messages
{Template Redis Cluster:Redis.Cluster[cluster_current_epoch].last()}<6
0
cluster current epoch < 6
0
0
3
0
1
{Template Redis Cluster:Redis.Cluster[cluster_known_nodes].last()}<6
0
cluster known nodes < 6
0
0
3
0
1
{Template Redis Cluster:Redis.Cluster[cluster_state].last()}=0
0
Redis Cluster is down
0
0
4
0
1
Redis cluster node
900
200
0.0000
100.0000
1
1
0
1
0
0.0000
0.0000
0
0
0
0
0
0
1A7C11
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_current_epoch]
1
0
F63100
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_known_nodes]
2
0
2774A4
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_my_epoch]
3
0
A54F10
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_size]
Redis cluster slots
900
200
0.0000
100.0000
1
1
0
1
0
0.0000
0.0000
0
0
0
0
0
0
1A7C11
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_slots_assigned]
1
0
F63100
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_slots_fail]
2
0
2774A4
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_slots_ok]
3
0
A54F10
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_slots_pfail]
Redis cluster stats messages
900
200
0.0000
100.0000
1
1
0
1
0
0.0000
0.0000
0
0
0
0
0
0
1A7C11
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_meet_received]
1
0
F63100
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_ping_received]
2
0
2774A4
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_ping_sent]
3
0
A54F10
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_pong_received]
4
0
FC6EA3
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_pong_sent]
5
0
6C59DC
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_received]
6
0
AC8C14
0
7
0
-
Template Redis Cluster
Redis.Cluster[cluster_stats_messages_sent]
2、导入监控模板
配置—模板—导入
点击“选择文件”,找到redis_cluster_templates.xml文件,将其导入