0. 基本情况
Redis采用集群模式,560个主节点,主从比为1:1,单台机器上为16个节点。info memory观察到主节点A单个Redis内存used_memory_rss_human为9.2G(设置的maxmemory_human为6G)超过最大值设置。
1.查看客户端连接情况
[root@ A ~]# redis-cli -p 9001 info clients
# Clients
connected_clients:4705
client_longest_output_list:204928
client_biggest_input_buf:0
blocked_clients:0
通过client_longest_output_list发现有大量的输出排队,进一步观察所有的客户端均存在这个情况。
在A机器上刷客户端输出list:
CLIENT_A:9001# Clients
client_longest_output_list:491335
CLIENT_A:9002# Clients
client_longest_output_list:520840
CLIENT_A:9003# Clients
client_longest_output_list:486498
CLIENT_A:9004# Clients
client_longest_output_list:485387
CLIENT_A:9005# Clients
client_longest_output_list:480000
CLIENT_A:9006# Clients
client_longest_output_list:537211
CLIENT_A:9007# Clients
client_longest_output_list:487594
CLIENT_A:9008# Clients
client_longest_output_list:490037
CLIENT_A:9009# Clients
client_longest_output_list:478734
CLIENT_A:9010# Clients
client_longest_output_list:524717
CLIENT_A:9011# Clients
client_longest_output_list:487200
CLIENT_A:9012# Clients
client_longest_output_list:491687
CLIENT_A:9013# Clients
client_longest_output_list:483900
CLIENT_A:9014# Clients
client_longest_output_list:524557
CLIENT_A:9015# Clients
client_longest_output_list:497967
CLIENT_A:9016# Clients
client_longest_output_list:484250
2.输出内存占用不为0 的内容
[root@ A redis-node]# redis-cli -p 9001 client list |grep -v "omem=0"
id=843 addr=client_B:53475 fd=2248 name= age=582784 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=456078 omem=7446059139 events=rw cmd=replconf
通过cmd可知是主从同步占用大量输出内存
3.检查统计值
有问题Client A的从client_B
[root@localhost 9001]# redis-cli -p 9001
127.0.0.1:9001> info stats
# Stats
total_connections_received:2
total_commands_processed:25
instantaneous_ops_per_sec:0
total_net_input_bytes:1280055129
total_net_output_bytes:36461
instantaneous_input_kbps:192.22
instantaneous_output_kbps:0.00
rejected_connections:0
正常的机器C的从Client D
[root@CLIENT D redis-node]# redis-cli -p 9001
127.0.0.1:9001> info Stats
# Stats
total_connections_received:10
total_commands_processed:23220212769
instantaneous_ops_per_sec:18281
total_net_input_bytes:3405817622262
total_net_output_bytes:59530419
instantaneous_input_kbps:2882.54
instantaneous_output_kbps:0.03
rejected_connections:0
输入速度差了10倍,其他配置均一样,怀疑是网络问题。
4.检查网络时延
机器A的从client_B
[root@A redis-node]# traceroute client_B
traceroute to client_B (client_B), 30 hops max, 60 byte packets
1 *.*.*.254 0.752 ms 0.809 ms 0.900 ms
2 *.*.*.133 2.164 ms 2.170 ms 2.157 ms
3 * * *
4 * * *
5 *.*.*.2 163.542 ms 163.940 ms 164.181 ms
6 client_B (client_B) 160.995 ms 156.202 ms 158.603 ms
Clinet C的从Client D
[root@SHJX-YSP-GKTZCQ-SEV-128-35 ~]# traceroute Client D
traceroute to Client D (Client D), 30 hops max, 60 byte packets
1 *.*.*.254 0.732 ms 0.799 ms 0.897 ms
2 *.*.*.133 3.147 ms 3.182 ms 3.168 ms
3 * * *
4 * * *
5 *.*.*.2 5.537 ms 5.788 ms 5.961 ms
6 Client D (Client D) 1.953 ms 1.960 ms 1.956 ms
测试机器A到机器D性能
[root@ 42 9001]# traceroute Client D
traceroute to Client D (Client D), 30 hops max, 60 byte packets
1 *.*.*.254 (10.180.128.254) 0.738 ms 0.800 ms 0.889 ms
2 *.*.*.133 (10.180.176.133) 2.156 ms 2.190 ms 2.177 ms
3 * * *
4 * * *
5 *.*.*.2 5.768 ms 5.942 ms 6.114 ms
6 Client D (Client D) 1.940 ms 1.964 ms 1.948 ms
说明是机器A到client_B网络有问题
5.解决办法
更换从的机器到client_B到clinet E
启动后机器A上9001 内存使用情况
127.0.0.1:9001> info memory
# Memory
used_memory:2771776520
used_memory_human:2.58G
used_memory_rss:3403390976
used_memory_rss_human:3.17G
used_memory_peak:18154232496
used_memory_peak_human:16.91G
used_memory_peak_perc:15.27%
used_memory_overhead:758586508
used_memory_startup:8105584
used_memory_dataset:2013190012
used_memory_dataset_perc:72.84%
total_system_memory:540783263744
total_system_memory_human:503.64G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:6442450944
maxmemory_human:6.00G
maxmemory_policy:volatile-ttl
mem_fragmentation_ratio:1.23
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
127.0.0.1:9001>
6.解决后观察
Client E内存同步情况
127.0.0.1:9001> info stats
# Stats
total_connections_received:37
total_commands_processed:5318696
instantaneous_ops_per_sec:19310
total_net_input_bytes:1711079396
total_net_output_bytes:4954958
instantaneous_input_kbps:2967.88
instantaneous_output_kbps:70.17
rejected_connections:0
sync_full:0
修改后机器A上的9001 输出
[root@A redis-node]# redis-cli -p 9001 info clients
# Clients
connected_clients:4695
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:
7.总结:
在redis设置同样而内存使用不同时,应先查看info命令对比正常机器的输出不同,优先关注Stats的输出情况以及本地的redis.log日志