聊聊redis的slowlog与latency monitor

序

本文主要研究一下redis的slowlog与latency monitor

slowlog

redis在2.2.12版本引入了slowlog，用于记录超过指定执行时间的命令，这个执行时间不包括诸如与客户端通信的IO操作耗时，是实实在在的命令执行的耗时。主要有如下操作：

查看slowlog的数量

127.0.0.1:6379> slowlog len
(integer) 1024

查看slowlog的执行耗时阈值

127.0.0.1:6379> config get slowlog-log-slower-than
1) "slowlog-log-slower-than"
2) "1000"

设置slowlog的执行耗时阈值

127.0.0.1:6379> config set slowlog-log-slower-than 1000
OK

查看slowlog保存数量的阈值

127.0.0.1:6379> config get slowlog-max-len
1) "slowlog-max-len"
2) "1024"

设置slowlog保存数量的阈值

127.0.0.1:6379> config set slowlog-max-len 1024
OK

查询slowlog

127.0.0.1:6379> slowlog get 1
1) 1) (integer) 76016
   2) (integer) 1537250266
   3) (integer) 48296
   4) 1) "COMMAND"

第一行是命令id，第二行是timestamp，第三行是执行耗时，第四行是命令及参数

清除slowlog记录

127.0.0.1:6379> slowlog reset
OK

latency monitor

redis在2.8.13版本引入了latency monitoring，这里主要是监控latency spikes(延时毛刺)。它基于事件机制进行监控，command事件是监控命令执行latency，fast-command事件是监控时间复杂度为O(1)及O(logN)命令的latency，fork事件则监控redis执行系统调用fork(2)的latency。主要有如下操作：

设置/开启latency monitor

127.0.0.1:6379> config set latency-monitor-threshold 100
OK

读取latency monitor配置

127.0.0.1:6379> config get latency-monitor-threshold
1) "latency-monitor-threshold"
2) "100"

获取最近的latency

127.0.0.1:6379> debug sleep 1
OK
(1.01s)
127.0.0.1:6379> debug sleep .25
OK
127.0.0.1:6379> latency latest
1) 1) "command"
   2) (integer) 1537268070
   3) (integer) 250
   4) (integer) 1010

返回事件名、发生的时间戳、最近的延时( 毫秒)、最大的延时( 毫秒)

查看某一事件的延时历史

127.0.0.1:6379> latency history command
1) 1) (integer) 1537268064
   2) (integer) 1010
2) 1) (integer) 1537268070
   2) (integer) 250

查看事件延时图

127.0.0.1:6379> latency reset command
(integer) 0
127.0.0.1:6379> debug sleep .1
OK
127.0.0.1:6379> debug sleep .2
OK
127.0.0.1:6379> debug sleep .3
OK
127.0.0.1:6379> debug sleep .4
OK
127.0.0.1:6379> debug sleep .5
OK
(0.50s)
127.0.0.1:6379> latency graph
(error) ERR syntax error
127.0.0.1:6379> latency graph command
command - high 500 ms, low 100 ms (all time high 500 ms)
--------------------------------------------------------------------------------
   _#
  _||
 _|||
_||||

22117
4062s
ssss

重置/清空事件数据

127.0.0.1:6379> latency reset command
(integer) 1
127.0.0.1:6379> latency history command
(empty list or set)
127.0.0.1:6379> latency latest
(empty list or set)

诊断建议

127.0.0.1:6379> latency doctor
Dave, I have observed latency spikes in this Redis instance. You don't mind talking about it, do you Dave?

1. command: 6 latency spikes (average 257ms, mean deviation 142ms, period 3.83 sec). Worst all time event 500ms.

I have a few advices for you:

- Check your Slow Log to understand what are the commands you are running which are too slow to execute. Please check http://redis.io/commands/slowlog for more information.
- Deleting, expiring or evicting (because of maxmemory policy) large objects is a blocking operation. If you have very large objects that are often deleted, expired, or evicted, try to fragment those objects into multiple smaller objects.

小结

redis的slowlog在2.2.12版本引入，latency monitor在2.8.13版本引入
slowlog仅仅是记录纯命令的执行耗时，不包括与客户端的IO交互及redis的fork等耗时
latency monitor监控的latency spikes则范围广一点，不仅包括命令执行，也包括fork(2)系统调用，key过期等操作的耗时

doc

slowlog
latency-monitor
latency-monitor中文版