1.便于统计分析,以往只能一个节点一个节点看,采集到一起方便分析
2.慢日志持久化,类比MySQL,mysql是有日志文件的,而redis是内存队列FIFO
3.redis是单线程的slowlog的监控分析很有意义,严重会导致系统雪崩
slowlog get [len]
和 slowlog reset
。
slowlog get [len]
命令获取指定长度的慢查询列表。
|
核心代码,关于采集的参数代码中有详细注释
public void slowLogProducer(Cluster cluster) {
try {
List addressList = NetUtil.getHostByAddress(cluster.getAddress());
for (Host host : addressList) {
List slowList = JedisUtil.getClusterSlowLog(new ConnectionParam(host.getIp(), host.getPort(), cluster.getRedisPassword()));
List binarySlowLogs = ClusterBinarySlowLog.from(slowList);
Map redisInfo = JedisUtil.getMapInfo(new ConnectionParam(host.getIp(), host.getPort(), cluster.getRedisPassword()));
for (ClusterBinarySlowLog log : binarySlowLogs) {
List args = log.getArgs();
String type = args.get(0);
List commands = args.subList(1, args.size());
String command = StringUtils.join(commands, " ");
RedisBinarySlowLog redisBinarySlowLog = new RedisBinarySlowLog();
redisBinarySlowLog.setRedisName(cluster.getClusterName());
redisBinarySlowLog.setRedisRole(redisInfo.get("role"));
redisBinarySlowLog.setRedisVersion(redisInfo.get("redis_version"));
redisBinarySlowLog.setServerHost(host.getIp() + ":" + host.getPort());
redisBinarySlowLog.setSlowDateTime(DateUtil.getFormatDateTime(log.getTimeStamp() * 1000));
redisBinarySlowLog.setRunTime(log.getExecutionTime());
redisBinarySlowLog.setType(type);
redisBinarySlowLog.setCommand(command);
redisBinarySlowLog.setClientHost(log.getClientAddress());
redisBinarySlowLog.setClientName(log.getClientName());
slowLogger.info(JSONObject.fromObject(redisBinarySlowLog).toString());
}
}
} catch (Exception e) {
logger.error("SlowLogSchedule SlowLogProducer get SlowLog from redis exception", e);
}
}
public class RedisBinarySlowLog {
// redis集群的名称
private String RedisName;
// redis节点的角色
private String RedisRole;
// redis集群的版本
private String RedisVersion;
// redis的节点ip:port
private String serverHost;
// slowlog产生的时间
private String slowDateTime;
// 慢查询的执行时长
private long runTime;
// 慢查询命令的类型
private String type;
// 慢查询命令参数
private String command;
// 发起慢查询命令的客户端ip:port
private String clientHost;
// 发起慢查询命令的客户端name
private String clientName;
……
|
通过logback.xml配置落盘,EFK采集展示
上面返回了两个慢查询命令,其中每行的含义如下:
在redis cluster中又会多出来两行参数,发起redis slow log的慢查询客户端
2020-09-04 10:36:14.518|{"redisVersion":"4.0.10","clientHost":"10.10.202.166:38822","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2020-03-23 22:37:39","runTime":26364,"type":"INFO","command":"","serverHost":"10.10.52.100:24692"}
2020-09-04 10:36:14.519|{"redisVersion":"4.0.10","clientHost":"10.10.202.166:58979","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2020-03-23 20:36:39","runTime":17576,"type":"INFO","command":"","serverHost":"10.10.52.100:24692"}
2020-09-04 10:36:14.519|{"redisVersion":"4.0.10","clientHost":"10.10.202.166:33727","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2020-03-23 10:19:39","runTime":20188,"type":"INFO","command":"","serverHost":"10.10.52.100:24692"}
2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.110:57824","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2019-10-10 14:51:30","runTime":29046,"type":"bgsave","command":"","serverHost":"10.10.52.100:24696"}
2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.110:53587","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2019-10-10 14:28:43","runTime":44932,"type":"bgsave","command":"","serverHost":"10.10.52.100:24696"}
2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.51:58326","redisName":"xueqiu-push-user-sep","clientName":"lettuce#ClusterTopologyRefresh","redisRole":"slave","slowDateTime":"2019-08-29 18:08:05","runTime":20747,"type":"CLUSTER","command":"NODES","serverHost":"10.10.52.100:24696"}
2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.31:59840","redisName":"xueqiu-push-user-sep","clientName":"lettuce#ClusterTopologyRefresh","redisRole":"slave","slowDateTime":"2019-08-27 14:09:32","runTime":22212,"type":"CLUSTER","command":"NODES","serverHost":"10.10.52.100:24696"}
2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2017-01-02 20:51:33","runTime":12224,"type":"bgsave","command":"","serverHost":"10.10.201.100:21201"}
2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 08:11:44","runTime":10121,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-14 23:47:44","runTime":11084,"type":"SETEX","command":"accer_5m:GE 1728000 0.04","serverHost":"10.10.27.110:18501"}
2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 07:37:09","runTime":23488,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2017-01-02 18:41:31","runTime":23067,"type":"bgsave","command":"","serverHost":"10.10.201.100:21201"}
2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 05:59:46","runTime":18328,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2017-01-02 18:27:11","runTime":71880,"type":"bgsave","command":"","serverHost":"10.10.201.100:21201"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-14 02:00:06","runTime":11504,"type":"SETEX","command":"accer_5m:KEY 1728000 -0.05","serverHost":"10.10.27.110:18501"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 04:44:08","runTime":18322,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2015-12-24 13:51:49","runTime":19850,"type":"INFO","command":"","serverHost":"10.10.201.100:21201"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-13 09:52:05","runTime":12074,"type":"ZREMRANGEBYSCORE","command":"price_statistics:SZ002888 20.2 20.2","serverHost":"10.10.27.110:18501"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2015-12-23 20:59:28","runTime":10008,"type":"INFO","command":"","serverHost":"10.10.201.100:21201"}
2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 00:05:39","runTime":19104,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
2020-09-10 15:02:15.196|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2015-12-18 03:00:48","runTime":15257,"type":"INFO","command":"","serverHost":"10.10.201.100:21201"}
2020-09-10 15:02:15.196|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-16 23:52:09","runTime":20171,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
2020-09-10 15:02:15.196|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-12 14:51:00","runTime":14154,"type":"ZREMRANGEBYSCORE","command":"capital_net_flows_day_l2:SZ002008 1.5892128E12 1.5892128E12","serverHost":"10.10.27.110:18501"}
slowlog reset
命令是清空慢日志队列。
按照最大的估算:
一个redis节点一天0.3MB,
那么线上sentinel节点356个master,按照一个master两个slave
线上cluster集群28个,按照几个集群主从共8个节点
那么就是0.3*(356*3+28*8)=387.6MB,这个还是按照每天每个节点都有慢查询估算了,实际1/10都没有,目标就是消灭慢查询
slowlog get和reset是非原子的,会不会漏?
slowlog多长时间采集一次合适?
slowlog-log-slower-than: 慢查询阀值,单位微秒. 默认100000(10毫秒);
生产环境设置1ms,因为Redis是single thread,如果命令都是1ms以上,则实例的吞吐量只有1000QPS.
slowlog-max-len: 慢查询存储的最大个数,我们默认1000;
生产设置设置大于1024,因为slowlog会省略过多的参数,慢查询不会占用过多的内存;
慢查询队列满后,淘汰最老的慢查询实体。
如MySQL/MongoDB等常见数据库,慢查询的query_time都会包含命令所有耗时,包含锁等待这类时间;
而Redis的慢查询query_time只记录自己“被cpu服务的时间”,不包含排队等待、IO等待(如AOF SYNC)这类时间。
理解这点非常重要定:请求的响应时间(R),服务时间(S), 排队延时(Q). R = S + Q
# Clients
connected_clients:10
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
The Redis Slow Log is a system to log queries that exceeded a
specified execution time. The execution time does not include I/O
operations like talking with the client, sending the reply and so forth,
but just the time needed to actually execute the command (this is the only
stage of command execution where the thread is blocked and can not serve
other requests in the meantime).
所以RD和DBA在设计keyspace和访问模式时,应尽量避免使用耗时较大的命令。
根据经验在实际生产中,Redis单实例能处理4~5w的QPS, 如果大量的redis命令大量耗时大于1ms, 其实QPS只能达到1000基于几百。
Redis出现耗时大的命令,导致其他所有请求被阻塞等待,redis处理能力急剧退化,易导致整个服务链雪崩。