作者:RogerZhuo
来源:DBACoder
Redis4.0版本增加了很多诱人的新特性,在redis精细化运营管理中都非常有用(猜想和antirez加入redislabs有很大关系);此系列几篇水文主要介绍以下几个新特性的使用和效果。
Redis Memeory Command:详细分析内存使用情况,内存使用诊断,内存碎片回收;
PSYNC2:解决failover和从实例重启不能部分同步;PSYNC3已经路上了;
LazyFree: 再也不用怕big key的删除引起集群故障切换;
LFU: 支持近似的LFU内存淘汰算法;
Active Memory Defragmentation:内存碎片回收效果很好(实验阶段);
Modules: Redis成为更多的可能(觉得像mongo/mysql引入engine的阶段);
因暂未有官方的详细文档,加之业余时间有限; 还请各位看官请轻拍。:)
那本文先介绍第一个特性memory指令。
redis4.0引入新的命令memory, memory命令共有5个子命令;
让我们能更深入要了解redis内部的内存使用情况。
通过memory help命令,可以查看除memory doctor的其他4个子命令;
5个指令简介如下:
MEMORY USAGE [SAMPLES] -“Estimate memory usage of key”
MEMORY STATS -“Show memory usage details”
MEMORY PURGE -“Ask the allocator to release memory”
MEMORY DOCTOR - “A better observability on the Redis memory usage.”
MEMORY MALLOC-STATS - “Show allocator internal stats”
本文简述memory每个子命令的用途和部分实现。
在redis4.0之前,只能通过DEBUG OBJECT命令估算key的内存使用(字段serializedlength),但因为相差太大,没有太多参考价值。
注:可以通过rdb工具分析rdb文件,获得某个key的实际使用内存
如以下示例,k1的序列化值是7。
127.0.0.1:6379> set k1 value1
OK
127.0.0.1:6379> DEBUG OBJECT k1
xx refcount:1 encoding:embstr serializedlength:7 lru:7723189 lru_seconds_idle:160
usage子命令使用非常简单,直接按memory usage key名字;如果当前key存在,则返回key的value实际使用内存估算值;如果key不存在,则返回nil.
示例:
127.0.0.1:6379> set k1 value1
OK
127.0.0.1:6379> memory usage k1 //这里k1 value占用57字节内存
(integer) 57
127.0.0.1:6379> memory usage aaa // aaa键不存在,返回nil.
(nil)
memory usage不包含key串的内存占用
127.0.0.1:6379>set k1 a // key长度为2字符
OK
127.0.0.1:6379> memory usage k1
(integer) 52
127.0.0.1:6379> set k111111111111 a //key长度为13字符
OK
127.0.0.1:6379> memory usage k111111111111 //两个value相同,但key长度不同的key, usage分析的内存占用相同
(integer) 52
memory usage不包含Key Expire的内存占用
127.0.0.1:6379> memory usage k1
(integer) 52
127.0.0.1:6379> expire k1 10000 //对k1设置ttl
(integer) 1
127.0.0.1:6379> memory usage k1 //usage不包含ttl的内存占用
(integer) 52
对于集合的数据类型(除string外), usage子命令采用类似LRU SAMPLES的抽样方式,默认抽样5个元素求平均 X 元数个数 得出实际内存占用(下一节会详细说明)。所以计算是近似值,当面可以指定抽样的SAMPLES个数。
示例说明: 生成一个100w个字段的hash键:hkey, 每字段的value长度是从1~1024字节的随机值。
127.0.0.1:6379> hlen hkey // hkey有100w了字段,每个字段的value长度介入1~1024个字节
(integer) 1000000
127.0.0.1:6379> MEMORY usage hkey //默认SAMPLES为5,分析hkey键内存占用521588753字节
(integer) 521588753
127.0.0.1:6379> MEMORY usage hkey SAMPLES 1000 //指定SAMPLES为1000,分析hkey键内存占用617977753字节
(integer) 617977753
127.0.0.1:6379> MEMORY usage hkey SAMPLES 10000 //指定SAMPLES为10000,分析hkey键内存占用624950853字节
(integer) 624950853
这是使用抽样求平均的算法,要想获取key较精确的内存值,就指定更大SAMPLES个数。但并不越大越好,因为越大,memory usage占用cpu时间分片就大。
memory usage时间复杂度,和指定的SAMPLES数有点
见以下示例,SAMPLES为1000耗时0.176ms, 为100000耗时14.65ms
127.0.0.1:6379> SLOWLOG get
1) 1) (integer) 3
3) (integer) 14651
4) 1) "MEMORY"
2) "usage"
3) "hkey"
4) "SAMPLES"
5) "100000"
2) 1) (integer) 1
3) (integer) 176
4) 1) "MEMORY"
2) "usage"
3) "hkey"
4) "SAMPLES"
5) "1000"
注:全实例的Expire内存占用,详见下文memory stats子命令的overhead.hashtable.expires)
memory命令的入口函数为memoryCommand(object.c文件中)
/* The memory command will eventually be a complete interface for the
* memory introspection capabilities of Redis.
*
* Usage: MEMORY usage */
void memoryCommand(client *c) {
对于memory usage的计算核心函数objectComputeSize(object.c文件中)
因为文章篇幅,这里只贴少部分代码:
//函数注释已说明,value计算和取样计算的处理
/* Returns the size in bytes consumed by the key's value in RAM.
* Note that the returned value is just an approximation, especially in the
* case of aggregated data types where only "sample_size" elements
* are checked and averaged to estimate the total size. */
#define OBJ_COMPUTE_SIZE_DEF_SAMPLES 5 /* Default sample size. */ //对于集合类型结构,默认取样个数为5
size_t objectComputeSize(robj *o, size_t sample_size) {
if (o->type == OBJ_STRING) { //String类型
--- 省略---------
} else if (o->type == OBJ_LIST) { //List类型
if (o->encoding == OBJ_ENCODING_QUICKLIST) {
quicklist *ql = o->ptr;
quicklistNode *node = ql->head;
asize = sizeof(*o)+sizeof(quicklist);
//获取List的头sample_size个元数,计算长度之和
do {
elesize += sizeof(quicklistNode)+ziplistBlobLen(node->zl);
samples++;
} while ((node = node->next) && samples < sample_size); 个元素
asize += (double)elesize/samples*listTypeLength(o); //求平均 X List元素个数,计算出内存之和
} ---省略----
在redis 4.0之前,我们只能通过info memory查看redis实例的内存大体使用状况;而内存的使用细节,比如expire的消耗,client output buffer, query buffer等是很难直观显示的。 memory stats命令就是为展现redis内部内存使用细节。
memory stats命令直接运行,返回当前实例内存使用细节;命令的系统开销小(可用于监控采集)。示例如下: 运行时返回33行数据,16个子项目; 下节详细分析,每个子项目的具体含义。
127.0.0.1:6379> memory stats
1) "peak.allocated"
2) (integer) 3211205544
3) "total.allocated"
4) (integer) 875852320
5) "startup.allocated"
6) (integer) 765608
7) "replication.backlog"
8) (integer) 117440512
9) "clients.slaves"
10) (integer) 16858
11) "clients.normal"
12) (integer) 49630
13) "aof.buffer"
14) (integer) 0
15) "db.0"
16) 1) "overhead.hashtable.main"
2) (integer) 48388888
3) "overhead.hashtable.expires"
4) (integer) 104
17) "overhead.total"
18) (integer) 166661600
19) "keys.count"
20) (integer) 1000007
21) "keys.bytes-per-key"
22) (integer) 875
23) "dataset.bytes"
24) (integer) 709190720
25) "dataset.percentage"
26) "81.042335510253906"
27) "peak.percentage"
28) "27.274873733520508"
29) "fragmentation"
30) "0.90553224086761475"
peak.allocated: redis从启动来,allocator分配的内存峰值;同于info memory的used_memory_peak
total.allocated: allocator分配当前内存字节数;同于info memory的used_memory
startup.allocated: redis启动完成使用的内存字节数;initial_memory_usage; / Bytes used after initialization. /
replication.backlog: redis复制积压缓冲区(replication backlog)内存使用字节数; 通过repl-backlog-size参数设置,默认1M,上例中redis设置是100MB。(每个实例只有一个backlog)
注意:1. redis启用主从同步,不管backlog是否被填充,replication.backlog都等于repl-backlog-size的值。
笔者觉得此值应设置为repl_backlog_histlen更合适,没太明白大神的用意。
2. slave也会启用backlog;用于slave被提升为master后,
仍能使用PSYNC(这也是redis4.0 PSYNC 2.0实现的基础
clients.slaves: 在master侧,所有slave clients消耗的内存字节数(非常重要的指标)。
每个slave连接master有且只有一个client, 标识为Sclient list命令中flag为S. 这里消耗的内存指每个slave client的query buffer, client output buffer和client本身结构体占用。
有此指标,就能有效监控和分析slave client消耗的output buffer, 更优化地设置”client-output-buffer-limit”。
下面示例:当slave client limit设置很大时,可见client的output占用内存非常大,clients.slaves已达3GB. 以前只能通过client list的omem字段分析。
127.0.0.1:6379> memory stats
1) "peak.allocated"
2) (integer) 38697041192
9) "clients.slaves" //因slave client出现大量的client output buffer内存占用
10) (integer) 3312505550
11) "clients.normal"
12) (integer) 2531130
clients.normal:Redis所有常规客户端消耗内存节字数(非常重要)
即所有flag为N的客户端内存使用: query buffer + client output buffer + client的结构体内存占用。
计算方式和clients.slave类似。 这个子项对于我们监测异常的数据写入或读取的client非常有用。
127.0.0.1:6379> memory stats //省略其他信息
9) "clients.slaves" // slave client主要占用是client output buffer,见下面id=10520连接
10) (integer) 10256918
11) "clients.normal" //普通clients占用的内存,一般是query buffer和client output buffer.
12) (integer) 102618310
127.0.0.1:6379> client list //只显示几个测试clients, 查看omem和qbuf
id=10520 addr=xx:60592 fd=8 flags=S qbuf=0 qbuf-free=0 obl=0 oll=2 omem=10240060 events=rw cmd=replconf
id=10591 addr=xx:56055 fd=10 flags=N qbuf=5799889 qbuf-free=4440113 obl=0 oll=0 omem=0 events=r cmd=set
id=10592 addr=xx:56056 fd=11 flags=N qbuf=10121401 qbuf-free=118601 obl=0 oll=0 omem=0 events=r cmd=set
id=10593 addr=xx:56057 fd=12 flags=N qbuf=0 qbuf-free=10240002 obl=0 oll=0 omem=0 events=r cmd=set
aof.buffer: AOF BUFFER使用内存字节数; 一般较小,只有出现AOF rewrite时buffer会变得较大。开启AOF或定期执行BGREWRITEAOF命令的业务,可能使用内存较大,需关注此项目。
overhead.hashtable.main:
overhead.hashtable.expires:
overhead.total:redis额外的总开销内存字节数; 即分配器分配的总内存total.allocated,减去数据实际存储使用内存。overhead.total由7部分组成,公式如下:
计算公式:
overhead.total=startup.allocated + replication.backlog + clients.slaves
+ clients.normal + aof.buffer + overhead.hashtable.main
+ overhead.hashtable.expires
实例分析:(见上文 节中的实例),通过计算验证正确。
765608 + 117440512 + 16858 + 49630 + 0 + 48388888 + 104 = 166661600
且示例中:17) "overhead.total"=166661600
理论应尽量减少额外的内存开销- overhead.total,现在有详细监控,就可以很好入手分析
keys.count: 整个实例key的个数; 相同于dbsize返回值
keys.bytes-per-key:每个key平均占用字节数;把overhead也均摊到每个key上。不能以此值来表示业务实际的key平均长度。
计算公式:
keys.bytes-per-key = (total.allocated-startup.allocated)/keys.count
实例分析:
(875852320-765608)/1000007 = 875.08 //和上文中的875对应
dataset.bytes:表示redis数据占用的内存容量,即分配的内存总量,减去总的额外开销内存量。
计算公式:
dataset.bytes = total.allocated - overhead.total
实例分析:
875852320 - 166661600 = 709190720 // 有上文示例中"dataset.bytes"值相同
dataset.percentage:表示redis数据占用内存占总内存分配的百分比(重要);
计算公式:
dataset.percentage = dataset.bytes/(total.allocated-startup.allocated) * 100%
实例分析:
709190720/(875852320-765608) * 100% = 81.042336750% //有上文示例中的dataset.percentage相同
可表示业务的redis数据存储的内存效率
peak.percentage:当前内存使用量与峰值时的占比
计算公式:
peak.percentage = total.allocated/peak.allocated * 100%
实例分析:
875852320/3211205544 * 100% = 27.274875681393% //有上文示例中的peak.percentage相同
fragmentation: 表示Redis的内存碎片率(非常重要);前文的项目中都没包含redis内存碎片属性
/ Fragmentation = RSS / allocated-bytes /, 同于info memory中的mem_fragmentation_ratio
memory doctor命令分析redis使用内存的状态,根据一系列简单判断,给出一定的诊断建议,有一定参考价值。
在redis-cli中运行memory doctor命令,如果内存使用有明显不合里的情况,会给出不合理的状态,同时给出处理的建议。
示例如下:
127.0.0.1:6379> memory doctor
"Sam, I detected a few issues in this Redis instance memory implants:\n\n
Peak memory: In the past this instance used more than 150% the memory that is currently using.
The allocator is normally not able to release memory after a peak,
so you can expect to see a big fragmentation ratio,
however this is actually harmless and is only due to the memory peak, and if the Redis instance Resident Set Size (RSS) is currently bigger than expected,
the memory will be used as soon as you fill the Redis instance with more data.
If the memory peak was only occasional and you want to try to reclaim memory,
please try the MEMORY PURGE command, otherwise the only other option is to
shutdown and restart the instance.\n\n
I'm here to keep you safe, Sam. I want to help you.\n"
memory doctor主要列举条件判断,满足条件的给出检查结果和建议。
主要包含以下几点,满足其中一点,就给出诊断结果和建议:
used_memory小于5M,doctor认为内存使用量过小,不做进一步诊断
peak分配内存大于当前total_allocated的1.5倍,可能说明RSS远大于used_memory
内存碎片率大于1.4
每个Normal Client平均使用内存大于200KB
每个Slave Client平均使用内存大于10MB
dockor命令的实现函数getMemoryDoctorReport(void)在object.c源文件
核心代码块如下:
sds getMemoryDoctorReport(void) {
int empty = 0; /* Instance is empty or almost empty. */
int big_peak = 0; /* Memory peak is much larger than used mem. */
int high_frag = 0; /* High fragmentation. */
int big_slave_buf = 0; /* Slave buffers are too big. */
int big_client_buf = 0; /* Client buffers are too big. */
int num_reports = 0;
struct redisMemOverhead *mh = getMemoryOverheadData(); //获取各个内存指标,用于后面进行条件判断
if (mh->total_allocated < (1024*1024*5)) { // 如果使用内存小于5MB,则判断几乎是空实例,不进行其他诊断
empty = 1;
num_reports++;
} else {
---- 省略---
/* Fragmentation is higher than 1.4? */
if (mh->fragmentation > 1.4) {
high_frag = 1;
num_reports++;
}
/* Slaves using more than 10 MB each? */
if (numslaves > 0 && mh->clients_slaves / numslaves > (1024*1024*10)) {
big_slave_buf = 1;
num_reports++;
}
}
sds s;
if (num_reports == 0) {
s = sdsnew(
"Hi Sam, I can't find any memory issue in your instance. "
"I can only account for what occurs on this base.\n");
} else if (empty == 1) {
s = sdsnew(
"Hi Sam, this instance is empty or is using very little memory, "
"my issues detector can't be used in these conditions. "
"Please, leave for your mission on Earth and fill it with some data. "
"The new Sam and I will be back to our programming as soon as I "
"finished rebooting.\n");
} else {
if (high_frag) {
s = sdscatprintf(s," * High fragmentation: This instance has a memory fragmentation greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is \"%s\".\n\n", ZMALLOC_LIB);
}
if (big_slave_buf) {
s = sdscat(s," * Big slave buffers: The slave output buffers in this instance are greater than 10MB for each slave (on average). This likely means that there is some slave instance that is struggling receiving data, either because it is too slow or because of networking issues. As a result, data piles on the master output buffers. Please try to identify what slave is not receiving data correctly and why. You can use the INFO output in order to check the slaves delays and the CLIENT LIST command to check the output buffers of each slave.\n\n");
}
if (big_client_buf) {
}
memory purge命令通过调用jemalloc内部命令,进行内存释放,尽量把redis进程占用但未有效使用内存,即常说的内存碎片释放给操作系统。
memory purge功能只适用于使用jemalloc作为allocator的实例。
redis的内存碎片率,是DBA比较头疼的事; 如某个业务下线删除了大量的key,redis不会把“清理”的内存及时归还给操作系统;但这部分内存可以被redis再次利用。
redis4.0提供两种机制解决内存碎片问题,一是memory purge命令; 二是Active memory defragmentation,目前还处于实验阶段,回收效率相当高; 本节只介绍memory purge.
memory purge使用简单,对性能没明显影响;通过测试验证来看,内存碎片回收的效率不高,当mem_fragmentation_ratio为2时,执行purge基本没有回收;
下面例子中:内存碎片率mem_fragmentation_ratio为8.2,执行memory purge, 碎片率下降为7.31,回收内存0.28GB。 从4.0版本来看,回收的效率不太理想。
127.0.0.1:6379> info memory
# Memory
used_memory:344944360
used_memory_human:328.96M
used_memory_rss:2828042240
used_memory_rss_human:2.63G
mem_fragmentation_ratio:8.20
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
127.0.0.1:6379> memory purge
OK
127.0.0.1:6379> info memory
# Memory
used_memory:344942912
used_memory_human:328.96M
used_memory_rss:2522521600
used_memory_rss_human:2.35G
used_memory_dataset_perc:86.02%
mem_fragmentation_ratio:7.31
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
memory purge命令只在jemalloc分配器中有效。
因真正释放内存操作,是通过jemalloc的底层实现,笔者没太看明白;
感兴趣的看官,阅读object.c源文件中的memoryCommand()函数逻辑代码如下:
else if (!strcasecmp(c->argv[1]->ptr,"purge") && c->argc == 2) {
#if defined(USE_JEMALLOC) //判断当前redis使用提malloc是否为jemalloc
char tmp[32];
unsigned narenas = 0;
size_t sz = sizeof(unsigned);
if (!je_mallctl("arenas.narenas", &narenas, &sz, NULL, 0)) { //调用jemalloc的处理
sprintf(tmp, "arena.%d.purge", narenas);
if (!je_mallctl(tmp, NULL, 0, NULL, 0)) {
addReply(c, shared.ok);
return;
}
}
addReplyError(c, "Error purging dirty pages");
#else
addReply(c, shared.ok);
/* Nothing to do for other allocators. */
#endif
此命令用于打印allocator内部的状态,目前只支持jemalloc。对于源码开发同学,应该比较有用;简单示例如下:
127.0.0.1:6379> memory malloc-stats
___ Begin jemalloc statistics ___
Version: 4.0.3-0-ge9192eacf8935e29fc62fddc2701f7942b1cc02c
Assertions disabled
Run-time option settings:
opt.abort: false
opt.lg_chunk: 21
opt.dss: "secondary"
opt.narenas: 96
opt.lg_dirty_mult: 3 (arenas.lg_dirty_mult: 3)
opt.stats_print: false
opt.junk: "false"
opt.quarantine: 0
opt.redzone: false
opt.zero: false
opt.tcache: true
opt.lg_tcache_max: 15
CPUs: 24
Arenas: 96
Pointer size: 8
Quantum size: 8
Page size: 4096
Min active:dirty page ratio per arena: 8:1
Maximum thread-cached size class: 32768
Chunk size: 2097152 (2^21)
Allocated: 345935320, active: 350318592, metadata: 65191296, resident: 455610368, mapped: 2501902336
Current active ceiling: 352321536
arenas[0]:
assigned threads: 1
dss allocation precedence: secondary
min active:dirty page ratio: 8:1
dirty pages: 85527:10020 active:dirty, 82084 sweeps, 112369 madvises, 665894 purged
allocated nmalloc ndalloc nrequests
small: 311397848 12825077 11674091 32248603
large: 983040 1850 1842 1854
huge: 33554432 8 6 8
total: 345935320 12826935 11675939 32250465
--省略------
memory命令使用我们能直观地查看redis内存分布,对我们掌握内存使用情况,有针对性地做业务的内存使用优化。尤其是purge, stats, usage三个子命令。 相信在新的版本中,memory命令的功能会更加强:)
-END-
推荐订阅原文作者公众号 DBACoder
▼