Weird change of "memory used"

问题背景

起因是Stackoverflow上的问题,看到这个问题第一反应是,可能是因为OBJ_ENCODING_ZIPLIST/OBJ_ENCODING_HT或者redis rehash吧。然而,仔细看了问题描述之后,发现和这两个原因没有一丁点关系。
自诩为资深troubleshooter的某球,认为troubleshooting有以下3个步骤:

  1. 理解问题
  2. 重现问题
  3. 解决问题

鄙以为很多同学在“理解问题”这一环节上投入得很不够,要么是客户的描述不是很清楚,不能很好地引导客户去描述自己的问题;要么是主观上过于武断,通过一点点蛛丝马迹就判断是某个问题,不通过从多方的数据佐证;要么是意识到这可能是一个复杂的问题,强行当作一个简单的问题去处理,想避免受辱,殊不知“受辱一时而补牢,好于受辱一世而不知也”。当然,某球也有这样的时候,诸君共勉。

理解问题

这位同学的回答感觉有些问题,于是和题主再三确认:

@Karthikeyan Gopall,I think it's better if you provide two more details: 1. How you get the momery used? use redis-cli to get 'info Memory' and get the value of field ''used_memory" or "used_memory_rss" or somewhere else 2. Your memory allocator which you can get by "./redis-cli info |grep mem_allocator"
@sel-fish 1) I will get it from "used_memory". Before starting the process I will take the used_memory, after the process I will do the same and difference of them is what I have mentioned in the graph. 2) mem_allocator:jemalloc-3.6.0

重现问题

重现了一下这个问题,结果和题主在问题中的贴图不太一致,但是有一些有共性的地方:

解决问题

查代码

查看redis源码

hsetCommand
ziplistPush
ziplistResize
zrealloc

题主提到的场景下每次hset,都会引发zrelloc,但是realloc不一定会导致memory_used增加的,src/zmalloc.c中zrealloc的部分代码:

    oldsize = zmalloc_size(ptr);
    newptr = realloc(ptr,size);
    if (!newptr) zmalloc_oom_handler(size);

    update_zmalloc_stat_free(oldsize);
    update_zmalloc_stat_alloc(zmalloc_size(newptr));

如果oldsize比size要大,realloc其实是不会申请新的空间的,这个时候memory_used不会增加,就出现了题主问题中提到的现象。对于libc的malloc,是不会出现这种情况的,每次realloc申请新的空间,至少从malloc_size接口拿到的值每次都会变的:

所以,猜想这个问题是jemalloc的逻辑导致的。

确认猜想

直接测试jemalloc在各个申请大小实际分配的内存:

复审现象

得到的memory used变化和题主还是有区别的,在“理解问题”的部分已经知道题主使用的mem_allocator是jemalloc-3.6.0,所以猜想是jemalloc逻辑有变化:

回答问题

As you have constantly 1000 keys in redis, only field number changes in each hash key, and you field number is lower than 512, so this phenomenon is only caused by jemalloc.

Here is the behavior when I use libc as my mem_allocator:

You can remake your redis by :

make MALLOC=libc

Run your test again, and see what you will get.

To answer your questions:

  1. Can someone explain me about the internal happenings of hashmap in terms of memory and resizing? What is the logic followed during resizing?

    As mentioned, what you encountered has nothing to do with redis itself. Jemalloc do it this way to improve its efficiency

  2. If I have to loose double the memory just for storing one more entry that is 215 to 216, why can't I restrict my application to have a hashes less than 215 always, unless and until the system needs it at the most.

    Of course, you can do it as long as you can restrict the field numbers

  3. Suppose if I want to store 1 million hashes each consisting of 250 values I need 800MB. If I split them into 2 hashes of 125 values ie, 2 million hashes of 125 values I need 500MB. In this way I am saving 300 MB which is huge!!. Is this calcuation right? Am I missing something in this case?

    I don't think that's a proper way to do that. Maybe you can save some memory by doing this. However, the disadvantages are: as you split 1 million hashes to 2 million, redis will do rehashing(which will take you some space) and it will take you more time to find one key, because it will leads to more chance of hash confliction.

问题延伸

任何问题总可以有延伸,问题的解决是阶段性的。根据当前问题要尽量做到有所发散、举一反三,这样才能得到更多东西,有的是一些启示,有的是更深层次或者其它方面的问题。比如这个问题会引发:

  1. jemalloc的原理
  2. 我们自己的产品在内存使用上能否提供更细维度的监控数据

你可能感兴趣的:(Weird change of "memory used")