测服报错信息:
Caused by: redis.clients.jedis.exceptions.JedisDataException:
MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error......
大体意思是现在配置的redis使用RDB进行持久化,但是现在RDB快照时失败了;因为开启了stop-writes-on-bgsave-error选项,RDB失败之后就禁止执行修改数据的命令了。Jedis等客户端连接时就会报上述错误。
查看redis服务器的日志:
WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
日志显示:overcommit_memory被设置成0,redis后台的保存操作在低可用内存的情况下会失败。这个时候很清楚,并不是磁盘被占满的缘故,事实上测服磁盘剩余量还很多。
查阅对应的redis FAQ以及stackoverflow之后大体上明白原因在哪了。
https://redis.io/topics/faq#background-saving-fails-with-a-fork-error-under-linux-even-if-i-have-a-lot-of-free-ram
https://stackoverflow.com/questions/19581059/misconf-redis-is-configured-to-save-rdb-snapshots
当时的测服服务器内存占用情况:可用内存128M
测服晚上有一个定时任务在跑,会将酒店静态信息json数据push到redis list供业务系统pop消费,平均一个酒店的全部数据大概0.5M左右,测服有700不到的酒店,这个是测服redis数据量最大的使用场景了。之前两方系统联调时,滞留在redis中未被消费的json数据并不是很多,生成-消费的速度比大概是3:2
服务器kernel设置的overcommit阈值
Background saving fails with a fork() error under Linux even if I have a lot of free RAM!
Short answer:
echo 1 > /proc/sys/vm/overcommit_memory
:)And now the long one:
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the
overcommit_memory
setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.Setting
overcommit_memory
to 1 tells Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.A good source to understand how Linux Virtual Memory works and other alternatives for
overcommit_memory
andovercommit_ratio
is this classic from Red Hat Magazine, "Understanding Virtual Memory". Beware, this article had1
and2
configuration values forovercommit_memory
reversed: refer to the proc(5) man page for the right meaning of the available values.
上面是redis FAQ,就算有足够多的内存,仍建议修改内核参数overcommit_memory。
查了下,内核参数 vm.overcommit_memory 接受三种取值:
CommitLimit 就是overcommit的阈值,申请的内存总数超过CommitLimit的话就算是overcommit。
Heuristic overcommit算法在以下函数中实现,基本上可以这么理解:
单次申请的内存大小不能超过 【free memory + free swap + pagecache的大小 + SLAB中可回收的部分】,否则本次申请就会失败。
Redis在保存数据到硬盘时为了避免主进程假死,需要Fork一份主进程,然后在Fork进程内完成数据保存到硬盘的操作,如果主进程使用了4GB的内存,Fork子进程的时候需要额外的4GB,此时内存就不够了,Fork失败,进而数据保存硬盘也失败了。