公司的一款产品(linux平台),最近一段时间经常出现莫名其妙的死机,开始怀疑是某个虚拟设备的驱动有问题,后来修改了代码还是会死机,再后来我就写了个脚本,每隔一个小时将系统的各种信息写到日志文件,直到再次死机后分析日志发现,系统的可用内存从开始的900M多逐渐减少,直到最后一次记录显示可用内存为100M左右,那么死机是不是由于内存耗尽引起的呢,还不能确定,我决定写个小程序来测一下,内核在内存耗尽时会是怎样的状况。
#include
int main()
{
char *p = NULL;
int count = 1;
while(1){
p = (char *)malloc(1024*1024*10);
if(!p) {
printf("malloc error!/n");
return -1;
}
memset(p, 0, 1024*1024*10);
printf("malloc %dM memory/n",10*count++);
usleep(500000);
}
}
把这段程序分别在两个版本的linux平台上跑,得到的结果却完全不同:
平台1: Red Hat Linux release 8.0(2.4.18-14),物理内存 1G,当程序malloc到2890M 时被系统的oom killer干掉,Out of Memory: Killed process 6448 (loop_malloc)
平台2: Red Hat Enterprise Linux WS release 3(2.4.21-50.EL),物理内存 1G,当程序malloc到1460M时,系统死翘翘了,ssh和http都无法访问,但是可以ping通
而出现死机的产品就是用的平台2版本的内核,这么说来由于内存耗尽导致死机的可能性越来越大了,只是不明白为什么oom killer在平台2上没有起作用,是没有被调用呢,还是没有找到合适的进程来杀?这恐怕要看内核源码了,鉴于我对内核方面的无知,只好拜托萝卜同学帮忙了。。
从另一方面看,可用内存在短短2天减少了这么多很是有内存泄露的嫌疑,由于原来脚本收集的信息有限,于是又修改了脚本,增加记录每个进程的VSZ和RSS,这次发现果然有一个进程的VSZ和RSS超大,并且在不断增长,用valgrand 测了一下果然是内存泄露,这样情况就比较明朗了,某进程内存泄露->内存耗光->死机,下面把这几天搜集的资料整理一下。
【关于/proc/meminfo】
MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
MemFree: The sum of LowFree+HighFree
Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached
SwapCached: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn't need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)
Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
Inactive: Memory which has been less recently used. It is more eligible to be reclaimed for other purposes
HighTotal:
HighFree: Highmem is all memory above ~860MB of physical memory Highmem areas are for use by userspace programs, or for the pagecache. The kernel must use tricks to access this memory, making it slower to access than lowmem.
LowTotal:
LowFree: Lowmem is memory which can be used for everything that highmem can be used for, but it is also availble for the kernel's use for its own data structures. Among many other things, it is where everything from the Slab is allocated. Bad things happen when you're out of lowmem.
SwapTotal: total amount of swap space available
SwapFree: Memory which has been evicted from RAM, and is temporarily on the disk
Slab: in-kernel data structures cache
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in 'vm.overcommit_memory').The CommitLimit is calculated with the following formula: CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap For example, on a system with 1G of physical RAM and 7G of swap with a `vm.overcommit_ratio` of 30 it would yield a CommitLimit of 7.3G. For more details, see the memory overcommit documentation in vm/overcommit-accounting.
Committed_AS: The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which malloc()'s 1G of memory, but only touches 300M of it will only show up as using 300M of memory even if it has the address space allocated for the entire 1G. This 1G is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in 'vm.overcommit_memory'), allocations which would exceed the CommitLimit (detailed above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
【关于free】
在Linux下查看内存我们一般用free命令:
[root@scs-2 tmp]# free
total used free shared buffers cached
Mem: 3266180 3250004 16176 0 110652 2668236
-/+ buffers/cache: 471116 2795064
Swap: 2048276 80160 1968116
区别:第二行(mem)的used/free与第三行(-/+ buffers/cache) used/free的区别。这两个的区别在于使用的角度来看,第一行是从OS的角度来看,因为对于OS,buffers/cached 都是属于被使用,所以他的可用内存是16176KB,已用内存是3250004KB,其中包括,内核(OS)使用+Application(X, oracle,etc)使用的+buffers+cached.
第三行所指的是从应用程序角度来看,对于应用程序来说,buffers/cached 是等于可用的,因为buffer/cached是为了提高文件读取的性能,当应用程序需在用到内存的时候,buffer/cached会很快地被回收。所以从应用程序的角度来说,可用内存=系统free memory+buffers+cached。如上例:2795064=16176+110652+2668236
我们通过free命令查看机器空闲内存时,会发现free的值很小。这主要是因为,在linux中有这么一种思想,内存不用白不用,因此它尽可能的cache和buffer一些数据,以方便下次使用。但实际上这些内存也是可以立刻拿来使用的。所以,空闲内存=free+buffers+cached=total-used
【关于/proc/ pid/status】
我们可以通过ps –aux或者top查看某个进程占用的虚拟内存VSZ和物理内存RSS,也可以直接查看/proc/pid/status文件得到这些信息。
VmSize(KB) 任务虚拟地址空间的大小 (total_vm-reserved_vm),其中total_vm为进程的地址空间的大小,reserved_vm:进程在预留或特殊的内存间的物理页
VmLck(KB) 任务已经锁住的物理内存的大小。锁住的物理内存不能交换到硬盘 (locked_vm)
VmRSS(KB) 应用程序正在使用的物理内存的大小,就是用ps命令的参数rss的值 (rss)
VmData(KB) 程序数据段的大小(所占虚拟内存的大小),存放初始化了的数据; (total_vm-shared_vm-stack_vm)
VmStk(KB) 任务在用户态的栈的大小 (stack_vm)
VmExe(KB) 程序所拥有的可执行虚拟内存的大小,代码段,不包括任务使用的库 (end_code-start_code)
VmLib(KB) 被映像到任务的虚拟内存空间的库的大小 (exec_lib)
VmPTE 该进程的所有页表的大小,单位:kb
【关于oom killer】
Out-of-Memory (OOM) Killer,就是一层保护机制,用于避免 Linux 在内存不足的时候不至于出太严重的问题,把无关紧要的进程杀掉。
在 32 位CPU 架构下寻址是有限制的。Linux 内核定义了三个区域:
# DMA: 0x00000000 - 0x00999999 (0 - 16 MB)
# LowMem: 0x01000000 - 0x037999999 (16 - 896 MB) - size: 880MB
# HighMem: 0x038000000 - <硬件特定>
什么时候会触发oom killer?根据我的搜查,大概就两种情况:
1 当 low memory 被耗尽的时候,即使high memory还有很大的空闲内存
2 low memory里都是碎片,请求不到连续的内存区域
通常的问题是high memory很大仍然会触发oom killer,或者由于碎片触发oom killer,解决办法:
1、升级至64位的 Linux 版本,这是最好的解决方案。
2、如果是32位的 Linux 版本,最好的解决办法就是应用 hugemem kernel,还有一个解决方法设置 /proc/sys/vm/lower_zone_protection 值为250或更高。
# echo "250" > /proc/sys/vm/lower_zone_protection
设置成启动加载,在 /etc/sysctl.conf 中加入
vm.lower_zone_protection = 250
3、最无力的解决办法,就是禁用 oom-killer ,这可能会导致系统挂起,所以要慎重使用。
使 oom-killer 关闭/开启:
# echo "0" > /proc/sys/vm/oom-kill
# echo "1" > /proc/sys/vm/oom-kill
使设置启动时生效,需要在 /etc/sysctl.conf 中加入
vm.oom-kill = 0
而我的问题是oom killer在该被触发的时候没有被触发,所以这些方法对我没用-_-|
【关于overcommit_memory】
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
The overcommit percentage is set via `vm.overcommit_ratio'.
The current overcommit limit and amount committed are viewable in
/proc/meminfo as CommitLimit and Committed_AS respectively.
------------------------------------------------------------------------------------------
#echo 2>/proc/sys/vm/overcommit_memory
#echo 0>/proc/sys/vm/overcommit_ratio
-------------------------------------------------------------------------------------------
实际测试:
overcommit_memory ==2 ,物理内存使用完后,打开任意一个程序均显示“内存不足”;
overcommit_memory ==1,会从buffer中释放较多物理内存,适合大型科学应用软件,但oom-killer机制仍然起作用;
overcommit_memory ==0,系统默认设置,释放物理内存较少,使得oom-killer机制运作很明显。