环境centos5.5x86_64,这台web给数据采集用的,主要是抓取数据.启动过错出错信息:
Call Trace:
[<ffffffff802bf8db>] out_of_memory+0x8b/0x203
[<ffffffff8020f8a2>] __alloc_pages+0x27f/0x308
[<ffffffff8025d117>] getnstimeofday+0x10/0x28
[<ffffffff80213836>] __do_page_cache_readahead+0xc6/0x1ab
[<ffffffff802140ff>] filemap_nopage+0x14c/0x360
[<ffffffff80208ebc>] __handle_mm_fault+0x443/0x1348
[<ffffffff8020622a>] hypercall_page+0x22a/0x1000
[<ffffffff8026769a>] do_page_fault+0xf7b/0x12e0
[<ffffffff8026ef4b>] monotonic_clock+0x35/0x7b
[<ffffffff80262dd3>] thread_return+0x6c/0x113
[<ffffffff8026082b>] error_exit+0x0/0x6e
DMA per-cpu:
cpu 0 hot: high 186, batch 31 used:26
cpu 0 cold: high 62, batch 15 used:58
cpu 1 hot: high 186, batch 31 used:35
cpu 1 cold: high 62, batch 15 used:49
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 5760kB (0kB HighMem)
Active:46859 inactive:24599 dirty:0 writeback:0 unstable:0 free:1440 slab:28082 mapped-file:5 mapped-anon:76709 pagetables:390527
DMA free:5760kB min:5800kB low:7248kB high:8700kB active:187436kB inactive:98396kB present:2105344kB pages_scanned:328111276 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 16*4kB 68*8kB 2*16kB 4*32kB 2*64kB 2*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 5760kB
DMA32: empty
Normal: empty
HighMem: empty
45 pagecache pages
Swap cache: add 14188563, delete 14188552, find 585539/2267924, race 4109+888
Free swap = 0kB
Total swap = 4192956kB
重启后查看message错误日志:
Dec 9 12:09:33 SN7x kernel: INFO: task httpd:31276 blocked for more than 120 seconds.
Dec 9 12:09:33 SN7x kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 9 12:09:33 SN7x kernel: httpd D 011d5cb2df1c1910 0 31276 6472 31284 31269 (NOTLB)
Dec 9 12:09:33 SN7x kernel: ffff88005e587d28 0000000000000286 0000000050c40dea ffff88007f67d678
Dec 9 12:09:33 SN7x kernel: 0000000000000008 ffff8800614cf820 ffffffff804f4b80 0000000000003d62
Dec 9 12:09:33 SN7x kernel: ffff8800614cfa08 ffffffff00000001
Dec 9 12:09:33 SN7x kernel: Call Trace:
Dec 9 12:09:34 SN7x kernel: [<ffffffff802b653d>] delayacct_end+0x5d/0x86
Dec 9 12:09:34 SN7x kernel: [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
Dec 9 12:09:35 SN7x kernel: [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
Dec 9 12:09:35 SN7x kernel: [<ffffffff802cb8e0>] remove_exclusive_swap_page+0x10b/0x116
Dec 9 12:09:35 SN7x kernel: [<ffffffff80222073>] generic_file_aio_write+0x4e/0xc1
Dec 9 12:09:35 SN7x kernel: [<ffffffff8804c1b6>] :ext3:ext3_file_write+0x16/0x91
Dec 9 12:09:35 SN7x kernel: [<ffffffff80218705>] do_sync_write+0xc7/0x104
Dec 9 12:09:35 SN7x kernel: [<ffffffff802676cd>] do_page_fault+0xfae/0x12e0
Dec 9 12:09:35 SN7x kernel: [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
Dec 9 12:09:35 SN7x kernel: [<ffffffff802172b2>] vfs_write+0xce/0x174
Dec 9 12:09:35 SN7x kernel: [<ffffffff80217aff>] sys_write+0x45/0x6e
Dec 9 12:09:36 SN7x kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6
查询结果:
httpd:31276 blocked for more than 120 seconds. httpd进程阻塞了120s,查了下:
This is a know bug. By default Linux uses up to 40% of the available memory for file system caching.
原理:linux会设置40%的可用内存用来做系统cache,当flush数据时这40%内存中的数据由于和IO同步问题导致超时(120s)简单讲就是设置在文件 /etc/sysctl.conf中加入 “vm.dirty_ratio=10″
目前处理方案:
更改maxclients,将其进程数的最大值设大点或
临时处理更改了下内核参数有待后续观察:
'echo 0 > /proc/sys/kernel/hung_task_timeout_secs.
在进程处理于可中断的睡眠状态S 时,进程要等待 如 套接字、信号量被挂起,进入到对应事件等待队列中的的这个时间太长,占用了队列空间使进程长时间处于阻塞状态,影响了进程队列的有序调度。