Date:2016.02.15
今年的第一天上班,就让我意外的发现有两台服务器日志里出现了管理kernel bug的报错。
Feb 14 15:24:15 svn3 kernel: <EOI> Feb 14 15:24:15 svn3 kernel: swapper: page allocation failure. order:1, mode:0x20 Feb 14 15:24:15 svn3 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-431.el6.x86_64 #1 Feb 14 15:24:15 svn3 kernel: Call Trace: Feb 14 15:24:15 svn3 kernel: <IRQ> [<ffffffff8112f9e7>] ? __alloc_pages_nodemask+0x757/0x8d0 Feb 14 15:24:15 svn3 kernel: [<ffffffff8116e482>] ? kmem_getpages+0x62/0x170 Feb 14 15:24:15 svn3 kernel: [<ffffffff8116f09a>] ? fallback_alloc+0x1ba/0x270 Feb 14 15:24:15 svn3 kernel: [<ffffffff8116eaef>] ? cache_grow+0x2cf/0x320 Feb 14 15:24:15 svn3 kernel: [<ffffffff8116ee19>] ? ____cache_alloc_node+0x99/0x160 Feb 14 15:24:15 svn3 kernel: [<ffffffff8116fd9b>] ? kmem_cache_alloc+0x11b/0x190 Feb 14 15:24:15 svn3 kernel: [<ffffffff8144c1b8>] ? sk_prot_alloc+0x48/0x1c0 Feb 14 15:24:15 svn3 kernel: [<ffffffff8144d3c2>] ? sk_clone+0x22/0x2e0 Feb 14 15:24:15 svn3 kernel: [<ffffffff8149ebf6>] ? inet_csk_clone+0x16/0xd0 Feb 14 15:24:15 svn3 kernel: [<ffffffff814b84c3>] ? tcp_create_openreq_child+0x23/0x470 Feb 14 15:24:15 svn3 kernel: [<ffffffff814b5c7d>] ? tcp_v4_syn_recv_sock+0x4d/0x310 Feb 14 15:24:15 svn3 kernel: [<ffffffff814b8266>] ? tcp_check_req+0x226/0x460 Feb 14 15:24:15 svn3 kernel: [<ffffffff814b56bb>] ? tcp_v4_do_rcv+0x35b/0x490 Feb 14 15:24:15 svn3 kernel: [<ffffffffa0261557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4] Feb 14 15:24:15 svn3 kernel: [<ffffffff814b6f4a>] ? tcp_v4_rcv+0x51a/0x900 Feb 14 15:24:15 svn3 kernel: [<ffffffff814941a0>] ? ip_local_deliver_finish+0x0/0x2d0 Feb 14 15:24:15 svn3 kernel: [<ffffffff8149427d>] ? ip_local_deliver_finish+0xdd/0x2d0 Feb 14 15:24:15 svn3 kernel: [<ffffffff81494508>] ? ip_local_deliver+0x98/0xa0 Feb 14 15:24:15 svn3 kernel: [<ffffffff814939cd>] ? ip_rcv_finish+0x12d/0x440 Feb 14 15:24:15 svn3 kernel: [<ffffffff81493f55>] ? ip_rcv+0x275/0x350 Feb 14 15:24:15 svn3 kernel: [<ffffffff8145b54b>] ? __netif_receive_skb+0x4ab/0x750 Feb 14 15:24:15 svn3 kernel: [<ffffffff810153a3>] ? native_sched_clock+0x13/0x80 Feb 14 15:24:15 svn3 kernel: [<ffffffff8145f1b8>] ? netif_receive_skb+0x58/0x60 Feb 14 15:24:15 svn3 kernel: [<ffffffff812686f9>] ? blk_peek_request+0x189/0x210 Feb 14 15:24:15 svn3 kernel: [<ffffffffa010079c>] ? xennet_poll+0x91c/0xe50 [xen_netfront] Feb 14 15:24:15 svn3 kernel: [<ffffffffa004953c>] ? do_blkif_request+0x4c/0x560 [xen_blkfront] Feb 14 15:24:15 svn3 kernel: [<ffffffff81267570>] ? freed_request+0x50/0x80 Feb 14 15:24:15 svn3 kernel: [<ffffffff81460b43>] ? net_rx_action+0x103/0x2f0 Feb 14 15:24:15 svn3 kernel: [<ffffffff810e6ef2>] ? handle_IRQ_event+0x92/0x170 Feb 14 15:24:15 svn3 kernel: [<ffffffff8107a8e1>] ? __do_softirq+0xc1/0x1e0 Feb 14 15:24:15 svn3 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Feb 14 15:24:15 svn3 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0 Feb 14 15:24:15 svn3 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90 Feb 14 15:24:15 svn3 kernel: [<ffffffff81325b95>] ? xen_evtchn_do_upcall+0x35/0x50 Feb 14 15:24:15 svn3 kernel: [<ffffffff8100c433>] ? xen_hvm_callback_vector+0x13/0x20 Feb 14 15:24:15 svn3 kernel: <EOI> [<ffffffff8103eacb>] ? native_safe_halt+0xb/0x10 Feb 14 15:24:15 svn3 kernel: [<ffffffff810167bd>] ? default_idle+0x4d/0xb0 Feb 14 15:24:15 svn3 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Feb 14 15:24:15 svn3 kernel: [<ffffffff8150cbea>] ? rest_init+0x7a/0x80 Feb 14 15:24:15 svn3 kernel: [<ffffffff81c26f8f>] ? start_kernel+0x424/0x430 Feb 14 15:24:15 svn3 kernel: [<ffffffff81c2633a>] ? x86_64_start_reservations+0x125/0x129 Feb 14 15:24:15 svn3 kernel: [<ffffffff81c26453>] ? x86_64_start_kernel+0x115/0x124 Feb 14 15:25:54 svn3 kernel: swapper: page allocation failure. order:1, mode:0x20 Feb 14 15:25:54 svn3 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-431.el6.x86_64 #1 Feb 14 15:25:54 svn3 kernel: Call Trace: Feb 14 15:25:54 svn3 kernel: <IRQ> [<ffffffff8112f9e7>] ? __alloc_pages_nodemask+0x757/0x8d0 Feb 14 15:25:54 svn3 kernel: [<ffffffff8116e482>] ? kmem_getpages+0x62/0x170 Feb 14 15:25:54 svn3 kernel: [<ffffffff8116f09a>] ? fallback_alloc+0x1ba/0x270 Feb 14 15:25:54 svn3 kernel: [<ffffffff8116eaef>] ? cache_grow+0x2cf/0x320 Feb 14 15:25:54 svn3 kernel: [<ffffffff8116ee19>] ? ____cache_alloc_node+0x99/0x160 Feb 14 15:25:54 svn3 kernel: [<ffffffff8116fd9b>] ? kmem_cache_alloc+0x11b/0x190 Feb 14 15:25:54 svn3 kernel: [<ffffffff8144c1b8>] ? sk_prot_alloc+0x48/0x1c0 Feb 14 15:25:54 svn3 kernel: [<ffffffff8144d3c2>] ? sk_clone+0x22/0x2e0 Feb 14 15:25:54 svn3 kernel: [<ffffffff8149ebf6>] ? inet_csk_clone+0x16/0xd0 Feb 14 15:25:54 svn3 kernel: [<ffffffff814b84c3>] ? tcp_create_openreq_child+0x23/0x470 Feb 14 15:25:54 svn3 kernel: [<ffffffff814b5c7d>] ? tcp_v4_syn_recv_sock+0x4d/0x310 Feb 14 15:25:54 svn3 kernel: [<ffffffff814b8266>] ? tcp_check_req+0x226/0x460 Feb 14 15:25:54 svn3 kernel: [<ffffffff814b56bb>] ? tcp_v4_do_rcv+0x35b/0x490 Feb 14 15:25:54 svn3 kernel: [<ffffffffa0261557>] ? ipv4_confirm+0x87/0x1d0 [nf_conntrack_ipv4] Feb 14 15:25:54 svn3 kernel: [<ffffffff814b6f4a>] ? tcp_v4_rcv+0x51a/0x900 Feb 14 15:25:54 svn3 kernel: [<ffffffff814941a0>] ? ip_local_deliver_finish+0x0/0x2d0 Feb 14 15:25:54 svn3 kernel: [<ffffffff8149427d>] ? ip_local_deliver_finish+0xdd/0x2d0 Feb 14 15:25:54 svn3 kernel: [<ffffffff81494508>] ? ip_local_deliver+0x98/0xa0 Feb 14 15:25:54 svn3 kernel: [<ffffffff814939cd>] ? ip_rcv_finish+0x12d/0x440 Feb 14 15:25:54 svn3 kernel: [<ffffffff81493f55>] ? ip_rcv+0x275/0x350 Feb 14 15:25:54 svn3 kernel: [<ffffffff8145b54b>] ? __netif_receive_skb+0x4ab/0x750 Feb 14 15:25:54 svn3 kernel: [<ffffffff8145f1b8>] ? netif_receive_skb+0x58/0x60 Feb 14 15:25:54 svn3 kernel: [<ffffffffa010079c>] ? xennet_poll+0x91c/0xe50 [xen_netfront] Feb 14 15:25:54 svn3 kernel: [<ffffffffa004953c>] ? do_blkif_request+0x4c/0x560 [xen_blkfront] Feb 14 15:25:54 svn3 kernel: [<ffffffff810ec18e>] ? rcu_start_gp+0x1be/0x230 Feb 14 15:25:54 svn3 kernel: [<ffffffff8126826d>] ? __blk_put_request+0x7d/0xd0 Feb 14 15:25:54 svn3 kernel: [<ffffffff81460b43>] ? net_rx_action+0x103/0x2f0 Feb 14 15:25:54 svn3 kernel: [<ffffffff810e6ef2>] ? handle_IRQ_event+0x92/0x170 Feb 14 15:25:54 svn3 kernel: [<ffffffff8107a8e1>] ? __do_softirq+0xc1/0x1e0 Feb 14 15:25:54 svn3 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Feb 14 15:25:54 svn3 kernel: [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0 Feb 14 15:25:54 svn3 kernel: [<ffffffff8107a795>] ? irq_exit+0x85/0x90 Feb 14 15:25:54 svn3 kernel: [<ffffffff81325b95>] ? xen_evtchn_do_upcall+0x35/0x50 Feb 14 15:25:54 svn3 kernel: [<ffffffff8100c433>] ? xen_hvm_callback_vector+0x13/0x20 Feb 14 15:25:54 svn3 kernel: <EOI> [<ffffffff8103eacb>] ? native_safe_halt+0xb/0x10 Feb 14 15:25:54 svn3 kernel: [<ffffffff810167bd>] ? default_idle+0x4d/0xb0 Feb 14 15:25:54 svn3 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Feb 14 15:25:54 svn3 kernel: [<ffffffff8150cbea>] ? rest_init+0x7a/0x80 Feb 14 15:25:54 svn3 kernel: [<ffffffff81c26f8f>] ? start_kernel+0x424/0x430 Feb 14 15:25:54 svn3 kernel: [<ffffffff81c2633a>] ? x86_64_start_reservations+0x125/0x129 Feb 14 15:25:54 svn3 kernel: [<ffffffff81c26453>] ? x86_64_start_kernel+0x115/0x124
通过百度查询,找到下面的解释:
dmesg里报出这个信息,然后机器负载开始上升,而实际上用free可以看到还有大量的内存被buffer/cached
因此不应该出这个问题才对,我的系统是centos6,上centos的论坛查了一圈,有人回说是内核的bug
一个临时解决方案是
sysctl -w vm.zone_reclaim_mode=1
深究:
关于zone_reclaim_mode的定义kernel的文档里描述如下
Zone_reclaim_mode allows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no
zone reclaim occurs. Allocations will be satisfied from other zones / nodes
in the system.
This is value ORed together of
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages
zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
page allocator will then reclaim easily reusable pages (those page
cache pages that are currently not used) before allocating off node pages.
It may be beneficial to switch off zone reclaim if the system is
used for a file server and all of memory should be used for caching files
from disk. In that case the caching effect is more important than
data locality.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively
throttle the process. This may decrease the performance of a single process
since it cannot use all of system memory to buffer the outgoing writes
anymore but it preserve the memory on other nodes so that the performance
of other processes running on other nodes will not be affected.
Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations
这个参数告诉内核当内存不够用时就直接回收buffer/cache
原文出处:http://www.zbuse.com/2014/07/837.html