Label
PostgreSQL, Linux, page allocation failure, memory
background
Linux kernel allocation fails.
After using a certain amount of memory, HANG.
There may be similar errors in dmesg, system HANG live, can not connect, need to restart to solve.
page allocation failure Oct 24 11:27:42 kernel: : [21289.479063] python2.6: page allocation failure. order:1, mode:0x20 kernel: swapper: page allocation failure. order:1, mode:0x20 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.2.1.el6.x86_64 #1 kernel: Call Trace: kernel: <IRQ> [<ffffffff8112c207>] ? __alloc_pages_nodemask+0x757/0x8d0 kernel: [<ffffffff81166ab2>] ? kmem_getpages+0x62/0x170 kernel: [<ffffffff811676ca>] ? fallback_alloc+0x1ba/0x270 kernel: [<ffffffff8116711f>] ? cache_grow+0x2cf/0x320 kernel: [<ffffffff81167449>] ? ____cache_alloc_node+0x99/0x160 kernel: [<ffffffff811683cb>] ? kmem_cache_alloc+0x11b/0x190 kernel: [<ffffffff81439d58>] ? sk_prot_alloc+0x48/0x1c0 kernel: [<ffffffff8143ae32>] ? sk_clone+0x22/0x2e0 kernel: [<ffffffff81489d66>] ? inet_csk_clone+0x16/0xd0 kernel: [<ffffffff814a2c73>] ? tcp_create_openreq_child+0x23/0x450 kernel: [<ffffffff814a046d>] ? tcp_v4_syn_recv_sock+0x4d/0x310 kernel: [<ffffffff814a2a16>] ? tcp_check_req+0x226/0x460 kernel: [<ffffffff8149ff0b>] ? tcp_v4_do_rcv+0x35b/0x430 kernel: [<ffffffff81082034>] ? mod_timer+0x144/0x220 kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0 kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0 kernel: [<ffffffff8147f50d>] ? ip_local_deliver_finish+0xdd/0x2d0 kernel: [<ffffffff8147f798>] ? ip_local_deliver+0x98/0xa0 kernel: [<ffffffff8147ec5d>] ? ip_rcv_finish+0x12d/0x440 kernel: [<ffffffff8147f1e5>] ? ip_rcv+0x275/0x350 kernel: [<ffffffff814483bb>] ? __netif_receive_skb+0x4ab/0x750 kernel: [<ffffffff8144a798>] ? netif_receive_skb+0x58/0x60 kernel: [<ffffffffa008b975>] ? vmxnet3_rq_rx_complete+0x365/0x890 [vmxnet3] kernel: [<ffffffff8128d2b0>] ? swiotlb_map_page+0x0/0x100 kernel: [<ffffffffa008c0f3>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3] kernel: [<ffffffff8144cf63>] ? net_rx_action+0x103/0x2f0 kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0 kernel: [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90 kernel: [<ffffffff81516f15>] ? do_IRQ+0x75/0xf0 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 kernel: <EOI> [<ffffffff8103b90b>] ? native_safe_halt+0xb/0x10 kernel: [<ffffffff8101495d>] ? default_idle+0x4d/0xb0 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 kernel: [<ffffffff81506d9c>] ? start_secondary+0x2ac/0x2ef
Solution - Upgrading Kernel Version
1. Upgrade to kernel-2.6.32-358.el6 or higher. (But it can't be solved thoroughly, it's just to alleviate the problem.)
Update to kernel-2.6.32-358.el6 or higher, which contains the enhancement described in the Root Cause section below. Please note, this update (or newer) does not completely eliminate the possibility of the occurrence of the page allocation failure. The below mentioned workaround also works in 2.6.32-358.el6 and newer if the issue still persists even after the update.
Solution - Modifying Kernel Parameters
vi /etc/sysctl.conf or vi /etc/sysctl.d/xxx.conf vm.zone_reclaim_mode = 1 vm.min_free_kbytes = 512000 sysctl -w vm.zone_reclaim_mode=1 sysctl -w vm.min_free_kbytes=512000
The following tunables can be used in an attempt to alleviate or prevent the reported condition: Increase vm.min_free_kbytes value, for example to a higher value than a single allocation request. Change vm.zone_reclaim_mode to 1 if it's set to zero, so the system can reclaim back memory from cached memory. Both settings can be set in /etc/sysctl.conf, and loaded using sysctl -p /etc/sysctl.conf. For more information on these tunables, install the kernel-doc package and refer to file /usr/share/doc/kernel-doc-2.6.32/Documentation/sysctl/vm.txt.
Root cause
Prior to version 6.4, kswapd will not process
Before RHEL 6.4, kswapd does not try to free contiguous pages.
This can cause GFP_ATOMIC allocations requests to fail repeatedly,
when nothing else in the system defragments memory.
With RHEL 6.4 and newer, kswapd will compact (defragment) free memory, when required.
Please note that allocation failures can still happen.
For example, when a larger burst of GFP_ATOMIC allocations occur which kswapd may struggle to keep up with.
However, these allocations should eventually succeed.
There are also other more specific cases that can result in page allocation failures and cause additional issues.
Please refer to the following articles for more information
Zone_reclaim_mode Interpretation
Zone_reclaim_mode allows someone to set more or less aggressive approaches to reclaim memory when a zone runs out of memory. If it is set to zero then no zone reclaim occurs. Allocations will be satisfied from other zones / nodes in the system. This is value ORed together of 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages zone_reclaim_mode is set during bootup to 1 if it is determined that pages from remote zones will cause a measurable performance reduction. The page allocator will then reclaim easily reusable pages (those page cache pages that are currently not used) before allocating off node pages. 0: It may be beneficial to switch off zone reclaim if the system is used for a file server and all of memory should be used for caching files from disk. In that case the caching effect is more important than data locality. 1: Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone reclaim will write out dirty pages if a zone fills up and so effectively throttle the process. This may decrease the performance of a single process 2: since it cannot use all of system memory to buffer the outgoing writes anymore but it preserve the memory on other nodes so that the performance of other processes running on other nodes will not be affected. 4: Allowing regular swap effectively restricts allocations to the local node unless explicitly overridden by memory policies or cpuset configurations.
Reference resources
http://www.zbuse.com/2014/07/837.html
https://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory
https://access.redhat.com/solutions/90883
Linux page allocation failure problem handling - lowmem_reserve_ratio