vm.min_free_kbytes调整导致GI异常,kernel: oracle: page allocation failure

有个11204 rac的测试环境,客户反馈凌晨rman全备时偶尔会有内存耗尽导致数据库重启的情况,不是合同内的维护环境,请我们帮忙处理。我估计是没配置vm.min_free_kbytes,之前也调整多次每次都成功完成,就没有多想,直接白天调整了

 机器内存有370G多, 实例sga+pga=260G,我计划配置成预留50G

添加如下配置后,sysctl  -p执行生效

vm.min_free_kbytes = 52428800

几分钟后发现db1不正常了,oraagent .bin负载高了,db1上无法执行查询命令crsctl status res -t 

vm.min_free_kbytes调整导致GI异常,kernel: oracle: page allocation failure_第1张图片

查看集群日志

2023-06-16 15:14:03.998:
[ohasd(9796)]CRS-2878:Failed to restart resource 'ora.gpnpd'
2023-06-16 15:14:04.056:
[ohasd(9796)]CRS-2878:Failed to restart resource 'ora.mdnsd'
2023-06-16 15:14:07.504:
[gpnpd(15816)]CRS-2328:GPNPD started on node db1.
2023-06-16 15:14:10.523:
[gpnpd(15816)]CRS-2338:Clusterwide GPnP profile updates may be impaired.
2023-06-16 15:14:18.528:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD Provider" failed with RDE-00023.
 
2023-06-16 15:14:26.529:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD Provider" failed with RDE-00023.
 
2023-06-16 15:14:34.530:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD Provider" failed with RDE-00023.
 
2023-06-16 15:14:42.531:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD Provider" failed with RDE-00023.
 
2023-06-16 15:14:50.532:
[gpnpd(15816)]CRS-2301:GPnP: rdp_Work: work function for "Oracle Apple DNS-SD Provider" failed with RDE-00023.

等待40多分钟还一直这样报错,mos中也找不到类似的案例,实例还正常运行就是集群异常,和客户申请了停机维护,实例可以正常关闭,但是GI无法正常关闭,直接reboot主机了,重启后服务正常

查看系统message,确实在调整vm.min_free_kbytes后有内存不足的报错,还好是测试环境,是个教训,今后谨慎操作

Jun 16 15:08:38 db1 kernel: oracle: page allocation failure: order:0, mode:0x20
Jun 16 15:08:38 db1 kernel: Pid: 16474, comm: oracle Tainted: GF          O 3.8.13-16.2.1.el6uek.x86_64 #1
Jun 16 15:08:38 db1 kernel: Call Trace:
Jun 16 15:08:38 db1 kernel:   [] warn_alloc_failed+0xf3/0x160
Jun 16 15:08:38 db1 kernel: [] ? default_spin_lock_flags+0x9/0x10
Jun 16 15:08:38 db1 kernel: [] __alloc_pages_slowpath+0x4a6/0x7b0
Jun 16 15:08:38 db1 kernel: [] ? zone_watermark_ok+0x1f/0x30
Jun 16 15:08:38 db1 kernel: [] __alloc_pages_nodemask+0x2fb/0x320
Jun 16 15:08:38 db1 kernel: [] alloc_pages_current+0xe3/0x1c0
Jun 16 15:08:38 db1 kernel: [] __netdev_alloc_frag+0x99/0x150
Jun 16 15:08:38 db1 kernel: [] __netdev_alloc_skb+0x9a/0xe0
Jun 16 15:08:38 db1 kernel: [] igb_fetch_rx_buffer+0x7a/0x1e0 [igb]
Jun 16 15:08:38 db1 kernel: [] igb_clean_rx_irq+0xa5/0x420 [igb]
Jun 16 15:08:38 db1 kernel: [] igb_poll+0x65/0xb0 [igb]
Jun 16 15:08:38 db1 kernel: [] net_rx_action+0x105/0x2b0
Jun 16 15:08:38 db1 kernel: [] __do_softirq+0xd7/0x240
Jun 16 15:08:38 db1 kernel: [] ? _raw_spin_lock+0xe/0x20
Jun 16 15:08:38 db1 kernel: [] call_softirq+0x1c/0x30
Jun 16 15:08:38 db1 kernel: [] do_softirq+0x65/0xa0
Jun 16 15:08:38 db1 kernel: [] irq_exit+0xbd/0xe0
Jun 16 15:08:38 db1 kernel: [] do_IRQ+0x66/0xe0
Jun 16 15:08:38 db1 kernel: [] ? sched_open+0x20/0x20
Jun 16 15:08:38 db1 kernel: [] common_interrupt+0x6d/0x6d
Jun 16 15:08:38 db1 kernel:   [] ? seq_open+0x4f/0xb0
Jun 16 15:08:38 db1 kernel: [] ? do_dentry_open+0x259/0x2d0
Jun 16 15:08:38 db1 kernel: [] ? do_dentry_open+0x23e/0x2d0
Jun 16 15:08:38 db1 kernel: [] finish_open+0x35/0x50
Jun 16 15:08:38 db1 kernel: [] do_last+0x436/0x7b0
Jun 16 15:08:38 db1 kernel: [] ? inode_permission+0x18/0x50
Jun 16 15:08:38 db1 kernel: [] ? link_path_walk+0x24d/0x420
Jun 16 15:08:38 db1 kernel: [] path_openat+0xb3/0x480
Jun 16 15:08:38 db1 kernel: [] do_filp_open+0x49/0xa0
Jun 16 15:08:38 db1 kernel: [] ? _raw_spin_lock+0xe/0x20
Jun 16 15:08:38 db1 kernel: [] ? __alloc_fd+0xb5/0x160
Jun 16 15:08:38 db1 kernel: [] do_sys_open+0x108/0x1f0
Jun 16 15:08:38 db1 kernel: [] sys_open+0x21/0x30
Jun 16 15:08:38 db1 kernel: [] system_call_fastpath+0x16/0x1b

你可能感兴趣的:(OracleDB,数据库,oracle,运维)