abrt-server导致数据库宕机

有套新19c环境,迁移后总不固定时间的宕机重启,登录查看发现message中总是报和abrt-hook-ccpp的错误。

Jul 20 00:18:32 db1 abrt-hook-ccpp: Process 59611 (ocssd.bin) of user 1000 killed by SIGABRT - dumping core
Jul 20 00:18:32 db1 abrt-server: Executable '/u01/app/19.0.0/grid/bin/ocssd.bin' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 20 00:18:32 db1 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2023-07-20-00:18:32-59611' exited with 1
Jul 20 00:18:32 db1 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-20-00:18:32-59611'
Jul 20 00:18:33 db1 journal: Oracle Clusterware: 2023-07-20 00:18:33.435#012[(78481)]CRS-8500:Oracle Clusterware OCTSSD process is starting with operating system process ID 78481
Jul 20 01:10:21 db1 abrt-hook-ccpp: Process 66350 (ocssd.bin) of user 1000 killed by SIGABRT - dumping core
Jul 20 01:10:21 db1 abrt-server: Executable '/u01/app/19.0.0/grid/bin/ocssd.bin' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 20 01:10:21 db1 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2023-07-20-01:10:21-66350' exited with 1
Jul 20 01:10:21 db1 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-20-01:10:21-66350'
Jul 20 01:16:11 db1 avahi-daemon[4529]: Withdrawing address record for 172.16.66.206 on bond0.
Jul 20 01:16:17 db1 abrt-hook-ccpp: Process 90643 (ocssd.bin) of user 1000 killed by SIGABRT - dumping core
Jul 20 01:16:17 db1 abrt-server: Executable '/u01/app/19.0.0/grid/bin/ocssd.bin' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 20 01:16:17 db1 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2023-07-20-01:16:17-90643' exited with 1
Jul 20 01:16:17 db1 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-20-01:16:17-90643'
Jul 20 01:16:18 db1 journal: Oracle Clusterware: 2023-07-20 01:16:18.725#012[(106875)]CRS-8500:Oracle Clusterware OCTSSD process is starting with operating system process ID 106875
Jul 20 01:16:41 db1 systemd: Removed slice User Slice of oracle.
Jul 20 01:26:15 db1 avahi-daemon[4529]: Withdrawing address record for 172.16.66.206 on bond0.
Jul 20 01:26:21 db1 kernel: F 4853131.713/230719172621 ora_rms0_zhfwdb[111488] oracleafd:14:1016:Failed to find device handle: [2]
Jul 20 01:26:23 db1 abrt-hook-ccpp: Process 106938 (ocssd.bin) of user 1000 killed by SIGABRT - dumping core
Jul 20 01:26:23 db1 abrt-server: Executable '/u01/app/19.0.0/grid/bin/ocssd.bin' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 20 01:26:23 db1 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2023-07-20-01:26:23-106938' exited with 1
Jul 20 01:26:23 db1 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-20-01:26:23-106938'
Jul 22 14:07:01 db1 abrt-hook-ccpp[22622]: Process 22621 (oracle) of user 1001 killed by SIGABRT - dumping core
Jul 22 14:07:01 db1 abrt-server[22625]: Executable '/u01/app/oracle/product/19.0.0/db_1/bin/oracle' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 22 14:07:01 db1 abrt-server[22625]: 'post-create' on '/var/spool/abrt/ccpp-2023-07-22-14:07:01-22621' exited with 1
Jul 22 14:07:01 db1 abrt-server[22625]: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-22-14:07:01-22621'
Jul 22 18:07:47 db1 abrt-hook-ccpp[143932]: Process 143931 (oracle) of user 1001 killed by SIGABRT - dumping core
Jul 22 18:07:47 db1 abrt-server[143937]: Executable '/u01/app/oracle/product/19.0.0/db_1/bin/oracle' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 22 18:07:47 db1 abrt-server[143937]: 'post-create' on '/var/spool/abrt/ccpp-2023-07-22-18:07:47-143931' exited with 1
Jul 22 18:07:47 db1 abrt-server[143937]: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-22-18:07:47-143931'
Jul 23 10:00:55 db1 abrt-hook-ccpp[57997]: Process 57996 (oracle) of user 1001 killed by SIGABRT - dumping core
Jul 23 10:00:55 db1 abrt-server[58000]: Executable '/u01/app/oracle/product/19.0.0/db_1/bin/oracle' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 23 10:00:55 db1 abrt-server[58000]: 'post-create' on '/var/spool/abrt/ccpp-2023-07-23-10:00:55-57996' exited with 1
Jul 23 10:00:55 db1 abrt-server[58000]: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-23-10:00:55-57996'
Jul 23 18:02:23 db1 abrt-hook-ccpp[8282]: Process 8281 (oracle) of user 1001 killed by SIGABRT - dumping core
Jul 23 18:02:23 db1 abrt-server[8283]: Executable '/u01/app/oracle/product/19.0.0/db_1/bin/oracle' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 23 18:02:23 db1 abrt-server[8283]: 'post-create' on '/var/spool/abrt/ccpp-2023-07-23-18:02:23-8281' exited with 1
Jul 23 18:02:23 db1 abrt-server[8283]: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-23-18:02:23-8281'
Jul 23 10:00:55 db1 abrt-hook-ccpp[57997]: Process 57996 (oracle) of user 1001 killed by SIGABRT - dumping core
Jul 23 10:00:55 db1 abrt-server[58000]: Executable '/u01/app/oracle/product/19.0.0/db_1/bin/oracle' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 23 10:00:55 db1 abrt-server[58000]: 'post-create' on '/var/spool/abrt/ccpp-2023-07-23-10:00:55-57996' exited with 1
Jul 23 10:00:55 db1 abrt-server[58000]: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-23-10:00:55-57996'

处理办法:把abrt相关的服务全部关闭,

  # systemctl disable abrt-ccpp.service && systemctl stop abrt-ccpp.service
  # systemctl disable abrt-oops.service && systemctl stop abrt-oops.service  
  # systemctl disable abrtd.service && systemctl stop abrtd.service
  #  systemctl disable abrt-vmcore.service && systemctl stop abrt-vmcore.service
  #  systemctl disable abrt-xorg.service && systemctl stop abrt-xorg.service  
  #  systemctl disable postfix.service; systemctl stop postfix.service

处理后观察1周多,运行正常,不再发生问题

其他处理办法-未验证
sed -i "s/ProcessUnpackaged = yes/ProcessUnpackaged = no/g"  /etc/abrt/abrt-action-save-package-data.conf
sed -i "s/MaxCrashReportsSize = 5000/MaxCrashReportsSize = 0/g"  /etc/abrt/abrt.conf
systemctl restart abrtd.service

验证异常数据所属时间点:
使用 abrt-cli list  确认core状态对应的进程及触发时间
删除冗余记录:abrt-cli rm /root/spool/abrt/ccpp-2020-xxx

你可能感兴趣的:(Linux,Oracle_19c,数据库,linux)