semctl(156532736, 0, IPC_RMID, ...) failed: Invalid argument 引起的数据库重启

目录

环境

症状

问题原因

解决方案

环境

系统平台:Linux x86-64 Red Hat Enterprise Linux 7

版本:9.5

症状

数据库日志没有规律性的出现如下所示报错,同时导致数据库重启。

 

FATAL,XX000,semop(id=157450268) failed: Invalid argument

FATAL,XX000,semop(id=157843496) failed: Invalid argument

PANIC,XX000,queueing for lock while waiting on another one

terminating any other active server processes

WARNING,57P02,terminating connection because of crash of another server process,The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.,In a moment you should be able to reconnect to the database and repeat your command.

archiver process (PID 3766) exited with exit code 1

FATAL,57P03,the database system is in recovery mode

all server processes terminated; reinitializing

could not remove shared memory segment /PostgreSQL.44345806: No such file or directory

semctl(156532736, 0, IPC_RMID, ...) failed: Invalid argument

semctl(156565505, 0, IPC_RMID, ...) failed: Invalid argument

FATAL,57P03,the database system is in recovery mode

database system was interrupted; last known up at 2018-12-27 04:54:36 CST

database system was not properly shut down; automatic recovery in progress

redo starts at 5E7/7036BD30

FATAL,57P03,the database system is in recovery mode

invalid record length at 5E7/75359EC8

redo done at 5E7/75359EA0

last completed transaction was at log time 2018-12-27 05:06:26.652179+08

MultiXact member wraparound protections are now enabled

autovacuum launcher started

database system is ready to accept connections

 

问题原因

造成该问题的原因为参数RemoveIPC被设置为yes。

 

RemoveIPC参数在/etc/systemd/logind.conf中控制在用户完全注销时是否删除System V IPC对象。

该参数在 systemd 212(2014-03-25)版本中默认打开,RHEL7从219版本开始。显然,RHEL7中的该参数是默认关闭的。

当RemoveIPC = yes时,PostgreSQL服务器使用的信号量对象在随机时间被删除,导致服务器崩溃,出现类似的日志:

LOG: semctl(1234567890, 0, IPC_RMID, ...) failed: Invalid argument

attached状态的共享内存段不会被清理,所以systemd不会清理正在被使用的共享内存段,但信号量没有进程attached的概念,所以即使它们实际上仍在使用中它们也会被清理干净。

解决方案

在所有PostgreSQL的服务器主机上确认参数RemoveIPC的值。

1)修改/etc/systemd/logind.conf文件中参数

vi /etc/systemd/logind.conf  

RemoveIPC=no

 

更多解决方案请登陆【瀚高支持平台查看】 https://support.highgo.com/#/index/docContent/9a9643d502394501

 

你可能感兴趣的:(Highgo,DB)