Oracle数据库出错ORA-27303,客户端无法连接!

文章目录

  • 现象
  • 分析
  • 结论
  • 解决过程
  • 回顾

现象

安装了我们公司的产品后客户的数据库突然出现客户端无法连接。检查错误日志出现了大量的ORA-27300错误,下面是出现第一次ORA-27300的记录。

Thu Dec 31 15:49:14 2020
Archived Log entry 1248193 added for thread 1 sequence 1248219 ID 0x348d2cd1 dest 1:
Thu Dec 31 15:49:39 2020
Errors in file /odata/oracle/app/diag/rdbms/wind/WIND/trace/WIND_j000_125990.trc:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
Thu Dec 31 15:49:40 2020

检查red hat 7的系统日志:

Dec 31 15:40:01 db1-gss systemd: Starting Session 63477 of user root.
Dec 31 15:49:20 db1-gss systemd: Reloading.
Dec 31 15:49:20 db1-gss systemd-sysv-generator[125498]: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Reloading.
Dec 31 15:49:20 db1-gss systemd-sysv-generator[125515]: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Configuration file /usr/lib/systemd/system/dbackup3-agent.service is marked world-writable. Please remove world writability permission bits. Proceeding anyway.
Dec 31 15:49:20 db1-gss systemd: Started dbackup3 agent daemon.
Dec 31 15:49:20 db1-gss systemd: Starting dbackup3 agent daemon...
Dec 31 15:49:21 db1-gss systemd: Stopping dbackup3 agent daemon...
Dec 31 15:49:21 db1-gss systemd: Started dbackup3 agent daemon.

数据库错误日志中的其他信息:

: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
Process J000 died, see its trace file
Thu Dec 31 16:36:55 2020
kkjcre1p: unable to spawn jobq slave process
Thu Dec 31 16:36:55 2020
Errors in file :
Thu Dec 31 16:36:56 2020
Errors in file /odata/oracle/app/diag/rdbms/wind/WIND/trace/WIND_j000_348517.trc:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)
Process J000 died, see its trace file
Thu Dec 31 16:36:57 2020

trace文件中的信息:

Trace file /odata/oracle/app/diag/rdbms/wind/WIND/trace/WIND_j000_348517.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
ORACLE_HOME = /odata/oracle/app/oracle/product/12.1.0/dbhome_1
System name:    Linux
Node name:      db1-gss
Release:        3.10.0-693.el7.x86_64
Version:        #1 SMP Thu Jul 6 19:56:57 EDT 2017
Machine:        x86_64
Instance name: WIND
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 348517, image:


*** 2020-12-31 16:36:56.427
Unexpected error 27140 in job slave process
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid = 1001 (oinstall)

分析

从现象上看,客户的数据库正好是我们的产品第一次安装的时候了问题,所以客户怀疑是我们的软件问题。从数据库系统的日志看数据库上次启动时间是Thu Mar 05 22:27:32 2020,已经9个月没有启动了。
从trace记录里面分析,oracle进程9个月前启动的时候egid(有效组id)是1000 (dba),当前是1001 (oinstall)。
检查oracle执行文件的属性

[oracle@db1-gss trace]$ ls -l /odata/oracle/app/oracle/product/12.1.0/dbhome_1/bin/oracle
-rwsr-s--x. 1 oracle oinstall 323649840 Dec 27  2019 /odata/oracle/app/oracle/product/12.1.0/dbhome_1/bin/oracl

发现组是oinstall。
再检查正在运行的oracle进程,发现组号是1000

[root@db1-gss ~]# ps -eo pid,stat,pri,uid,gid,cmd |grep oracle
 56474 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
 72953 S     19     0     0 su - oracle
 97310 S+    19     0     0 grep --color=auto oracle
198434 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
198452 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
311705 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
313106 Ssl   19  1000  1000 /odata/oracle/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit
313475 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
327504 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
327637 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
331933 Ss    19  1000  1000 oracleWIND (LOCAL=NO)
338102 Ss    19  1000  1000 oracleWIND (LOCAL=NO)

检查oracle用户,主组是1000(dba)

# id oracle
uid=1000(oracle) gid=1000(dba) groups=1000(dba),1001(oinstall)
[root@db1-gss ~]#

结论

文件系统里面的oracle执行程序的组是oinstall,而oracle用户的组是dba,因此当我们软件安装的时候需要产生一个从进程(spawn jobq slave process)是dba,和之前的进程组oinstall不同,造成冲突

解决过程

关闭数据库,将oracle用户的主组从dba改成oinstall,

[root@oracle18 orcl]# id oracle
uid=54321(oracle) gid=54322(dba) groups=54322(dba),54321(oinstall),54323(oper),54324(backupdba),54325(dgdba),54326(kmdba),54330(racdba)
[root@oracle18 orcl]# usermod -g  oinstall oracle
[root@oracle18 orcl]# id oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54323(oper),54324(backupdba),54325(dgdba),54326(kmdba),54330(racdba)
[root@oracle18 orcl]# 

结果关机关不了

SQL> shutdown immediate;
ERROR:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (dba), current egid =
1001 (oinstall)

结果关机关不了,修改用户和oracle执行文件的属主成错误的组号后才关数据库。
再把组号改成正确的,再启动数据库,成功!
完成后客户连接不进来,原来监听的进程也是同样的问题,组号不对,重新启动后解决。

回顾

又是帮客户背锅,他们修改了组id,自己不知道,我们的产品启动了一次从进程就暴露了!

你可能感兴趣的:(oracle)