oracle 11.2.0.3.7升级到11.2.0.3.11后数据库无法启动案例 - ORA-00600 kfioTranslateIO03和17090

1. 环境说明

有一批数据库准备上线,当时安装的版本是11.2.0.3,打了PSU到11.2.0.3.7,但目前该版本的最新PSU已经到了11了,为了避免上线后安全扫描等需要停机打补丁操作,所以干脆在上线前就将数据库打上最新的PSU到11.2.0.3.11(Patch ID:18522512)。

blog地址:http://blog.csdn.net/hw_libo/article/details/39672901


2. alert日志

Mon Sep 29 14:50:23 2014
ALTER DATABASE   MOUNT
Mon Sep 29 14:50:26 2014
Sweep [inc][280114]: completed
Sweep [inc][280113]: completed
Sweep [inc2][280114]: completed
Sweep [inc2][280113]: completed
NOTE: Loaded library: System 
ORA-15025: could not open disk "/dev/diskgroup/dg_ora"
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
SUCCESS: diskgroup DG_ORA was mounted
ERROR: failed to establish dependency between database NDADB and diskgroup resource ora.DG_ORA.dg
Errors in file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_ckpt_15674.trc  (incident=288113):
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/ndadb/NDADB/incident/incdir_288113/NDADB_ckpt_15674_i288113.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_ckpt_15674.trc  (incident=288114):
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/ndadb/NDADB/incident/incdir_288114/NDADB_ckpt_15674_i288114.trc
Dumping diagnostic data in directory=[cdmp_20140929145027], requested by (instance=1, osid=15674 (CKPT)), summary=[incident=288113].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 15674 
Dumping diagnostic data in directory=[cdmp_20140929145028], requested by (instance=1, osid=15674 (CKPT)), summary=[incident=288114].
PMON (ospid: 15585): terminating the instance due to error 469
System state dump requested by (instance=1, osid=15585 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_diag_15634.trc
Dumping diagnostic data in directory=[cdmp_20140929145030], requested by (instance=1, osid=15585 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 15585

查看状态:

NDADB01:~ # crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.DG_DATA.dg ora....up.type ONLINE    ONLINE    ndadb01     
ora.DG_ORA.dg  ora....up.type ONLINE    ONLINE    ndadb01     
ora....ER.lsnr ora....er.type ONLINE    ONLINE    ndadb01     
ora.asm        ora.asm.type   ONLINE    ONLINE    ndadb01     
ora.cssd       ora.cssd.type  ONLINE    ONLINE    ndadb01     
ora.diskmon    ora....on.type OFFLINE   OFFLINE               
ora.evmd       ora.evm.type   ONLINE    ONLINE    ndadb01     
ora.ons        ora.ons.type   OFFLINE   OFFLINE

说明:数据库是由VCS双机拉起的,所以这里是看不到rdbms资源组的。

并且查看了crs日志、asm日志均是正常的。


3. 根据MOS文档解决问题

在MOS中查到:

ORA-00600 [kfioTranslateIO03] [17090] (Doc ID 1336846.1)

关键检查点:
Case #1 ] Group permission of "oracle" executable from RDBMS home should have the same group information for ASM devices according to note 1084186.1.

$ ls -l $GRID_HOME/bin/oracle
-rwsr-s--x 1 grid oinstall 228954465 Jul 1 13:37 /oh1/grid/product/11.2.0/bin/oracle

$ ls -l $RDBMS_HOME/bin/oracle
-rwsr-s--x 1 oracle asmadmin 228954465 Jul 1 13:37 /oh1/oracle/product/11.2.0/bin/oracle
导致这个问题的原因在于oracle可执行文件的所在操作系统组必需要有ASM磁盘文件的读写权限。

解决办法:
Please execute the following action plan from note 1084186.1.

$ su - grid
$ cd /bin
$ ./setasmgidwrap o=<11.2 RDBMS Home>/bin/oracle

经检查,确实是oracle用户下的$ORACLE_HOME/bin/oracle文件权限不对了:

## grid用户的$ORACLE_HOME/bin/oracle权限是正确的
NDADB01:/dev/diskgroup # su - grid
grid@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 grid oinstall 204902468 2014-09-29 10:37 /opt/oracrs/product/11gR2/grid/bin/oracle

## oracle用户下的$ORACLE_HOME/bin/oracle文件权限不对
NDADB01:/dev/diskgroup # su - oracle 
oracle@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle
-rwxr-x--x 1 oracle oinstall 233461759 2014-09-29 11:53 /opt/oracle/product/11gR2/db/bin/oracle
## 正确应该为:
oracle@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 oracle asmadmin 233461759 2014-09-29 15:39 /opt/oracle/product/11gR2/db/bin/oracle

根据MOS的文档,解决办法:

NDADB01:/dev/diskgroup # su - grid
grid@NDADB01:~> cd $ORACLE_HOME/bin
grid@NDADB01:/opt/oracrs/product/11gR2/grid/bin> ./setasmgidwrap o=/opt/oracle/product/11gR2/db/bin/oracle  ##这里指定的是oracle用户下的$ORACLE_HOME/bin/oracle
grid@NDADB01:/opt/oracrs/product/11gR2/grid/bin> ls -l /opt/oracle/product/11gR2/db/bin/oracle 
-rwsr-s--x 1 oracle asmadmin 233461759 2014-09-29 11:45 /opt/oracle/product/11gR2/db/bin/oracle

说明:这个文件的权限,我使用chmod u+s和chmod g+s等手工更正了文件权限,但数据库还是无法启动的,问题不能得到解决。

然后重启has(我这里是HA双机,而非RAC):

NDADB01:~ # crsctl stop has -f

NDADB01:~ # crsctl start has

经检查,数据库状态正常,数据也没有丢失,问题解决。

blog地址:http://blog.csdn.net/hw_libo/article/details/39672901

-- Bosco  QQ:375612082

---- END ----


你可能感兴趣的:(Oracle)