一. 故障描述
使用一个已经安装了AIX 6.1+HACMP+ORACLE RAC 10.2.0.5 环境创建备份,然后在一个新的POWER 5上使用这个备份进行操作系统全恢复.系统恢复后,ORACLE的环境已经有了,现在需要在这个环境下建立一个single db(单实例DB).
笔者尝试通过手工建立单实例DB的方法,启动数据库实例到nomount状态时,报错:
SQL> startup nomount;
ORA-29702: error occurred in Cluster Group Service operation
无法进行下面的数据库创建工作.
二. 故障分析
首先查看数据库警告日志文件,内容如下:
$ more alert_test1.log
Tue May 10 07:28:58 GMT+08:00 2011Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Tue May 10 07:28:59 GMT+08:00 2011Errors in file /oracle/admin/test1/udump/test1_ora_2425116.trc:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:skgxnqtsz failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: SKGXN not av
clsssinit ret = 21
interconnect information is not available from OCR
WARNING: No cluster interconnect has been specified. Depending on
the communication driver configured Oracle cluster traffic
may be directed to the public interface of this machine.
Oracle recommends that RAC clustered databases be co# more
: A file or directory in the path name does not exist.
数据库警告日志中涉及了/oracle/admin/test1/udump/test1_ora_2425116.trc,查看改trace文件,内容如下:
$ more /oracle/admin/test1/udump/test1_ora_2425116.trc
/oracle/admin/test1/udump/test1_ora_2425116.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
Oracle process number: 0
Unix process pid: 2425116, image: oracle@localhost
*** 2011-05-10 07:28:58.792
Number of resource hash buckets is 16
Parsing user specified table space list to be ignored
2011-05-10 07:28:59.810: [ COMMCRS]clsc_set_clsd_NS_trace: called before init completed
2011-05-10 07:28:59.906: [ CSSCLNT]clsssinit: error(32 PROC-32: Cluster Ready Services on the local node is not running Messaging er
ror [9]) in OCR initialization
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:skgxnqtsz failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: SKGXN not av
Number of resource hash buckets is 16
* kjfcnfy: kjinumbuckets = 8
Dynamic strand is set to TRUE
Running with 2 shared and 48 private strand(s). Zero-copy redo is FALSE
在metalink上搜索了ORA-29702,找到了一个非常有价值的文章,内容如下:
Starting Instance Fails with ORA-29702 [ID 216030.1]
修改时间 16-SEP-2010 类型 PROBLEM 状态 PUBLISHED
fact: Oracle Server Enterprise Edition 8
fact: Oracle Parallel Server (OPS)
fact: AIX-Based Systems
symptom: Starting database fails
symptom: ORA-29702: error occurred in Cluster Group Service operation
symptom: Mounting database in Non-OPS (exclusive) mode
symptom: Environment not configured for OPS
cause: Oracle in IBM RS6000 SP installs the Parallel Server Option as the
default option. As a result, Oracle tries to communicate with GMS or Cluster
Manager during startup, but the environment is not configured to work in
Parallel Server mode.
fix:
Relink Oracle to disable Parallel Server Option:
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk no_parropt
$ make -f ins_rdbms.mk install
这个文档说的情况与本问题的情况吻合,需要解决的问题就是取消并行模式(创建单节点DB,而非RAC DB).
三. 解决方法
执行metalink文档中的fix的命令,取消并行服务器模式。
$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk no_parropt
rm -f /oracle/product/10.2.0/db/lib/libskgxp10.a
cp /oracle/product/10.2.0/db/lib//libskgxpd.a /oracle/product/10.2.0/db/lib/libskgxp10.a
rm -f /oracle/product/10.2.0/db/lib/libskgxn2.a
cp /oracle/product/10.2.0/db/lib//libskgxns.a /oracle/product/10.2.0/db/lib/libskgxn2.a
/bin/ar -X64 d /oracle/product/10.2.0/db/rdbms/lib/libknlopt.a kcsm.o
/bin/ar -X64 cr /oracle/product/10.2.0/db/rdbms/lib/libknlopt.a /oracle/product/10.2.0/db/rdbms/lib/ksnkcs.o
Target "no_parropt" is up to date.
$ make -f ins_rdbms.mk install
chmod 755 /oracle/product/10.2.0/db/bin
rm -f oracle dbv tstshm maxmem orapwd dbfsize cursize genoci extproc extproc32 hsalloci hsots hsdepxa dgmgrl dumpsga mapsga osh sbttest expdp impdp imp exp sqlldr rman hsodbc tg4sybs nid extjob extjobo genezi ikfod grdcscan /oracle/product/10.2.0/db/rdbms/lib/ksms.s /oracle/product/10.2.0/db/rdbms/lib/ksms.o
- Linking DB*Verify utility (dbv)
……
/bin/ar -X64 t /oracle/product/10.2.0/db/rdbms/lib/libknlopt.a | grep '^'kcsm.o > /dev/null 2>&1 ; then echo "-lha_gs_r -lha_em_r -lpthreads"; fi` -locijdbcst10 -lwwg -bpT:0x100000000 -bpD:0x110000000 -bforceimprw
mv -f /oracle/product/10.2.0/db/bin/oracle /oracle/product/10.2.0/db/bin/oracleO
mv /oracle/product/10.2.0/db/rdbms/lib/oracle /oracle/product/10.2.0/db/bin/oracle
chmod 6751 /oracle/product/10.2.0/db/bin/oracle
Target "install" is up to date.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 1073741824 bytes
Fixed Size 2101912 bytes
Variable Size 545262952 bytes
Database Buffers 524288000 bytes
Redo Buffers 2088960 bytes
到此,问题得到解决,数据库可以启动到nomount状态,可以进行下面的手工创建数据库的操作了。