ORA-29702


Instances Abort With ORA-29702 When The Server is rebooted or shut down [ID 752399.1]

--------------------------------------------------------------------------------

Modified 23-MAR-2009 Type PROBLEM Status PUBLISHED

In this Document
Symptoms
Cause
Solution
References

--------------------------------------------------------------------------------

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4
Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7
Linux x86-64

Symptoms
When the server is rebooted or shutdown all instances on the server abort with ORA-29702.

Last log entries in the alert.log look like:
Error: KGXGN aborts the instance (6)
Errors in file /HOME/oracle/admin/+ASM/bdump/+asm1_lmon_8981.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702
System state dump is made for local instance
System State dumped to trace file /HOME/oracle/admin/+ASM/bdump/+asm1_diag_8977.
trc
Trace dumping is performing id=[cdmp_20080929110159]


Cause
When doing a reboot or a shutdown of a server the K96init.crs is called after the operating system stops other services like network, therefore instances on the server crash due to losing the private interconnect.

Solution
Replace the K96 links with K19 links.

Please run the following steps on all nodes in the cluster:


Shutdown the clusterware.
Run this script to replace K96 links with K19 links as root.
Start up the Clusterware

Script:
RC_START=S96
RC_KILL=K19
RC_KILL_OLD=K96
RCSDIR="/etc/rc.d/rc3.d /etc/rc.d/rc5.d"
RCKDIR="/etc/rc.d/rc0.d /etc/rc.d/rc1.d /etc/rc.d/rc2.d /etc/rc.d/rc4.d
/etc/rc.d/rc6.d"
RCALLDIR="/etc/rc.d/rc0.d /etc/rc.d/rc1.d /etc/rc.d/rc2.d /etc/rc.d/rc3.d
/etc/rc.d/rc4.d /etc/rc.d/rc5.d /etc/rc.d/rc6.d"
ID=/etc/init.d
if [ -z "$RMF" ]; then RMF="/bin/rm -f"; fi
if [ -z "$LNS" ]; then LNS="/bin/ln -s"; fi
if [ -z "$ECHO" ]; then ECHO=/bin/echo; fi
# Clean up any old init.crs scripts
for rc in $RCALLDIR
do
$RMF $rc/"$RC_START"init.crs
$RMF $rc/"$RC_KILL"init.crs
$RMF $rc/"$RC_KILL_OLD"init.crs
done
# Install new ones
for rc in $RCSDIR
do
$LNS $ID/init.crs $rc/"$RC_START"init.crs || { $ECHO $?; exit 1; }
done
for rc in $RCKDIR
do
$LNS $ID/init.crs $rc/"$RC_KILL"init.crs || { $ECHO $?; exit 1; }
done

References
BUG:7326677 - WHEN NODE IS REBOOTED, RAC ASM CRASHES DURING SHUTDOWN WITH ORA-29702
BUG:7496341 - FIX FOR 4587300 DOESN'T EXIST IN 11.1.0.7
//////////////////////////////////////////////////////////////////////////////////////
One of the Instances Fails to Start After Reboot With ORA-29702 [ID 788455.1]

--------------------------------------------------------------------------------

Modified 26-JUN-2009 Type PROBLEM Status REVIEWED

In this Document
Symptoms
Cause
Solution
References

--------------------------------------------------------------------------------

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.4 to 10.2.0.4
This problem can occur on any platform.
Oracle Server Enterprise Edition - Version: 10.2.0.4 to 10.2.0.4
Symptoms
After a node is rebooted, one of the instances fails to start automatically, but can be started manually.

Database Alert log reports:
==================
Thu Feb 5 15:19:07 2009
Error: KGXGN aborts the instance (6)
Thu Feb 5 15:19:07 2009
Errors in file
/oneport/apps/oracle/admin/dbname/bdump/<instance>_lmon_14390.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702
Thu Feb 5 15:19:07 2009
System state dump is made for local instance
System State dumped to trace file
/oneport/apps/oracle/admin/dbname/bdump/<instance>_diag_14386.trc
Thu Feb 5 15:19:08 2009
Trace dumping is performing id=[cdmp_20090205151907]
Thu Feb 5 15:19:13 2009
Instance terminated by LMON, pid = 14390
Thu Feb 5 15:38:08 2009
Starting ORACLE instance (normal)

From <instance>_diag_14386.trc, we see:
===========================
*** 2009-02-05 15:19:07.218
2009-02-05 15:19:07.217: [ CSSCLNT]clsssRecvMsg: comm error received, comrc
11, con (1068dea10), msg (ffffffff7fffd9e8), msgl 144
2009-02-05 15:19:07.289: [ CSSCLNT]clssgsGGetStatus: communications failed
(0/3/1)
2009-02-05 15:19:07.289: [ CSSCLNT]clssgsGGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSS
CM problem, please abort
*** 2009-02-05 15:19:07.289
Node monitor becomes unavailable for service
2009-02-05 15:19:07.497: [ CSSCLNT]clsssRecvMsg: comm error received, comrc
11, con (1068dea10), msg (ffffffff7fffd9e8), msgl 144
2009-02-05 15:19:07.498: [ CSSCLNT]clssgsGGetStatus: communications failed
(0/3/1)
2009-02-05 15:19:07.498: [ CSSCLNT]clssgsGGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSS
CM problem, please abort


The crsd.log shows the database instances being started before the ASM instances.


ora.<dbname>.<instance>.inst.log shortly after the reboot, reports:


startup
ORA-1565 error in identifying +<ASM disk>/../<spfile>


Cause
The instances were missing the ASM dependency.

crs_stat -p ora.<dbname>.<instance>.inst shows that REQUIRED_RESOURCE is empty. It should contain the name of the ASM resource to ensure that ASM is started before the database instance.

Solution
Add ASM dependency to instances manually:

srvctl modify instance -d <database name> -i <instance name> -s <ASM instance name>

References
NOTE:387217.1 - INSTANCE NOT STARTING AUTOMATICALLY BY CRS WHEN ASM IS USED

你可能感兴趣的:(ora)