修改时间:2014-2-17类型:BULLETIN |
|
文档内容
适用于:Oracle Database - Enterprise Edition - 版本 11.2.0.1 和更高版本本文档所含信息适用于所有平台 用途本文提供了诊断 11GR2 和 12C Grid Infrastructure 启动问题的方法。对于新安装的环境(root.sh 和 rootupgrade.sh 执行过程中)和有故障的旧环境都适用。针对 root.sh 的问题,我们可以参考 note 1053970.1 来获取更多的信息。 适用范围本文适用于集群/RAC数据库管理员和 Oracle 支持工程师。 详细信息启动顺序:简而言之,操作系统负责启动 ohasd 进程,ohasd 进程启动 agents 用来启动守护进程(gipcd, mdnsd, gpnpd, ctssd, ocssd, crsd, evmd ,asm …) ,crsd 启动 agents 用来启动用户资源(database,SCAN,Listener 等)。 集群状态
$GRID_HOME/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online $GRID_HOME/bin/crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE rac1 Started ora.crsd 1 ONLINE ONLINE rac1 ora.cssd 1 ONLINE ONLINE rac1 ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE ONLINE rac1 OBSERVER ora.diskmon 1 ONLINE ONLINE rac1 ora.drivers.acfs 1 ONLINE ONLINE rac1 ora.evmd 1 ONLINE ONLINE rac1 ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1 对于11.2.0.2 和以上的版本,会有以下两个额外的进程: ora.cluster_interconnect.haip 1 ONLINE ONLINE rac1 ora.crf 1 ONLINE ONLINE rac1 对于11.2.0.3 以上的非EXADATA的系统,ora.diskmon会处于offline的状态,如下: 对于 12c 以上的版本, 会出现ora.storage资源: ora.storage
$GRID_HOME/bin/crsctl start res ora.crsd -init
问题 1: OHASD 无法启动
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
cat /etc/inittab|grep init.ohasd
h1: 35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1
who -r
ps -ef|grep init.ohasd|grep -v grep
root 2279 1 0 18:14 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run 注意:Oracle Linux (OL6)以及 Red Hat Linux 6 (RHEL6) 已经不再支持 inittab 了,所以 init.ohasd 会被配置在 /etc/init 中,并被 /etc/init 启动,尽管如此,我们还是应该能看到进程 "/etc/init.d/init.ohasd run" 被启动;
cd nohup ./init.ohasd run &
$GRID_HOME/bin/crsctl enable crs
$GRID_HOME/bin/crsctl config crs
Feb 29 16:20:36 racnode1 logger: Oracle Cluster Ready Services startup disabled.
Feb 29 16:20:36 racnode1 logger: Could not access /var/opt/oracle/scls_scr/racnode1/root/ohasdstr
Jan 20 20:46:51 rac1 logger: Oracle HA daemon is enabled for autostart.
case `$CAT $AUTOSTARTFILE` in
enable*) $LOGERR "Oracle HA daemon is enabled for autostart."
case `$CAT $AUTOSTARTFILE` in
enable*) /bin/touch /tmp/ohasd.start."`date`" $LOGERR "Oracle HA daemon is enabled for autostart."
case `$CAT $AUTOSTARTFILE` in
enable*) $LOGERR "Oracle HA daemon is enabled for autostart."
case `$CAT $AUTOSTARTFILE` in
enable*) /bin/sleep 120 $LOGERR "Oracle HA daemon is enabled for autostart."
Jan 20 20:46:51 rac1 logger: Oracle HA daemon is enabled for autostart.
.. Jan 20 20:46:57 rac1 logger: exec /ocw/grid/perl/bin/perl -I/ocw/grid/perl/lib /ocw/grid/bin/crswrapexece.pl /ocw/grid/crs/install/s_crsconfig_rac1_env.txt /ocw/grid/bin/ohasd.bin "reboot"
ls -l $GRID_HOME/cdata/*.olr
-rw------- 1 root oinstall 272756736 Feb 2 18:20 rac1.olr
..
2010-01-24 22:59:10.470: [ default][1373676464] Initializing OLR 2010-01-24 22:59:10.472: [ OCROSD][1373676464]utopen:6m':failed in stat OCR file/disk /ocw/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory 2010-01-24 22:59:10.472: [ OCROSD][1373676464]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory 2010-01-24 22:59:10.473: [ OCRRAW][1373676464]proprinit: Could not open raw device 2010-01-24 22:59:10.473: [ OCRAPI][1373676464]a_init:16!: Backend init unsuccessful : [26] 2010-01-24 22:59:10.473: [ CRSOCR][1373676464] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] 2010-01-24 22:59:10.473: [ default][1373676464] OLR initalization failured, rc=26 2010-01-24 22:59:10.474: [ default][1373676464]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry 2010-01-24 22:59:10.474: [ default][1373676464][PANIC] OHASD exiting; Could not init OLR
..
2010-01-24 23:01:46.275: [ OCROSD][1228334000]utread:3: Problem reading buffer 1907f000 buflen 4096 retval 0 phy_offset 102400 retry 5 2010-01-24 23:01:46.275: [ OCRRAW][1228334000]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format. 2010-01-24 23:01:46.275: [ OCRRAW][1228334000]proprioini: all disks are not OCR/OLR formatted 2010-01-24 23:01:46.275: [ OCRRAW][1228334000]proprinit: Could not open raw device 2010-01-24 23:01:46.275: [ OCRAPI][1228334000]a_init:16!: Backend init unsuccessful : [26] 2010-01-24 23:01:46.276: [ CRSOCR][1228334000] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage 2010-01-24 23:01:46.276: [ default][1228334000] OLR initalization failured, rc=26 2010-01-24 23:01:46.276: [ default][1228334000]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry 2010-01-24 23:01:46.277: [ default][1228334000][PANIC] OHASD exiting; Could not init OLR
..
2010-11-07 03:00:08.932: [ default][1] Created alert : (:OHAS00102:) : OHASD is not running as privileged user 2010-11-07 03:00:08.932: [ default][1][PANIC] OHASD exiting: must be run as privileged user
ohasd.bin comes up but output of "crsctl stat res -t -init"shows
no resource, and "ocrconfig -local -manualbackup" fails
..
2010-08-04 13:13:11.102: [ CRSPE][35] Resources parsed 2010-08-04 13:13:11.103: [ CRSPE][35] Server [] has been registered with the PE data model 2010-08-04 13:13:11.103: [ CRSPE][35] STARTUPCMD_REQ = false: 2010-08-04 13:13:11.103: [ CRSPE][35] Server [] has changed state from [Invalid/unitialized] to [VISIBLE] 2010-08-04 13:13:11.103: [ CRSOCR][31] Multi Write Batch processing... 2010-08-04 13:13:11.103: [ default][35] Dump State Starting ... .. 2010-08-04 13:13:11.112: [ CRSPE][35] SERVERS: :VISIBLE:address{{Absolute|Node:0|Process:-1|Type:1}}; recovered state:VISIBLE. Assigned to no pool ------------- SERVER POOLS: Free [min:0][max:-1][importance:0] NO SERVERS ASSIGNED 2010-08-04 13:13:11.113: [ CRSPE][35] Dumping ICE contents...:ICE operation count: 0 2010-08-04 13:13:11.113: [ default][35] Dump State Done.
2010-06-29 10:31:01.570: [ COMMCRS][1206901056]clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))
2010-06-29 10:31:01.571: [ OCRSRV][1217390912] th_listen: CLSCLISTEN failed clsc_ret= 3, addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))] 2010-06-29 10:31:01.571: [ OCRSRV][3267002960]th_init: Local listener did not reach valid state
Feb 20 10:47:08 racnode1 OHASD[9566]: OHASD exiting; Directory /ocw/grid/log/racnode1/ohasd not found.
..
15058/1: 0.1995 close(2147483646) Err#9 EBADF 15058/1: 0.1996 close(2147483645) Err#9 EBADF ..
_close sclssutl_closefiledescriptors main ..
12. ohasd.bin 正常启动,但是, "crsctl check crs" 只显示以下一行信息:
CRS-4638: Oracle High Availability Services is online
并且命令 "crsctl stat res -p -init" 无法显示任何信息 这个问题是由于 OLR 损坏导致的,请参考 note 1193643.1 进行恢复。 13. 如果 ohasd 仍然无法启动,请参见 ohasd 的日志 问题 2: OHASD Agents 未启动
2011-05-03 11:11:13.189
[ohasd(25303)] CRS-5828:Could not start agent '/ocw/grid/bin/orarootagent_grid'. Details at (:CRSAGF00130:) {0:0:2} in /ocw/grid/log/racnode1/ohasd/ohasd.log. 2011-05-03 12:03:17.491: [ AGFW][1117866336] {0:0:184} Created alert : (:CRSAGF00130:) : Failed to start the agent /ocw/grid/bin/orarootagent_grid 2011-05-03 12:03:17.491: [ AGFW][1117866336] {0:0:184} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.diskmon 1 1] ID 4098:403 2011-05-03 12:03:17.491: [ AGFW][1117866336] {0:0:184} Can not stop the agent: /ocw/grid/bin/orarootagent_grid because pid is not initialized .. 2011-05-03 12:03:17.492: [ CRSPE][1128372576] {0:0:184} Fatal Error from AGFW Proxy: Unable to start the agent process 2011-05-03 12:03:17.492: [ CRSPE][1128372576] {0:0:184} CRS-2674: Start of 'ora.diskmon' on 'racnode1' failed .. 2011-06-27 22:34:57.805: [ AGFW][1131669824] {0:0:2} Created alert : (:CRSAGF00123:) : Failed to start the agent process: /ocw/grid/bin/cssdagent Category: -1 Operation: fail Loc: canexec2 OS error: 0 Other : no exe permission, file [/ocw/grid/bin/cssdagent] 2011-06-27 22:34:57.805: [ AGFW][1131669824] {0:0:2} Created alert : (:CRSAGF00126:) : Agent start failed .. 2011-06-27 22:34:57.806: [ AGFW][1131669824] {0:0:2} Created alert : (:CRSAGF00123:) : Failed to start the agent process: /ocw/grid/bin/cssdmonitor Category: -1 Operation: fail Loc: canexec2 OS error: 0 Other : no exe permission, file [/ocw/grid/bin/cssdmonitor]
问题 3: OCSSD.BIN 无法启动
2010-02-02 18:00:16.251: [ GPnP][408926240]clsgpnpm_exchange: [at clsgpnpm.c:1175] Calling "ipc://GPNPD_rac1", try 4 of 500...
2010-02-02 18:00:16.263: [ GPnP][408926240]clsgpnp_profileVerifyForCall: [at clsgpnp.c:1867] Result: (87) CLSGPNP_SIG_VALPEER. Profile verified. prf=0x165160d0 2010-02-02 18:00:16.263: [ GPnP][408926240]clsgpnp_profileGetSequenceRef: [at clsgpnp.c:841] Result: (0) CLSGPNP_OK. seq of p=0x165160d0 is '6'=6 2010-02-02 18:00:16.263: [ GPnP][408926240]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2186] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_rac1" disco ""
2010-02-03 22:26:17.057: [ GPnP][3852126240]clsgpnpm_connect: [at clsgpnpm.c:1100] GIPC gipcretConnectionRefused (29) gipcConnect(ipc-ipc://GPNPD_rac1)
2010-02-03 22:26:17.057: [ GPnP][3852126240]clsgpnpm_connect: [at clsgpnpm.c:1101] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url "ipc://GPNPD_rac1" 2010-02-03 22:26:17.057: [ GPnP][3852126240]clsgpnp_getProfileEx: [at clsgpnp.c:546] Result: (13) CLSGPNP_NO_DAEMON. Can't get GPnP service profile from local GPnP daemon 2010-02-03 22:26:17.057: [ default][3852126240]Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 2010-02-03 22:26:17.057: [ CSSD][3852126240] clsgpnp_getProfile failed, rc(13)
2010-02-03 22:37:22.212: [ CSSD][2330355744]clssnmReadDiscoveryProfile: voting file discovery string(/share/storage/di*)
.. 2010-02-03 22:37:22.227: [ CSSD][1145538880] clssnmvDiskVerify: Successful discovery of 0 disks 2010-02-03 22:37:22.227: [ CSSD][1145538880]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2010-02-03 22:37:22.227: [ CSSD][1145538880]clssnmvFindInitialConfigs: No voting files found 2010-02-03 22:37:22.228: [ CSSD][1145538880]################################### 2010-02-03 22:37:22.228: [ CSSD][1145538880]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2010-05-02 03:11:19.033: [ CSSD][1197668093]clssnmCompleteInitVFDiscovery: Detected voting file add in progress for CIN 0:1134513465:0, waiting for configuration to complete 0:1134513098:0
2010-02-03 23:26:25.804: [GIPCXCPT][1206540320]gipcmodGipcPassInitializeNetwork: failed to find any interfaces in clsinet, ret gipcretFail (1)
2010-02-03 23:26:25.804: [GIPCGMOD][1206540320]gipcmodGipcPassInitializeNetwork: EXCEPTION[ ret gipcretFail (1) ] failed to determine host from clsinet, using default .. 2010-02-03 23:26:25.810: [ CSSD][1206540320]clsssclsnrsetup: gipcEndpoint failed, rc 39 2010-02-03 23:26:25.811: [ CSSD][1206540320]clssnmOpenGIPCEndp: failed to listen on gipc addr gipc://rac1:nm_eotcs- ret 39 2010-02-03 23:26:25.811: [ CSSD][1206540320]clssscmain: failed to open gipc endp
2010-09-20 11:52:54.014: [ CSSD][1103055168]clssnmvDHBValidateNCopy: node 1, racnode1,
has a disk HB, but no network HB, DHB has rcfg 180441784, wrtcnt, 453, LATS 328297844, lastSeqNo 452, uniqueness 1284979488, timestamp 1284979973/329344894
2010-09-20 11:52:54.016: [ CSSD][1078421824]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 .. >>>> after a long delay 2010-09-20 12:02:39.578: [ CSSD][1103055168]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 180441784, wrtcnt, 1037, LATS 328883434, lastSeqNo 1036, uniqueness 1284979488, timestamp 1284980558/329930254 2010-09-20 12:02:39.895: [ CSSD][1107286336]clssgmExecuteClientRequest: MAINT recvd from proc 2 (0xe1ad870) 2010-09-20 12:02:39.895: [ CSSD][1107286336]clssgmShutDown: Received abortive shutdown request from client. 2010-09-20 12:02:39.895: [ CSSD][1107286336]################################### 2010-09-20 12:02:39.895: [ CSSD][1107286336]clssscExit: CSSD aborting from thread GMClientListener 2010-09-20 12:02:39.895: [ CSSD][1107286336]###################################
$GRID_HOME/bin/lsnodes -n racnode1 1 racnode1 0
2010-08-30 18:28:13.207: [ CSSD][36]clssnm_skgxninit: skgxncin failed, will retry
2010-08-30 18:28:14.207: [ CSSD][36]clssnm_skgxnmon: skgxn init failed 2010-08-30 18:28:14.208: [ CSSD][36]################################### 2010-08-30 18:28:14.208: [ CSSD][36]clssscExit: CSSD signal 11 in thread skgxnmon
$INSTALL_SOURCE/install/lsnodes -v
5. 在错误的 GRID_HOME 下执行命令"crsctl"
2012-11-14 10:21:44.014: [ CSSD][1086675264]ASSERT clssnm1.c 3248
2012-11-14 10:21:44.014: [ CSSD][1086675264](:CSSNM00056:)clssnmvStartDiscovery: Terminating because of the release version(11.2.0.2.0) of this node being lesser than the active version(11.2.0.3.0) that the cluster is at 2012-11-14 10:21:44.014: [ CSSD][1086675264]################################### 2012-11-14 10:21:44.014: [ CSSD][1086675264]clssscExit: CSSD aborting from thread clssnmvDDiscThread#
问题 4: CRSD.BIN 无法启动
2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clssscConnect: gipc request failed with 29 (0x16)
2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clsssInitNative: connect failed, rc 29 2010-02-03 22:37:51.639: [ CRSRTI][1548456880] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-02-03 22:22:55.186: [ OCRASM][2603807664]proprasmo: Error in open/create file in dg [GI]
[ OCRASM][2603807664]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge ORA-15077: could not locate ASM instance serving a required diskgroup 2010-02-03 22:22:55.189: [ OCRASM][2603807664]proprasmo: kgfoCheckMount returned [7] 2010-02-03 22:22:55.189: [ OCRASM][2603807664]proprasmo: The ASM instance is down 2010-02-03 22:22:55.190: [ OCRRAW][2603807664]proprioo: Failed to open [+GI]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2010-02-03 22:22:55.190: [ OCRRAW][2603807664]proprioo: No OCR/OLR devices are usable 2010-02-03 22:22:55.190: [ OCRASM][2603807664]proprasmcl: asmhandle is NULL 2010-02-03 22:22:55.190: [ OCRRAW][2603807664]proprinit: Could not open raw device 2010-02-03 22:22:55.190: [ OCRASM][2603807664]proprasmcl: asmhandle is NULL 2010-02-03 22:22:55.190: [ OCRAPI][2603807664]a_init:16!: Backend init unsuccessful : [26] 2010-02-03 22:22:55.190: [ CRSOCR][2603807664] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge ORA-15077: could not locate ASM instance serving a required diskgroup ] [7] 2010-02-03 22:22:55.190: [ CRSD][2603807664][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-02-03 23:14:33.583: [ OCROSD][2346668976]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2010-02-03 23:14:33.583: [ OCRRAW][2346668976]proprinit: Could not open raw device 2010-02-03 23:14:33.583: [ default][2346668976]a_init:7!: Backend init unsuccessful : [26] 2010-02-03 23:14:34.587: [ OCROSD][2346668976]utopen:6m':failed in stat OCR file/disk /share/storage/ocr, errno=2, os err string=No such file or directory 2010-02-03 23:14:34.587: [ OCROSD][2346668976]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory 2010-02-03 23:14:34.587: [ OCRRAW][2346668976]proprinit: Could not open raw device 2010-02-03 23:14:34.587: [ default][2346668976]a_init:7!: Backend init unsuccessful : [26] 2010-02-03 23:14:35.589: [ CRSD][2346668976][PANIC] CRSD exiting: OCR device cannot be initialized, error: 1:26
2010-02-03 23:19:38.417: [ default][3360863152]a_init:7!: Backend init unsuccessful : [26]
2010-02-03 23:19:39.429: [ OCRRAW][3360863152]propriogid:1_2: INVALID FORMAT 2010-02-03 23:19:39.429: [ OCRRAW][3360863152]proprioini: all disks are not OCR/OLR formatted 2010-02-03 23:19:39.429: [ OCRRAW][3360863152]proprinit: Could not open raw device 2010-02-03 23:19:39.429: [ default][3360863152]a_init:7!: Backend init unsuccessful : [26] 2010-02-03 23:19:40.432: [ CRSD][3360863152][PANIC] CRSD exiting: OCR device cannot be initialized, error: 1:26
2010-03-10 11:45:12.510: [ OCRASM][611467760]proprasmo: Error in open/create file in dg [SYSTEMDG]
[ OCRASM][611467760]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=1031, loc=kgfokge ORA-01031: insufficient privileges 2010-03-10 11:45:12.528: [ OCRASM][611467760]proprasmo: kgfoCheckMount returned [7] 2010-03-10 11:45:12.529: [ OCRASM][611467760]proprasmo: The ASM instance is down 2010-03-10 11:45:12.529: [ OCRRAW][611467760]proprioo: Failed to open [+SYSTEMDG]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2010-03-10 11:45:12.529: [ OCRRAW][611467760]proprioo: No OCR/OLR devices are usable 2010-03-10 11:45:12.529: [ OCRASM][611467760]proprasmcl: asmhandle is NULL 2010-03-10 11:45:12.529: [ OCRRAW][611467760]proprinit: Could not open raw device 2010-03-10 11:45:12.529: [ OCRASM][611467760]proprasmcl: asmhandle is NULL 2010-03-10 11:45:12.529: [ OCRAPI][611467760]a_init:16!: Backend init unsuccessful : [26] 2010-03-10 11:45:12.530: [ CRSOCR][611467760] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=1031, loc=kgfokge ORA-01031: insufficient privileges ] [7]
2012-03-04 21:34:23.139: [ OCRASM][3301265904]proprasmo: Error in open/create file in dg [OCR]
[ OCRASM][3301265904]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=12547, loc=kgfokge 2012-03-04 21:34:23.139: [ OCRASM][3301265904]ASM Error Stack : ORA-12547: TNS:lost contact 2012-03-04 21:34:23.633: [ OCRASM][3301265904]proprasmo: kgfoCheckMount returned [7] 2012-03-04 21:34:23.633: [ OCRASM][3301265904]proprasmo: The ASM instance is down 2012-03-04 21:34:23.634: [ OCRRAW][3301265904]proprioo: Failed to open [+OCR]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2012-03-04 21:34:23.634: [ OCRRAW][3301265904]proprioo: No OCR/OLR devices are usable 2012-03-04 21:34:23.635: [ OCRASM][3301265904]proprasmcl: asmhandle is NULL 2012-03-04 21:34:23.636: [ GIPC][3301265904] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326] 2012-03-04 21:34:23.639: [ default][3301265904]clsvactversion:4: Retrieving Active Version from local storage. 2012-03-04 21:34:23.643: [ OCRRAW][3301265904]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required. 2012-03-04 21:34:23.645: [ OCRRAW][3301265904]proprinit: Could not open raw device 2012-03-04 21:34:23.646: [ OCRASM][3301265904]proprasmcl: asmhandle is NULL 2012-03-04 21:34:23.650: [ OCRAPI][3301265904]a_init:16!: Backend init unsuccessful : [26] 2012-03-04 21:34:23.651: [ CRSOCR][3301265904] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ORA-12547: TNS:lost contact 2012-03-04 21:34:23.652: [ CRSMAIN][3301265904] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage ORA-12547: TNS:lost contact 2012-03-04 21:34:23.652: [ CRSD][3301265904][PANIC] CRSD exiting: Could not init OCR, code: 26
-rwsr-s--x 1 grid oinstall 184431149 Feb 2 20:37 /ocw/grid/bin/oracle
2010-05-11 11:16:38.578: [ OCRASM][18]proprasmo: Error in open/create file in dg [OCRMIR]
[ OCRASM][18]SLOS : SLOS: cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCRMIR.255.4294967295 ORA-17503: ksfdopn:2 Failed to open file +OCRMIR.255.4294967295 ORA-15001: diskgroup "OCRMIR .. 2010-05-11 11:16:38.647: [ OCRASM][18]proprasmo: kgfoCheckMount returned [6] 2010-05-11 11:16:38.648: [ OCRASM][18]proprasmo: The ASM disk group OCRMIR is not found or not mounted 2010-05-11 11:16:38.648: [ OCRASM][18]proprasmdvch: Failed to open OCR location [+OCRMIR] error [26] 2010-05-11 11:16:38.648: [ OCRRAW][18]propriodvch: Error [8] returned device check for [+OCRMIR] 2010-05-11 11:16:38.648: [ OCRRAW][18]dev_replace: non-master could not verify the new disk (8) [ OCRSRV][18]proath_invalidate_action: Failed to replace [+OCRMIR] [8] [ OCRAPI][18]procr_ctx_set_invalid_no_abort: ctx set to invalid .. 2010-05-11 11:16:46.587: [ OCRMAS][19]th_master:91: Comparing device hash ids between local and master failed 2010-05-11 11:16:46.587: [ OCRMAS][19]th_master:91 Local dev (1862408427, 1028247821, 0, 0, 0) 2010-05-11 11:16:46.587: [ OCRMAS][19]th_master:91 Master dev (1862408427, 1859478705, 0, 0, 0) 2010-05-11 11:16:46.587: [ OCRMAS][19]th_master:9: Shutdown CacheLocal. my hash ids don't match [ OCRAPI][19]procr_ctx_set_invalid_no_abort: ctx set to invalid [ OCRAPI][19]procr_ctx_set_invalid: aborting... 2010-05-11 11:16:46.587: [ CRSD][19] Dump State Starting ...
2010-02-14 17:40:57.927: [ora.crsd][1243486528] [check] PID FILE doesn't exist.
.. 2010-02-14 17:41:57.927: [ clsdmt][1092499776]Creating PID [30269] file for home /ocw/grid host racnode1 bin crs to /ocw/grid/crs/init/ 2010-02-14 17:41:57.927: [ clsdmt][1092499776]Error3 -2 writing PID [30269] to the file [] 2010-02-14 17:41:57.927: [ clsdmt][1092499776]Failed to record pid for CRSD 2010-02-14 17:41:57.927: [ clsdmt][1092499776]Terminating process 2010-02-14 17:41:57.927: [ default][1092499776] CRSD exiting on stop request from clsdms_thdmai
2011-04-06 15:53:38.777: [ora.crsd][1160390976] [check] PID will be looked for in /ocw/grid/crs/init/racnode1.pid
2011-04-06 15:53:38.778: [ora.crsd][1160390976] [check] PID which will be monitored will be 1535 >> 1535 is output of "cat /ocw/grid/crs/init/racnode1.pid" 2011-04-06 15:53:38.965: [ COMMCRS][1191860544]clsc_connect: (0x2aaab400b0b0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_CRSD)) [ clsdmc][1160390976]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_CRSD)) with status 9 2011-04-06 15:53:38.966: [ora.crsd][1160390976] [check] Error = error 9 encountered when connecting to CRSD 2011-04-06 15:53:39.023: [ora.crsd][1160390976] [check] Calling PID check for daemon 2011-04-06 15:53:39.023: [ora.crsd][1160390976] [check] Trying to check PID = 1535 2011-04-06 15:53:39.203: [ora.crsd][1160390976] [check] PID check returned ONLINE CLSDM returned OFFLINE 2011-04-06 15:53:39.203: [ora.crsd][1160390976] [check] DaemonAgent::check returned 5 2011-04-06 15:53:39.203: [ AGFW][1160390976] check for resource: ora.crsd 1 1 completed with status: FAILED 2011-04-06 15:53:39.203: [ AGFW][1170880832] ora.crsd 1 1 state changed from: UNKNOWN to: FAILED .. 2011-04-06 15:54:10.511: [ AGFW][1167522112] ora.crsd 1 1 state changed from: UNKNOWN to: CLEANING .. 2011-04-06 15:54:10.513: [ora.crsd][1146542400] [clean] Trying to stop PID = 1535 .. 2011-04-06 15:54:11.514: [ora.crsd][1146542400] [clean] Trying to check PID = 1535
ls -l /ocw/grid/crs/init/*pid
-rwxr-xr-x 1 ogrid oinstall 5 Feb 17 11:00 /ocw/grid/crs/init/racnode1.pid cat /ocw/grid/crs/init/*pid 1535 ps -ef| grep 1535 root 1535 1 0 Mar30 ? 00:00:00 iscsid >> 注意:进程 1535 不是 crsd.bin
# > $GRID_HOME/crs/init/ # $GRID_HOME/bin/crsctl stop res ora.crsd -init # $GRID_HOME/bin/crsctl start res ora.crsd -init
2010-02-03 23:34:28.412: [ GPnP][2235814832]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=867, tl=3, f=0
2010-02-03 23:34:28.428: [ OCRAPI][2235814832]clsu_get_private_ip_addresses: no ip addresses found. .. 2010-02-03 23:34:28.434: [ OCRAPI][2235814832]a_init:13!: Clusterware init unsuccessful : [44] 2010-02-03 23:34:28.434: [ CRSOCR][2235814832] OCR context init failure. Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7] 2010-02-03 23:34:28.434: [ CRSD][2235814832][PANIC] CRSD exiting: Could not init OCR, code: 44
2009-12-10 06:28:31.974: [ OCRMAS][20]proath_connect_master:1: could not connect to master clsc_ret1 = 9, clsc_ret2 = 9
2009-12-10 06:28:31.974: [ OCRMAS][20]th_master:11: Could not connect to the new master 2009-12-10 06:29:01.450: [ CRSMAIN][2] Policy Engine is not initialized yet! 2009-12-10 06:29:31.489: [ CRSMAIN][2] Policy Engine is not initialized yet!
2009-12-31 00:42:08.110: [ COMMCRS][10]clsc_receive: (102b03250) Error receiving, ns (12535, 12560), transport (505, 145, 0)
问题 5: GPNPD.BIN 无法启动1. 网络的域名解析不正常
2010-05-13 12:48:11.540: [ GPnP][1171126592]clsgpnpm_exchange: [at clsgpnpm.c:1175] Calling "tcp://node2:9393", try 1 of 3...
2010-05-13 12:48:11.540: [ GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1015] ENTRY 2010-05-13 12:48:11.541: [ GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1066] GIPC gipcretFail (1) gipcConnect(tcp-tcp://node2:9393) 2010-05-13 12:48:11.541: [ GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1067] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url "tcp://node2:9393"
问题 6: 其它的一些守护进程无法启动常见原因:
2010-02-02 12:55:20.485: [ COMMCRS][1121433920]
clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_GIPCD))
2010-02-02 12:55:20.485: [ clsdmt][1110944064]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_GIPCD))
2012-07-22 00:15:16.565: [ default][1]clsvactversion:4: Retrieving Active Version from local storage.
2012-07-22 00:15:16.575: [ CTSS][1]clsctss_r_av3: Invalid active version [] retrieved from OLR. Returns [19]. 2012-07-22 00:15:16.585: [ CTSS][1](:ctss_init16:): Error [19] retrieving active version. Returns [19]. 2012-07-22 00:15:16.585: [ CTSS][1]ctss_main: CTSS init failed [19] 2012-07-22 00:15:16.585: [ CTSS][1]ctss_main: CTSS daemon aborting [19]. 2012-07-22 00:15:16.585: [ CTSS][1]CTSS daemon aborting 问题 7: CRSD Agents 无法启动
$GRID_HOME/crsctl stat res -t
问题 8: HAIP 无法启动HAIP 无法启动的原因有很多,例如:
[ohasd(891)]CRS-2807:Resource 'ora.cluster_interconnect.haip' failed to start automatically.
请参见 note 1210883.1 获取更多关于 HAIP 的信息。 网络和域名解析的验证
日志文件位置, 属主和权限
在 Grid Infrastructure 的环境中:我们假设一个 Grid Infrastructure 环境,节点名字为 rac1, CRS 的属主是 grid, 并且有两个单独的 RDBMS 属主分别为: rdbmsap 和 rdbmsar,以下是 $GRID_HOME/log 中正常的设置情况:
请注意,绝大部分的子目录都继承了父目录的属主和权限,以上仅作为一个参考,来判断 CRS HOME 中是否有一些递归的权限和属主改变,如果您已经有一个相同版本的正在运行的工作节点,您可以把该运行的节点作为参考。 在 Oracle Restart 的环境中:这里显示了在 Oracle Restart 环境中 $GRID_HOME/log 目录下的权限和属主设置:
网络socket文件的位置,属主和权限
2011-06-18 14:07:28.545: [ COMMCRS][772]clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_EVMD))
2011-06-18 14:07:28.545: [ clsdmt][515]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=lena042DBG_EVMD)) 2011-06-18 14:07:28.545: [ clsdmt][515]Terminating process 2011-06-18 14:07:28.559: [ default][515] EVMD exiting on stop request from clsdms_thdmai
CRS-5017: The resource action "ora.evmd start" encountered the following error:
CRS-2674: Start of 'ora.evmd' on 'racnode1' failed ..
在 Grid Infrastructure cluster 环境中:以下例子是集群环境中的例子:
drwxrwxrwt 2 root oinstall 4096 Feb 2 21:25 .oracle
./.oracle: drwxrwxrwt 2 root oinstall 4096 Feb 2 21:25 . srwxrwx--- 1 grid oinstall 0 Feb 2 18:00 master_diskmon srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 mdnsd -rw-r--r-- 1 grid oinstall 5 Feb 2 18:00 mdnsd.pid prw-r--r-- 1 root root 0 Feb 2 13:33 npohasd srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 ora_gipc_GPNPD_rac1 -rw-r--r-- 1 grid oinstall 0 Feb 2 13:34 ora_gipc_GPNPD_rac1_lock srwxrwxrwx 1 grid oinstall 0 Feb 2 13:39 s#11724.1 srwxrwxrwx 1 grid oinstall 0 Feb 2 13:39 s#11724.2 srwxrwxrwx 1 grid oinstall 0 Feb 2 13:39 s#11735.1 srwxrwxrwx 1 grid oinstall 0 Feb 2 13:39 s#11735.2 srwxrwxrwx 1 grid oinstall 0 Feb 2 13:45 s#12339.1 srwxrwxrwx 1 grid oinstall 0 Feb 2 13:45 s#12339.2 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 s#6275.1 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 s#6275.2 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 s#6276.1 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 s#6276.2 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 s#6278.1 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 s#6278.2 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 sAevm srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 sCevm srwxrwxrwx 1 root root 0 Feb 2 18:01 sCRSD_IPC_SOCKET_11 srwxrwxrwx 1 root root 0 Feb 2 18:01 sCRSD_UI_SOCKET srwxrwxrwx 1 root root 0 Feb 2 21:25 srac1DBG_CRSD srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 srac1DBG_CSSD srwxrwxrwx 1 root root 0 Feb 2 18:00 srac1DBG_CTSSD srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 srac1DBG_EVMD srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 srac1DBG_GIPCD srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 srac1DBG_GPNPD srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 srac1DBG_MDNSD srwxrwxrwx 1 root root 0 Feb 2 18:00 srac1DBG_OHASD srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 sLISTENER srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 sLISTENER_SCAN2 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:01 sLISTENER_SCAN3 srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 sOCSSD_LL_rac1_ srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 sOCSSD_LL_rac1_eotcs -rw-r--r-- 1 grid oinstall 0 Feb 2 18:00 sOCSSD_LL_rac1_eotcs_lock -rw-r--r-- 1 grid oinstall 0 Feb 2 18:00 sOCSSD_LL_rac1__lock srwxrwxrwx 1 root root 0 Feb 2 18:00 sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 root root 0 Feb 2 18:00 sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 sOracle_CSS_LclLstnr_eotcs_1 -rw-r--r-- 1 grid oinstall 0 Feb 2 18:00 sOracle_CSS_LclLstnr_eotcs_1_lock srwxrwxrwx 1 root root 0 Feb 2 18:01 sora_crsqs srwxrwxrwx 1 root root 0 Feb 2 18:00 sprocr_local_conn_0_PROC srwxrwxrwx 1 root root 0 Feb 2 18:00 sprocr_local_conn_0_PROL srwxrwxrwx 1 grid oinstall 0 Feb 2 18:00 sSYSTEM.evm.acceptor.auth
在 Oracle Restart 环境中: 以下是 Oracle Restart 环境中的输出例子:
drwxrwxrwt 2 root oinstall 4096 Feb 2 21:25 .oracle
./.oracle: srwxrwx--- 1 grid oinstall 0 Aug 1 17:23 master_diskmon prw-r--r-- 1 grid oinstall 0 Oct 31 2009 npohasd srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 s#14478.1 srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 s#14478.2 srwxrwxrwx 1 grid oinstall 0 Jul 14 08:02 s#2266.1 srwxrwxrwx 1 grid oinstall 0 Jul 14 08:02 s#2266.2 srwxrwxrwx 1 grid oinstall 0 Jul 7 10:59 s#2269.1 srwxrwxrwx 1 grid oinstall 0 Jul 7 10:59 s#2269.2 srwxrwxrwx 1 grid oinstall 0 Jul 31 22:10 s#2313.1 srwxrwxrwx 1 grid oinstall 0 Jul 31 22:10 s#2313.2 srwxrwxrwx 1 grid oinstall 0 Jun 29 21:58 s#2851.1 srwxrwxrwx 1 grid oinstall 0 Jun 29 21:58 s#2851.2 srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 srac1DBG_CSSD srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 srac1DBG_OHASD srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sEXTPROC1521 srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sOCSSD_LL_rac1_ srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sOCSSD_LL_rac1_localhost -rw-r--r-- 1 grid oinstall 0 Aug 1 17:23 sOCSSD_LL_rac1_localhost_lock -rw-r--r-- 1 grid oinstall 0 Aug 1 17:23 sOCSSD_LL_rac1__lock srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sgrid_CSS_LclLstnr_localhost_1 -rw-r--r-- 1 grid oinstall 0 Aug 1 17:23 sgrid_CSS_LclLstnr_localhost_1_lock srwxrwxrwx 1 grid oinstall 0 Aug 1 17:23 sprocr_local_conn_0_PROL 诊断文件收集
参考BUG:10105195 - PROC-32 ACCESSING OCR; CRS DOES NOT COME UP ON NODENOTE:1323698.1 - Troubleshooting CRSD Start up Issue NOTE:1325718.1 - OHASD not Starting After Reboot on SLES NOTE:1077094.1 - How to fix the "DiscoveryString in profile.xml" or "asm_diskstring in ASM" if set wrongly NOTE:1068835.1 - What to Do if 11gR2 Grid Infrastructure is Unhealthy NOTE:942166.1 - How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation NOTE:969254.1 - How to Proceed from Failed Upgrade to 11gR2 Grid Infrastructure on Linux/Unix NOTE:10105195.8 - Bug 10105195 - Clusterware fails to start after reboot due to gpnpd fails to start NOTE:1053147.1 - 11gR2 Clusterware and Grid Home - What You Need to Know NOTE:1053970.1 - Troubleshooting 11.2 Grid Infrastructure root.sh Issues NOTE:1069182.1 - OHASD Failed to Start: Inappropriate ioctl for device NOTE:1054902.1 - How to Validate Network and Name Resolution Setup for the Clusterware and RAC BUG:11834289 - OHASD FAILED TO START TIMELY NOTE:1564555.1 - 11.2.0.3 PSU5/PSU6/PSU7 or 12.1.0.1 CSSD Fails to Start if Multicast Fails on Private Network NOTE:1427234.1 - autorun file for ohasd is missing |