clusterware和oracle10gr2软件升级到10.2.0.4时,重启系统后,节点一crs无法启动, crsctl start crs后系统立即重启。
以下是crs 和 css的日志记录。
- crsd.log:
- 2012-12-25 08:11:56.757: [ CSSCLNT][1226828528]clsssInitNative: connect failed, rc 9
- 2012-12-25 08:11:56.757: [ CRSRTI][1226828528]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
- 2012-12-25 08:11:58.252: [ COMMCRS][1099401536]clsc_connect: (0xe18010) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
- 2012-12-25 08:11:58.252: [ CSSCLNT][1226828528]clsssInitNative: connect failed, rc 9
- 2012-12-25 08:11:58.252: [ CRSRTI][1226828528]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
- 2012-12-25 08:11:59.789: [ COMMCRS][1099401536]clsc_connect: (0xe18010) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
- 2012-12-25 08:11:59.789: [ CSSCLNT][1226828528]clsssInitNative: connect failed, rc 9
- 2012-12-25 08:11:59.789: [ CRSRTI][1226828528]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
- 2012-12-25 08:12:01.586: [ COMMCRS][1099401536]clsc_connect: (0xe18010) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
- 2012-12-25 08:12:01.586: [ CSSCLNT][1226828528]clsssInitNative: connect failed, rc 9
- 2012-12-25 08:12:01.586: [ CRSRTI][1226828528]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
- 2012-12-25 08:12:04.174: [ COMMCRS][1099401536]clsc_connect: (0xe18010) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
- 2012-12-25 08:12:04.174: [ CSSCLNT][1226828528]clsssInitNative: connect failed, rc 9
- 2012-12-25 08:12:04.175: [ CRSRTI][1226828528]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
- ocssd.log:
- [ CSSD]2012-12-25 09:58:03.233 >USER: Copyright 2012, Oracle version 10.2.0.4.0
- [ CSSD]2012-12-25 09:58:03.233 >USER: CSS daemon log for node rac1, number 1, in cluster crs
- [ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CSSD))
- [ CSSD]2012-12-25 09:58:03.337 [547869936] >TRACE: clssscmain: local-only set to false
- [ CSSD]2012-12-25 09:58:03.351 [547869936] >TRACE: clssnmReadNodeInfo: added node 1 (rac1) to cluster
- [ CSSD]2012-12-25 09:58:03.386 [547869936] >TRACE: clssnmReadNodeInfo: added node 2 (rac2) to cluster
- [ CSSD]2012-12-25 09:58:04.159 [1138325824] >TRACE: clssnm_skgxninit: Compatible vendor clusterware not in use
- [ CSSD]2012-12-25 09:58:04.159 [1138325824] >TRACE: clssnm_skgxnmon: skgxn init failed
- [ CSSD]2012-12-25 09:58:04.341 [547869936] >TRACE: clssnmNMInitialize: misscount set to (300)
- [ CSSD]2012-12-25 09:58:04.342 [547869936] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 150000 ms, reconfig start (misscount) 300000 ms
- [ CSSD]2012-12-25 09:58:04.350 [547869936] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw4)
- [ CSSD]2012-12-25 09:58:04.350 [1138325824] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/raw/raw4)
- [ CSSD]2012-12-25 09:58:06.389 [1138325824] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw4)
- [ CSSD]2012-12-25 09:58:06.457 [547869936] >TRACE: clssnmFatalInit: fatal mode enabled
- [ CSSD]2012-12-25 09:58:06.522 [1148815680] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/raw/raw4) initial sleep interval (1000)ms
- [ CSSD]2012-12-25 09:58:06.531 [1169795392] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=rac1-priv)(PORT=49895))
- [ CSSD]2012-12-25 09:58:06.542 [1169795392] >TRACE: clssnmClusterListener: Probing node rac2 (2), probcon(0x1422bd90)
- [ CSSD]2012-12-25 09:58:06.582 [1169795392] >TRACE: clssnmConnComplete: MSGSRC 2, type 6, node 2, flags 0x0001, con 0x1422bd90, probe 0x1422bd90
- [ CSSD]2012-12-25 09:58:06.582 [1169795392] >TRACE: clssnmConnComplete: node 2, rac2, con(0x1422bd90), probcon(0x1422bd90), ninfcon((nil)), node unique 1356444601, prev unique 0, msg unique 1356444601 node state 0
- [ CSSD]2012-12-25 09:58:06.582 [1169795392] >TRACE: clssnmConnComplete: connected to node 2 (con 0x1422bd90), ninfcon (0x1422bd90), state (0), flag (1037)
- [ CSSD]2012-12-25 09:58:06.594 [1138325824] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(2797) LATS(207944) Disk lastSeqNo(2797)
- [ CSSD]2012-12-25 09:58:06.756 [1092946240] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
- [ CSSD]2012-12-25 09:58:06.756 [1092946240] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
- [ CSSD]2012-12-25 09:58:06.817 [1201264960] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=20)(HOST=10.0.0.154)(PORT=33670))
- [ CSSD]2012-12-25 09:58:08.725 [1169795392] >TRACE: clssnmHandleSync: diskTimeout set to (297000)ms
- [ CSSD]2012-12-25 09:58:08.725 [1169795392] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[rac2] seq[0] sync[2]
- [ CSSD]2012-12-25 09:58:08.725 [1232734528] >TRACE: clssnmRcfgMgrThread: initial lastleader(2) unique(1356444601)
各节点都能ping通,但根据日志总感觉是节点间通信问题,我将OCR恢复了一下,但问题依然。在这里记录一下整个处理过程。
1.停止两个节点crs
# crsctl stop crs
2.各节点运行crs/root脚本
--节点一
[root@rac1 ~]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured
Oracle CRS stack is already configured and will be running under init(1M)
以上问题需要删除两个节点 /etc/oracle/scls_scr/<node_name>/oracle/cssfatal ,然后重新运行crs/root.sh脚本。
- --节点一
- [root@rac1 oracle]# /u01/app/oracle/product/10.2.0/crs/root.sh
- WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
- WARNING: directory '/u01/app/oracle/product' is not owned by root
- WARNING: directory '/u01/app/oracle' is not owned by root
- WARNING: directory '/u01/app' is not owned by root
- WARNING: directory '/u01' is not owned by root
- Checking to see if Oracle CRS stack is already configured
- Setting the permissions on OCR backup directory
- Setting up NS directories
- Oracle Cluster Registry configuration upgraded successfully
- WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
- WARNING: directory '/u01/app/oracle/product' is not owned by root
- WARNING: directory '/u01/app/oracle' is not owned by root
- WARNING: directory '/u01/app' is not owned by root
- WARNING: directory '/u01' is not owned by root
- Successfully accumulated necessary OCR keys.
- Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
- node <nodenumber>: <nodename> <private interconnect name> <hostname>
- node 1: rac1 rac1-priv rac1
- node 2: rac2 rac2-priv rac2
- Creating OCR keys for user 'root', privgrp 'root'..
- Operation successful.
- Now formatting voting device: /dev/raw/raw4
- Format of 1 voting devices complete.
- Startup will be queued to init within 30 seconds.
- Adding daemons to inittab
- Expecting the CRS daemons to be up within 600 seconds.
- CSS is active on these nodes.
- rac1
- CSS is inactive on these nodes.
- rac2
- Local node checking complete.
- Run root.sh on remaining nodes to start CRS daemons.
- --节点二
- [root@rac2 ~]# /u01/app/oracle/product/10.2.0/crs/root.sh
- WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
- WARNING: directory '/u01/app/oracle/product' is not owned by root
- WARNING: directory '/u01/app/oracle' is not owned by root
- WARNING: directory '/u01/app' is not owned by root
- WARNING: directory '/u01' is not owned by root
- Checking to see if Oracle CRS stack is already configured
- Setting the permissions on OCR backup directory
- Setting up NS directories
- Oracle Cluster Registry configuration upgraded successfully
- WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
- WARNING: directory '/u01/app/oracle/product' is not owned by root
- WARNING: directory '/u01/app/oracle' is not owned by root
- WARNING: directory '/u01/app' is not owned by root
- WARNING: directory '/u01' is not owned by root
- clscfg: EXISTING configuration version 3 detected.
- clscfg: version 3 is 10G Release 2.
- Successfully accumulated necessary OCR keys.
- Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
- node <nodenumber>: <nodename> <private interconnect name> <hostname>
- node 1: rac1 rac1-priv rac1
- node 2: rac2 rac2-priv rac2
- clscfg: Arguments check out successfully.
- NO KEYS WERE WRITTEN. Supply -force parameter to override.
- -force is destructive and will destroy any previous cluster
- configuration.
- Oracle Cluster Registry for cluster has already been initialized
- Startup will be queued to init within 30 seconds.
- Adding daemons to inittab
- Expecting the CRS daemons to be up within 600 seconds.
- CSS is active on these nodes.
- rac1
- rac2
- CSS is active on all nodes.
- Waiting for the Oracle CRSD and EVMD to start
- Oracle CRS stack installed and running under init(1M)
- Running vipca(silent) for configuring nodeapps
- Creating VIP application resource on (2) nodes...
- Creating GSD application resource on (2) nodes...
- Creating ONS application resource on (2) nodes...
- Starting VIP application resource on (2) nodes...
- Starting GSD application resource on (2) nodes...
- Starting ONS application resource on (2) nodes...
- Done.
- [root@rac2 ~]# crsctl check crs
- CSS appears healthy
- CRS appears healthy
- EVM appears healthy
3.各节点运行cluster升级时的两个脚本
- --节点一
- [root@rac1 oracle]# /u01/app/oracle/product/10.2.0/crs/bin/crsctl stop crs
- Stopping resources. This could take several minutes.
- Successfully stopped CRS resources.
- Stopping CSSD.
- Shutting down CSS daemon.
- Shutdown request successfully issued.
- [root@rac1 oracle]# /u01/app/oracle/product/10.2.0/crs/install/root102.sh
- WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
- WARNING: directory '/u01/app/oracle/product' is not owned by root
- WARNING: directory '/u01/app/oracle' is not owned by root
- WARNING: directory '/u01/app' is not owned by root
- WARNING: directory '/u01' is not owned by root
- Preparing to recopy patched init and RC scripts.
- Recopying init and RC scripts.
- Startup will be queued to init within 30 seconds.
- Starting up the CRS daemons.
- Waiting for the patched CRS daemons to start.
- This may take a while on some systems.
- .
- 10204 patch successfully applied.
- clscfg: EXISTING configuration version 3 detected.
- clscfg: version 3 is 10G Release 2.
- Successfully accumulated necessary OCR keys.
- Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
- node <nodenumber>: <nodename> <private interconnect name> <hostname>
- node 1: rac1 rac1-priv rac1
- Creating OCR keys for user 'root', privgrp 'root'..
- Operation successful.
- clscfg -upgrade completed successfully
- [root@rac1 oracle]# /etc/init.d/init.crs enable
- Automatic startup enabled for system boot.
- --节点二
- [root@rac2 ~]# /u01/app/oracle/product/10.2.0/crs/bin/crsctl stop crs
- Stopping resources. This could take several minutes.
- Successfully stopped CRS resources.
- Stopping CSSD.
- Shutting down CSS daemon.
- Shutdown request successfully issued.
- You have new mail in /var/spool/mail/root
- [root@rac2 ~]# /u01/app/oracle/product/10.2.0/crs/install/root102.sh
- WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
- WARNING: directory '/u01/app/oracle/product' is not owned by root
- WARNING: directory '/u01/app/oracle' is not owned by root
- WARNING: directory '/u01/app' is not owned by root
- WARNING: directory '/u01' is not owned by root
- Preparing to recopy patched init and RC scripts.
- Recopying init and RC scripts.
- Startup will be queued to init within 30 seconds.
- Starting up the CRS daemons.
- Waiting for the patched CRS daemons to start.
- This may take a while on some systems.
- .
- 10204 patch successfully applied.
- clscfg: EXISTING configuration version 3 detected.
- clscfg: version 3 is 10G Release 2.
- Successfully accumulated necessary OCR keys.
- Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
- node <nodenumber>: <nodename> <private interconnect name> <hostname>
- node 2: rac2 rac2-priv rac2
- Creating OCR keys for user 'root', privgrp 'root'..
- Operation successful.
- clscfg -upgrade completed successfully
- [root@rac2 ~]# crs_stat -t
- Name Type Target State Host
- ------------------------------------------------------------
- ora.rac1.gsd application ONLINE ONLINE rac1
- ora.rac1.ons application ONLINE ONLINE rac1
- ora.rac1.vip application ONLINE ONLINE rac1
- ora.rac2.gsd application ONLINE ONLINE rac2
- ora.rac2.ons application ONLINE ONLINE rac2
- ora.rac2.vip application ONLINE ONLINE rac2
- [root@rac2 ~]# /etc/init.d/init.crs enable
- Automatic startup enabled for system boot.
4.添加asm
- [oracle@rac1 db_1]$ srvctl add asm -n rac1 -i ASM1 -o /u01/app/oracle/product/10.2.0/db_1
- [oracle@rac1 db_1]$ srvctl add asm -n rac2 -i ASM2 -o /u01/app/oracle/product/10.2.0/db_1
5.然后建库。