环境描述 :
应用使用sap ERP6.0
数据库使用11g RAC + ASM
巡检时SAP 里面执行事务码db02
查看clusterware状态,只剩下一个节点
bash-3.00$ ./crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ACFS.dg
ONLINE ONLINE r3prddb01
ora.ARCH.dg
ONLINE ONLINE r3prddb01
ora.DATA.dg
ONLINE ONLINE r3prddb01
ora.LISTENER.lsnr
ONLINE ONLINE r3prddb01
ora.MLOG.dg
ONLINE ONLINE r3prddb01
ora.OLOG.dg
ONLINE ONLINE r3prddb01
ora.RECO.dg
ONLINE ONLINE r3prddb01
ora.VCR.dg
ONLINE ONLINE r3prddb01
ora.acfs.acfs.acfs
ONLINE ONLINE r3prddb01
ora.asm
ONLINE ONLINE r3prddb01
ora.gsd
OFFLINE OFFLINE r3prddb01
ora.net1.network
ONLINE ONLINE r3prddb01
ora.ons
ONLINE ONLINE r3prddb01
ora.registry.acfs
ONLINE ONLINE r3prddb01
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE r3prddb01
ora.cvu
1 ONLINE ONLINE r3prddb01
ora.oc4j
1 ONLINE ONLINE r3prddb01
ora.p01.db
1 ONLINE ONLINE r3prddb01 Open
2 ONLINE OFFLINE
ora.r3prddb01.vip
1 ONLINE ONLINE r3prddb01
ora.r3prddb02.vip
1 ONLINE INTERMEDIATE r3prddb01 FAILED OVER
ora.scan1.vip
1 ONLINE ONLINE r3prddb01
bash-3.00$
db01 数据库实例alert日志,
Sat Aug 23 11:17:20 2014
Reconfiguration started (old inc 4, new inc 6)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Aug 23 11:17:20 2014
LMS 1: 6 GCS shadows cancelled, 0 closed, 0 Xw survived
Sat Aug 23 11:17:20 2014
LMS 0: 5 GCS shadows cancelled, 1 closed, 0 Xw survived
Sat Aug 23 11:17:20 2014
LMS 2: 5 GCS shadows cancelled, 2 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Aug 23 11:17:20 2014
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 31 processes
Started redo scan
Completed redo scan
read 58319 KB redo, 11434 data blocks need recovery
Started redo application at
Thread 2: logseq 49560, block 81359
Recovery of Online Redo Log: Thread 2 Group 44 Seq 49560 Reading mem 0
Mem# 0: +OLOG/p01/onlinelog/log_g44m1.dbf
Mem# 1: +MLOG/p01/onlinelog/log_g44m2.dbf
Sat Aug 23 11:17:25 2014
Setting Resource Manager plan SCHEDULER[0x447BF]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sat Aug 23 11:17:25 2014
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:6 new-inc#:6
Completed redo application of 48.31MB
Completed instance recovery at
Thread 2: logseq 49560, block 197998, scn 11518227963
10738 data blocks read, 11483 data blocks written, 58319 redo k-bytes read
Thread 2 advanced to log sequence 49561 (thread recovery)
Redo thread 2 internally disabled at seq 49561 (SMON)
Sat Aug 23 11:17:27 2014
Archived Log entry 91800 added for thread 2 sequence 49560 ID 0x592ddd4a dest 1:
Sat Aug 23 11:17:27 2014
ARC0: Archiving disabled thread 2 sequence 49561
Archived Log entry 91801 added for thread 2 sequence 49561 ID 0x592ddd4a dest 1:
minact-scn: master continuing after IR
minact-scn: Master considers inst:2 dead
Sat Aug 23 11:17:28 2014
Beginning log switch checkpoint up to RBA [0xa4e4.2.10], SCN: 11518240393
Thread 1 advanced to log sequence 42212 (LGWR switch)
Current log# 35 seq# 42212 mem# 0: +OLOG/p01/onlinelog/log_g35m1.dbf
Current log# 35 seq# 42212 mem# 1: +MLOG/p01/onlinelog/log_g35m2.dbf
Archived Log entry 91802 added for thread 1 sequence 42211 ID 0x592ddd4a dest 1:
DB01 ASM实例alert日志 ASM01
Sat Aug 23 11:17:20 2014
Reconfiguration started (old inc 4, new inc 6)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 1 invalid = TRUE
* dead instance detected - domain 2 invalid = TRUE
* dead instance detected - domain 3 invalid = TRUE
* dead instance detected - domain 4 invalid = TRUE
* dead instance detected - domain 5 invalid = TRUE
* dead instance detected - domain 6 invalid = TRUE
* dead instance detected - domain 7 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sat Aug 23 11:17:20 2014
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Sat Aug 23 11:17:20 2014
NOTE: SMON starting instance recovery for group ACFS domain 1 (mounted)
NOTE: F1X0 found on disk 1 au 49 fcn 0.12248
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
NOTE: starting recovery of thread=2 ckpt=19.43 group=1 (ACFS)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 1 (ACFS)
NOTE: SMON successfully validated lock domain 1
NOTE: advancing ckpt for thread=2 ckpt=19.43
NOTE: SMON did instance recovery for group ACFS domain 1
Sat Aug 23 11:17:20 2014
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group ARCH domain 2 (mounted)
NOTE: F1X0 found on disk 9 au 113 fcn 0.41343439
NOTE: starting recovery of thread=2 ckpt=77.3254 group=2 (ARCH)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 2 (ARCH)
NOTE: SMON successfully validated lock domain 2
NOTE: advancing ckpt for thread=2 ckpt=77.3254
NOTE: SMON did instance recovery for group ARCH domain 2
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group DATA domain 3 (mounted)
NOTE: F1X0 found on disk 15 au 60241 fcn 0.5143392
NOTE: starting recovery of thread=2 ckpt=22.3858 group=3 (DATA)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 3 (DATA)
NOTE: SMON successfully validated lock domain 3
NOTE: advancing ckpt for thread=2 ckpt=22.3858
NOTE: SMON did instance recovery for group DATA domain 3
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group MLOG domain 4 (mounted)
NOTE: F1X0 found on disk 3 au 639 fcn 0.120137
NOTE: starting recovery of thread=2 ckpt=23.2161 group=4 (MLOG)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 4 (MLOG)
NOTE: SMON successfully validated lock domain 4
NOTE: advancing ckpt for thread=2 ckpt=23.2161
NOTE: SMON did instance recovery for group MLOG domain 4
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group OLOG domain 5 (mounted)
NOTE: F1X0 found on disk 3 au 637 fcn 0.121291
NOTE: starting recovery of thread=2 ckpt=23.2261 group=5 (OLOG)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 5 (OLOG)
NOTE: SMON successfully validated lock domain 5
NOTE: advancing ckpt for thread=2 ckpt=23.2261
NOTE: SMON did instance recovery for group OLOG domain 5
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group RECO domain 6 (mounted)
NOTE: F1X0 found on disk 11 au 11 fcn 0.2264
NOTE: starting recovery of thread=2 ckpt=19.6 group=6 (RECO)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 6 (RECO)
NOTE: SMON successfully validated lock domain 6
NOTE: advancing ckpt for thread=2 ckpt=19.6
NOTE: SMON did instance recovery for group RECO domain 6
ASM Volume(VDBG) - Unable to send message 'disk status' to the volume driver.
NOTE: SMON starting instance recovery for group VCR domain 7 (mounted)
NOTE: F1X0 found on disk 2 au 177 fcn 0.1216
NOTE: starting recovery of thread=2 ckpt=16.13 group=7 (VCR)
NOTE: SMON waiting for thread 2 recovery enqueue
NOTE: SMON about to begin recovery lock claims for diskgroup 7 (VCR)
NOTE: SMON successfully validated lock domain 7
NOTE: advancing ckpt for thread=2 ckpt=16.13
NOTE: SMON did instance recovery for group VCR domain 7
通过以上日志判断,故障发生在 23日 11:17 ,联系机房同事,检查发现M9000上该机器告警,内存故障,已报修。
机器修理中。。。。
第二天02机器内存故障已修好,启动02操作系统。等待一会儿发现cluserware自动起来了,02数据库实例也恢复
该机器设置的cluster随着操作系统启动自动启动,默认安装好设置也是这样的。
如果设置不自动启动执行以下命令crsctl disable crs,使用crsctl enable crs改回开启自动启动
设置后,需要手工启动执行crsctl start crs命令
bash-3.00$ ./crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ACFS.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.ARCH.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.DATA.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.LISTENER.lsnr
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.MLOG.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.OLOG.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.RECO.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.VCR.dg
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.acfs.acfs.acfs
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.asm
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.gsd
OFFLINE OFFLINE r3prddb01
OFFLINE OFFLINE r3prddb02
ora.net1.network
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.ons
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
ora.registry.acfs
ONLINE ONLINE r3prddb01
ONLINE ONLINE r3prddb02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE r3prddb01
ora.cvu
1 ONLINE ONLINE r3prddb01
ora.oc4j
1 ONLINE ONLINE r3prddb01
ora.p01.db
1 ONLINE ONLINE r3prddb01 Open
2 ONLINE ONLINE r3prddb02 Open
ora.r3prddb01.vip
1 ONLINE ONLINE r3prddb01
ora.r3prddb02.vip
1 ONLINE ONLINE r3prddb02
ora.scan1.vip
1 ONLINE ONLINE r3prddb01
DB02查看已经两个实例
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/27771627/viewspace-1257887/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/27771627/viewspace-1257887/