Dear all,
I have a rac system which includes 2 nodes of DB (db1 and db2) and both installed on vmware 6.5. They are installed on 2 ESXi servers ESX1 and ESX2 respectively. For this system, I have 2 storage systems to provide storage resource (storage1 and storage2). This picture below can describe more clearly about DB system.
I have some diskgroups which has 2 asm disks and they are all normal redundancy. For storage of OCR and voting disks, I have create 2 diskgroup OCR and VOTE which has 3 disks, 2 of them are in storage1 and storage2, the other one is stored in NFS server which is 1 of my vm and I use dd command to create a virtual disk and share it between db1 and db2. This NFS VM is configured to use resource of localdisk.
DiskgroupASM Disk 1 resourceASM Disk 2 resourceASM Disk 3 resource
OCRstorage 1storage 2NFS disk
VOTEstorage 1storage 1NFS disk
ARCHIVEstorage 1storage 1
DATAstorage 1storage 1
In my test case, I poweroff node db1 and broke the connection between storage1 and physical machine ESX2 (which node2 is running on). As I understand, in this case, OCR and VOTE still be online because just 1 of 3 disks is corrupted and DB still be online. But no.
After the connection is blocked, CRS service in db2 is down:
[[email protected] ~]$ crsctl check cluster -all
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
And these logs are created in crs and asm alert log:
CRS alert log:
2017-12-26 21:32:34.467 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 72230 msecs.
2017-12-26 21:32:42.468 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 80020 msecs.
2017-12-26 21:32:42.468 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 80230 msecs.
2017-12-26 21:32:50.469 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 88020 msecs.
2017-12-26 21:32:50.469 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 88230 msecs.
2017-12-26 21:32:58.470 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 96020 msecs.
2017-12-26 21:32:58.470 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvWorkerThread_0 not scheduled for 96230 msecs.
2017-12-26 21:33:01.711 [OCSSD(15404)]CRS-1615: No I/O has completed after 50% of the maximum interval. Voting file /dev/oracleasm/disks/OCR1_2 will be considered not functional in 99740 milliseconds
2017-12-26 21:34:31.477 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 8020 msecs.
2017-12-26 21:34:39.477 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 16020 msecs.
2017-12-26 21:34:41.477 [OCSSD(15404)]CRS-1604: CSSD voting file is offline: /dev/oracleasm/disks/OCR1_2; details at (:CSSNM00058:) in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/ocssd.trc.
2017-12-26 21:34:41.477 [OCSSD(15404)]CRS-1672: The number of voting files currently available 2 has fallen to the minimum number of voting files required 2.
2017-12-26 21:39:28.099 [ORAAGENT(5555)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/ohasd_oraagent_oracle.trc"
2017-12-26 21:39:28.160 [ORAAGENT(17427)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"
2017-12-26 21:39:28.162 [ORAAGENT(5555)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/ohasd_oraagent_oracle.trc"
2017-12-26 21:39:28.190 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"
2017-12-26 21:39:28.197 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"
2017-12-26 21:39:28.213 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"
2017-12-26 21:39:28.231 [ORAAGENT(17427)]CRS-5011: Check of resource "_mgmtdb" failed: details at "(:CLSN00007:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc"
2017-12-26 21:39:28.416 [ORAAGENT(17427)]CRS-5017: The resource action "ora.cdb.db start" encountered the following error:
2017-12-26 21:39:28.416+ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+NPCDBDG/CDB/spfileCDB.ora'
ORA-17503: ksfdopn:2 Failed to open file +NPCDBDG/CDB/spfileCDB.ora
ORA-01092: ORACLE instance terminated. Disconnection forced
. For details refer to "(:CLSN00107:)" in "/vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc".
2017-12-26 21:39:28.494 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 120040 msecs.
2017-12-26 21:39:29.449 [CRSD(17321)]CRS-2878: Failed to restart resource 'ora.cdb.db'
2017-12-26 21:39:29.449 [CRSD(17321)]CRS-2769: Unable to failover resource 'ora.cdb.db'.
2017-12-26 21:39:36.495 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 128040 msecs.
2017-12-26 21:39:44.495 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 136040 msecs.
2017-12-26 21:39:52.496 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 144040 msecs.
2017-12-26 21:42:26.402 [CRSD(17321)]CRS-1024: The Cluster Ready Service on this node terminated because the ASM instance was not active on this node. Details at (:PROCR00009:) in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd.trc.
2017-12-26 21:42:26.410 [ORAROOTAGENT(17431)]CRS-5822: Agent '/vendor/app/' disconnected from server. Details at (:CRSAGF00117:) {0:3:7} in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_orarootagent_root.trc.
2017-12-26 21:42:26.411 [ORAAGENT(17427)]CRS-5822: Agent '/vendor/app/' disconnected from server. Details at (:CRSAGF00117:) {0:1:41} in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_oraagent_oracle.trc.
2017-12-26 21:42:26.412 [SCRIPTAGENT(3165)]CRS-5822: Agent '/vendor/app/' disconnected from server. Details at (:CRSAGF00117:) {0:5:18} in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd_scriptagent_oracle.trc.
2017-12-26 21:42:26.461 [CRSD(1773)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 1773
2017-12-26 21:42:33.505 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 120050 msecs.
2017-12-26 21:42:35.822 [CRSD(1773)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd.trc.
2017-12-26 21:42:41.506 [OCSSD(15404)]CRS-1719: Cluster Synchronization Service daemon (CSSD) clssnmvDiskPingThread_0 not scheduled for 128050 msecs.
2017-12-26 21:42:44.098 [CRSD(1773)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /vendor/app/oracle/diag/crs/p-db-02/crs/trace/crsd.trc.
2017-12-26 21:42:44.101 [CRSD(1773)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
ASM alert log:
attached file.
I don't understand why DB is still down even all diskgroups are configured in normal redundancy mode.
Thank you,