Oracle Database - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]
Information in this document applies to any platform.
On 11.2.0.3 (prior to 11.2.0.3.4 PSU), one of the cluster nodes may experience CRS restart intermittently (no node reboot) with ocssd message point to "clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1". As the result, ASM and database instance on the affected node also get restarted. It is caused by a racing condition when checking voting disk availability from different thread. It is reported and fixed in an unpublished bug 13869978.
It only affects cluster with 1 voting disk/file for Grid Infrastructure 11.2.0.3 prior to applying 11.2.0.3.4 PSU.
<grid-home>/log/<node>/cssd/ocssd.log shows the following:
2012-05-28 07:45:32.823: [ CSSD][1075423552](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2012-05-28 07:45:32.835: [ CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [ CSSD][1075423552]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-05-28 07:45:32.835: [ CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [ CSSD][1075423552](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-05-28 07:45:32.849: [ CSSD][1075423552]
----- Call Stack Trace -----
2012-05-28 07:45:32.857: [ CSSD][1075423552]calling call entry argument values in hex
2012-05-28 07:45:32.858: [ CSSD][1075423552]location type point (? means dubious value)
2012-05-28 07:45:32.859: [ CSSD][1075423552]-------------------- -------- -------------------- ----------------------------
2012-05-28 07:45:32.881: [ CSSD][1075423552]clssscExit()+740 call kgdsdst() 000000000 ? 000000000 ?
2012-05-28 07:45:32.884: [ CSSD][1075423552]clssnmvDiskCheck()+ call clssscExit() 2AAAAC477780 ? 000000002 ?
2012-05-28 07:45:32.887: [ CSSD][1075423552]clssnmvDiskPingMoni call clssnmvDiskCheck() 2AAAAC477780 ? 2AAAAC0A3C40 ?
2012-05-28 07:45:32.888: [ CSSD][1075423552]torThread()+423 04019A0B8 ? 000000000 ?
2012-05-28 07:45:32.890: [ CSSD][1075423552]clssscthrdmain()+25 call clssnmvDiskPingMoni 2AAAAC477780 ? 2AAAAC0A3C40 ?
For some cases, the following may show up in ocssd.log:
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmFindVFByVDIN: Requested guid 0b11163b-77614f16-bf6dea8e-e0b9a98b, vdisk guid 0b11163b-77614f16-bf6dea8e-e0b9a98b (0000000007D8E248) - len 16, vfile (0000000007D8B980), link (0000000007D8B980)
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmFindVFByVDIN: Voting file not found - queue(0000000007CF8AC0), prev (0000000007D8B980), next (0000000007D8B980)
2012-03-20 23:11:19.337: [ CSSD][3956]clssnmvDiskCheck: No voting file found for guid 0b11163b-77614f16-bf6dea8e-e0b9a98b
Usually, if there is a voting disk IO issue, the following will be seen in ocssd.log before cssd aborts the node:
2012-05-22 14:13:21.939: [ CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 75% maximum time, 27000 ms, will be considered unusable in 6640 ms
..
2012-05-22 14:13:26.408: [ CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 90% maximum time, 27000 ms, will be considered unusable in 2170 ms
OR
If access to voting disk is down instead of slow, an OS error will be printed.
Use 3 or more voting disks/files instead of 1.
If the voting disk is on ASM, move the voting disk to a normal or high redundancy diskgroup. Please refer to note 428681.1 OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) for instructions to move voting disks.
As a best practice, It is recommended to config multiple voting disks.
It's recommended to apply latest PSU/bundle patch as the fix has been included in 11.2.0.3 GI PSU 4 and above, 11.2.0.3 Windows Patch Bundle 11 and above
解决方法:
打上11.2.0.3 GI PSU 4以上补丁