RAC 11R2 单表决盘触发的bug(ID 1466639.1)

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]
Information in this document applies to any platform.

DESCRIPTION

On 11.2.0.3 (prior to 11.2.0.3.4 PSU), one of the cluster nodes may experience CRS restart intermittently (no node reboot) with ocssd message point to "clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1". As the result, ASM and database instance on the affected node also get restarted. It is caused by a racing condition when checking voting disk availability from different thread. It is reported and fixed in an unpublished bug 13869978.

OCCURRENCE

It only affects cluster with 1 voting disk/file for Grid Infrastructure 11.2.0.3 prior to applying 11.2.0.3.4 PSU.

SYMPTOMS

<grid-home>/log/<node>/cssd/ocssd.log shows the following:

2012-05-28 07:45:32.823: [    CSSD][1075423552](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2012-05-28 07:45:32.835: [    CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [    CSSD][1075423552]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-05-28 07:45:32.835: [    CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [    CSSD][1075423552](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-05-28 07:45:32.849: [    CSSD][1075423552]


----- Call Stack Trace -----
2012-05-28 07:45:32.857: [    CSSD][1075423552]calling              call     entry                argument values in hex
2012-05-28 07:45:32.858: [    CSSD][1075423552]location             type     point                (? means dubious value)
2012-05-28 07:45:32.859: [    CSSD][1075423552]-------------------- -------- -------------------- ----------------------------
2012-05-28 07:45:32.881: [    CSSD][1075423552]clssscExit()+740     call     kgdsdst()            000000000 ? 000000000 ?
2012-05-28 07:45:32.884: [    CSSD][1075423552]clssnmvDiskCheck()+  call     clssscExit()         2AAAAC477780 ? 000000002 ?
2012-05-28 07:45:32.887: [    CSSD][1075423552]clssnmvDiskPingMoni  call     clssnmvDiskCheck()   2AAAAC477780 ? 2AAAAC0A3C40 ?
2012-05-28 07:45:32.888: [    CSSD][1075423552]torThread()+423                                    04019A0B8 ? 000000000 ?
2012-05-28 07:45:32.890: [    CSSD][1075423552]clssscthrdmain()+25  call     clssnmvDiskPingMoni  2AAAAC477780 ? 2AAAAC0A3C40 ?

For some cases, the following may show up in ocssd.log:

2012-03-20 23:11:19.337: [    CSSD][3956]clssnmFindVFByVDIN: Requested guid 0b11163b-77614f16-bf6dea8e-e0b9a98b, vdisk guid 0b11163b-77614f16-bf6dea8e-e0b9a98b (0000000007D8E248) - len 16, vfile (0000000007D8B980), link (0000000007D8B980)
2012-03-20 23:11:19.337: [    CSSD][3956]clssnmFindVFByVDIN: Voting file not found - queue(0000000007CF8AC0), prev (0000000007D8B980), next (0000000007D8B980)
2012-03-20 23:11:19.337: [    CSSD][3956]clssnmvDiskCheck: No voting file found for guid 0b11163b-77614f16-bf6dea8e-e0b9a98b

Usually, if there is a voting disk IO issue, the following will be seen in ocssd.log before cssd aborts the node:

2012-05-22 14:13:21.939: [    CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 75% maximum time, 27000 ms, will be considered unusable in 6640 ms
..
2012-05-22 14:13:26.408: [    CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 90% maximum time, 27000 ms, will be considered unusable in 2170 ms

OR
If access to voting disk is down instead of slow, an OS error will be printed.

WORKAROUND

Use 3 or more voting disks/files instead of 1.
If the voting disk is on ASM, move the voting disk to a normal or high redundancy diskgroup. Please refer to note 428681.1 OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) for instructions to move voting disks.

As a best practice, It is recommended to config multiple voting disks.

PATCHES

It's recommended to apply latest PSU/bundle patch as the fix has been included in 11.2.0.3 GI PSU 4 and above, 11.2.0.3 Windows Patch Bundle 11  and above


解决方法:

打上11.2.0.3 GI PSU 4以上补丁



你可能感兴趣的:(oracle,bug,1466639.1)