ASM 11g New Features - How ASM Disk Resync Works. [ID 466326.1] | |||||
修改时间 05-MAR-2012 类型 BULLETIN 状态 PUBLISHED |
In this Document
Purpose
Scope and Application
ASM Fast Disk Resync Overview
ASM 11g New Features - How ASM Disk Resync Works.
Community Discussions
References
11g introduces new Scalability and Performance improvements for ASM, this is the case of ASM Fast Disk Resync Feature which quickly resynchronizes ASM disks within a disk group after transient disk path failures as long as the disk drive media is not corrupted. Any failures that render a failure group temporarily unavailable are considered transient failures. Disk path malfunctions, such as cable disconnections, host bus adapter or controller failures, or disk power supply interruptions, can cause transient failures. The duration of a fast mirror resync depends on the duration of the outage. The duration of a resynchronization is typically much shorter than the amount of time required to completely rebuild an entire ASM disk group.
1) When we take a disk offline in case the disk is corrupted or database is not able to read or write from the disk. In case of Oracle database 10g, oracle engine use to balance the other disks with the content of offline disk. This process was a relatively costly operation, and could take hours to complete, even if the disk failure was only a transient failure.
2) Oracle Database 11g introduces the ASM Fast Mirror Resync feature that significantly reduces the time required to resynchronize a transient failure of a disk. When a disk goes off line oracle engine doesnât balance other disk, instead ASM tracks the allocation units that are modified during the outage. The content present in the failed disk is tracked by other disks and any modification that is made to the content of failed disk is actually made in other available disks. Once we get the disk back and attach it, the data belonging to this disk and which got modified during that time will get resynchronized back again. This avoids the heavy re-balancing activity.
3) ASM fast disk resync significantly reduces the time required to resynchronize a transient failure of a disk. When a disk goes offline following a transient failure, ASM tracks the extents that are modified during the outage. When the transient failure is repaired, ASM can quickly resynchronize only the ASM disk extents that have been affected during the outage.
4) This feature assumes that the content of the affected ASM disks has not been damaged or modified.
5) When an ASM disk path fails, the ASM disk is taken offline but not dropped if you have set the DISK_REPAIR_TIME attribute for the corresponding disk group. The setting for this attribute determines the duration of a disk outage that ASM tolerates while still being able to resynchronize after you complete the repair.
Note: The tracking mechanism uses one bit for each modified allocation unit. This ensures that the tracking mechanism very efficient.
Requirements:
1) This feature requires that the redundancy level for the disk should be set to NORMAL or HIGH.
2) compatible.asm & compatible.rdbms = 11.1.0.0.0 or higher
3) You need to set DISK_REPAIR_TIME parameter, which gives the time it takes for the disk to get repaired. The default time for this is set to 3.6 hours.
Examples:
SQL> ALTER DISKGROUP dgroupA SET ATTRIBUTE 'DISK_REPAIR_TIME'='3H';
4) The disk has to be offline (automatically due to the hardware failure or manually for maintenance operations) and should not be dropped.
To take the disk offline use:
SQL> ALTER DISKGROUP ⦠OFFLINE DISKS command.
Example:
ALTER DISKGROUP dgroupA OFFLINE DISKS IN FAILGROUP controller2 DROP AFTER 5H;
Repair time for the disk is associated with diskgroup. You can override the repair time of diskgroup using following command:
SQL> ALTER DISKGROUP dgroupA SET ATTRIBUTE âDISK_REPAIR_TIMEâ='3Hâ;
Additional Manual Offline Disk Operations Examples:
SQL>ALTER DISKGROUP DG1 OFFLINE DISK DG1_0003 ;
SQL>ALTER DISKGROUP DG1 OFFLINE DISK DG1_0003 DROP AFTER 1H;
SQL>ALTER DISKGROUP DG1 OFFLINE DISKS IN FAILGROUP FG1;
SQL> ALTER DISKGROUP dgroupA OFFLINE DISKS IN FAILGROUP controller2 DROP AFTER 5H;
5) After the transient failure was corrected on the affected disks, you will need to explicitly online the disks.
Examples:
SQL>ALTER DISKGROUP DG1 ONLINE DISK DG1_0003;
SQL>ALTER DISKGROUP DG1 ONLINE DISKS IN FAILGROUP FG1 POWER 8 WAIT;
6) If you cannot repair a failure group that is in the offline state, you can use the ALTER DISKGROUP DROP DISKS IN FAILGROUP command with the FORCE option. This ensures that data originally stored on these disks is reconstructed from redundant copies of the data and stored on other disks in the same diskgroup.
Example:
SQL> ALTER DISKGROUP dgroupA DROP DISKS IN FAILGROUP controller2;
Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window is the live community not a screenshot)
Click here to open in main browser window