Oracle 学习之--ASM DISK Header的备份和恢复(2)
三. 使用KFED 进行备份恢复
这种方式和dd 一样,先把asm disk header 导出,然后导入就可以了。不过这里要注意的几点,就是当我们导出以后,在导入。 在这段时间内disk header的信息可能会发生变化。 所以在导入之前需要关注一下这些信息。
如:
kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: DATA_0000 ; 0x028: length=9 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATA_0000 ; 0x068: length=9 kfdhdb.crestmp.hi: 32937833 ; 0x0a8: HOUR=0x9 DAYS=0x1b MNTH=0x5 YEAR=0x7da kfdhdb.mntstmp.hi: 32937834 ; 0x0b0: HOUR=0xa DAYS=0x1b MNTH=0x5 YEAR=0x7da kfdhdb.secsize: 512 ; 0x0b8: 0x0200 kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 kfdhdb.dsksize: 51200 ; 0x0c4: 0x0000c800 kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002 kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000 kfdhdb.grpstmp.hi: 32937833 ; 0x0e4: HOUR=0x9 DAYS=0x1b MNTH=0x5 YEAR=0x7da kfdhdb.grpstmp.lo: 1704339456 ; 0x0e8: USEC=0x0 MSEC=0x18a SECS=0x19 MINS=0x19
以上信息的解释说明:
dsknum:磁盘号 grptyp:磁盘所属类型EXTERNALREDUNDANCY 磁盘所属类型主要有: NORMAL REDUNDANCY - Two-waymirroring, requiring two failure groups. HIGH REDUNDANCY - Three-waymirroring, requiring three failure groups. EXTERNAL REDUNDANCY - No mirroringfor disks that are already protected using hardware mirroring or RAID. ddrsts:磁盘头状态 dskname:在asm中磁盘名 grpname:磁盘组名 fgname:failure groupname crestmp.hi:asm磁盘组创建时间 mntstmp.hi:asm磁盘组mount时间 blksize:磁盘头块大小 4096 ausize:条带化大小 默认1M dsksize:磁盘大小 f1b1locn:FileDirectory blk 1 AU num
这里需要强调一点,如果一个disk group里有多个disk 的时候,并且他们都是同时添加到disk group里的,那么这种情况下,他们的disk header 是差不多的。 所以在同一个disk group里,当某个disk header 出现corrupt的时候,只需要将改组的其他disk header 导出,然后导入corrupt的就ok了。
3.1 KFED 备份asmdisk header
SYS@anqing2(rac2)> select path fromv$asm_disk;
PATH -------------------------------------------------------------------------------- /dev/mapper/datap1 /dev/mapper/frap1
[oracle@rac2 u01]$ kfed read/dev/mapper/datap1 text=/u01/datap1disker
[oracle@rac2 u01]$ ll datap1disker
-rw-r--r-- 1 oracle oinstall 6607 Sep 2 10:48 datap1disker
[oracle@rac2 u01]$ cat datap1disker
kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002:KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0 kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8NUMB=0x0 kfbh.check: 868534624 ; 0x00c:0x33c4c960 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISKDATA ; 0x000: length=12 kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144 kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 kfdhdb.compat: 168820736 ; 0x020: 0x0a100000 kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 1 ; 0x026:KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027:KFDHDR_MEMBER kfdhdb.dskname: DATA ; 0x028: length=4 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATA ; 0x068: length=4 kfdhdb.capname: ; 0x088: length=0 kfdhdb.crestmp.hi: 32952076 ; 0x0a8: HOUR=0xcDAYS=0x18 MNTH=0x3 YEAR=0x7db kfdhdb.crestmp.lo: 3374491648 ; 0x0ac: USEC=0x0MSEC=0xaa SECS=0x12 MINS=0x32 kfdhdb.mntstmp.hi: 32957488 ; 0x0b0: HOUR=0x10DAYS=0x1 MNTH=0x9 YEAR=0x7db kfdhdb.mntstmp.lo: 2804987904 ; 0x0b4: USEC=0x0MSEC=0x2e SECS=0x33 MINS=0x29 kfdhdb.secsize: 512 ; 0x0b8: 0x0200 kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80 kfdhdb.dsksize: 11993 ; 0x0c4: 0x00002ed9 kfdhdb.pmcnt: 2 ; 0x0c8: 0x00000002 kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001 kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002 kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002 kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000 kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000 kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000 kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000 kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000 kfdhdb.grpstmp.hi: 32952076 ; 0x0e4: HOUR=0xcDAYS=0x18 MNTH=0x3 YEAR=0x7db kfdhdb.grpstmp.lo: 3374396416 ; 0x0e8: USEC=0x0MSEC=0x4d SECS=0x12 MINS=0x32 .....
3.2 清空asmdisk header
要清空头4k disk header的原因,是由于一些垃圾位信息的存在,导致check校验值计算有误,清空完头后再merge的话,校验计算就正确了。如果不清空,那么前4k不仅仅只包含merge的header信息,还有其他被corrupt的信息,所以用merge进去会导致校验值错误,就算修改check的16进制代码,还是不能加载diskgroup,v$asm_disk显示header_status为provision(错误的check值会显示imcompatible),需要清空前4k再merge这样check才会正确。
[oracle@rac2 u01]$ dd if=/dev/zero of=/dev/mapper/datap1 bs=4096 count=1
1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.000133662seconds, 30.6 MB/s
注意:
我的这步dd 操作实在DB open 状态下进行的,我们看一下此时的状态。
SYS@anqing2(rac2)> select name,state,offline_disks from v$asm_diskgroup; NAME STATE OFFLINE_DISKS ------------------------------ ----------- ------------- DATA CONNECTED 0 FRA CONNECTED 0
SYS@anqing2(rac2)> select mount_status,header_status,state,path from v$asm_disk; MOUNT_S HEADER_STATU STATE PATH ------- ------------ ---------------------------------------------------------- OPENED UNKNOWN NORMAL /dev/mapper/datap1 OPENED UNKNOWN NORMAL /dev/mapper/frap1
进行一下事务操作:
SYS@anqing2(rac2)> create table d1 asselect * from all_objects; Table created. SYS@anqing2(rac2)> select count(*) fromd1; COUNT(*) ---------- 49868
事务操作也正常。
现在我们重启一下ASM实例。
SYS@+ASM2(rac2)> shutdown immediate ASM diskgroups dismounted ASM instance shutdown SYS@+ASM2(rac2)> startup ASM instance started Total System Global Area 92274688 bytes Fixed Size 1265960 bytes Variable Size 65842904 bytes ASM Cache 25165824 bytes ORA-15032: not all alterations performed ORA-15063: ASM discovered an insufficientnumber of disks for diskgroup "DATA"
重启之后,之前的dd 破坏就有影响了。
3.3 使用KFEDMerge 恢复
在前面讲过,使用Merge 恢复,要检查下之前导出来的内容。 因为可能有变跟。
我这里直接使用KFEDmerge 回去。 在mout Data disk group.
[oracle@rac2 u01]$ kfed merge /dev/mapper/datap1 text=/u01/datap1disker
SYS@+ASM2(rac2)> select name,state fromv$asm_diskgroup; NAME STATE ------------------------------ ----------- DATA DISMOUNTED FRA MOUNTED
SYS@+ASM2(rac2)> alter diskgroup DATAmount; Diskgroup altered. SYS@+ASM2(rac2)> select name,state fromv$asm_diskgroup; NAME STATE ------------------------------ ----------- DATA MOUNTED FRA MOUNTED 成功Mount。
四. 重建ASM Disk Header
Oracle 官方文档:
Creatinga New ASM Disk Header After Existing One Is Corrupted
http://blog.csdn.net/tianlesoftware/article/details/6740716
Oracle的asm这块很脆弱,如果我们没有对disk header进行,或者使用kfed merge 也失败,那么就只有最好一招:重建disk header。 这里要注意,不是所有情况下都可以重建成功的。如果重建失败,那么就只有最后一个解决方法,重建diskgroup,然后通过备份进行全库恢复。
在Oracle 11g里引入了AMDU工具,不过该工具在10g里也可以使用。具体参考MOS 文档:[ID 553639.1]
AMDU isa tool introduced in 11g where it is posible to extract all the availablemetadata from one or more ASM disks, generate formatted block printouts fromthe dump output, extract one or more files from a diskgroup (mounted/unmounted)and write them to the OS file system. This tool is very important whendealing with internal errors related to the ASM metadata. Although this tool was releasedwith 11g, it can be used with ASM 10g.
而且在11gR2里,asmcmd 的md_backup 和 md_restore命令也可以进行备份。 关于这个命令的使用,参考
Eygle 的blog:
http://www.eygle.com/archives/2011/03/asm_md_backup_md_restore.html
我们通过x$kfdat 字典查看时,会显示每个file# 对应的AU 数。如下:
SYS@+ASM2(rac2)> select group_kfdatgroup#,FNUM_KFDAT file#, sum(1) AU_used from x$kfdat where v_kfdat='V' group bygroup_kfdat,FNUM_KFDAT,v_kfdat;
GROUP# FILE# AU_USED ---------- ---------- ---------- 1 0 2 1 1 2 1 2 1 1 3 85 1 4 2 1 5 1 1 6 1 其中我们在disk header 重建时需要关注的的几个信息:file direcroy 和 disk directory。 (1). File#0, AU=0: disk header(disk name, etc), Allocation Table (AT)and Free Space Table (FST) (2). File#0, AU=1: PartnerStatus Table (PST) (3). File#1: File Directory(files and their extent pointers) (4). File#2: Disk Directory
注意几点:
1. KFED 工具版本要10.2.0.2 以上的的。否则会有bug:5039964.
2. 重建disk header思路如下:
1).找到filedirectory ,再根据filedirectory 找到 diskdirectory;
2). 根据disk directory找出磁盘信息,手工编辑磁盘头文件,最后用kfed merge到对应磁盘中,生成disk header。
3).file directory一般在磁盘组某个磁盘au=2的位置上,如果对磁盘组做过删除盘和增加盘的操作,file directory不一定在au=2的位置上,需要手工去查找。
4.1 官网的示例
Forthis test we have 3 ASM disks in an external redundancy diskgroup. For the test we will wipe out the header for ASMdisk 3 (data03): /ocfs02/asm/data01 /ocfs02/asm/data02 /ocfs02/asm/data03 测试环境的diskgroup里有3个disk, 实验破坏data 03的diak header。
1. Make sure all ASMinstances are shut down. --关闭所有ASM 实例 2. Make a back up of thefirst 4k of the bad disk with dd: ddif=of= bs=4096 count=1 备份损坏的disk header 3. Check existing disksand see which one has “file 1 block 1″: To find the disk with f1b1 run: kfedread | grep f1b1 搜索含有file 1 block 1的字段。 Example: $ kfed read /ocfs02/asm/data01 | grep f1b1 kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002 $ kfed read /ocfs02/asm/data02 | grep f1b1 kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000 Sincedata01, has a non-zero value, data01 is the disk with “file 1 block 1″. --注意这里的值,如果非0,就是代表搜索到了file 1 block 1. Confirmthis by checking the following to see if you see “KFBTYP_LISTHEAD” in the 2ndallocation unit: 可以可以通过第二个AU 单元来验证。 kfed read aunum=2 |grep kfbh.type Also specify the ausize with AUSZ=# ifusing a non default allocation unit size. 如果使用非默认AUsize 的话,也可以指定ausize。 Example: $ kfed read /ocfs02/asm/data01 aunum=2 |grep kfbh.type kfbh.type: 5 ; 0x002: KFBTYP_LISTHEAD Ifthe lost disk is the “file 1 block 1″ disk then scan every AU of the bad disk till you find a headerwhich claims to be FILE_DIRECTORY (KFBTYP_FILEDIR). 如果通过grep没有找到f1b1,就需要查找所有的AU.直到找到file directory。 Onceyou find that you can set f1b1locn to that AU number and continue… If the file directory cannotbe found anywhere then we have no choice but to re-create the diskgroup andrestore from a backup. 如果找到了f1b1locn,就将其设置为正确的AU Number,如果说没有找到File directory。 那么就只有重建diskgroup,然后通过备份进行restore了。 4. Make a copy of a gooddisk header with kfed that IS NOT the disk that contains f1b1 and is in theSAME diskgroup as the bad disk. copy 一个disk header。这个disk header是非f1b1的。 在上面的测试,f1b1在data01上。 In our example this is data02: kfedread > fix.txt Example: $ kfed read /ocfs02/asm/data02 > fix.txt 5. Edit the fix.txt and change thefollowing fields to the proper values (use the ASM alert log for reference): kfdhdb.dsknum kfdhdb.dskname kfdhdb.fgname 修改相关的参数值 Example: Check the alert log for proper names: NOTE: cache opening disk 0 of grp 1:DATA_0000 path:/ocfs02/asm/data01 NOTE: cache opening disk 1 of grp 1:DATA_0001 path:/ocfs02/asm/data02 NOTE: cache opening disk 2 of grp 1:DATA_0002 path:/ocfs02/asm/data03 Old values from fix.txt: kfdhdb.dsknum:1 ; 0x024: 0x0001 kfdhdb.grptyp:1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts:3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname:DATA_0001 ; 0x028: length=9 kfdhdb.grpname:DATA ; 0x048: length=4 kfdhdb.fgname:DATA_0001 ; 0x068: length=9 New values from fix.txt: kfdhdb.dsknum:2 ; 0x024: 0x0002 kfdhdb.grptyp:1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts:3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname:DATA_0002 ; 0x028: length=9 kfdhdb.grpname:DATA ; 0x048: length=4 kfdhdb.fgname:DATA_0002 ; 0x068: length=9 6. Find the diskdirectory by dumping aunum=2 and blknum=2 for the disk with f1b1: 根据file directory查找disk directory,命令如下: kfed read aunum=2 blknum=2 | more Example: $ kfed read /ocfs02/asm/data01 aunum=2blknum=2 | more kfffde[0].xptr.au: 2 ; 0x4a0: 0x00000002 kfffde[0].xptr.disk: 2 ; 0x4a4: 0x0002 kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0D=0 S=0 kfffde[0].xptr.chk: 42 ; 0x4a7: 0x2a kfffde[1].xptr.au: 4294967295; 0x4a8:0xffffffff kfffde[1].xptr.disk: 65535 ; 0x4ac: 0xffff kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0D=0 S=0 kfffde[1].xptr.chk: 42 ; 0x4af: 0x2a Afterthe initial file directory header, you will see the extent map. If thediskgroup is external redundancy then each entry refers to an extent of thefile. For normal redundancy, every pair is a extent set, similarly for highredundancy [012] form the extent set. Here we see thedisk directory is at au = 2 in disk number = 2. In this example, it turned out to bein that location on the second AU, but it is not guaranteed that it will alwaysbe there. 7. Once the diskdirectory location is found, find the info for your disk number. 一旦确定了disk directory 的位置,就可以查看disk number 的信息。命令如下: kfedread aunum=2 blknum=0 | more Example: kfed read /ocfs02/asm/data02 aunum=2blknum=0 | more kfbh.type: 6 ; 0x002: KFBTYP_DISKDIR ... kfddde[0].entry.incarn: 1 ;0x024: A=1 NUMM=0x0 --为1 才是allocatedentries,为0表示该entry 已经被deleted。 ... kfddde[2].dsknum: 2 ; 0x3b4: 0x0002 kfddde[2].state: 2 ; 0x3b6: KFDSTA_NORMAL kfddde[2].ub1spare:0 ; 0x3b7: 0x00 kfddde[2].dskname: DATA_0002 ; 0x3b8:length=9 kfddde[2].fgname: DATA_0002 ; 0x3d8:length=9 kfddde[2].crestmp.hi: 32885842; 0x3f8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7 kfddde[2].crestmp.lo:3860343808 ; 0x3fc: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39 kfddde[2].failstmp.hi: 0 ; 0x400: HOUR=0x0DAYS=0x0 MNTH=0x0 YEAR=0x0 kfddde[2].failstmp.lo: 0 ; 0x404: USEC=0x0MSEC=0x0 SECS=0x0 MINS=0x0 Various kfddde refer to the disk directory entries.Only entries with entry.incarn numbers shouldA=1 are allocated entries. You might find entries with dskname populated, butif A=0 then it means that entry was deleted. 8. Now go back to fix.txt and adjust thecrestmp.hi and crestmp.lo to match what the disk directory shows. Ifit is already the same then leave it. 根据diskdirectory里的值修改crestmp.hi 和 crestmp.lo 参数 Example: Before: kfdhdb.crestmp.hi: 32879468 ; 0x0a8:HOUR=0xc DAYS=0x1b MNTH=0xc YEAR=0x7d6 kfdhdb.crestmp.lo: 296378368 ; 0x0ac: USEC=0x0 MSEC=0x298SECS=0x1a MINS=0x4 kfdhdb.mntstmp.hi: 32879468 ; 0x0b0:HOUR=0xc DAYS=0x1b MNTH=0xc YEAR=0x7d6 kfdhdb.mntstmp.lo: 309633024 ; 0x0b4:USEC=0x0 MSEC=0x128 SECS=0x27 MINS=0x4 After: kfdhdb.crestmp.hi:32885842 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7 kfdhdb.crestmp.lo:3860343808 ; 0x0ac: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39 kfdhdb.mntstmp.hi: 32885842 ; 0x0b0:HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7 kfdhdb.mntstmp.lo: 3870944256 ; 0x0b4:USEC=0x0 MSEC=0x27b SECS=0x2b MINS=0x39 9. Do a kfed merge to put the new headerinto the disk using fix.txt: 用kfed 命令将我们修改的新的disk header merge 到损坏的disk header上。 命令如下: kfed merge text=fix.txt Example: kfed merge /ocfs02/asm/data03 text=fix.txt Ifyou are using ASMLIB, at this point you will need to run the following to fixthe ASMLIB portion of the header: 如果使用ASMLIB,还需要修复对应的header,命令如下: /etc/init.d/oracleasmforce-renamedisk /dev/sdbg1 /etc/init.d/oracleasm scandisks /etc/init.d/oracleasm listdisks 10. Startup nomount the ASM instance: SQL> startup nomount; 启动ASM 实例 11. Check v$asm_disk.header_status toverify that the disk header is in a “MEMBER” state. 检查asmdisk header 的状态。 Example: SQL> select path, header_status fromv$asm_disk where path like '%data03%'; PATH -------------------------------------------------------------------------------- HEADER_STATU ------------ /ocfs02/asm/data03 MEMBER 12. Mount the diskgroup. mount diskgroup,命令如下: alterdiskgroup mount; Ifthe diskgroup fails to mount at this point, you may want to either considerre-creating the diskgroup and restoring or engaging BDE to assist. Youmay also want to try clearing the first 4k of the disk with dd then do a kfedmerge again in case there are any extra characters causing problems (MAKE SURE YOU HAVE A BACKUP OF THE FIRST 4K FIRST): 如果mount 失败,可以先考虑清空头4k的内容,然后在merge,如果还失败,就只能重建diskgroup,然后restore DB了。 Example: dd if= of= bs=4096 count=1 dd if=/dev/zero of= bs=4096 count=1 4.2 说明 我的测试环境的diskgroup 都只有一个disk,所以不能进行测试。只能通过备份进行恢复,而无法进行重建。 如果进行重建,那么分别从filedirectory 中获取如下参数: kfdhdb.dsknum: 0 ; 0x024: 0x0000 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027:KFDHDR_MEMBER kfdhdb.dskname: DATA ; 0x028: length=4 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATA ; 0x068: length=4 从diskdirectory 中获取如下参数: kfdhdb.crestmp.hi:32885842 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7 kfdhdb.crestmp.lo:3860343808 ; 0x0ac: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39
重新生成disk header 之后进行kfed merge恢复。 具体的操作步骤参考官网示例的步骤。总之备份终于一切。