生产环境存储划盘并做了多路径聚合,由于经验生疏忘记需要配置udev。
在添加磁盘到diskgroup时,数据库报错权限不足,简单以为只是对应的字符设备的权限是root:disk导致无法添加到diskgroup。
所以尝试使用chown命令在两个节点将该字符设备修改权限为grid:oinstall后,由于suse系统udev内部机制问题,在添加该字符设备到diskgroup过程中,字符设备的权限自动由grid:oinstall变为root:disk,导致添加磁盘失败。此时该磁盘盘头已写入部分信息,HEADER_STATUS为NUMBER,导致无法重新添加到diskgroup,也无法从磁盘组踢出,进退两难。
由于是生产环境,不敢随意操作,故在测试环境模拟问题并解决。
问题重现:
1 确认/dev/raw/raw6的HEADER_STATUS是CANDIDATE状态,说明可以添加到asm磁盘组。
SQL> select PATH,NAME,DISK_NUMBER,GROUP_NUMBER,STATE,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS from v$asm_disk;
PATH NAME DISK_NUMBER GROUP_NUMBER STATE MOUNT_STATUS HEADER_STATUS MODE_STATUS
------------------------------ -------------------- ----------- ------------ ---------------- -------------- ------------------------ --------------
/dev/raw/raw6 0 0 NORMAL CLOSED CANDIDATE ONLINE
/dev/raw/raw1 DATA_0000 0 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw3 DATA_0002 2 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw4 DATA1_0000 0 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw5 DATA1_0001 1 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw2 DATA_0001 1 2 NORMAL CACHED MEMBER ONLINE
2 查看磁盘组,模拟将/dev/raw/raw6加进磁盘组DATA。
SQL> select GROUP_NUMBER,NAME from v$asm_diskgroup;
GROUP_NUMBER NAME
------------ --------------------
1 DATA1
2 DATA
3 将/dev/raw/raw6加入磁盘组DATA的过程中(约等待3s),另开窗口,将/dev/raw/raw6的权限由grid:asmadmin改为root:disk,此时报错如下:
SQL> alter diskgroup DATA add disk '/dev/raw/raw6';
alter diskgroup DATA add disk '/dev/raw/raw6'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide
4 重启udev(即:将/dev/raw/raw6 权限重新改为grid:asmadmin) 并重新查询。
SQL> select PATH,NAME,DISK_NUMBER,GROUP_NUMBER,STATE,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS from v$asm_disk;
PATH NAME DISK_NUMBER GROUP_NUMBER STATE MOUNT_STATUS HEADER_STATUS MODE_STATUS
------------------------------ -------------------- ----------- ------------ ---------------- -------------- ------------------------ --------------
/dev/raw/raw6 0 0 NORMAL CLOSED
MEMBER ONLINE
/dev/raw/raw1 DATA_0000 0 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw3 DATA_0002 2 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw4 DATA1_0000 0 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw5 DATA1_0001 1 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw2 DATA_0001 1 2 NORMAL CACHED MEMBER ONLINE
5 此时尝试再次将/dev/raw/raw6加入DATA中,出现报错,模拟故障成功。
SQL> alter diskgroup DATA add disk '/dev/raw/raw6';
alter diskgroup DATA add disk '/dev/raw/raw6'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033:
disk '/dev/raw/raw6' belongs to diskgroup "DATA"
解决方法:
查了oracle官方文档,提供了如下三种解决方法(1453285.1):
方法一:
1 备份问题磁盘
dd if=/dev/oracleasm/disks/EMCORA43 of=/tmp/EMCORA43.txt bs=1M count=10
2 修改磁盘权限
chmod 000 /dev/oracleasm/disks/EMCORA47
3 重新挂载diskgroup,确认diskgroup可以正常挂载,则可以使用dd命令清除盘头信息,并重新添加该磁盘。
dd if=/dev/zero of=/dev/oracleasm/disks/EMCORA43 bs=1M count=1
方法二:
使用oracleasm删除磁盘
/etc/init.d/oracleasm deletedisk EMCORA43
方法三:
使用force选项强制重新挂载该磁盘。
alter diskgroup XXXX add disk '/dev/oracleasm/disks/EMCORA43' force;
本次实验采用第三种方法:
具体如下:
SQL> select NAME, TOTAL_MB, FREE_MB, (TOTAL_MB - FREE_MB) USED_MB, to_char((TOTAL_MB - FREE_MB) * 100 / TOTAL_MB,'99.99') || '%' USED_PRECENT from v$asm_diskgroup;
NAME TOTAL_MB FREE_MB USED_MB USED_PRECENT
-------------------- ---------- ---------- ---------- --------------
DATA1 1998 1901 97 4.85%
DATA 2997 630 2367 78.98%
SQL> select PATH,NAME,DISK_NUMBER,GROUP_NUMBER,STATE,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS from v$asm_disk;
PATH NAME DISK_NUMBER GROUP_NUMBER STATE MOUNT_STATUS HEADER_STATUS MODE_STATUS
------------------------------ -------------------- ----------- ------------ ---------------- -------------- ------------------------ --------------
/dev/raw/raw6 0 0 NORMAL CLOSED MEMBER ONLINE
/dev/raw/raw1 DATA_0000 0 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw3 DATA_0002 2 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw4 DATA1_0000 0 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw5 DATA1_0001 1 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw2 DATA_0001 1 2 NORMAL CACHED MEMBER ONLINE
6 rows selected.
SQL> alter diskgroup DATA add disk '/dev/raw/raw6';
alter diskgroup DATA add disk '/dev/raw/raw6'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/raw/raw6' belongs to diskgroup "DATA"
SQL> alter diskgroup DATA add disk '/dev/raw/raw6' force;
Diskgroup altered.
SQL> select PATH,NAME,DISK_NUMBER,GROUP_NUMBER,STATE,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS from v$asm_disk;
PATH NAME DISK_NUMBER GROUP_NUMBER STATE MOUNT_STATUS HEADER_STATUS MODE_STATUS
------------------------------ -------------------- ----------- ------------ ---------------- -------------- ------------------------ --------------
/dev/raw/raw1 DATA_0000 0 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw3 DATA_0002 2 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw4 DATA1_0000 0 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw5 DATA1_0001 1 1 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw2 DATA_0001 1 2 NORMAL CACHED MEMBER ONLINE
/dev/raw/raw6 DATA_0005 5 2 NORMAL CACHED MEMBER ONLINE
6 rows selected.
SQL> select NAME, TOTAL_MB, FREE_MB, (TOTAL_MB - FREE_MB) USED_MB, to_char((TOTAL_MB - FREE_MB) * 100 / TOTAL_MB,'99.99') || '%' USED_PRECENT from v$asm_diskgroup;
NAME TOTAL_MB FREE_MB USED_MB USED_PRECENT
-------------------- ---------- ---------- ---------- --------------
DATA1 1998 1901 97 4.85%
DATA 3497 1128 2369 67.74%
实际测试中,第一和第三种方法都可行(因本测试环境未使用oracleasm,故不测试第二种)。
因第一种方法较为稳妥,故选择第一种方法在生产环境修复问题。
后续:
1、 几天后在生产环境pboss2-rdb1/2 江苏库使用第一种方法修复成功。但是引出一个问题:suse系统和rhel系统在udev上有一些差异,rhel修改完 /etc/udev/rules.d/下的 rules文件,通过命令start_udev启动udev即可。但是suse则需要如下操作/etc/init.d/boot.udev restart 重启udev配置,然后模拟拔插 udevadm test
/sys/block/xxx 才能使设置生效。当时没发现这个差异,只能通过重启系统生效。
2、在asmdisk加进diskgroup后,正在做rebalance过程中,使用两种方法改变asmdisk权限。一种是直接操作系统chown命令改变两个节点的asmdisk,结果发现rebalance正常进行,diskgroup不受影响。另一种是使用udev改变asmdisk属性,重启udev使之生效,结果发现diskgroup异常,直接unmount,数据库实例down,把asmdisk权限修改回来后,重新alter diskgroup xxx mount后,发现rebalance继续进行,数据库重启后正常。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31441616/viewspace-2146709/,如需转载,请注明出处,否则将追究法律责任。