1.例行每日检查
2.读取查询数据库存储空间大小发现为0,发现异常
SQL> select name,total_mb,free_mb from v$asm_diskgroup;
NAME TOTAL_MB FREE_MB
------------------------------ ------------------- --------------
HUAWEI 0 0
3.查询数据库数据文件报错,提示I/O操作提交失败
SQL>select name from v$datafile;
Select name from v$datafile;
*
ERROR at line 1:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '+HUAWEI/oxxx/controlfile/controlfile'
ORA-15081: failed to submit an I/O operation to a disk
根据查询返回信息确定挂载的ASM存储访问不正常。
4.查看oracle日志文件
Tue Aug 30 08:53:20 2016
Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:
ORA-15025: could not open disk "/dev/mapper/huawei10"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:
ORA-15025: could not open disk "/dev/mapper/huawei10"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 985 in group [1.1532792394] from disk HUAWEI_0009 allocation unit 674552 reason error; if possible, will try another mirror side
Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:
ORA-00202: control file: '+HUAWEI/oxxx/controlfile/controlfile'
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '+HUAWEI/oxxx/controlfile/controlfile'
ORA-15081: failed to submit an I/O operation to a disk
根据日志文件信息,判断可能ASM存储访问权限问题,
5.查看目录/dev/mapper/下文件访问权限,AMS存储挂载所有者是root,可能是有问题的
[oracle@fxxxx trace]$ ls -lth /dev/mapper/
total 0
lrwxrwxrwx 1 root root 8 Aug 30 09:50 huawei05 -> ../dm-11
lrwxrwxrwx 1 root root 8 Aug 30 09:50 huawei06 -> ../dm-10
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei08 -> ../dm-9
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei07 -> ../dm-8
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei09 -> ../dm-7
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei10 -> ../dm-6
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei11 -> ../dm-5
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei12 -> ../dm-4
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei01 -> ../dm-3
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei02 -> ../dm-2
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei03 -> ../dm-1
lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei04 -> ../dm-0
crw-rw---- 1 root root10, 58 Aug 30 09:50 control
6.查看存储挂载情况,发现存储的负载链路有一部分存在故障,但是应该不影响数据库的读写。
root@fxxxx ~]# multipath -ll
huawei10 (360022a11000ae70e000d26f40000004d) dm-6 HUAWEI,S5500T
size=1.0T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 8:0:0:6 sdh 8:112 active ready running
`- 8:0:1:6 sdt 65:48 failed faulty running
7.尝试重新挂载ASM存储,关闭数据库。
关闭ASM实例并重启,查看ASM日志。
SQL> ALTER DISKGROUP ALL MOUNT
Tue Aug 30 09:08:31 2016
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]
Tue Aug 30 09:08:43 2016
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]
Tue Aug 30 09:13:12 2016
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]
Tue Aug 30 09:13:21 2016
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process MMNL
Stopping background process MMON
License high water mark = 1
SQL> ALTER DISKGROUP ALL DISMOUNT
Tue Aug 30 09:13:24 2016
Stopping background process VKTM
Tue Aug 30 09:13:26 2016
Instance shutdown complete
ASM_DISKSTRING参数读取存储设备失败,确认是ASM实例未正常挂载存储
8.重启服务器(可以试试重启multipathd服务 )
查看存储挂载情况
# multipath -ll
huawei10 (360022a11000ae70e000d26f40000004d) dm-6 HUAWEI,S5500T
size=1.0T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 8:0:0:6 sdh 8:112 active ready running
`- 8:0:1:6 sdt 65:48 active ready running
9.重新设置ASM存储ASM_DISKSTRING
编辑ASM实例参数文件
设置参数ASM_DISKSTRING=/dev/raw/raw*
使ASM存储搜索存储设备为/dev/raw/raw*
将存储块设备绑定到字符设备
raw /dev/raw/raw01 /dev/mapper/huawei01
raw /dev/raw/raw02 /dev/mapper/huawei02
raw /dev/raw/raw03 /dev/mapper/huawei03
raw /dev/raw/raw04 /dev/mapper/huawei04
raw /dev/raw/raw05 /dev/mapper/huawei05
raw /dev/raw/raw06 /dev/mapper/huawei06
raw /dev/raw/raw07 /dev/mapper/huawei07
raw /dev/raw/raw08 /dev/mapper/huawei08
raw /dev/raw/raw09 /dev/mapper/huawei09
raw /dev/raw/raw10 /dev/mapper/huawei10
raw /dev/raw/raw11 /dev/mapper/huawei11
raw /dev/raw/raw12 /dev/mapper/huawei12
将此映射写入脚本,使服务器重启后仍然生效
#cat /etc/rc.local
更改字符设备的访问权限
grid@fxxxx ~]$ ll /dev/raw
total 0
crwxrwxrwx 1 grid asmadmin 162, 1 Aug 30 10:30 raw1
crwxrwxrwx 1 grid asmadmin 162, 2 Aug 30 10:30 raw2
crwxrwxrwx 1 grid asmadmin 162, 3 Aug 30 10:30 raw3
crwxrwxrwx 1 grid asmadmin 162, 4 Aug 30 10:30 raw4
crwxrwxrwx 1 grid asmadmin 162, 5 Aug 30 10:30 raw5
crwxrwxrwx 1 grid asmadmin 162, 6 Aug 30 10:30 raw6
crwxrwxrwx 1 grid asmadmin 162, 7 Aug 30 10:30 raw7
crwxrwxrwx 1 grid asmadmin 162, 8 Aug 30 10:30 raw8
crwxrwxrwx 1 grid asmadmin 162, 9 Aug 30 10:30 raw9
crwxrwxrwx 1 grid asmadmin 162, 10 Aug 30 10:30 raw10
crwxrwxrwx 1 grid asmadmin 162, 11 Aug 30 10:30 raw11
crwxrwxrwx 1 grid asmadmin 162, 12 Aug 30 10:30 raw12
10.启动ASM实例,未报错,正常启动
11.启动数据库,正常启动
12.查询ASM磁盘组信息
SQL> select name,path,total_mb,free_mb from v$asm_disk;
NAME PATH TOTAL_MB FREE_MB
----------------------------- ---------------------- -------------------------- --------------
HUAWEI_0011 /dev/raw/raw12 1048576 58506
HUAWEI_0010 /dev/raw/raw11 1048576 58507
HUAWEI_0009 /dev/raw/raw10 1048576 58528
HUAWEI_0008 /dev/raw/raw9 1048576 58480
HUAWEI_0007 /dev/raw/raw8 1048576 58528
HUAWEI_0006 /dev/raw/raw7 1048576 58504
HUAWEI_0005 /dev/raw/raw6 1048576 58481
HUAWEI_0004 /dev/raw/raw5 1048576 58507
HUAWEI_0003 /dev/raw/raw4 1048576 58483
HUAWEI_0002 /dev/raw/raw3 1048576 58506
HUAWEI_0001 /dev/raw/raw2 1048576 58514
HUAWEI_0000 /dev/raw/raw1 1048576 58516
12 rows selected.
13.查询数据库数据文件信息
SQL>select name from v$datafile;
NAME
---------------------------------------------------------------------
+HUAWEI/oxxx/datafile/system.677.849506399
+HUAWEI/oxxx/datafile/sysaux.587.849504601
+HUAWEI/oxxx/datafile/undotbs1.256.849452711
+HUAWEI/oxxx/datafile/users.559.849503517
+HUAWEI/oxxx/datafile/main.689.849506465
+HUAWEI/oxxx/datafile/main.687.849506455
+HUAWEI/oxxx/datafile/main.686.849506451
+HUAWEI/oxxx/datafile/main.688.849506461
…………
+HUAWEI/oxxx/datafile/t_kk_cltgxx.1077.908358177
+HUAWEI/oxxx/datafile/t_kk_cltgxx.1078.908358179
+HUAWEI/oxxx/datafile/t_kk_cltgxx.1079.908358183
796 rows selected.
14.数据库恢复正常。
总结:数据库报错的原因是ASM存储未能正确挂载,而导致此原因是由于旧的ASM_DISKSTRING参数所读取的设备文件权限不足,ASM运行时grid用户,而存储设备的所有者是root用户,重新映射存储读取位置并重新设置权限,数据库访问正常。
20160831
段亚东