oracle服务器日常检查,某数据库日常检查故障处理报告

1.例行每日检查

2.读取查询数据库存储空间大小发现为0,发现异常

SQL> select name,total_mb,free_mb from v$asm_diskgroup;

NAME                 TOTAL_MB FREE_MB

------------------------------ ------------------- --------------

HUAWEI                 0 0

3.查询数据库数据文件报错,提示I/O操作提交失败

SQL>select name from v$datafile;

Select name from v$datafile;

*

ERROR at line 1:

ORA-00204: error in reading (block 1, # blocks 1) of control file

ORA-00202: control file: '+HUAWEI/oxxx/controlfile/controlfile'

ORA-15081: failed to submit an I/O operation to a disk

根据查询返回信息确定挂载的ASM存储访问不正常。

4.查看oracle日志文件

Tue Aug 30 08:53:20 2016

Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:

ORA-15025: could not open disk "/dev/mapper/huawei10"

ORA-27041: unable to open file

Linux-x86_64 Error: 13: Permission denied

Additional information: 3

Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:

ORA-15025: could not open disk "/dev/mapper/huawei10"

ORA-27041: unable to open file

Linux-x86_64 Error: 13: Permission denied

Additional information: 3

WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 985 in group [1.1532792394] from disk HUAWEI_0009 allocation unit 674552 reason error; if possible, will try another mirror side

Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:

ORA-00202: control file: '+HUAWEI/oxxx/controlfile/controlfile'

ORA-15081: failed to submit an I/O operation to a disk

Errors in file /u01/app/oracle/diag/rdbms/oxxx/oxxx/trace/oxxx_m000_125893.trc:

ORA-00204: error in reading (block 1, # blocks 1) of control file

ORA-00202: control file: '+HUAWEI/oxxx/controlfile/controlfile'

ORA-15081: failed to submit an I/O operation to a disk

根据日志文件信息,判断可能ASM存储访问权限问题,

5.查看目录/dev/mapper/下文件访问权限,AMS存储挂载所有者是root,可能是有问题的

[oracle@fxxxx trace]$ ls -lth /dev/mapper/

total 0

lrwxrwxrwx 1 root root 8 Aug 30 09:50 huawei05 -> ../dm-11

lrwxrwxrwx 1 root root 8 Aug 30 09:50 huawei06 -> ../dm-10

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei08 -> ../dm-9

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei07 -> ../dm-8

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei09 -> ../dm-7

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei10 -> ../dm-6

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei11 -> ../dm-5

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei12 -> ../dm-4

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei01 -> ../dm-3

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei02 -> ../dm-2

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei03 -> ../dm-1

lrwxrwxrwx 1 root root 7 Aug 30 09:50 huawei04 -> ../dm-0

crw-rw---- 1 root root10, 58 Aug 30 09:50 control

6.查看存储挂载情况,发现存储的负载链路有一部分存在故障,但是应该不影响数据库的读写。

root@fxxxx ~]# multipath -ll

huawei10 (360022a11000ae70e000d26f40000004d) dm-6 HUAWEI,S5500T

size=1.0T features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

|- 8:0:0:6 sdh 8:112 active ready running

`- 8:0:1:6 sdt 65:48 failed faulty running

7.尝试重新挂载ASM存储,关闭数据库。

关闭ASM实例并重启,查看ASM日志。

SQL> ALTER DISKGROUP ALL MOUNT

Tue Aug 30 09:08:31 2016

NOTE: failed to discover disks from gpnp profile asm diskstring

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:

ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]

Tue Aug 30 09:08:43 2016

NOTE: failed to discover disks from gpnp profile asm diskstring

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:

ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]

Tue Aug 30 09:13:12 2016

NOTE: failed to discover disks from gpnp profile asm diskstring

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:

ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]

NOTE: failed to discover disks from gpnp profile asm diskstring

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_126447.trc:

ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]

Tue Aug 30 09:13:21 2016

Shutting down instance (immediate)

Shutting down instance: further logons disabled

Stopping background process MMNL

Stopping background process MMON

License high water mark = 1

SQL> ALTER DISKGROUP ALL DISMOUNT

Tue Aug 30 09:13:24 2016

Stopping background process VKTM

Tue Aug 30 09:13:26 2016

Instance shutdown complete

ASM_DISKSTRING参数读取存储设备失败,确认是ASM实例未正常挂载存储

8.重启服务器(可以试试重启multipathd服务 )

查看存储挂载情况

# multipath -ll

huawei10 (360022a11000ae70e000d26f40000004d) dm-6 HUAWEI,S5500T

size=1.0T features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

|- 8:0:0:6 sdh 8:112 active ready running

`- 8:0:1:6 sdt 65:48 active ready running

9.重新设置ASM存储ASM_DISKSTRING

编辑ASM实例参数文件

设置参数ASM_DISKSTRING=/dev/raw/raw*

使ASM存储搜索存储设备为/dev/raw/raw*

将存储块设备绑定到字符设备

raw /dev/raw/raw01 /dev/mapper/huawei01

raw /dev/raw/raw02 /dev/mapper/huawei02

raw /dev/raw/raw03 /dev/mapper/huawei03

raw /dev/raw/raw04 /dev/mapper/huawei04

raw /dev/raw/raw05 /dev/mapper/huawei05

raw /dev/raw/raw06 /dev/mapper/huawei06

raw /dev/raw/raw07 /dev/mapper/huawei07

raw /dev/raw/raw08 /dev/mapper/huawei08

raw /dev/raw/raw09 /dev/mapper/huawei09

raw /dev/raw/raw10 /dev/mapper/huawei10

raw /dev/raw/raw11 /dev/mapper/huawei11

raw /dev/raw/raw12 /dev/mapper/huawei12

将此映射写入脚本,使服务器重启后仍然生效

#cat /etc/rc.local

更改字符设备的访问权限

grid@fxxxx ~]$ ll /dev/raw

total 0

crwxrwxrwx 1 grid asmadmin 162, 1 Aug 30 10:30 raw1

crwxrwxrwx 1 grid asmadmin 162, 2 Aug 30 10:30 raw2

crwxrwxrwx 1 grid asmadmin 162, 3 Aug 30 10:30 raw3

crwxrwxrwx 1 grid asmadmin 162, 4 Aug 30 10:30 raw4

crwxrwxrwx 1 grid asmadmin 162, 5 Aug 30 10:30 raw5

crwxrwxrwx 1 grid asmadmin 162, 6 Aug 30 10:30 raw6

crwxrwxrwx 1 grid asmadmin 162, 7 Aug 30 10:30 raw7

crwxrwxrwx 1 grid asmadmin 162, 8 Aug 30 10:30 raw8

crwxrwxrwx 1 grid asmadmin 162, 9 Aug 30 10:30 raw9

crwxrwxrwx 1 grid asmadmin 162, 10 Aug 30 10:30 raw10

crwxrwxrwx 1 grid asmadmin 162, 11 Aug 30 10:30 raw11

crwxrwxrwx 1 grid asmadmin 162, 12 Aug 30 10:30 raw12

10.启动ASM实例,未报错,正常启动

11.启动数据库,正常启动

12.查询ASM磁盘组信息

SQL> select name,path,total_mb,free_mb from v$asm_disk;

NAME         PATH         TOTAL_MB     FREE_MB

----------------------------- ---------------------- -------------------------- --------------

HUAWEI_0011     /dev/raw/raw12     1048576     58506

HUAWEI_0010     /dev/raw/raw11     1048576     58507

HUAWEI_0009     /dev/raw/raw10     1048576     58528

HUAWEI_0008     /dev/raw/raw9     1048576     58480

HUAWEI_0007     /dev/raw/raw8     1048576     58528

HUAWEI_0006     /dev/raw/raw7     1048576     58504

HUAWEI_0005     /dev/raw/raw6     1048576     58481

HUAWEI_0004     /dev/raw/raw5     1048576     58507

HUAWEI_0003     /dev/raw/raw4     1048576     58483

HUAWEI_0002     /dev/raw/raw3     1048576     58506

HUAWEI_0001     /dev/raw/raw2     1048576     58514

HUAWEI_0000     /dev/raw/raw1     1048576     58516

12 rows selected.

13.查询数据库数据文件信息

SQL>select name from v$datafile;

NAME

---------------------------------------------------------------------

+HUAWEI/oxxx/datafile/system.677.849506399

+HUAWEI/oxxx/datafile/sysaux.587.849504601

+HUAWEI/oxxx/datafile/undotbs1.256.849452711

+HUAWEI/oxxx/datafile/users.559.849503517

+HUAWEI/oxxx/datafile/main.689.849506465

+HUAWEI/oxxx/datafile/main.687.849506455

+HUAWEI/oxxx/datafile/main.686.849506451

+HUAWEI/oxxx/datafile/main.688.849506461

…………

+HUAWEI/oxxx/datafile/t_kk_cltgxx.1077.908358177

+HUAWEI/oxxx/datafile/t_kk_cltgxx.1078.908358179

+HUAWEI/oxxx/datafile/t_kk_cltgxx.1079.908358183

796 rows selected.

14.数据库恢复正常。

总结:数据库报错的原因是ASM存储未能正确挂载,而导致此原因是由于旧的ASM_DISKSTRING参数所读取的设备文件权限不足,ASM运行时grid用户,而存储设备的所有者是root用户,重新映射存储读取位置并重新设置权限,数据库访问正常。

20160831

段亚东

你可能感兴趣的:(oracle服务器日常检查)