RAC环境创建表空间,ASM磁盘组缺少”+“号引发的故障

RAC环境创建表空间,ASM磁盘组缺少”+“号引发的故障


环境介绍:

OS:suse linux 11

Grid:11.2.0.4

Oracle db:11.2.0.4

ASM存储 4节点RAC


故障现象:

节点2 alert日志

Thu Dec 18 15:30:16 2014
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB2/trace/DEVDB2_dbw0_58174.trc:
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB2/trace/DEVDB2_dbw0_58174.trc:
ORA-01186: file 133 failed verification tests
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
File 133 not verified due to error ORA-01157
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB2/trace/DEVDB2_dbw0_58174.trc:
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB2/trace/DEVDB2_dbw0_58174.trc:
ORA-01186: file 133 failed verification tests
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
File 133 not verified due to error ORA-01157


节点3 alert日志

Thu Dec 18 15:30:17 2014
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB3/trace/DEVDB3_dbw0_42554.trc:
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB3/trace/DEVDB3_dbw0_42554.trc:
ORA-01186: file 133 failed verification tests
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
File 133 not verified due to error ORA-01157
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB3/trace/DEVDB3_dbw0_42554.trc:
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB3/trace/DEVDB3_dbw0_42554.trc:
ORA-01186: file 133 failed verification tests
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
File 133 not verified due to error ORA-01157


节点4 alert日志

Thu Dec 18 15:30:16 2014
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB4/trace/DEVDB4_dbw0_55436.trc:
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB4/trace/DEVDB4_dbw0_55436.trc:
ORA-01186: file 133 failed verification tests
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
File 133 not verified due to error ORA-01157
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB4/trace/DEVDB4_dbw0_55436.trc:
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /oracle/app/oracle/diag/rdbms/DEVDB/DEVDB4/trace/DEVDB4_dbw0_55436.trc:
ORA-01186: file 133 failed verification tests
ORA-01157: cannot identify/lock data file 133 - see DBWR trace file
ORA-01110: data file 133: '/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA'
File 133 not verified due to error ORA-01157


分析问题:

                检查日志,发现4节点RAC的2、3、4节点有异常ORA报错,而一节点没有类似ORA错误,但是在故障相关时段一节点的alert日志有一个创建表空间的语句。最初以为是数据文件有损坏,导致集群环境中各个节点无法识别或者锁住数据文件。再仔细一看,发现数据文件的位置有问题,data file 133 文件对应的是/oracle/app/oracle/product/11.2.0/db_1/dbs/DEVDBDATA,ASM环境下数据库文件都存储在ASM磁盘中,而非本地文件系统。于是断定这个文件创建得有问题。前面提到过,在故障时段,一节点alert出现过创建表空间的记录。于是检查创建表空间的语句,看是否有问题。语句如下:

create tablespace  TBS_DATA datafile 'DEVDBDATA' size 4G
autoextend off
extent management local
segment space management auto;

一看就发现问题了,ASM磁盘组前面少了一个‘’+“,这就是导致故障的原因。创建表空间时,指定的ASM磁盘组前少了一个”+“号。虽然表空间创建成功了,但是没有在ASM中生成的数据文件,而在执行表空间创建语句的节点生成了一个非共享的文件。因为这个文件只在节点1存在,对其他节点不共享,导致其他节点的实例无法识别和锁住这个文件。

在一节点本地目录查看创建的数据文件:

ora11g@PROD1:/oracle/app/oracle/product/11.2.0/db_1/dbs> ls -lrt
total 4198440
-rw-r--r-- 1 ora11g oinstall       2851 2009-05-15 13:35 init.ora
-rw-rw---- 1 ora11g asmadmin       1544 2014-08-14 12:03 hc_PROD1.dat
-rw-r----- 1 ora11g oinstall       1536 2014-08-14 14:40 orapwPROD11
-rw-rw---- 1 ora11g asmadmin       1544 2014-08-14 14:42 hc_PROD11.dat
-rw-r----- 1 ora11g oinstall         41 2014-08-14 15:20 initDEVDB1.ora
-rw-r----- 1 ora11g oinstall       1536 2014-11-23 05:51 orapwDEVDB1
-rw-rw---- 1 ora11g asmadmin       1544 2014-11-23 05:55 hc_DEVDB1.dat
-rw-r----- 1 ora11g asmadmin 4294975488 2014-12-18 17:42 DEVDBDATA
ora11g@PROD1:/oracle/app/oracle/product/11.2.0/db_1/dbs> 

上面标红部分就是创建的4G大小的本地数据文件。


原因找到了,是创建表空间出问题了。于是删除后,重建该表空间,问题解决。

删除表空间语句:

drop tablespace TBS_DATA including contents and datafiles;

重建表空间语句:

create tablespace  TBS_DATA datafile '+DEVDBDATA' size 4G
autoextend off
extent management local
segment space management auto;


总结:在RAC环境最重要的一点就是存储要共享,本例中因为遗漏了“+”号,导致共享的ASM存储变成了非共享的本地存储,导致发生了故障。这里也警示我们在RAC环境创建表空间时要格外细心,避免造成故障。











你可能感兴趣的:(oracle,故障)