日志文件有报错,查看相应trace文件,如下:
oel57t1:oracle:rac2 > more /u01/oracle/admin/rac/bdump/rac2_j000_13831.trc
/u01/oracle/admin/rac/bdump/rac2_j000_13831.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
ORACLE_HOME = /u01/oracle/product/10.2.0/db_1
System name: Linux
Node name: oel57t1
Release: 2.6.32-200.13.1.el5uek
Version: #1 SMP Wed Jul 27 21:02:33 EDT 2011
Machine: x86_64
Instance name: rac2
Redo thread mounted by this instance: 2
Oracle process number: 24
Unix process pid: 13831, image: oracle@oel57t1 (J000)
*** ACTION NAME:(RLM$EVTCLEANUP) 2013-04-10 13:09:59.155
*** MODULE NAME:(DBMS_SCHEDULER) 2013-04-10 13:09:59.155
*** SERVICE NAME:(SYS$USERS) 2013-04-10 13:09:59.155
*** SESSION ID:(139.14308) 2013-04-10 13:09:59.155
*** 2013-04-10 13:09:59.155
ORA-12012: error on auto execute of job 42567
ORA-00376: file ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '+DATA/rac/sysaux01.dbf'
ORA-06512: at "EXFSYS.DBMS_RLMGR_DR", line 15
ORA-06512: at line 1
cannot be read at this time
经检查不是硬件的问题,于是去查下数据文件的情况:
SQL> select file#,status from v$datafile;
FILE# STATUS
---------- ---------------------
1 SYSTEM
2 ONLINE
3 RECOVER
4 ONLINE
5 ONLINE
6 ONLINE
SQL> select file_id,bytes,status,online_status from dba_data_files;
FILE_ID BYTES STATUS ONLINE_STATUS
---------- ---------- --------------------------- ---------------------
6 36700160 AVAILABLE ONLINE
5 104857600 AVAILABLE ONLINE
4 5242880 AVAILABLE ONLINE
3 AVAILABLE RECOVER
2 57671680 AVAILABLE ONLINE
1 524288000 AVAILABLE SYSTEM
看到recover自然要查v$recover_file,再往下查很明朗了:
SQL> select * from V$RECOVER_FILE;SQL> alterdatabase recover datafile 3;
alter databaserecover datafile 3
*
ERROR at line1:
ORA-00279:change 16799532025 generated at 04/04/2013 14:56:19 needed for thread 1
ORA-00289:suggestion : /soft/archivelog/1_49_809951357.arc
ORA-00280:change 16799532025 for thread 1 is in sequence #49
把rac1上的1_49_809951357.arc归档文件传到rac2上,然后再进行恢复。
SQL> alter database recover tablespacesysaux;
alter database recover tablespace sysaux
*
ERROR at line 1:
ORA-00275: media recovery has already beenstarted
SQL> alter database recover cancel;
Database altered.
SQL> recover datafile 3;
ORA-00279:change 16799532025 generated at 04/04/2013 14:56:19 needed for thread 1
ORA-00289:suggestion : /soft/archivelog/1_49_809951357.arc
ORA-00280:change 16799532025 for thread 1 is in sequence #49
Specify log: {<RET>=suggested |filename | AUTO | CANCEL}
AUTOSpecify log: {<RET>=suggested |filename | AUTO | CANCEL}
ORA-00279:change 16799532025 generated at 04/04/2013 14:56:19 needed for thread 1
ORA-00289:suggestion : /soft/archivelog/1_49_809951357.arc
ORA-00280:change 16799532025 for thread 1 is in sequence #49
.............
Specify log: {<RET>=suggested |filename | AUTO | CANCEL}
Log applied.
Media recovery complete.
此时recover成功了,只要把表空间online就行了。
SQL> alter tablespace sysaux online;
Tablespace altered.
SQL> select * from V$RECOVER_FILE;
no rows selected
SQL> select file_id,bytes,status,online_status from dba_data_files;此时,没问题了,file都已经online了。
到第二个节点上查看
SQL> select file_id,bytes,status,online_status from dba_data_files;
FILE_ID BYTES STATUS ONLINE_STATUS
---------- ---------- --------------------------- ---------------------
6 36700160 AVAILABLE ONLINE
5 104857600 AVAILABLE ONLINE
4 5242880 AVAILABLE ONLINE
3 618659840 AVAILABLE ONLINE
2 57671680 AVAILABLE ONLINE
1 524288000 AVAILABLE SYSTEM
原因分析和整个过程总结:
4号的存储有问题,导致+DATA/rac/sysaux01.dbf offline。
因为这个表空间比较特殊,用户应用普通的查询和修改等不会被影响,所以问题没有被发现。
当有job等需要碰到它操作的时候才会触发错误。
如ORA-12012: error on auto execute of job 1
结决过程思路比较简单:
因为硬件问题已经解决,所以恢复也比较简单就是recover datafile
由于rac的两个instance 各自归档在本地,所以在其中一个节点recover的时候要保证它能读到两个instance的归档,所以用了ftp把归档都放在了执行recover的节点。
redo log 本来就是共享的,所以不用关心,自然能recover。
Fri Apr 5 03:00:37 2013
Errors in file /u01/oracle/admin/rac/bdump/rac1_dbw0_7587.trc:
ORA-01148: cannot refresh file size for datafile 3
ORA-01110: data file 3: '+DATA/rac/sysaux01.dbf'
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9925