测试环境控制文件被误删的故障处理

今天在测试库执行select * from v$log的时候报错
select * from v$log
              *
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/opt/oracle/oradata/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

报错信息很明确,控制文件找不到了。根据控制文件的路径,通过操作系统命令来查看,文件确实已经不在了。
ls -l /opt/oracle/oradata/control01.ctl
ls: cannot access /opt/oracle/oradata/control01.ctl: No such file or directory


select * from v$controlfile;
STATUS         NAME                                     IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
               /opt/oracle/oradata/control01.ctl        NO          16384           1972


糟糕的是,测试库的控制文件只有一份,如果有多份的话,通过拷贝+重命名可以解决一个控制文件丢失的问题(关闭数据库后,进行拷贝+重命名)。

在这种情况下,要不停掉数据库,手工创建控制文件,或者第二种办法:由于数据库还是打开的状态,虽然文件被删除,但是实例(进程)依然会对访问过的文件保持打开状态。

ps -ef | grep lgwr
oracle   11780     1  0 Jun27 ?        00:05:02 ora_lgwr_fio

cd /proc/11780/fd

ls -ltr
total 0
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 9 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 8 -> /dev/null
lrwx------. 1 oracle oinstall 64 Jul  9 09:25 7 -> /opt/oracle/products/11.2.0/dbs/hc_fio.dat
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 6 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 5 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 4 -> /dev/null
lr-x------. 1 oracle oinstall 64 Jul  9 09:25 3 -> /dev/null
l-wx------. 1 oracle oinstall 64 Jul  9 09:25 27 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trm
l-wx------. 1 oracle oinstall 64 Jul  9 09:25 26 -> /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_lgwr_11780.trc
lrwx------. 1 oracle oinstall 64 Jul  9 09:25 257 -> /dev/dfa
lrwx------. 1 oracle oinstall 64 Jul  9 09:25 256 -> /opt/oracle/oradata/control01.ctl (deleted)

可以看到LGWR进程保持了对文件  256 -> /opt/oracle/oradata/control01.ctl (deleted) 的访问,可以通过拷贝命令把文件还原。

cp 256 /opt/oracle/oradata/control01.ctl
chown oracle.oinstall /opt/oracle/oradata/control01.ctl

再查看v$log,已经可已正常查看了。

select * from v$log;


    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIV STATUS                           FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- -------------------
NEXT_CHANGE# NEXT_TIME
------------ -------------------
         1          1       3436 1073741824        512          1 YES    INACTIVE                               2011282 2014-07-09 09:18:19
     3684369 2014-07-09 09:20:28


         2          1       3437 1073741824        512          1 YES    ACTIVE                                 3684369 2014-07-09 09:20:28
     4951745 2014-07-09 09:25:08


         3          1       3438 1073741824        512          1 NO     CURRENT                                4951745 2014-07-09 09:25:08
  2.8147E+14



但是紧接着,实例突然就DOWN掉了,查看ALERT文件。
********************* ATTENTION: ********************
 The controlfile header block returned by the OS
 has a sequence number that is too old.
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely,
 please do the following:
 (1) Save all copies of the controlfile for later
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue:
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance.
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 11782): terminating the instance
Wed Jul 09 09:30:08 2014
System state dump requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc
Wed Jul 09 09:30:08 2014
ORA-1092 : opitsk aborting process
Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/dlsp/fio/trace/fio_diag_11761.trc"
Error message: ORA-48913: Writing into trace file failed, file size limit [2000000] reached
Writing to the above trace file is disabled for now on...
Dumping diagnostic data in directory=[cdmp_20140709093008], requested by (instance=1, osid=11782 (CKPT)), summary=[abnormal instance termination].
Instance terminated by CKPT, pid = 11782

基本的意思是,控制文件太旧了,然后检查点进程终止了实例。

尝试打开数据库

@>startup
ORACLE instance started.


Total System Global Area 1.9108E+10 bytes
Fixed Size                  2236328 bytes
Variable Size            2952790104 bytes
Database Buffers         1.6106E+10 bytes
Redo Buffers               46886912 bytes
Database mounted.
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '+DG_DATA/dlsp/redo01.log'
提示日志里记录的信息(检查点信息)比控制文件新。

看来只能重新创建控制文件了
alter database backup controlfile to trace as '/tmp/a.txt';

@>startup nomount
ORACLE instance started.


Total System Global Area 1.9108E+10 bytes
Fixed Size                  2236328 bytes
Variable Size            2952790104 bytes
Database Buffers         1.6106E+10 bytes
Redo Buffers               46886912 bytes
@>
@>CREATE CONTROLFILE REUSE DATABASE "DLSP" NORESETLOGS FORCE LOGGING ARCHIVELOG
  2      MAXLOGFILES 60
  3      MAXLOGMEMBERS 5
  4      MAXDATAFILES 2000
  5      MAXINSTANCES 1
  6      MAXLOGHISTORY 8631
  7  LOGFILE
  8    GROUP 1 '+DG_DATA/dlsp/redo01.log'  SIZE 1024M BLOCKSIZE 512,
  9    GROUP 2 '+DG_DATA/dlsp/redo02.log'  SIZE 1024M BLOCKSIZE 512,
 10    GROUP 3 '+DG_DATA/dlsp/redo03.log'  SIZE 1024M BLOCKSIZE 512
 11  -- STANDBY LOGFILE
 12  DATAFILE
 13    '+DG_DATA/dlsp/system01.dbf',
 14    '+DG_DATA/dlsp/sysaux01.dbf',
 15    '+DG_DATA/dlsp/undotbs01.dbf',
 16    '+DG_DATA/dlsp/user01.dbf'
 17  CHARACTER SET UTF8
 18  ;


Control file created.


@>alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '+DG_DATA/dlsp/system01.dbf'
提示数据库需要恢复。

@>recover database; 
Media recovery complete.
@>alter database open;


Database altered.


@>col name for a40
@>select * from v$controlfile;


STATUS         NAME                                     IS_REC BLOCK_SIZE FILE_SIZE_BLKS
-------------- ---------------------------------------- ------ ---------- --------------
               /opt/oracle/oradata/control01.ctl        NO          16384           1972

至此整个恢复过程结束。

虽然理论上可以把误删除的控制文件、数据文件、日志文件找回来(数据库必须没关闭!!),但是就像本案例看到的,可能会遭遇这样那样的错误,DBA对数据库的操作一定要谨慎。


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22034023/viewspace-1213644/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/22034023/viewspace-1213644/

你可能感兴趣的:(测试环境控制文件被误删的故障处理)