一次误操作引起的Oracle数据库大恢复


事情起由是在Oracle 10g手动建库脚本中看到dbms_backup_restore.zerodbid(0)过程,其中作用是修改数据库的dbid。于是想通过该存储直接在sqlplus中执行修改dbid。
修改之前记录其dbid
引用
SQL> select dbid from v$database;

      DBID
----------
1488207495


修改dbid

引用
SQL> exec dbms_backup_restore.zerodbid(0);

PL/SQL procedure successfully completed.



貌似执行成功了,但随后alert日志显示ckpt进程将数据实例终止
引用
Tue Mar  9 01:43:22 2010
CKPT: terminating instance due to error 1242
Instance terminated by CKPT, pid = 16653
Tue Mar  9 01:43:53 2010


再次启动数据库报错
引用
Tue Mar  9 01:56:09 2010
Errors in file /ora10g/app/admin/ldbra/udump/ldbra_ora_12275.trc:
ORA-01221: data file 1 is not the same file to a background process
ORA-1221 signalled during: ALTER DATABASE OPEN...


dump Oracle数据文件头
引用
SQL> ALTER SESSION SET EVENTS 'immediate trace name file_hdrs level 3';


通过跟踪文件可以看到dbid以被重置为0
引用
V10 STYLE FILE HEADER:
        Compatibility Vsn = 169870080=0xa200300
        Db ID=0=0x0, Db Name='LDBRA'
        Activation ID=0=0x0
        Control Seq=8122=0x1fba, File size=65280=0xff00
        File Number=1, Blksiz=8192, File Type=3 DATA


还有一种途径是通过bbed工具观察

引用
   struct kcvfhhdr, 76 bytes                @20     
      ub4 kccfhswv                          @20       0x00000000
      ub4 kccfhcvn                          @24       0x0a200300
      ub4 kccfhdbi                          @28       0x00000000


当然第一反应是重建控制文件,看看能不能恢复成功

引用
SQL> alter database backup controlfile to trace;

Database altered.

STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "LDBRA" RESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/ora10g/app/oradata/ldbra/redo01.log'  SIZE 50M,
  GROUP 2 '/ora10g/app/oradata/ldbra/redo02.log'  SIZE 50M,
  GROUP 3 '/ora10g/app/oradata/ldbra/redo03.log'  SIZE 50M
-- STANDBY LOGFILE
DATAFILE
  '/ora10g/app/oradata/ldbra/system01.dbf',
  '/ora10g/app/oradata/ldbra/undotbs01.dbf',
  '/ora10g/app/oradata/ldbra/sysaux01.dbf',
  '/ora10g/app/oradata/ldbra/users01.dbf',
  '/ora10g/app/oradata/ldbra/example01.dbf',
  '/ora10g/app/product/10.2.0/db_1/dbs/company.dbf',
  '/ora10g/app/product/10.2.0/db_1/dbs/streams.dbf'
CHARACTER SET ZHS16GBK
;


郁闷的是重建控制文件不成功:

引用
CREATE CONTROLFILE REUSE DATABASE "LDBRA" RESETLOGS  NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01227: log  is inconsistent with other logs


想到还有另外一种语法重建控制文件(重建控制文件之前,备份controlfile和online redolog):
引用
Create controlfile reuse set database "LDBRA"
MAXINSTANCES 8
MAXLOGHISTORY 1
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
Datafile
'/ora10g/app/oradata/ldbra/system01.dbf',
'/ora10g/app/oradata/ldbra/undotbs01.dbf',
'/ora10g/app/oradata/ldbra/sysaux01.dbf',
'/ora10g/app/oradata/ldbra/users01.dbf',
'/ora10g/app/oradata/ldbra/example01.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/ company.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/streams.dbf'
LOGFILE GROUP 1 ('/ora10g/app/oradata/ldbra/redo01.log') SIZE 51200K,
GROUP 2 ('/ora10g/app/oradata/ldbra/redo02.log') SIZE 51200K,
GROUP 3 ('/ora10g/app/oradata/ldbra/redo03.log') SIZE 51200K RESETLOGS; 


似乎重建成功了!但是进行recover的时候报错了!
引用
SQL> RECOVER DATABASE USING BACKUP CONTROLFILE;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [kcvhvdf_1], [], [], [], [], [], [],
[]


可以看到控制文件在重建的过程中进行了dbid重置
引用
SQL> select dbid from v$database;

      DBID
----------
1498845164


问题到这里似乎失去了头绪,呵呵,拷回之前备份的控制文件替换刚建的控制文件。因为我采用的是resetlog选项创建控制文件,从理论上来讲,应该是会重置redolog的,即重新创建redolog。但是目前采用此选项确报ORA-01227错误。不可思议!后来一想可能是跟数据文件中的dbid为0有关。于是采用终极修复方法,bbed!首先将所有数据文件的dbid用bbed工具重置为1488207495,其次将fuzzy标记打为0x2000(因为数据库被ckpt进程异常终止,将标记打为0x2000表示数据库是shutdown immediate关闭),采用上述方法之后控制文件成功创建!
引用
SQL> STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "LDBRA" RESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271616 bytes
Variable Size             461375680 bytes
Database Buffers          608174080 bytes
Redo Buffers                2920448 bytes
  GROUP 1 '/ora10g/app/oradata/ldbra/redo01.log'  SIZE 50M,
  GROUP 2 '/ora10g/app/oradata/ldbra/redo02.log'  SIZE 50M,
  GROUP 3 '/ora10g/app/oradata/ldbra/redo03.log'  SIZE 50M
-- STANDBY LOGFILE
DATAFILE
  '/ora10g/app/oradata/ldbra/system01.dbf',
  '/ora10g/app/oradata/ldbra/undotbs01.dbf',
  '/ora10g/app/oradata/ldbra/sysaux01.dbf',
  '/ora10g/app/oradata/ldbra/users01.dbf',
  '/ora10g/app/oradata/ldbra/example01.dbf',
  '/ora10g/app/product/10.2.0/db_1/dbs/company.dbf',
  '/ora10g/app/product/10.2.0/db_1/dbs/streams.dbf'
CHARACTER SET ZHS16GBK
21  ;

Control file created.


尝试打开数据库
SQL> alter database open RESETLOGS;
出现数据库挂起状态,后台alert日志显示[2662]错误,呵呵,看到这个错误,希望就来了!
引用
SMON: enabling cache recovery
Tue Mar  9 03:11:38 2010
Errors in file /ora10g/app/admin/ldbra/udump/ldbra_ora_13676.trc:
ORA-00600: internal error code, arguments: [2662], [2268], [3799096903], [2268], [3799098345], [8388617], [], []
Tue Mar  9 03:11:40 2010
Errors in file /ora10g/app/admin/ldbra/udump/ldbra_ora_13676.trc:
ORA-00600: internal error code, arguments: [2662], [2268], [3799096903], [2268], [3799098345], [8388617], [], []
Tue Mar  9 03:11:40 2010


由于shutdown abort实例不起作用,就采用杀Oracle进程,删除共享内存段的做法,将挂起的数据库实力强制abort:
1、杀Oracle核心进程
引用
[ora10g@test bdump]$ ps -ef|grep ora_
ora10g   14431     1  0 Feb21 ?        00:01:32 ora_pmon_streams
ora10g   14433     1  0 Feb21 ?        00:00:46 ora_psp0_streams
ora10g   14435     1  0 Feb21 ?        00:00:47 ora_mman_streams
ora10g   14437     1  0 Feb21 ?        00:06:57 ora_dbw0_streams
ora10g   14439     1  0 Feb21 ?        00:06:24 ora_lgwr_streams
ora10g   14441     1  0 Feb21 ?        00:46:13 ora_ckpt_streams
ora10g   14443     1  0 Feb21 ?        00:01:02 ora_smon_streams
ora10g   14445     1  0 Feb21 ?        00:00:00 ora_reco_streams
ora10g   14447     1  0 Feb21 ?        00:05:53 ora_cjq0_streams
ora10g   14449     1  0 Feb21 ?        00:03:15 ora_mmon_streams
ora10g   14451     1  0 Feb21 ?        00:02:47 ora_mmnl_streams
ora10g   14453     1  0 Feb21 ?        00:00:01 ora_d000_streams
ora10g   14455     1  0 Feb21 ?        00:00:03 ora_s000_streams
ora10g   14460     1  0 Feb21 ?        00:00:05 ora_qmnc_streams
ora10g   14468     1  0 Feb21 ?        00:00:08 ora_q000_streams
ora10g   14470     1  0 Feb21 ?        00:00:02 ora_q001_streams
ora10g   13622     1  0 03:08 ?        00:00:00 ora_j000_streams
ora10g   13710 12028  0 03:13 pts/5    00:00:00 grep ora_
[ora10g@test bdump]$ kill -9 14431 14433 14435 14437 14439 14441 14443 14445 14447 14449 14451 14453 14455 14460 14468 14470 13622


2、删除Oracle 共享内存段
引用
[ora10g@test bdump]$ ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status     
0xcc481b8c 1441796    ora10g    640        599785472  0                      
0x40b3b558 2818054    ora10g    640        1077936128 0                      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems    
0x0d908ec4 360448     ora10g    640        154      

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

[root@test ~]# ipcrm -m 1441796
[root@test ~]# ipcrm -s 360448


再次尝试将实例打开,这里用到了10015事件。
引用
SQL> alter session set events '10015 trace name adjust_scn level 1';

Session altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/ora10g/app/oradata/ldbra/system01.dbf'


SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.


后续工作就是将tempfile添加到temp表空间中,终于恢复成功。
引用
SQL> alter tablespace temp add tempfile '/ora10g/app/oradata/ldbra/temp01.dbf' size 50m reuse;

Tablespace altered.

你可能感兴趣的:(oracle,sql,cache,脚本)