事情起由是在Oracle 10g手动建库脚本中看到dbms_backup_restore.zerodbid(0)过程,其中作用是修改数据库的dbid。于是想通过该存储直接在sqlplus中执行修改dbid。
修改之前记录其dbid
引用
SQL> select dbid from v$database;
DBID
----------
1488207495
修改dbid
引用
SQL> exec dbms_backup_restore.zerodbid(0);
PL/SQL procedure successfully completed.
貌似执行成功了,但随后alert日志显示ckpt进程将数据实例终止
引用
Tue Mar 9 01:43:22 2010
CKPT: terminating instance due to error 1242
Instance terminated by CKPT, pid = 16653
Tue Mar 9 01:43:53 2010
再次启动数据库报错
引用
Tue Mar 9 01:56:09 2010
Errors in file /ora10g/app/admin/ldbra/udump/ldbra_ora_12275.trc:
ORA-01221: data file 1 is not the same file to a background process
ORA-1221 signalled during: ALTER DATABASE OPEN...
dump Oracle数据文件头
引用
SQL> ALTER SESSION SET EVENTS 'immediate trace name file_hdrs level 3';
通过跟踪文件可以看到dbid以被重置为0
引用
V10 STYLE FILE HEADER:
Compatibility Vsn = 169870080=0xa200300
Db ID=0=0x0, Db Name='LDBRA'
Activation ID=0=0x0
Control Seq=8122=0x1fba, File size=65280=0xff00
File Number=1, Blksiz=8192, File Type=3 DATA
还有一种途径是通过bbed工具观察
引用
struct kcvfhhdr, 76 bytes @20
ub4 kccfhswv @20 0x00000000
ub4 kccfhcvn @24 0x0a200300
ub4 kccfhdbi @28 0x00000000
当然第一反应是重建控制文件,看看能不能恢复成功
引用
SQL> alter database backup controlfile to trace;
Database altered.
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "LDBRA" RESETLOGS ARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 '/ora10g/app/oradata/ldbra/redo01.log' SIZE 50M,
GROUP 2 '/ora10g/app/oradata/ldbra/redo02.log' SIZE 50M,
GROUP 3 '/ora10g/app/oradata/ldbra/redo03.log' SIZE 50M
-- STANDBY LOGFILE
DATAFILE
'/ora10g/app/oradata/ldbra/system01.dbf',
'/ora10g/app/oradata/ldbra/undotbs01.dbf',
'/ora10g/app/oradata/ldbra/sysaux01.dbf',
'/ora10g/app/oradata/ldbra/users01.dbf',
'/ora10g/app/oradata/ldbra/example01.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/company.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/streams.dbf'
CHARACTER SET ZHS16GBK
;
郁闷的是重建控制文件不成功:
引用
CREATE CONTROLFILE REUSE DATABASE "LDBRA" RESETLOGS NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01227: log is inconsistent with other logs
想到还有另外一种语法重建控制文件(重建控制文件之前,备份controlfile和online redolog):
引用
Create controlfile reuse set database "LDBRA"
MAXINSTANCES 8
MAXLOGHISTORY 1
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
Datafile
'/ora10g/app/oradata/ldbra/system01.dbf',
'/ora10g/app/oradata/ldbra/undotbs01.dbf',
'/ora10g/app/oradata/ldbra/sysaux01.dbf',
'/ora10g/app/oradata/ldbra/users01.dbf',
'/ora10g/app/oradata/ldbra/example01.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/ company.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/streams.dbf'
LOGFILE GROUP 1 ('/ora10g/app/oradata/ldbra/redo01.log') SIZE 51200K,
GROUP 2 ('/ora10g/app/oradata/ldbra/redo02.log') SIZE 51200K,
GROUP 3 ('/ora10g/app/oradata/ldbra/redo03.log') SIZE 51200K RESETLOGS;
似乎重建成功了!但是进行recover的时候报错了!
引用
SQL> RECOVER DATABASE USING BACKUP CONTROLFILE;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [kcvhvdf_1], [], [], [], [], [], [],
[]
可以看到控制文件在重建的过程中进行了dbid重置
引用
SQL> select dbid from v$database;
DBID
----------
1498845164
问题到这里似乎失去了头绪,呵呵,拷回之前备份的控制文件替换刚建的控制文件。因为我采用的是resetlog选项创建控制文件,从理论上来讲,应该是会重置redolog的,即重新创建redolog。但是目前采用此选项确报ORA-01227错误。不可思议!后来一想可能是跟数据文件中的dbid为0有关。于是采用终极修复方法,bbed!首先将所有数据文件的dbid用bbed工具重置为1488207495,其次将fuzzy标记打为0x2000(因为数据库被ckpt进程异常终止,将标记打为0x2000表示数据库是shutdown immediate关闭),采用上述方法之后控制文件成功创建!
引用
SQL> STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "LDBRA" RESETLOGS ARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
ORACLE instance started.
Total System Global Area 1073741824 bytes
Fixed Size 1271616 bytes
Variable Size 461375680 bytes
Database Buffers 608174080 bytes
Redo Buffers 2920448 bytes
GROUP 1 '/ora10g/app/oradata/ldbra/redo01.log' SIZE 50M,
GROUP 2 '/ora10g/app/oradata/ldbra/redo02.log' SIZE 50M,
GROUP 3 '/ora10g/app/oradata/ldbra/redo03.log' SIZE 50M
-- STANDBY LOGFILE
DATAFILE
'/ora10g/app/oradata/ldbra/system01.dbf',
'/ora10g/app/oradata/ldbra/undotbs01.dbf',
'/ora10g/app/oradata/ldbra/sysaux01.dbf',
'/ora10g/app/oradata/ldbra/users01.dbf',
'/ora10g/app/oradata/ldbra/example01.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/company.dbf',
'/ora10g/app/product/10.2.0/db_1/dbs/streams.dbf'
CHARACTER SET ZHS16GBK
21 ;
Control file created.
尝试打开数据库
SQL> alter database open RESETLOGS;
出现数据库挂起状态,后台alert日志显示[2662]错误,呵呵,看到这个错误,希望就来了!
引用
SMON: enabling cache recovery
Tue Mar 9 03:11:38 2010
Errors in file /ora10g/app/admin/ldbra/udump/ldbra_ora_13676.trc:
ORA-00600: internal error code, arguments: [2662], [2268], [3799096903], [2268], [3799098345], [8388617], [], []
Tue Mar 9 03:11:40 2010
Errors in file /ora10g/app/admin/ldbra/udump/ldbra_ora_13676.trc:
ORA-00600: internal error code, arguments: [2662], [2268], [3799096903], [2268], [3799098345], [8388617], [], []
Tue Mar 9 03:11:40 2010
由于shutdown abort实例不起作用,就采用杀Oracle进程,删除共享内存段的做法,将挂起的数据库实力强制abort:
1、杀Oracle核心进程
引用
[ora10g@test bdump]$ ps -ef|grep ora_
ora10g 14431 1 0 Feb21 ? 00:01:32 ora_pmon_streams
ora10g 14433 1 0 Feb21 ? 00:00:46 ora_psp0_streams
ora10g 14435 1 0 Feb21 ? 00:00:47 ora_mman_streams
ora10g 14437 1 0 Feb21 ? 00:06:57 ora_dbw0_streams
ora10g 14439 1 0 Feb21 ? 00:06:24 ora_lgwr_streams
ora10g 14441 1 0 Feb21 ? 00:46:13 ora_ckpt_streams
ora10g 14443 1 0 Feb21 ? 00:01:02 ora_smon_streams
ora10g 14445 1 0 Feb21 ? 00:00:00 ora_reco_streams
ora10g 14447 1 0 Feb21 ? 00:05:53 ora_cjq0_streams
ora10g 14449 1 0 Feb21 ? 00:03:15 ora_mmon_streams
ora10g 14451 1 0 Feb21 ? 00:02:47 ora_mmnl_streams
ora10g 14453 1 0 Feb21 ? 00:00:01 ora_d000_streams
ora10g 14455 1 0 Feb21 ? 00:00:03 ora_s000_streams
ora10g 14460 1 0 Feb21 ? 00:00:05 ora_qmnc_streams
ora10g 14468 1 0 Feb21 ? 00:00:08 ora_q000_streams
ora10g 14470 1 0 Feb21 ? 00:00:02 ora_q001_streams
ora10g 13622 1 0 03:08 ? 00:00:00 ora_j000_streams
ora10g 13710 12028 0 03:13 pts/5 00:00:00 grep ora_
[ora10g@test bdump]$ kill -9 14431 14433 14435 14437 14439 14441 14443 14445 14447 14449 14451 14453 14455 14460 14468 14470 13622
2、删除Oracle 共享内存段
引用
[ora10g@test bdump]$ ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0xcc481b8c 1441796 ora10g 640 599785472 0
0x40b3b558 2818054 ora10g 640 1077936128 0
------ Semaphore Arrays --------
key semid owner perms nsems
0x0d908ec4 360448 ora10g 640 154
------ Message Queues --------
key msqid owner perms used-bytes messages
[root@test ~]# ipcrm -m 1441796
[root@test ~]# ipcrm -s 360448
再次尝试将实例打开,这里用到了10015事件。
引用
SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/ora10g/app/oradata/ldbra/system01.dbf'
SQL> recover database;
Media recovery complete.
SQL> alter database open;
Database altered.
后续工作就是将tempfile添加到temp表空间中,终于恢复成功。
引用
SQL> alter tablespace temp add tempfile '/ora10g/app/oradata/ldbra/temp01.dbf' size 50m reuse;
Tablespace altered.