2008圣诞超级复杂困难之Oracle数据库大恢复

链接: http://www.eygle.com/archives/2008/12/difficult_recovery_christmas.html

昨天,一个朋友公司的数据库崩溃。

这再次印证了我反复提到的一个命题:数据库也需要休息
每逢节假日,数据库也经常会自我选择放假

以前我说:年终难终 进入数据库事故多发期,一年一度今又是,记得另外一个圣诞节,我还和Biti一起在北京的时候,同样遇到一个上海的朋友数据库崩溃,我们远程指导这位朋友恢复了数据。

这次的事情是这样的。
1.首先主机宕机,磁盘出错
看到以下这类错误,一般你的数据都很危险了

Dec 24 13:52:13 kernel: sda5: rw=0, want=18298437640, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=10384710304, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=8756273744, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=5023902272, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=6730428824, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=8884660792, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=9182513808, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=5002858800, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=6730428824, limit=163846872
Dec 24 13:52:13 kernel: attempt to access beyond end of device
Dec 24 13:52:13 kernel: sda5: rw=0, want=15872410168, limit=163846872


2.数据文件大量损坏

当然这次也不例外,大量文件损坏,dbv大量如下错误:

[oracle@stat datafile]$ dbv file=o1_mf_system_29448mn7_.dbf blocksize=8192

DBVERIFY: Release 10.2.0.2.0 - Production on Thu Dec 25 22:17:52 2008

Copyright (c) 1982, 2005, Oracle. All rights reserved.

DBVERIFY - Verification starting : FILE = o1_mf_system_29448mn7_.dbf
Page 40 is influx - most likely media corrupt
Corrupt block relative dba: 0x00400028 (file 1, block 40)
Fractured block found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x00400028
last change scn: 0x0000.18990f0e seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xbc120601
check value in block header: 0xc0cb
computed block checksum: 0xb003

Page 232 is influx - most likely media corrupt
Corrupt block relative dba: 0x004000e8 (file 1, block 232)
Fractured block found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x004000e8
last change scn: 0x0000.18991b98 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x6c440601
check value in block header: 0x8d7f
computed block checksum: 0x77dc

3.控制文件损坏
启动数据库出现如下错误:

Wed Dec 24 17:08:52 2008
ALTER DATABASE MOUNT
Wed Dec 24 17:08:56 2008
Errors in file /opt/oracle/admin/stat/udump/stat_ora_4630.trc:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [11258908], [10375171], [0x0], [], [], [], []
Wed Dec 24 17:08:57 2008
ORA-600 signalled during: ALTER DATABASE MOUNT...
Wed Dec 24 17:09:01 2008
Starting ORACLE instance (normal)
Wed Dec 24 17:16:22 2008
Corrupt block 1 found during reading backup piece, file=/opt/oracle/product/db10g/dbs/snapcf_stat.f, corr_type=2

4.经过反复确认,这个环境Over了

5.不完全的备份
以前的备份机制使得我可以从远程主机找到一系列备份集,但是没有控制文件。
通过备份集、dbms_backup_restore等手段,首先恢复出来数据文件,然后尝试启动数据库

6.强制打开
通过强制resetlogs手段打开数据库,出现ORA-600 4000错误

Wed Dec 24 18:56:00 2008
Errors in file /opt/oracle/admin/stat/udump/stat_ora_21479.trc:
ORA-00600: internal error code, arguments: [4000], [15], [], [], [], [], [], []
Wed Dec 24 18:56:01 2008
Errors in file /opt/oracle/admin/stat/udump/stat_ora_21479.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [15], [], [], [], [], [], []

7.通过BBED解决ORA-600 4000错误
这个没说的,只能通过BBED搞定了,修复有问题的数据块,再次尝试打开数据库


8.遇到ORA-600 2662错误

这个错误就好解决了,通过我网站上的示例就可以解决:

Wed Dec 24 21:13:17 2008
Errors in file /opt/oracle/admin/stat/udump/stat_ora_28316.trc:
ORA-00600: internal error code, arguments: [2662], [0], [412717646], [0], [412772634], [8389633], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [412717644], [0], [412772634], [8389633], [], []
Wed Dec 24 21:13:18 2008
Errors in file /opt/oracle/admin/stat/udump/stat_ora_28316.trc:
ORA-00600: internal error code, arguments: [2662], [0], [412717647], [0], [412772634], [8389633], [], []
ORA-00600: internal error code, arguments: [2662], [0], [412717646], [0], [412772634], [8389633], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [412717644], [0], [412772634], [8389633], [], []
Wed Dec 24 21:13:18 2008
Errors in file /opt/oracle/admin/stat/udump/stat_ora_28316.trc:
ORA-00600: internal error code, arguments: [2662], [0], [412717647], [0], [412772634], [8389633], [], []
ORA-00600: internal error code, arguments: [2662], [0], [412717646], [0], [412772634], [8389633], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [412717644], [0], [412772634], [8389633], [], []


9.解决ORA-600 4097号错误

接下来继续出现ORA-600 4097号错误,这个也好解决,搞定UNDO表空间就Ok了
Wed Dec 24 21:18:12 2008
Errors in file /opt/oracle/admin/stat/bdump/stat_j000_28723.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []

10.解决一些其他小问题
此处省略10000字,终于搞定了用户数据库!

这个案例所用到的所有知识在我的网站上全都有详细介绍,不过要能把所有知识综合运用才能解决这次的故障,这真是圣诞节对我的一大考验!


站内相关文章|Related Articles
  • ORA-00704 与 bootstrap 错误
  • 珍爱这一天 - 祝朋友们圣诞快乐
  • ORA-600 [2103]错误及CF enqueue竞争
  • TNS-12519 与 processes 参数设置
  • MMNL进程与ORA-07445 ktsmg_get_threshold


  • 你可能感兴趣的:(oracle,数据库,header,File,database,Access)