数据库崩溃故障处理一

 

 

故障时间:2008-06-21

故障项目:石河子人民医院

平台:Windows 2000 sp2 Oracle 9.2.0.1

故障描述一:2008-06-12上午九点左右,用户报告系统无法使用。重新启动服务器后数据库无法打开,检查alert文件发现以下错误

    重新启动前:

Sat Jun 21 09:47:40 2008

Errors in file d:\oracle\admin\shzrmyy\udump\shzrmyy_ora_660.trc:

Sat Jun 21 09:47:56 2008

Errors in file d:\oracle\admin\shzrmyy\udump\shzrmyy_ora_5640.trc:

…...

…...

…...

Errors in file d:\oracle\admin\shzrmyy\udump\shzrmyy_ora_4300.trc:

Sat Jun 21 09:48:42 2008

Errors in file d:\oracle\admin\shzrmyy\udump\shzrmyy_ora_1208.trc:

Sat Jun 21 09:49:35 2008

KCF: write/open error block=0x3c399 nline=1

file=39 E:\DATABASE\SHZRMYY\COMM_LONG_IDX_01.ORA

error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件

O/S-Error: (OS 1450) 系统资源不足,无法完成请求的服务。'

Automatic datafile offline due to write error on

file 39: E:\DATABASE\SHZRMYY\COMM_LONG_IDX_01.ORA

KCF: write/open error block=0x8860 nline=1

file=20 E:\DATABASE\SHZRMYY\COMM_IDX_01.ORA

error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件

O/S-Error: (OS 1450) 系统资源不足,无法完成请求的服务。'

Automatic datafile offline due to write error on

file 20: E:\DATABASE\SHZRMYY\COMM_IDX_01.ORA

KCF: write/open error block=0x273a nline=1

file=2 E:\DATABASE\SHZRMYY\UNDOTBS01.DBF

error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件

O/S-Error: (OS 1450) 系统资源不足,无法完成请求的服务。'

Automatic datafile offline due to write error on

file 2: E:\DATABASE\SHZRMYY\UNDOTBS01.DBF

KCF: write/open error block=0x25c43 nline=1

file=65 E:\DATABASE\SHZRMYY\INPSICK_TAB_02.ORA

error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件

O/S-Error: (OS 1450) 系统资源不足,无法完成请求的服务。'

Automatic datafile offline due to write error on

file 65: E:\DATABASE\SHZRMYY\INPSICK_TAB_02.ORA

Sat Jun 21 09:49:36 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

KCF: write/open error block=0x32ae6 nline=1

file=33 E:\DATABASE\SHZRMYY\INPSICK_PRES_LONG_TAB_01.ORA

error=27072 txt: 'OSD-04008: WriteFile() 失败, 无法写入文件

O/S-Error: (OS 1450) 系统资源不足,无法完成请求的服务。'

Automatic datafile offline due to write error on

file 33: E:\DATABASE\SHZRMYY\INPSICK_PRES_LONG_TAB_01.ORA

Sat Jun 21 09:49:36 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:49:36 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:49:37 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:49:38 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:49:48 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:49:57 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_qmn0_3456.trc:

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:50:02 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_pmon_2488.trc:

ORA-00221: error on write to controlfile

Sat Jun 21 09:50:03 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_qmn0_3456.trc:

ORA-00603: ORACLE server session terminated by fatal error

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:50:06 2008

Errors in file d:\oracle\admin\shzrmyy\udump\shzrmyy_j001_6572.trc:

ORA-00603: ORACLE server session terminated by fatal error

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

Sat Jun 21 09:50:20 2008

Instance terminated by CKPT, pid = 624

    重新启动后:

SMON: enabling tx recovery

Sat Jun 21 10:39:24 2008

Database Characterset is ZHS16GBK

Sat Jun 21 10:39:24 2008

SMON: about to recover undo segment 1

SMON: mark undo segment 1 as needs recovery

SMON: about to recover undo segment 2

SMON: mark undo segment 2 as needs recovery

SMON: about to recover undo segment 3

SMON: mark undo segment 3 as needs recovery

SMON: about to recover undo segment 4

SMON: mark undo segment 4 as needs recovery

SMON: about to recover undo segment 5

SMON: mark undo segment 5 as needs recovery

SMON: about to recover undo segment 6

SMON: mark undo segment 6 as needs recovery

SMON: about to recover undo segment 7

SMON: mark undo segment 7 as needs recovery

SMON: about to recover undo segment 8

SMON: mark undo segment 8 as needs recovery

SMON: about to recover undo segment 9

SMON: mark undo segment 9 as needs recovery

SMON: about to recover undo segment 10

SMON: mark undo segment 10 as needs recovery

Sat Jun 21 10:39:24 2008

Errors in file d:\oracle\admin\shzrmyy\bdump\shzrmyy_smon_2212.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-00376: file 2 cannot be read at this time

ORA-01110: data file 2: 'E:\DATABASE\SHZRMYY\UNDOTBS01.DBF'

处理步骤一:

    根据跟踪文件和用户反映情况初步判断应该是数据库崩溃后undo文件损坏,选择使用备份的undo文件进行恢复处理。结果还是一样无法打开数据库,同样还是undo文件出错

处理步骤二:

    使用热备份做了全库完全恢复,故障依旧

处理步骤三:

    使用Oracle隐含参数_corrupted_rollback_segments禁用警告日志中提示的回滚段,数据可以打开,但是运行应用很多地方报错,数据库还是不可用

故障描述二:检查数据库状态,发现好几个表空间的数据文件的字节数都为0

处理步骤四:

    咨询其他工程师认为可能是控制文件不一致导致的错误,重新创建控制文件并使用resetlogs打开数据库,数据库正常打开恢复使用

结果及后续处理

    从上述处理中可以看出确实是控制文件或是在线日志文件出现问题导致数据库崩溃。

    继续判断中:

    后续处理一:重建控制文件以nosetlogs的方式打开数据库,数据库正常打开恢复使用

    后续处理二:使用原先控制文件,启动实例到mount状态查询各个检查点,发现在警告日志中报错的文件都需要恢复,检查点小于数据库检查点,其它文件检查点与数据库检查点相同。

    思路分析:根据警告日志首先认为的是数据文件损坏,加上此项目的系统确实有问题,以前也有发生过无法读写磁盘阵列导致数据文件损坏的情况,因此首先尝试做单个文件的完全恢复,第一个文件就失败。失败后,因为这是重要在线系统,最好是能不丢数据,因此考虑使用所有备份进行完全恢复。再次失败后,使用Oracle隐含参数打开了数据库,但是应用还是无法正常使用,这说明了隐含参数是不能随便用的。用了后还要对数据库进行相当多的处理。最后通过重建控制文件的方式解决了这次崩溃故障。

    思考:究竟控制文件如何损害才会导致这个问题

    疑惑:不明白究竟损坏在什么地方,控制文件可读,日志文件也可读,都没有报错,这种情况下究竟如何迅速定位是哪部分文件损坏。由于条件限制无法再做原有控制文件的不完全恢复

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4670/viewspace-369143/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/4670/viewspace-369143/

你可能感兴趣的:(数据库崩溃故障处理一)