==今夜无眠==
了无睡意, 真的很想睡,但是又睡不着,在线redolog丢掉艰难的恢复旅程。
时间: 2010-06-16 21:00:00发现DB启动有问题。
os: Solaris
db: oracle 10.2.0.3.0
最后一次备份:2010-05-18
Tue May 18 19:15:20 2010
Starting control autobackup
Tue May 18 19:16:05 2010
Control autobackup written to SBT_TAPE device
comment 'API Version 2.0,MMS Version 5.0.0.0',
media '0010L3'
handle 'c-3901651628-20100518-0a'
第一次发生错误:2010-6-15 04:16:02
Tue Jun 15 04:16:02 2010
Errors in file /oracle/app/admin/baan/bdump/baan_lgwr_2900.trc:
ORA-00345: redo log write error block 733854 count 2
ORA-00312: online log 5 thread 1: '/oraarch/logBAAN2.ora'
ORA-27063: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
错误: 持续这样的错误发生了,推断那个时候在线的日志文件已经损坏了。
尝试: 于是试图进行恢复。
startup pfile=/export/home/oracle/init.ora mount
recover database until cancel;
alter database open resetlogs;
有错误发生:
*** 2010-06-16 22:02:02.863
ORA-00313: open failed for members of log group 6 of thread 1
ORA-00312: online log 6 thread 1: '/oraarch/logBAAN3.ora'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x1)
tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x1)
开始以为是archive进程的问题, 开始关闭该进程。
在mount模式底下, noarchivelog
到了非归档模式
系统还是不能正常启动。
错误: ORA-00600: internal error code, arguments: [4000]
比较揪人的错误。
得到了权威的解释如下:
ERROR:
ORA-600 [4000] [a]
VERSIONS:
version 6.0 to 9.2
DESCRIPTION:
This has the potential to be a very serious error.
It means that Oracle has tried to find an undo segment number in the
dictionary cache and failed.
ARGUMENTS:
Arg [a] Undo segment number
FUNCTIONALITY:
KERNEL TRANSACTION UNDO
IMPACT:
INSTANCE FAILURE - Instance will not restart
STATEMENT FAILURE
是事务出了问题,于是又开始尝试
*._allow_resetlogs_corruption=TRUE
*._corrupted_rollback_segments=(...)
从1增加到了40都不行
还是报错,心情非常低落。
分析: 查询了一些资料,发现是可能存在未递交的事务,于是奔着这条小路开始狂奔起来。
定位: 从trace文件中得到相关的线索
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [2], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
==是obj#表的问题,估计就是对象上面有事务没有及时得到处理
Object id on Block? Y
seg/obj: 0x12 csc: 0x02.1132f200 itc: 1 flg: - typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0002.012.0008fb24 0x008030b4.a76c.05 --U- 1 fsc 0x0000.1132f201
data_block_dump,data header at 0x509b94044
===============
tsiz: 0x1fb8
hsiz: 0xea
pbl: 0x509b94044
bdba: 0x0040007a ==>(122)
76543210
==> Flag
C=Committed U=Committed Upper Bound T=Active at CSC
B=Rollback of this UBA gives before image of ITL
Flag combinations include:
CB-- Tx is Committed ,rollback of this UBA gives prev ITL
------ Active Tx - look at RBS header to see if really active
--U- Upper Bound Commit
工具: bbed,开始修改数据块的标记,让事务表正常
过程: 操作过程
1。备份数据文件, DBA的重中之重
2。bbed FILENAME=a
3。找到flag,从20->80(128), 然后修改, 然后sum。
modify 128 block 123 offset 60
dump block 123 offset 60
sum block 123 apply
4。完成后校验一下。
5。尝试一下是否可以开启。
开启: 直接使用spfile启动了,init.ora 里面的参数全部抛弃了。
alter database open hang住
Errors in file /oracle/app/admin/baan/udump/baan_ora_12698.trc:
ORA-00600: internal error code, arguments: [2662], [2], [288551426], [2], [288569530], [8388617], [], []
2662是scn不一致导致的。
重新启动
alter session set events '10015 trace name adjust_scn level 4096';
alter database open;
系统直接启动了