今天和大家分享下一个最近遇到的ORA-600的报错,就是标题所提及的ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1]。本文共分为两个部分,第一部分为模拟报错,第二部分为故障解决。
一、问题的模拟与重现
测试环境:
数据库版本:19.3.0.0.0 no-cdb
session1-构造数据
create user test123 identified by oracle;
grant dba to test123;
SQL> conn test123/oracle
Connected.
SQL> create table t as select * from dba_objects;
Table created.
SQL> insert into t select * from t;
75040 rows created.
SQL> /
150080 rows created.
SQL> /
300160 rows created.
SQL> /
600320 rows created.
SQL> /
1200640 rows created.
session2-破坏环境
[ora11g@primary ~]$ cd /ora11g/app/ora11g/
[ora11g@primary ora11g]$ ps -ef|grep pmon
ora11g 2177 1 0 10:45 ? 00:00:00 ora_pmon_ora11g
ora11g 4545 2776 0 11:21 pts/1 00:00:00 grep --color=auto pmon
[ora11g@primary ora11g]$ kill -9 2177
[ora11g@primary ora11g]$ ls -rtl redo0*
-rw-r----- 1 ora11g oinstall 52429312 Feb 8 10:48 redo01.log
-rw-r----- 1 ora11g oinstall 52429312 Feb 8 10:48 redo02.log
-rw-r----- 1 ora11g oinstall 52429312 Feb 8 11:21 redo03.log
[ora11g@primary ora11g]$
[ora11g@primary ora11g]$ rm -rf redo0*
[ora11g@primary ora11g]$ ls -rtl redo0*
ls: cannot access redo0*: No such file or directory
session3-复现问题
SQL> startup
ORACLE instance started.
Total System Global Area 1946154480 bytes
Fixed Size 8898032 bytes
Variable Size 1543503872 bytes
Database Buffers 385875968 bytes
Redo Buffers 7876608 bytes
Database mounted.
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/ora11g/app/ora11g/redo03.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
SQL> recover database until cancel;
ORA-00279: change 2437332 generated at 02/08/2022 10:48:38 needed for thread 1
ORA-00289: suggestion :
/ora11g/soft/ORA11G/archivelog/2022_02_08/o1_mf_1_9_%u_.arc
ORA-00280: change 2437332 for thread 1 is in sequence #9
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/ora11g/app/ora11g/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],
[], [], [], [], [], [], []
Process ID: 4771
Session ID: 3121 Serial number: 2817
二、故障的处理
注意集群需要把其余节点关闭先
1、开启数据库时,提示ORA-01113,ORA-01110,然后recover这个数据文件报ORA-00742,ORA-00312
2、设置隐含参数
alter system set "_allow_resetlogs_corruption"=true scope=spfile;
alter system set "_allow_error_simulation"=true scope=spfile;
然后resetlogs打开
3、报错
SQL> alter database open RESETLOGS;
alter database open RESETLOGS
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],
[], [], [], [], [], [], []
Process ID: 60930
Session ID: 244 Serial number: 49624
4、设置隐含参数推进SCN
alter system set event="21307096 trace name context forever, level 3" scope=spfile;
若还没到这个报错,多重启几次推进下scn
重启打开数据库(4193,4195也适用,都是回滚段有问题)
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [4194], [57], [35], [], [], [], [],
[], [], [], [], []
Process ID: 5313
Session ID: 3121 Serial number: 41735
该错误表示检测到 redo 和 undo 回滚段记录的信息不一致
Arg [a] - Undo 块中的最大 Undo 记录数
Arg [b] - Redo 块中的 Undo 记录数
根据mos文档:
Step by step to resolve ORA-600 4194 4193 4197 on database crash (Doc ID 1428786.1)
处理方法:
create pfile='/tmp/initsid.ora' from spfile;
vi /tmp/initsid.ora
add
*.undo_management = manual
*.event='21307096 trace name context forever, level 3','10513 trace name context forever, level 2'(设置 event 10531,禁止smon进程进行回滚)
startup restrict pfile='/tmp/initsid.ora'(此时数据库状态为读写)
select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';
TABLESPACE_NAME STATUS SEGMENT_NAME
------------------------------ ---------------- ------------------------------
SYSTEM ONLINE SYSTEM
UNDOTBS1 PARTLY AVAILABLE _SYSSMU4_1254879796$
shutdown immediate;
修改参数文件,添加隐含参数,将上述查出来的回滚段,添加至隐含参数中(一个online的undo段都没有的话就不用添加该参数)
*._corrupted_rollback_segments=(_SYSSMU4_1254879796$)
startup restrict pfile='/tmp/initsid.ora'
重建undo表空间
单机:
create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
eg:create undo tablespace undo1 datafile '/ora11g/app/ora11g/undo1.dbf' size 200M;
drop tablespace <old undo tablespace> including contents and datafiles;
eg:drop tablespace UNDOTBS1 including contents and datafiles;
集群:
create undo tablespace undo3 datafile '+DATA/.../undo3.dbf' size 1G;
create undo tablespace undo4 datafile '+DATA/.../undo4.dbf' size 1G;
drop tablespace UNDOTBS1 including contents and datafiles;
drop tablespace UNDOTBS2 including contents and datafiles;
shutdown immediate;
startup nomount;
此处使用原始的参数文件
单机:
alter system set undo_tablespace = '' scope=spfile;
eg: alter system set undo_tablespace=undo1 scope=spfile;
集群:
alter system set undo_tablespace=undo3 scope=spfile sid='1节点';
alter system set undo_tablespace=undo4 scope=spfile sid='2节点';
shutdown immediate;
startup;
因为此类修复为非正常打开的数据库,因此打开后赶紧做个备份。