解决一例 ORA-00600 [4000], [2426] + ORA-00600 [4000], [2411]

朋友那一开发用的oracle系统,linux平台,版本10.2.0.1.0。

一次在做rman备份时,提示ORA-00600错误。退出后,只要一查询DBA_JOBS,数据库就提示ORA-00600错误,然后数据库就自动关闭了。

 

一、故障现象

将alert.log拿来看了看,如下:

Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Mon Oct 15 14:55:00 2012
ALTER DATABASE OPEN
Mon Oct 15 14:55:00 2012
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=16, OS id=4306
Mon Oct 15 14:55:00 2012
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=4308
Mon Oct 15 14:55:00 2012
Thread 1 opened at log sequence 29
  Current log# 1 seq# 29 mem# 0: /opt/ora10g/product/10.2.0/oradata/tftdb/REDO01.LOG
Successful open of redo thread 1
Mon Oct 15 14:55:00 2012
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Oct 15 14:55:00 2012
ARC0: STARTING ARCH PROCESSES
Mon Oct 15 14:55:00 2012
SMON: enabling cache recovery
Mon Oct 15 14:55:00 2012
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Mon Oct 15 14:55:00 2012
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARC2 started with pid=18, OS id=4310
Mon Oct 15 14:55:00 2012
Successfully onlined Undo Tablespace 1.
Mon Oct 15 14:55:00 2012
SMON: enabling tx recovery
Mon Oct 15 14:55:00 2012
Database Characterset is ZHS16GBK
Mon Oct 15 14:55:00 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_smon_4289.trc:
ORA-00600: internal error code, arguments: [4000], [2426], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Mon Oct 15 14:55:00 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/udump/tftdb_ora_4304.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:01 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon Oct 15 14:55:01 2012
Starting background process QMNC
QMNC started with pid=19, OS id=4312
Mon Oct 15 14:55:01 2012
db_recovery_file_dest_size of 2048 MB is 48.96% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Oct 15 14:55:01 2012
Completed: ALTER DATABASE OPEN
Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:02 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:06 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:06 2012

可以看到,在数据库启动时,SMON进程就报错了:

 

二、故障分析

主要故障有两个:ORA-00600 [4000], [2426], [], [], [], [], [], [] 和ORA-00600 [4000], [2411], [], [], [], [], [], []

1)ORA-00600 [4000], [2426], [], [], [], [], [], []

打开tftdb_smon_4289.trc文件,可看到如下信息:

/opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_smon_4289.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/ora10g/product/10.2.0/db_1
System name:	Linux
Node name:	node1
Release:	2.6.18-194.el5
Version:	#1 SMP Tue Mar 16 21:52:43 EDT 2010
Machine:	i686
Instance name: tftdb
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 4289, image: oracle@node1 (SMON)

*** SERVICE NAME:() 2012-10-15 14:55:00.738
*** SESSION ID:(164.1) 2012-10-15 14:55:00.738
*** 2012-10-15 14:55:00.738
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [2426], [], [], [], [], [], []
Current SQL statement for this session:
select smontabv.cnt, smontab.time_mp,    smontab.scn, smontab.num_mappings, smontab.tim_scn_map, smontab.orig_thread    
from smon_scn_time smontab, 
  (select max(scn) scnmax,count(*)+sum(NVL2(TIM_SCN_MAP,NUM_MAPPINGS,0)) cnt                 
     from smon_scn_time where thread=0) smontabv   
where smontab.scn = smontabv.scnmax and thread=0
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+27          call     ksedst1()            0 ? 1 ?
ksedmp()+557         call     ksedst()             0 ? D ? CBD2D20 ? 2A ?
                                                   CBD2D20 ? 2A ?

看样子是smon_scn_time表出了问题。这个表smon进程约每6秒更新一次,写入scn与time的map信息。创建语句如下:

create cluster smon_scn_to_time (
  thread number                         /* thread, compatibility */
)
/
create index smon_scn_to_time_idx on cluster smon_scn_to_time
/
create table smon_scn_time (
  thread number,                         /* thread, compatibility */
  time_mp number,                        /* time this recent scn represents */
  time_dp date,                          /* time as date, compatibility */
  scn_wrp number,                        /* scn.wrp, compatibility */
  scn_bas number,                        /* scn.bas, compatibility */
  num_mappings number,
  tim_scn_map raw(1200),
  scn number default 0,                  /* scn */
  orig_thread number default 0           /* for downgrade */
) cluster smon_scn_to_time (thread)
/

create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp)
/

create unique index smon_scn_time_scn_idx on smon_scn_time(scn)
/


执行查询:

select  count(*) from sys.smon_scn_time;

ERROR at line 1:
ORA-00600: internal error code, arguments: [4000], [2521], [], [], [], [], [],
[]

但是一下语句却可以正确执行:

select * from smon_scn_time where rownum<1000;


执行到rownum<2000时,也出现了ORA-00600错误。数据库down掉。

考虑是否数据库坏块,执行数据坏块检查:

[oracle@node1 tftdb]$ dbv file=SYSTEM01.DBF blocksize=8192 

DBVERIFY: Release 10.2.0.1.0 - Production on Mon Oct 15 15:46:44 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = SYSTEM01.DBF


DBVERIFY - Verification complete

Total Pages Examined         : 88320
Total Pages Processed (Data) : 50252
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 19344
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1785
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 16939
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 4458516 (0.4458516)
[oracle@node1 tftdb]$


没有发现坏块。

 

2)ORA-00600 [4000], [2411], [], [], [], [], [], []

执行select * from dba_jobs,也会提示ORA-00600错误,然后数据库down掉。

后台日志如下:

Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []


 

二、解决方案

既然select * from smon_scn_time where rownum<1000可以执行,<2000就报错。那么最有可能,就是smon_scn_time表中数据有问题。

可尝试备份全库后,清空该表。

sql> conn / as sysdba

sql> alter system set events '12500 trace name context forever, level 10';
--禁用SMON记录SCN与TIME的MAP。

sql> delete from sys.smon_scn_time;
delete from sys.smon_scn_time
                *
ERROR at line 1:
ORA-00600: internal error code, arguments: [4000], [2521], [], [], [], [], [],
[]
sql> truncate cluster sys.smon_scn_to_time;

sql>alter system set events '12500 trace name context off';

sql>shutdown immediate;

sql>startup


重启数据库后,ORA-00600 [4000], [2426], [], [], [], [], [], []不再出现,但ORA-00600 [4000], [2411], [], [], [], [], [], []依然报错。如下:

ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=18, OS id=5087
Mon Oct 15 16:17:41 2012
Successfully onlined Undo Tablespace 1.
Mon Oct 15 16:17:41 2012
SMON: enabling tx recovery
Mon Oct 15 16:17:41 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Mon Oct 15 16:17:41 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/udump/tftdb_ora_5081.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Starting background process QMNC
QMNC started with pid=19, OS id=5089
Mon Oct 15 16:17:42 2012
db_recovery_file_dest_size of 2048 MB is 49.06% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Oct 15 16:17:42 2012
Completed: ALTER DATABASE OPEN
Mon Oct 15 16:17:42 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_5070.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 16:17:42 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_5070.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 16:17:42 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_5070.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 16:17:43 2012

跟踪文件信息如下:

/opt/ora10g/product/10.2.0/admin/tftdb/udump/tftdb_ora_5081.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/ora10g/product/10.2.0/db_1
System name:	Linux
Node name:	node1
Release:	2.6.18-194.el5
Version:	#1 SMP Tue Mar 16 21:52:43 EDT 2010
Machine:	i686
Instance name: tftdb
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 5081, image: oracle@node1 (TNS V1-V3)

*** SERVICE NAME:(SYS$USERS) 2012-10-15 16:17:41.308
*** SESSION ID:(159.3) 2012-10-15 16:17:41.308
tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x1)
tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x1)
*** 2012-10-15 16:17:41.590
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Current SQL statement for this session:
select count(*) from dba_jobs where what = 'sys.dbms_aqadm_sys.register_driver();' and instance = :1
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+27          call     ksedst1()            0 ? 1 ?
ksedmp()+557         call     ksedst()             0 ? 13 ? CBD2D20 ? 2A ?
                                                   CBD2D20 ? 2A ?
ksfdmp()+19          call     ksedmp()             3 ? BFB0C3CC ? AC152A0 ?



查询dba_jobs仍然会引起数据库down掉。

 

看来DBA_JOBS表也有点问题,删除该表数据,然后重建作业。

数据库恢复正常。不在出现上述ORA-00600错误。

你可能感兴趣的:(thread,oracle,数据库,File,database,jobs)