Oracle 启动故障案例之--ORA-600 [4193]错误

Oracle 启动故障案例之--ORA-600 [4193]错误
操作系统:Oracle Linux 5
数据库:    Oracle 11gR2(11.2.0.3.0)

一、故障现象:
1、在做了redo log当前日志组被破坏恢复的测试后
2、启动数据库后出现ORA-600 【4193】的错误
3、数据库被强制关闭
查看告警日志:
[oracle@ocm1 ~]$ tail  -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log  
Block recovery completed at rba 5.111.16, scn 0.1430641
Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_pmon_10635.trc  (incident=36027):
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/incident/incdir_36027/enmoedu_pmon_10635_i36027.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Dec 13 12:54:04 2016
Dumping diagnostic data in directory=[cdmp_20161213125404], requested by (instance=1, osid=10635 (PMON)), summary=[incident=36027].
Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_pmon_10635.trc:
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
PMON (ospid: 10635): terminating the instance due to error 472
System state dump requested by (instance=1, osid=10635 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_diag_10653.trc
Dumping diagnostic data in directory=[cdmp_20161213125405], requested by (instance=1, osid=10635 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 10635

查看trace文件:

[oracle@ocm1 ~]$more  /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/incident/incdir_38544/enmoedu_mmon_3181_i38544.trc
......
----- START DDE Action: 'dumpKernelDiagState' (Sync) -----
------- Kernel Diag Dump -------
dbkcBSExt: 0
dbkedDefDump info:
  Internal err count: 4
  Error Flags: 0x0
  Exception: FALSE
Bootstrapping info:
  Flags: 0x17
  Options: 0x806
  Diag Dest: /u01/app/oracle
  DB Unique name: enmoedu
  Instance Name: enmoedu
------- END Kernel Diag Dump -------
----- END DDE Action: 'dumpKernelDiagState' (SUCCESS, 0 csec) -----
----- START DDE Action: 'xdb_dump_buckets' (Sync) -----
----- END DDE Action: 'xdb_dump_buckets' (FAILURE, 0 csec) -----
----- START DDE Action: 'dumpKGERing' (Sync) -----
----- END DDE Action: 'dumpKGERing' (SUCCESS, 0 csec) -----
----- START DDE Action: 'dumpKGEState' (Sync) -----
kgepgtfr      0x7fffb09068d0
kgepgtba      0x7fffb09107a8
kgepgter      5
kgepgpar      kgepgbpa      0xbaf48c5
kgepgepa      0xbaf5064
kgepgtfd      21
kgepgdmc      0
kgepgflg      0x8
kgepg_stkgfr  (nil)
kgepgkgsmp    0xbaf3fa0
kgepgspm      4
kgepg_ba_set_in_eh          0x7fffb09114b0
kgepg_kgecatch_set_in_eh_ba (nil)
kge_ba_set_in_eh_funcloc    0x9b975bc
kge_ba_set_in_eh_fileloc    0x9b97890
------------------- start error stack dump with barriers
    at 0x7fffb09107a8
   ORA-00603: ORACLE server session terminated by fatal error
   ORA-24557: error 600 encountered while handling error 600; exiting server process
   ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
    at 0x7fffb09114b0
   ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
   ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
    at 0x7fffb0915ed8
------------------- end   error stack dump with barriers
----- END DDE Action: 'dumpKGEState' (SUCCESS, 0 csec) -----
----- START DDE Action: 'kpuActionDefault' (Sync) -----
Begin OCI Current State Dump
End OCI Current State Dump
Begin OCI Call Context Dump
End OCI Call Context Dump
Begin Process state dump.
ttcdrvdmplocation: msg-0 ln-0 reporting 0
HST is NULL or no two task connection
End Process state dump.
----- END DDE Action: 'kpuActionDefault' (SUCCESS, 0 csec) -----
----- END DDE Actions Dump (total 1 csec) -----
End of Incident Dump

根据MOS介绍,此故障一般和undo segment有关
二、解决方法:

1、通过spfile生成pfile

01:55:31 SYS@ enmoedu>create pfile from spfile;
File created.

2、编辑pfile文件
[oracle@ocm1 dbs]$ vi initenmoedu.ora 
#*.undo_tablespace='UNDOTBS1'
undo_management = 'MANUAL'
rollback_segments = 'SYSTEM'


3、通过pfile启动Instance
01:58:48 SYS@ enmoedu>startup mount pfile='$ORACLE_HOME/dbs/initenmoedu.ora';
ORACLE instance started.
Total System Global Area  521936896 bytes
Fixed Size                  2229944 bytes
Variable Size             360712520 bytes
Database Buffers          155189248 bytes
Redo Buffers                3805184 bytes
Database mounted.
Elapsed: 00:00:00.00

02:00:07 SYS@ enmoedu>show parameter undo

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      MANUAL
undo_retention                       integer     900
undo_tablespace                      string

4、打开数据库
02:00:16 SYS@ enmoedu>alter database open;

Database altered.
此时打开数据库正常:
[oracle@ocm1 ~]$ tail  -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log 
alter database open
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 43 KB redo, 36 data blocks need recovery
Started redo application at
 Thread 1: logseq 7, block 3
Recovery of Online Redo Log: Thread 1 Group 1 Seq 7 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/enmoedu/redo01.log
Completed redo application of 0.03MB
Completed crash recovery at
 Thread 1: logseq 7, block 90, scn 1491526
 36 data blocks read, 36 data blocks written, 43 redo k-bytes read
Wed Dec 14 02:00:26 2016
Thread 1 advanced to log sequence 8 (thread open)
Thread 1 opened at log sequence 8
  Current log# 2 seq# 8 mem# 0: /u01/app/oracle/oradata/enmoedu/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Dec 14 02:00:26 2016
SMON: enabling cache recovery
Undo initialization finished serial:0 start:479574 end:479584 diff:10 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Dec 14 02:00:27 2016
QMNC started with pid=20, OS id=3522 
Completed: alter database open
Wed Dec 14 02:00:28 2016
Starting background process CJQ0
Wed Dec 14 02:00:28 2016
CJQ0 started with pid=26, OS id=3554 
Wed Dec 14 02:00:30 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed Dec 14 02:00:51 2016
Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_j001_3562.trc:
ORA-01552: cannot use system rollback segment for non-system tablespace 'TEMP'

5、删除原有的undo tablespace创建新的undo tablespace
02:00:27 SYS@ enmoedu>drop tablespace undotbs1 including contents and datafiles;
Tablespace dropped.

02:03:32 SYS@ enmoedu>create undo tablespace undotbs1
02:03:39   2  datafile '/u01/app/oracle/oradata/enmoedu/undotbs01.dbf' size 100m
02:03:50   3  autoextend on;

Tablespace created.

6、关闭数据库,重新通过spfle启动
02:04:02 SYS@ enmoedu>shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

02:05:45 SYS@ enmoedu>startup

ORACLE instance started.
Total System Global Area  521936896 bytes
Fixed Size                  2229944 bytes
Variable Size             360712520 bytes
Database Buffers          155189248 bytes
Redo Buffers                3805184 bytes
Database mounted.
Database opened.

查看告警日志,数据库启动正常,问题解决!

[oracle@ocm1 ~]$ tail  -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log 
ALTER DATABASE OPEN
Thread 1 opened at log sequence 8
  Current log# 2 seq# 8 mem# 0: /u01/app/oracle/oradata/enmoedu/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
[3870] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:808234 end:808274 diff:40 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Dec 14 02:05:55 2016
QMNC started with pid=20, OS id=3874 
Completed: ALTER DATABASE OPEN
Wed Dec 14 02:05:56 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Starting background process CJQ0
Wed Dec 14 02:05:56 2016
CJQ0 started with pid=22, OS id=3902 

附:(转MOS文档)

ORA-600[4193] 这个错误也是与UNDO 有关系,MOS 上有几篇相关的说明文章.

一.MOS说明

1.1 ORA-600 [4193] WhenTrying To Open The Database [ID 763566.1]

Symptoms

Copying database from one server to another server and getting an ORA-600 [4193] error when trying to open the database on the destination server.

--copy 数据库从一个server 到另一个server 后,尝试打开时报这个错误。

Cause

The online redo logs were copied when the source database was open, online redo logs should never be copied when the database is open.

--导致原因是因为在数据库open时把online redo logs 也一起copy 过去了。 在数据库open状态,online redo log 不应该copy。

Solution

In this instance the datafiles were being copied properly after the tablespaces were put in to backup mode, however, online redo logs should only be copied if the source database is shutdown first before copying the online redo logs.  The source database needed to remain open so, the datafiles were copied again (withthe tablespaces in backup mode) and then a number of archive logs were transferred over to the new server and after the last archivelog was applied the database could be opened with resetlogs and new online redo logs were created on the destination server.

--当表空间被设置为backup 模式之后,可以copy 数据文件,但是onlineredo log 只能是在数据库shutdown 之后才能copy,如果数据库一直是open 状态,那么只能把datafile copy 过去,然后把归档文件传送过去,最后用openresetlogs的方式打开数据库,在open时online redo log 会自动重建。

1.2 Ora-600 [4193] WhenOpening Or Shutting Down A Database [ID 452662.1]

1.2.1 Symptoms

Errors in alert.log:

Tue Jul 17 13:38:13 2007 
Errors in file /home/Oracle/oracle/product/10.2.0/yms/rdbms/log/yms_smon_8337.trc: 
ORA-00607: Internal error occurred while making a change to a data block 
ORA-00600: internal error code, arguments: [4193], [3552], [3554], [], [], [] 

yms_smon_8337.trc:

SO: 0xdfaec728, type: 24, owner: 0xdf266580, flag: INIT/-/-/0x00 

          (buffer) PR: 0xdf1f1338 FLG: 0x1000 
          class bit: 0x80000 
          kcbbfbp: [BH: 0xded4bf40, LINK: 0xdfaec768] 
          kcbbfbx[0]: [BH: 0xdece41d8, LINK: 0xdfaec788] 
          where: ktuwh01: ktugus, why: 0 
        buffer tsn: 2 rdba: 0x00c00002 (3/2) 
        scn: 0x0000.03c95628 seq: 0x01 flg: 0x00 tail: 0x56280e01 
        frmt: 0x02 chkval: 0x0000 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS 
      BH (0xdece41d8) file#: 3 rdba: 0x00c003b6 (3/950) class: 20 ba: 0x11d6ba000 
        set: 6 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0 
        dbwrid: 0 obj: -1 objn: 0 tsn: 2 afn: 3 
        hash: [df870f70,df870f70] lru: [dece4488,dece4028] 
        obj-flags: object_ckpt_list 
        ckptq: [dedac4a0,ded47cb8] fileq: [dedac500,ded47cc8] objq: [ded47d78,db7bfd78] 
        use: [dfaec788,dfaec788] wait: [NULL] 
        st: XCURRENT md: EXCL tch: 0 
        flags: mod_started gotten_in_current_mode block_written_once 
        change state: ACTIVE 
        change count: 1 
        LRBA: [0xac3.4de07.0] HSCN: [0xffff.ffffffff] HSUB: [65535] 
        Using State Objects 
          ---------------------------------------- 
          SO: 0xdfaec728, type: 24, owner: 0xdf266580, flag: INIT/-/-/0x00 
          (buffer) PR: 0xdf1f1338 FLG: 0x1000 
          class bit: 0x80000 
          kcbbfbp: [BH: 0xded4bf40, LINK: 0xdfaec768] 
          kcbbfbx[0]: [BH: 0xdece41d8, LINK: 0xdfaec788] 
          where: ktuwh01: ktugus, why: 0 
        buffer tsn: 2 rdba: 0x00c003b6 (3/950) 
        scn: 0x0000.03be3c7d seq: 0x5a flg: 0x04 tail: 0x3c7d025a 
        frmt: 0x02 chkval: 0x0868 type: 0x02=KTU UNDO BLOCK 
      ---------------------------------------- 
Error 607 in redo application callback 
TYP:0 CLS:20 AFN:3 DBA:0x00c003b6 OBJ:4294967295 SCN:0x0000.03be3c7d SEQ: 90 OP:5.1 
ktudb redo: siz: 132 spc: 4462 flg: 0x0012 seq: 0x0de2 rec: 0x09  

UNDO BLK: 
xid: 0x0002.045.00006c61 seq:0xde0 cnt: 0x60 irb: 0x60 icl: 0x0 flg: 0x0000

1.2.2 Cause

When we try toapply redo to an undo block (forward changes are made by  the applicationof redo to a block) we check that the seq# in the undo  record matches the seq# in the redo record.

--数据库在启动时需要进行一个前滚的操作,在前滚时会应用redo 到undo block上,操作时会检查undorecord里的seq#和 redo record里的seq#.

These seq# should be the  same because when we apply a redo record we must apply itto the correct version of the block.

--正常情况下,这2者的seq# 应该是一致的。

We can only apply a redo record to a  block that contains the same seq# as in the redo record. 

--在一致的情况下,我们才应用redo record 到undo record。

If the seq# do not match then ORA-600[4193][a].[b] is raised. .

Arg [a] Undorecord seq number --> seq: 0xde0 = 3552
Arg [b] Redo record seq number --> seq: 0x0de2   = 3554

 --如果不一致就会出现ORA-600[4193][a][b]的错误。其中a 是undo 里的seq#记录,b是redo 里的seq# 值。 这里的值都是十六进程,我们可以通过to_number() 这个函数来转换一下:

SYS@anqing1(rac1)>  Select to_number('de0','xxxx') from dual;

TO_NUMBER('DE0','XXXX')

-----------------------

                   3552

This implies some kind of block corruptionin either the redo or the undo block. 

--当redo record 和 undo record 不一致时,就会抛出ORA-600[4193]的错误。

相关的文章参考:

Oracledatafile block 格式 说明 http://www.linuxidc.com/Linux/2012-08/66994.htm
Oracle 实例恢复时 前滚(roll forward) 后滚(roll back) 问题 http://www.linuxidc.com/Linux/2011-03/33907.htm

1.2.3 Solution

1.2.3.1 If Database is opened:

--在db open 状态下,解决的方法如下:

1) Find out the rollback segment, based onthe first part of the xid: 0x0002.045.00006c61

  usn=2 is the segment_id

    select segment_name,status from dba_rollback_segs where segment_id=2;

    RS_DATA1   ONLINE

2) Dump the transaction table of the rollbacksegment to see if all TX are commited:

    alter system dump undoheader RS_DATA1;

Oracle dumpundo 说明 http://www.linuxidc.com/Linux/2012-08/66995.htm

3) check the trace file created underuser_dump_dest

     In the trace file search for the Keyword "TRN TBL" 

TRN TBL::  
index state cflags wrap#   uel   scn            dba 
----------------------------------------------------------------------------- 
0x00   9     0x00 0x21eb1 0x0023 0x0000.d28c43e9 0x00000000 ......

state=9 means transaction is committed

4) offline the rollback segment:

     alter rollback segment rs_data1 offline; 
      select status from dba_rollback_segs where segment_id=2; 
5)   if STATUS=OFFLINE

      drop rollback segment RS_DATA1;

1.2.3.2 If Database doesn't open:

--如果数据库不是open状态,处理方法如下:

1.   a) If using rollback segments, remove the rollback_segments line from init.ora, and open database

      b) If using undo segments set undo_management = manual in init.ora/spfile, and try to opendatabase.

2. If database opens means all transactions are committed, and you can drop the rollback segment or the undo tablespace

1.3 bug 导致的ORA-600[4193]

MOS:

ORA-600 [4193] "seq# mismatch while adding undo record" [ID 39282.1]

Bug 8240762 - Undo corruptions with ORA-600[4193]/ORA-600 [4194] or ORA-600 [4137] [ID 8240762.8]

Undo corruptionmay be caused after a shrink and the same undo block may be used for two different transactions causing several internal errors like:

  ORA-600 [4193] / ORA-600 [4194] for new transactions

 ORA-600 [4137] for a transaction rollback

Undo segment shrink is internally done by Oracle.

--undo shrink 导致的undo corruptions

Workaround

      Drop the undo segment.

Affects:

Product (Component)

Oracle Server (Rdbms)

Range of versions believed to be affected

Versions >= 10.2 but BELOW 11.2

Versions confirmed as being affected

  • 10.2.0.4
  • 10.2.0.3

Platforms affected

Generic (all / most platforms affected)

Fixed:

This issue is fixed in

  • 11.2.0.1 (Base Release)
  • 11.1.0.7.10 Patch Set Update
  • 10.2.0.5 (Server Patch Set)
  • 11.1.0.7 Patch 42 on Windows Platforms 
  • 10.2.0.4 Patch 40 on Windows Platforms

在Oracle 10.2 以上到11.2 的DB 会受Bug 8240762的影响导致undo 的corruption。在10.2.0.5 中已经修复了这个bug。如果出现这种问题,drop 对应的undo segment 即可。


你可能感兴趣的:(Oracle 启动故障案例之--ORA-600 [4193]错误)