ORA-00600[kcratr_nab_less_than_odr]小记

环境:
DB:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 8)

这是一个生产库的克隆环境,作为平时测试使用。我登陆进去的时候,发现数据库处于nomount状态。
oracle@wimngNB_test:~ $ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on Tue Feb 7 13:31:29 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected.
SQL> select status from v$instance;

STATUS
------------
STARTED

SQL> SELECT OPEN_MODE FROM V$DATABASE;
SELECT OPEN_MODE FROM V$DATABASE
                      *
ERROR at line 1:
ORA-01507: database not mounted


SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00214: control file '/orasys/flash_recovery_area/wimng2/control02.ctl'
version 140340 inconsistent with file '/data/oradata/wimng2/control01.ctl'
version 140332


SQL> alter system set control_files='/orasys/flash_recovery_area/wimng2/control02.ctl' scope=spfile;

System altered.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 2042241024 bytes
Fixed Size                  1337548 bytes
Variable Size            1509951284 bytes
Database Buffers          520093696 bytes
Redo Buffers               10858496 bytes
Database mounted.

数据库mount上了,但是open的时候报ORA-00600错误:
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[1468], [57304], [57605], [], [], [], [], [], [], []


SQL> select status from v$instance;

STATUS
------------
MOUNTED

 

-----------------------------------------------------------------------------------------------------------------------------------


[kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []的argument 定义

(a) redo thread id
(b) redo log sequence
(c) NAB
(d) on-disk rda  block number


 

查看Trace文件:

2012-02-07 13:40:53.366569 :80000687:KFNU:kfn.c@2200:kfnPrepareASM(): kfnPrepareASM force=0 state_kfnsg=0x7
2012-02-07 13:40:53.366569*:80000688:CACHE_RCV:kcv.c@16365:kcvcrv(): kcvcrv: Calling kctrec()
2012-02-07 13:40:53.366569*:80000689:CACHE_RCV:kct.c@4163:kctrec(): kctrec: Entering kctrec()
2012-02-07 13:40:53.413557*:8000068A:CACHE_RCV:kct.c@4271:kctrec(): kctrec: thread 1 cf thread ckpt: logseq 1468, block 2,scn 25917106
2012-02-07 13:40:53.413557*:8000068B:CACHE_RCV:kct.c@4285:kctrec(): kctrec: Checkpoint progress record contents
2012-02-07 13:40:53.413557*:8000068C:CACHE_RCV:kct.c@4287:kctrec(): kctrec:    kcccpsta 2, kcccpflg 0, kcccpdrt 48, kcccplrba 0x0005bc.0000dfd8.0000 kcccpodr 0x0005bc.0000e105.0000
2012-02-07 13:40:53.413557*:8000068D:CACHE_RCV:kct.c@4299:kctrec(): kctrec:    kcccpods 0x0000.018be694, kcccpodt 773934914, kcccprlc 753362405, kcccprls 0x0000.00000001, kcccphbt 774572255, kcccpmid 1635578584
2012-02-07 13:40:53.413557*:8000068E:CACHE_RCV:kct.c@4311:kctrec(): kctrec:    kcccpsdr 0x0005bc.00000001.0000, kcccpfbend (krfbafln 0, krfbathr 0, krfbaseq 0, krfbabno 0 krfbabof 0), kcccprsv 0
2012-02-07 13:40:53.413557*:8000068F:CACHE_RCV:kct.c@4360:kctrec(): kctrec: cache-low rba: logseq 1468, block 57304
2012-02-07 13:40:53.413557*:80000690:CACHE_RCV:kct.c@4374:kctrec(): kctrec: on-disk rba: logseq 1468, block 57605, scn 25945748
2012-02-07 13:40:53.413557*:80000691:CACHE_RCV:kct.c@4450:kctrec(): kctrec:
Current ckpt RBA < cache-low RBA, adjusted ckpt RBA to cache low RBA, zeroed ckpt SCN and timestamp to 0
2012-02-07 13:40:53.413557*:80000692:CACHE_RCV:kct.c@4604:kctrec(): kctrec:
Recovery starting point for thread 1 -  logseq 1468, block 57304, scn 0
2012-02-07 13:40:53.449498*:80000693:CACHE_RCV:kct.c@4664:kctrec(): kctrec: Do thread recovery, calling kcratr()
2012-02-07 13:40:53.456376 :80000694:CACHE_RCV:kcra.c@1517:kcratr(): kcratr: Entering kcratr()
2012-02-07 13:40:53.458293 :80000695:CACHE_RCV:kcra.c@1541:kcratr(): kcratr: Started redo scan
2012-02-07 13:40:53.458293*:80000696:CACHE_RCV:kcra.c@1862:kcratr_scan(): kcratr_scan: Entering kcratr_scan()
2012-02-07 13:40:53.458293*:80000697:CACHE_RCV:kcra.c@2000:kcratr_scan(): kcratr_scan: Log not open, opening online log for thread 1, RBA 0x0005bc.0000dfd8.0000, SCN 0x0000.00000000
2012-02-07 13:40:53.694427*:800006A4:CACHE_RCV:kcra.c@2036:kcratr_scan(): kcratr_scan: End of curr thread reached
2012-02-07 13:40:53.694427*:800006A5:CACHE_RCV:kcra.c@2038:kcratr_scan(): kcratr_scan:    end rcv RBA 0x0005bc.0000dfd8.   0, end rcv SCN 0x0000.018b76b3 end SCN timestamp 773895659, NAB 57304
2012-02-07 13:40:53.694427*:800006A6:CACHE_RCV:kcra.c@2048:kcratr_scan(): kcratr_scan:   (Previous) highest SCN seen in the redo stream 0x0000.00000000
2012-02-07 13:40:53.694427*:800006A7:CACHE_RCV:kcra.c@2162:kcratr_scan(): kcratr_scan: Exiting kcratr_scan()
2012-02-07 13:40:53.702245 :800006A8:CACHE_RCV:kcra.c@1559:kcratr(): kcratr: Completed redo scan, read 0 KB redo, 0 data blocks need recovery

 

 

通过trace中的内容可以知道,数据库需要恢复到rba到57605,但是因为某种原因实例恢复的时候,
只能利用1 thread 1468 seq#,恢复rba到57304。从而导致数据库无法正常open

 

 

解决方法:

 

尝试不完全恢复:
SQL>RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE;
还是报错,具体代码忘了记下来,


于是打算重建控制文件:

SQL>ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS '/data/backup/1';

 

SQL>SHUTDOWN IMMEDIATE;

 

SQL>STARTUP NOMOUNT
ORACLE instance started.

Total System Global Area 2042241024 bytes
Fixed Size                  1337548 bytes
Variable Size            1509951284 bytes
Database Buffers          520093696 bytes
Redo Buffers               10858496 bytes

SQL>CREATE CONTROLFILE REUSE DATABASE "WIMNG2" NORESETLOGS ARCHIVELOG
  2       MAXLOGFILES     16
  3       MAXLOGMEMBERS   3
  4       MAXDATAFILES    100
  5       MAXINSTANCES    8
  6       MAXLOGHISTORY   292
  7  LOGFILE
  8    GROUP 1 '/data/oradata/wimng2/redo01.log'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 '/data/oradata/wimng2/redo02.log'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 '/data/oradata/wimng2/redo03.log'  SIZE 50M BLOCKSIZE 512
 11  -- STANDBY LOGFILE
 12  DATAFILE
 13    '/data/oradata/wimng2/system01.dbf',
 14    '/data/oradata/wimng2/sysaux01.dbf',
 15    '/data/oradata/wimng2/undotbs01.dbf',
 16    '/data/oradata/wimng2/users01.dbf',
 17    '/data/oradata/wimng2/data01.dbf',
 18    '/data/oradata/wimng2/data02.dbf',
 19    '/data/oradata/wimng2/data03.dbf',
 20    '/data/oradata/wimng2/data04.dbf',
 21    '/data/oradata/wimng2/data05.dbf',
 22    '/data/oradata/wimng2/data06.dbf'
 23* CHARACTER SET WE8ISO8859P15
 
SQL> RECOVER DATABASE;
Media recovery complete.


SQL> ALTER DATABASE OPEN;

Database altered.

 

 

以下是 Liu Maclean对trace文件的分析,及提供的相应信息:

 

ODM finding

 

ORA-600 [kcratr_nab_less_than_odr] during Instance Recovery after Database Crash [ID 1299564.1]

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms
Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with :
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [219], [25020], [25021], []

The Database can't open at this Point. In the corresponding Tracefile we can find this Error Callstack:


dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) -----
ALTER DATABASE OPEN

----- Call Stack Trace -----
ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- dbgexPhaseII <- dbgexProcessError <- dbgePostErrorKGE  <- kgeasnmierr <- kcratr_odr_check  <- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv <- opiexe <- opiosq0 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- start


Cause
This Problem is caused by Storage Problem of the Database Files. The Subsystem (eg. SAN) crashed while the Database was open. The Database then crashed since the Database Files were not accessible anymore. This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.
Solution
There are two possible Solutions:

1. If you could restore your Storage Environment and the Online RedoLogs from the Time of the crash you can try a manual Recovery followed by a RESETLOGS:

SQL> startup mount;

SQL> recover database until cancel using backup controlfile;

-> manually provide Online RedoLog containing the last (current) Sequence when asked, eg.

ORA-00279: change 100000 generated at xx/xx/xxxx xx:xx:xx needed for thread 1
ORA-00289: suggestion :
/flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arc
ORA-00280: change 100000 for thread 1 is in sequence #100

Specify log: {=suggested | filename | AUTO | CANCEL}
/ora/oradata/dbtest/redo04_1.rdo
Log applied.
Media recovery complete.

SQL> alter database open resetlogs;

2. If  step1. fails or you don't have the full Set of Files you have to restore and recover the Database from a recent Backup.


Alter database open fails with ORA-00600 kcratr_nab_less_than_odr [ID 1296264.1]

Applies to:
Oracle Server - Standard Edition - Version: 11.2.0.1 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
After Power Fail Alter database open fails with

ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]

Changes
Power failure
Cause
There was a power failure causing logical corruption in controlfile
Solution

Option a
------------

SQL>Startup mount ;

SQL>Show parameter control_files

Query 1
------------

sql>select a.member,a.group#,b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status='CURRENT'

Note down the name of the redo log

SQL>Shutdown abort ;

Take a OS Level back of the controlfile (This is to ensure we have a backup of current state of controlfile)

SQL>Startup mount ;

SQL>recover database using backup controlfile until cancel ;

Enter location of redo log shown as current in Query 1 when prompted for recovery

Hit Enter

SQL>Alter database open resetlogs ;






Option b
-----------


Recreate the controlfile using the Controlfile recreation script

With database in mount stage

rman target /

rman> spool log to '/tmp/rman.log';

Rman> list backup ;

Rman > exit

Keep this log handy

Go to sqlplus

SQL> Show parameter control_files

Keep this location handy.

SQL>oradebug setmypid

SQL>Alter session set tracefile_identifier='controlfilerecreate' ;

SQL>Alter database backup controlfile to trace ;

SQL>Oradebug tracefile_name ; --> This command will give the path and name of the trace file


Go to this location ,Open this trace file and select the controlfile recreation script with NO Resetlogs option


SQL>Shutdown immediate;


Rename the existing controlfile to _old ---> This is Important as we need to have a backup of existing controlfile since we plan to recreate it

SQL>Startup nomount

Now run the Controlfile recreation script with NO Resetlogs mode

SQL>Alter database open ;

For database version 10g and above

Once database is opened you can recatalog the rman backup information present in the list /tmp/rman.log using

Rman> Catalog start with '' ;


Once the database has been opened using the option a or option b its recommended to take a hot backup of the database.
Same Steps are applicable to Rac if all instance are down with same error.

 

 

1.kcratr_nab_less_than_odr 可能因为 存储问题引发:

Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with


2.  分析trace:


Dump continued from file: /orasys/diag/rdbms/wimng2/wimng2/trace/wimng2_ora_29785.trc
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []

========= Dump for incident 16953 (ORA 600 [kcratr_nab_less_than_odr]) ========

*** 2012-02-07 13:40:54.447
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----
alter database open

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            BFFE7388 ? 2 ?
ksedst1()+77         call     skdstdst()           BFFE7388 ? 0 ? 1 ? AB8E3A8 ?
                                                   853C46E ? AB8E3A8 ?
ksedst()+33          call     ksedst1()            0 ? 1 ?
dbkedDefDump()+2699  call     ksedst()             0 ? 5AF911 ? BFFE74AC ?
                                                   1007B40C ? BFFE7794 ? 0 ?
ksedmp()+47          call     dbkedDefDump()       3 ? 2 ?
ksfdmp()+59          call     ksedmp()             3EB ? BFFE92D0 ? DFBE5A3 ?
                                                   106AD160 ? 3EB ? 106AD160 ?
dbgexPhaseII()+1725  call     00000000             106AD160 ? 3EB ?
dbgexProcessError()  call     dbgexPhaseII()       B7FEB598 ? B7DBC888 ?
+2089                                              BFFECBA4 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   106AD160 ? B7FDD0D4 ? 258 ?
47
kgeadse()+286        call     00000000             106AD160 ? B7FDD0D4 ? 258 ?
kgerinv_internal()+  call     kgeadse()            106AD160 ? B7FDD0D4 ? 258 ?
47                                                 FD8DC58 ? 0 ? 4 ? BFFED45C ?
kgerinv()+41         call     kgerinv_internal()   106AD160 ? B7FDD0D4 ?
                                                   FD8DC58 ? 258 ? 0 ? 4 ?
                                                   BFFED45C ?
kgeasnmierr()+47     call     kgerinv()            106AD160 ? B7FDD0D4 ?
                                                   FD8DC58 ? 4 ? BFFED45C ?
kcratr_odr_check()+  call     kgeasnmierr()        106AD160 ? B7FDD0D4 ?
204                                                FD8DC58 ? 4 ? 0 ? 1 ?
kcratr()+1806        call     kcratr_odr_check()   BFFED6EC ? 0 ? F386D53 ? 0 ?
                                                   9 ? F386D53 ?
kctrec()+9311        call     kcratr()             BFFED6EC ? BFFF45D0 ? 0 ?
kcvcrv()+5906        call     kctrec()             BFFF5868 ? 0 ? B7FD0BD0 ?
                                                   B7FD122C ? B7E1BE00 ? 0 ?

  

Trace中出现的一些基本概念:

low rba :在buffer cache中的数据块第一次数据改变所对应的RAB。 脏数据块在检查点 队列里面按照low rba排列。
high rba :在buffer cache中的数据块最近一次数据改变时所对应的RAB。
checkpoint rba:在checkpint queue中(每次checkpoint queue被clean以后)第一个脏数据块第一次被修改对应的RAB,这个RBA之前的脏数据已经被全部写入磁盘。
on-disk rba:是 lgwr 写日志文件的最末位置的地址。

 

Redo Byte Address (RBA)
the log file sequence number (4 bytes)
the log file block number (4 bytes)
the byte offset into the block at which the redo record starts (2 bytes)

With respect to a dirty block in the buffer cache, thelow RBA is the address of the redo for the first change that was applied to the block since it was last clean, and thehigh RBA is the address of the redo for the most recent change to have been applied to the block.
Dirty buffers are maintained on the buffer cache checkpoint queues in low RBA order. The checkpoint RBA is the point up to which DBWn has written buffers from the checkpoint queues if incremental checkpointing is enabled — otherwise it is the RBA of last full thread checkpoint. The checkpoint RBA is copied into the checkpoint progress record of the controlfile by the checkpoint heartbeat once every 3 seconds. Instance recovery, when needed, begins from the checkpoint RBA recorded in the controlfile. The target RBA is the point up to which DBWn should seek to advance the checkpoint RBA to satisfy instance recovery objectives.
The on-disk RBA is the point up to which LGWR has flushed the redo thread to the online log files. DBWn may not write a block for which the high RBA is beyond the on-disk RBA. Otherwise transaction recovery (rollback) would not be possible, because the redo needed to undo a change is always in the same redo record as the redo for the change itself.
The term sync RBA is sometimes used to refer to the point up to which LGWR is required to sync the thread. However, this is not a full RBA — only a redo block number is used at this point.

 

再次感谢 Liu Maclean和惜分飞 的帮助

 

 

 

你可能感兴趣的:(ORA-XXXXX,less,database,cache,thread,sql,crash)