一套Windows上的11.2.0.1单实例数据库在database open阶段出现了ORA-07445:core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []内部错误,具体的出错日志如下:
LOG CONTENT
=======================ALERT.LOG============================
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
ARCH: Warning; less destinations available than specified
by LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameter
Autotune of undo retention is turned on.
IMODE=BR
ILAT =84
2011-08-01 13:13:47.068000 +08:00
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Using parameter settings in server-side spfile C:\APP\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEG11R2.ORA
System parameters with non-default values:
_spin_count = 2000
processes = 500
event = "10500 trace name context forever,level 8:10013 trace name context forever,level 10:
10015 trace name context forever,level 10"
sga_max_size = 600M
shared_pool_size = 152M
large_pool_size = 32M
java_pool_size = 4M
streams_pool_size = 0
_db_file_direct_io_count = 12
sga_target = 0
memory_target = 0
control_files = "C:\APP\ORADATA\G11R2\CONTROLFILE\O1_MF_6VWCSH9J_.CTL"
control_files = "C:\APP\FLASH_RECOVERY_AREA\G11R2\CONTROLFILE\O1_MF_6VWCSHNF_.CTL"
db_block_checksum = "TRUE"
db_block_size = 8192
db_cache_size = 196M
_shared_io_pool_size = 0
compatible = "11.2.0.0.0"
log_archive_dest_2 = "service=stdby optional lgwr sync affirm valid_for=(online_logfiles,all_roles)"
log_buffer = 10485760
db_create_file_dest = "C:\app\oradata"
db_recovery_file_dest = "C:\app\flash_recovery_area"
db_recovery_file_dest_size= 500000M
undo_tablespace = "UNDOTBS1"
_kgl_bucket_count = 2
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
session_cached_cursors = 300
audit_file_dest = "C:\APP\ADMIN\G11R2\ADUMP"
optimizer_features_enable= "10.2.0.4"
audit_trail = "DB"
cell_offload_plan_display= "ALWAYS"
db_name = "G11R2"
open_cursors = 3000
_optimizer_extended_cursor_sharing_rel= "NONE"
pga_aggregate_target = 300M
diagnostic_dest = "C:\APP"
2011-08-01 13:13:48.164000 +08:00
PMON started with pid=2, OS id=984
VKTM started with pid=3, OS id=3656 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
GEN0 started with pid=4, OS id=5824
DIAG started with pid=5, OS id=5832
DBRM started with pid=6, OS id=2784
PSP0 started with pid=7, OS id=2500
DIA0 started with pid=8, OS id=5320
MMAN started with pid=9, OS id=4128
DBW0 started with pid=10, OS id=5852
LGWR started with pid=11, OS id=3960
CKPT started with pid=12, OS id=4472
SMON started with pid=13, OS id=5788
RECO started with pid=14, OS id=6036
MMON started with pid=15, OS id=5740
MMNL started with pid=16, OS id=2112
ORACLE_BASE from environment = C:\app
alter database mount exclusive
2011-08-01 13:13:52.390000 +08:00
Sweep [inc][135908]: completed
NSS2 started with pid=19, OS id=2728
Sweep [inc][135901]: completed
Successful mount of redo thread 1, with mount id 2704081164
Database mounted in Exclusive Mode
2011-08-01 13:13:53.413000 +08:00
Lost write protection disabled
2011-08-01 13:13:54.578000 +08:00
Sweep [inc][135897]: completed
Sweep [inc2][135908]: completed
Sweep [inc2][135901]: completed
Sweep [inc2][135897]: completed
2011-08-01 13:13:55.788000 +08:00
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
2011-08-01 13:13:56.959000 +08:00
Started redo scan
Completed redo scan
read 0 KB redo, 0 data blocks need recovery
Started redo application at
Thread 1: logseq 867, block 88140, scn 9122496
Recovery of Online Redo Log: Thread 1 Group 3 Seq 867 Reading mem 0
Mem# 0: C:\APP\ORADATA\G11R2\ONLINELOG\O1_MF_3_6VWCSMPO_.LOG
Mem# 1: C:\APP\FLASH_RECOVERY_AREA\G11R2\ONLINELOG\O1_MF_3_6VWCSNGX_.LOG
Completed redo application of 0.00MB
Completed crash recovery at
Thread 1: logseq 867, block 88140, scn 9142497
0 data blocks read, 0 data blocks written, 0 redo k-bytes read
2011-08-01 13:13:58.738000 +08:00
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=22, OS id=4784
2011-08-01 13:13:59.765000 +08:00
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
ARC1 started with pid=24, OS id=2780
ARC2 started with pid=25, OS id=1288
ARC1: Archival started
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
ARC3 started with pid=26, OS id=3876
2011-08-01 13:14:00.828000 +08:00
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
NSS2 started with pid=19, OS id=5156
2011-08-01 13:14:29.008000 +08:00
ORA-16198: LGWR received timedout error from KSR
2011-08-01 13:14:35.980000 +08:00
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_lgwr_3960.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Continuing...
ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch
2011-08-01 13:14:38.629000 +08:00
Trying to expand controlfile section 11 for Oracle Managed Files
Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_arc0_4784.trc (incident=136091):
ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []
Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136091\g11r2_arc0_4784_i136091.trc
2011-08-01 13:14:40.283000 +08:00
Trace dumping is performing id=[cdmp_20110801131440]
2011-08-01 13:14:52.417000 +08:00
Sweep [inc][136091]: completed
Sweep [inc2][136091]: completed
2011-08-01 13:14:59.805000 +08:00
ARC2: Detected ARCH process failure
ARC2: STARTING ARCH PROCESSES
ARC0 started with pid=19, OS id=5016
2011-08-01 13:15:00.836000 +08:00
ARC0: Archival started
ARC2: STARTING ARCH PROCESSES COMPLETE
2011-08-01 13:15:36.689000 +08:00
Deleted Oracle managed file C:\APP\FLASH_RECOVERY_AREA\G11R2\ARCHIVELOG\2011_08_01\O1_MF_1_866_73DFKWRK_.ARC
2011-08-01 13:15:38.013000 +08:00
Error 12154 received logging on to the standby
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc:
ORA-12154: TNS:could not resolve the connect identifier specified
ARCH: Error 12154 Creating archive log file to 'stdby'
Trying to expand controlfile section 11 for Oracle Managed Files
Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc (incident=136051):
ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []
Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc
2011-08-01 13:15:39.680000 +08:00
Trace dumping is performing id=[cdmp_20110801131539]
2011-08-01 13:15:42.782000 +08:00
PMON (ospid: 984): terminating the instance due to error 397
2011-08-01 13:15:50.520000 +08:00
Instance terminated by PMON, pid = 984
=============================g11r2_ora_4852_i136051.trc=============================
Dump file c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU : 4 - type 586, 2 Physical Cores
Process Affinity : 0x0x00000000
Memory (Avail/Total): Ph:2122M/3566M, Ph+PgF:5413M/7130M, VA:1084M/2047M
Instance name: g11r2
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 4852, image: ORACLE.EXE (SHAD)
*** 2011-08-01 13:15:38.527
*** SESSION ID:(197.1) 2011-08-01 13:15:38.527
*** CLIENT ID:() 2011-08-01 13:15:38.527
*** SERVICE NAME:() 2011-08-01 13:15:38.527
*** MODULE NAME:(oradim.exe) 2011-08-01 13:15:38.527
*** ACTION NAME:() 2011-08-01 13:15:38.527
Dump continued from file: c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc
ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []
========= Dump for incident 136051 (ORA 7445 [kcflfi()+466]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
Process Id: 0x000010bc Thread Id : 0x000012f4 Time : Mon Aug 01 13:15:38
Excp. Code: 0xc0000094 Excp. Type: INT_DIVIDE Flags: 0x00000000
------------------- Registers ----------------------------
eip = 0500282e esp = 0d9f525c ebp = 0d9f577c edi = 37eefe00 esi = 00000265
eax = 00000265 ebx = 00000000 ecx = 089ee234 edx = 00000000
ecs = 0000001b eds = 00000023 ees = 00000023 ess = 00000023
egs = 00000000 efs = 0000003b
eflags = 00010246
------------------- End of Registers ---------------------
*** 2011-08-01 13:15:38.536
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----
alter database open
----------- messages from pre-loading .sym files:
Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oracommon11.SYM does not match binary.
Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different
Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oraclsra11.SYM does not match binary.
Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different
----------- end of messages from pre-loading .sym files
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oracommon11.SYM does not match binary.
Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different
Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oraclsra11.SYM does not match binary.
Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different
EnumerateLoadedModules64 failed with error -1073741819
Symbol file oraclsra11.SYM does not match binary.
Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different
Symbol file oracommon11.SYM does not match binary.
Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different
__VInfreq__kcflfi() 00000000
+466
_kccrszf()+287 CALLrel _kcflfi() 0 318345B8 34 31C0DD40 4000
265 4 7FFFFFFF 1 0 0
_kccrsd_expd()+1418 CALLrel _kccrszf() D9F7CEC 268 264
_kccwnc_reuse_expan CALLrel _kccrsd_expd() D9F7CEC B 38
d()+640
__VInfreq__kccwnc() CALLrel _kccwnc_reuse_expan D9F7CEC B 26
+235 d()
_krse_arc_complete( CALLrel _kccwnc() D9F7CEC D9F6D38 B
)+1615
_krse_arc_driver_co CALLrel _krse_arc_complete( D9F78AC
re()+1307 )
_krse_arc_driver()+ CALLrel _krse_arc_driver_co D9F7CEC 1 D9F7C6C 0 0 D9F7CC8
274 re() 0 0 0 0 0 0 0
_krsq_arch_to_force CALLrel _krse_arc_driver() D9F7CEC 1 D9F7C6C 0 0 D9F7CC8
_switch()+196 0 0 0 0 0 0 0
__VInfreq__kcttsc() CALLrel _krsq_arch_to_force D9F7CEC 1
+129 _switch()
_kcfopd()+1504 CALLrel _kcttsc() 2
_adbdrv()+16700 CALLrel _kcfopd() 0 0 0 0 D9FBBF8
_opiexe()+13594 CALLrel _adbdrv() 4A C0000094 33644518 D9FBD38
6D60697 2F3FC5F0
_opiosq0()+6248 CALLrel _opiexe() 4 0 D9FC704
_kpooprx()+277 CALLrel _opiosq0() 3 E D9FC970 A4 0
_kpoal8()+632 CALLrel _kpooprx() D9FF074 D9FD3F8 13 1 0 A4
_opiodr()+1248 CALLreg 00000000 5E 1C D9FF070
___dyn_tls_init_cal CALLreg 00000000 5E 1C D9FF070 1
lback()+2935122
_opitsk()+1404 CALL??? 00000000 C9A10E8 5E D9FF070 0 D9FED00
D9FF19C 53E52E 0 D9FF1C8
_opiino()+980 CALLrel _opitsk() 0 0
_opiodr()+1248 CALLreg 00000000 3C 4 D9FFBC4
_opidrv()+1201 CALLrel _opiodr() 3C 4 D9FFBC4 0
_sou2o()+55 CALLrel _opidrv() 3C 4 D9FFBC4
_opimai_real()+124 CALLrel _sou2o() D9FFBD4 3C 4 D9FFBC4
_opimai()+125 CALLrel _opimai_real() 2 D9FFBFC
_OracleThreadStart@ CALLrel _opimai() 2 D9FFF3C 0 70 FFFFFFFF
4()+830 FFFFFFFF
___dyn_tls_init_cal CALLptr 00000000 901FF6C D9FFFD4 776437F5
lback()+366382316 901FF6C 765D34CB 0
___dyn_tls_init_cal CALLreg 00000000 901FF6C 765D34CB 0 0 901FF6C
lback()+367384440 0
___dyn_tls_init_cal CALLrel ___dyn_tls_init_cal 401326 901FF6C 0 0 0 0
lback()+367384392 lback()+367384403
00000000 CALL??? 00000000
--------------------- Binary Stack Dump ---------------------
..................
从以上日志中可以看到在”Trying to expand controlfile section 11 for Oracle Managed Files“扩扎控制文件过程中出现了 _kccwnc_reuse_expan->_kccrsd_expd->_kccrszf->_kcflfi->_VInfreq__kcflfi() 函数的7445错误,kcf意为(manages and coordinates operations on the control file(s),kcf.c),是在处理日志文件中引发了INT_DIVIDE_BY_ZERO除数为零的代码bug。 通过7445和kcflfi关键词在MOS上搜索没有太大的发现,说明该Bug的处罚几率非常低,正好让我碰到说明是某些特殊参数的设置引起了该问题。 目标锁定启动日志中的非默认隐藏参数"_db_file_direct_io_count",该参数决定了直接路径读写的IO大小,从9i开始该参数的单位调整为bytes而非原先的blocks,之前因为对该参数进行一些测试所以设置了一个较小值。
Parameter: DB_FILE_DIRECT_IO_COUNT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Versions: 8.0 - 8.1
This parameter is hidden in 9.0 onwards.
Parameter type: integer
Parameter class: dynamic, scope = ALTER SYSTEM DEFERRED
Default value: 64
Range of values: operating system-dependent
Description:
~~~~~~~~~~~~
DB_FILE_DIRECT_IO_COUNT is used to specify the number of blocks to be used
for IO operations done by backup, restore or direct path read and write
functions. The IO buffer size is a product of DB_FILE_DIRECT_IO_COUNT and
DB_BLOCK_SIZE. The IO buffer size cannot exceed max_IO_size for your
platform.
Assigning a high value to this parameter results in greater use of PGA or
SGA memory.
o In Oracle8i, minimize the number of I/O requests by setting the
DB_FILE_DIRECT_IO_COUNT instance parameter so that
DB_BLOCK_SIZE x DB_FILE_DIRECT_IO_COUNT = max_io_size of system
In Oracle8i the default for this is 64 blocks.
(In Oracle9i, it is replaced by _DB_FILE_DIRECT_IO_COUNT which governs
the size of direct I/Os in BYTES (not blocks). The default is 1Mb but
will be sized down if the max_io_size of the system is smaller.)
ORA-19863 during RMAN duplicate
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.3
This problem can occur on any platform.
Symptoms
-- Problem Statement:
Duplicate failed during the datafile restore stage:
Starting restore at 2008-Apr-09 09:28:24
using channel ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: starting datafile backupset restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00003 to /u06/oradata/hcmprdc/sysaux01.dbf
...
restoring datafile 00121 to /u06/oradata/hcmprdc/waapp.dbf
channel ORA_AUX_DISK_1: reading from backup piece
/u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 04/09/2008 09:28:28
RMAN-03015: error occurred in stored script Memory Script
ORA-19870: error reading backup piece /u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1
ORA-19863: device block size 1040384 is larger than max allowed: 262144
Cause
The database parameter _db_file_direct_io_count in the target and auxiliary instance does not match.
Solution
-- To implement the solution:
Ensure that parameter _db_file_direct_io_count on the target and auxiliary database the same
_DB_FILE_DIRECT_IO_COUNT need to be set to the same value between the source database
where the backup was taken and the target database where the backup is being restored.
2.0 Size of Input/Output Buffers
================================
a. input buffers
----------------
NOTE : DB_FILE_DIRECT_IO_COUNT is not available in Oracle9i onwards.
In Oracle9i, it is replaced by a hidden _DB_FILE_DIRECT_IO_COUNT which
governs the size of direct I/Os in BYTES (not blocks). The default is
1Mb butwill be sized down if the max_io_size of the system is smaller.
The input buffer size is:
buffersize = db_block_size * db_file_direct_io_count
As there are 4 input buffers, the total input buffer memory use per channel is:
memory(input) = #buffers * #files * buffersize
= 4 * #files * buffersize
For example, if 2 channels are used, and each of these channels backs up 3
files, then for each channel
memory(input) = 4 * 3 * db_block_size * db_file_direct_io_count
b. output buffers
-----------------
For disk channels, the output buffer size is:
buffersize = db_block_size * db_file_direct_io_count
For SBT_TAPE channels, the output buffer size in Oracle8/8i is o/s dependant. (On Solaris,
this defaults to 64k) On 9i/10g it defaults to 256k for all platforms. The BLKSIZE argument to 'allocate channel...' can be
used to override the default value.
As there are 4 output buffers,
memory(output) = #buffers * buffersize
= 4 * buffersize
一般来说使用该隐藏参数的默认值即可,通过重置该参数后修复启动问题:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> alter system reset "_db_file_direct_io_count" scope=spfile;
System altered.
SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
FROM SYS.x$ksppi x, SYS.x$ksppcv y
WHERE x.inst_id = USERENV ('Instance')
AND y.inst_id = USERENV ('Instance')
AND x.indx = y.indx
AND x.ksppinm LIKE '%db_file_direct_io_count%'
/
NAME VALUE DESCRIB
------------------------------ -------------------- ------------------------------
_db_file_direct_io_count 1048576 Sequential I/O buf size
windows上的11gr2默认该参数为1MB