5月26日某客户告警日志中出现的ORA-00600 [4400],[48]错误记录,并产生了 trace文件:,
*** SERVICE NAME:(ETL) 2010-05-26 16:45:45.930
*** SESSION ID:(262.12024) 2010-05-26 16:45:45.930
*** 2010-05-26 16:45:45.930
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []
ORA-10387: parallel query server interrupt (normal)
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 0FFFFFFFF ? 000000020 ?
ksedmp+0290 bl ksedst 1048DFFC0 ?
ksfdmp+0018 bl 03F5B014
kgeriv+0108 bl _ptrgl
kgeasi+0118 bl kgeriv 10496FEC8 ? 000000002 ?
700000010008000 ? 110000AD0 ?
110190290 ?
ktcddt+012c bl kgeasi 110190110 ? 110450040 ?
113000001130 ? 200000002 ?
100000001 ? 000000000 ?
000000030 ? 000000013 ?
ktcsod+0384 bl ktcddt 0000003C0 ? 000000018 ?
1048D4808 ?
kssdch_stage+0758 bl _ptrgl
....................................................
----- End of Call Stack Trace -----
******************* Dumping process map ****************
377144 : ora_p001_ETL
100000000 93710K read/exec oracle
1100007ef 2372K read/write oracle
9fffffff0000000 44K read/exec /usr/ccs/bin/usla64
9fffffff000b30a 0K read/write /usr/ccs/bin/usla64
900000000375880 484K read/exec /usr/lib/liblvm.a[shr_64.o]
9001000a0121ac8 135K read/write /usr/lib/liblvm.a[shr_64.o]
900000000426180 74K read/exec /usr/lib/libcfg.a[shr_64.o]
9001000a00f5ed0 26K read/write /usr/lib/libcfg.a[shr_64.o]
90000000034e280 2K read/exec /usr/lib/libcrypt.a[shr_64.o]
9001000a00d4760 0K read/write /usr/lib/libcrypt.a[shr_64.o]
90000000069a7b0 4K read/exec /usr/lib/libc.a[aio_64.o]
9001000a022a340 0K read/write /usr/lib/libc.a[aio_64.o]
90000000035f300 85K read/exec /usr/lib/libodm.a[shr_64.o]
9001000a00d5c08 35K read/write /usr/lib/libodm.a[shr_64.o]
9000000003ef080 83K read/exec /usr/lib/libperfstat.a[shr_64.o]
9001000a01d0818 9K read/write /usr/lib/libperfstat.a[shr_64.o]
900000000358000 0K read/exec /usr/lib/libdl.a[shr_64.o]
9001000a0219000 0K read/write /usr/lib/libdl.a[shr_64.o]
9000000007ff100 8588K read/exec /oracle/product/10.2.0/lib/libjox10.a[shr.o]
8001000a0000038 585K read/write /oracle/product/10.2.0/lib/libjox10.a[shr.o]
9000000004a0000 228K read/exec /usr/lib/libpthreads.a[shr_xpg5_64.o]
9001000a0144000 558K read/write /usr/lib/libpthreads.a[shr_xpg5_64.o]
900000000045500 2966K read/exec /usr/lib/libc.a[shr_64.o]
9001000a0000788 844K read/write /usr/lib/libc.a[shr_64.o]
Total 110843K
******************* End of process map dump ****************
===================================================
PROCESS STATE
-------------
Process global information:
process: 7000000cf48bf98, call: 7000000b40ebdf0, xact: 7000000cdece7a8, curses: 7000000cf666540, usrses: 7000000cf666540
----------------------------------------
SO: 7000000cf48bf98, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=43, calls cur/top: 7000000b40ebdf0/7000000b40ebdf0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 115 0 4
last post received-location: kslpsr
last process to post me: 7000000cf479c38 1 6
last post sent: 0 0 251
last post sent-location: kxfpqr: QC
last process posted by me: 7000000cf48a7f8 10 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 7000000cf50bd50
O/S info: user: oracle, term: UNKNOWN, ospid: 377144
OSD pid info: Unix process pid: 377144, image: oracle@etl_a (P001)
SO: 7000000cf666540, type: 4, owner: 7000000cf48bf98, flag: INIT/-/-/0x00
(session) sid: 262 trans: 7000000cdece7a8, creator: 7000000cf48bf98, flag: (c0100041) USR/- BSY/-/-/DEL/-/-
DID: 0001-0028-000FFB1E, short-term DID: 0001-002B-00151DA1
txn branch: 7000000cdf412c8
oct: 0, prv: 0, sql: 0, psql: 0, user: 26/ETL
O/S info: user: bo, term: pts/0, ospid: 377144, machine: etl_a
program: oracle@etl_a (P001)
last wait for 'SQL*Net message from dblink' blocking sess=0x0 seq=45 wait_time=476 seconds since wait started=0
driver id=0, #bytes=1, =0
Dumping Session Wait History
for 'SQL*Net message from dblink' count=1 wait_time=476
driver id=0, #bytes=1, =0
for 'SQL*Net message to dblink' count=1 wait_time=1
driver id=0, #bytes=1, =0
for 'log file sync' count=1 wait_time=7483
buffer#=ca9, =0, =0
for 'log file sync' count=1 wait_time=358
buffer#=ca9, =0, =0
for 'PX Deq: Execution Msg' count=1 wait_time=15527
sleeptime/senderid=1001ffff, passes=1, =0
for 'log file sync' count=1 wait_time=11855
buffer#=c57, =0, =0
for 'PX Deq: Execution Msg' count=1 wait_time=277142
sleeptime/senderid=1001ffff, passes=1, =0
for 'direct path write' count=1 wait_time=25
file number=5, first dba=103ca, block cnt=3
for 'direct path write' count=1 wait_time=6
file number=4, first dba=1abe1, block cnt=8
for 'row cache lock' count=1 wait_time=3307
cache id=5, mode=0, request=5
temporary object counter: 0
----------------------------------------
Virtual Thread:
kgskvt: 7000000c0e86b58, sess: 7000000cf666540, vc: 0, proc: 7000000cf48bf98
consumer group cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig:
vt_state: 0x200, vt_flags: 0x30, blkrun: 0
is_assigned: 1, in_sched: 0 (0)
vt_active: 0 (pending: 0)
used quanta: 0 (cg: 0)
cpu start time: 0, quantum status: 0x0
quantum checks to skip: 0, check thresh: 0
idle time: 0, active time: 0 (cg: 0)
cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait time: 0 (cg: 0)
queued time outs: 0, time: 0 (cur 0, cg 0)
calls aborted: 0, num est exec limit hit: 0
undo current: 0k max: 0k
----------------------------------------
UOL used : 0 locks(used=0, free=2)
KGX Atomic Operation Log 7000000bdcf0568
Mutex 0(0, 0) idn 0 oper NONE
Cursor Parent uid 262 efd 9 whr 11 slp 0
oper=NONE pt1=7000000c69d0878 pt2=7000000bef37090 pt3=7000000b47119c8
pt4=0 u41=1 stt=0
KGX Atomic Operation Log 7000000bdcf05b0
Mutex 7000000bdc2d028(0, 2) idn 0 oper NONE
Cursor Stat uid 262 efd 12 whr 1 slp 0
oper=NONE pt1=7000000bdc2cef8 pt2=0 pt3=0
pt4=0 u41=0 stt=0
KGX Atomic Operation Log 7000000bdcf05f8
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 262 efd 0 whr 0 slp 0
----------------------------------------
SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
(enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
从trace文件名就可以看出出错的是某个并行子进程,其最近调用堆栈为ktcsod->ktcddt->kgeasi(报错),ktcsod与ktcddt均为Oracle中内核事务控制模块函数 (Kernel Transaction Control),又该并行子进程最近等待为"SQL*Net message from dblink"事件,共等待476s(wait_time=476 seconds)。trace显示该并行会话包含分布式事务锁:
SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
(enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
即DX(Distributed transaction entry)锁;可以猜测该并行会话在等待远程事务被清理,但等待超时,故报错。 查询metalink可以发现Bug 5223587与该错误一致,该Bug的特征为: 1. 出现ORA-600[4400],且调用栈为ktcddt ,说明涉及到了分布式事务 2. trace显示当时无打开的游标 3. 可以看到当时的等待为'SQL*Net message from dblink'事件 Metalink文档[ID