客户的数据库alert报错如下:
Wed Aug 28 16:12:33 BEIST 2013
Errors in file /home/oracle/app/admin/ccdb/udump/ccdb1_ora_303156.trc:
ORA-00600: 内部错误代码, 参数: [kcblasm_1], [103], [], [], [], [], [], []
Wed Aug 28 16:12:37 BEIST 2013
Trace dumping is performing id=[cdmp_20130828161237]
呵呵,ORA-600错误,查看dump文件
[oracle@ccdb01]$more /home/oracle/app/admin/ccdb/udump/ccdb1_ora_303156.trc
发现错误发生时候的正在执行的sql
Current SQL statement for this session:
select t.GSDM,t.XSDH,t.MDDM,t.JZRQ,t.FPH,t.DJLX,t.GKXM,t.KPRQ,t.YYYID,t.SKYID,t.XSJE,BZ,t.QKJE,XSLB,t.SKYSSBM,t.THHBJ,t.THHLX,t.THHYY,t.DGYDM,t.LSKBJ,t.SKRQ,
t.HYKH,t.SPJF,t.GKSJ,t.QTDH,t.AZBJ,t.SHFS,t.KHDZ_CS,t.KHDZ_QX,t.KHDZ_XZ,t.KHDZ_HN,t.KHDZ_QC,t.KHDZ_JTDD,t.KHDZ_XXDZ,t.JLJDH,t.KHBM,d.thdh,d.zpbj,d.xssl,d.gys
,d.kcbj,d.lsdj,d.cxje,d.ythdh,d.spbm,d.zspthdh,d.fhrq,d.fhr,d.ywlx,d.ysbj from XSDD t, xsspmx d where t.gsdm=d.gsdm and t.xsdh = d.xsdh and t.KPRQ between to
_date('2011-08-09','yyyy-mm-dd') and to_date( '2013-08-28','yyyy-mm-dd') and t.GSDM= '2401'
看样子很恐怖啊
[oracle@ccdb01]$sqlplus "/as sysdba"
SQL*Plus: Release 10.2.0.5.0 - Production on Thu Aug 29 10:31:30 2013
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select OWNER,TABLE_NAME,NUM_ROWS,BLOCKS,EMPTY_BLOCKS,TABLE_LOCK from dba_tables
2 where TABLE_NAME = 'XSSPMX';
OWNER TABLE_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
BLOCKS EMPTY_BLOCKS TABLE_LO
---------- ------------ --------
JYCC XSSPMX 108855862
2406631 0 ENABLED
SQL> select OWNER,TABLE_NAME,NUM_ROWS,BLOCKS,EMPTY_BLOCKS,TABLE_LOCK from dba_tables where TABLE_NAME = 'XSDD';
OWNER TABLE_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
BLOCKS EMPTY_BLOCKS TABLE_LO
---------- ------------ --------
JYCC XSDD 103267707
2816593 0 ENABLED
继续跟踪检查sql语句执行计划:
explain plan for
select t.GSDM,t.XSDH,t.MDDM,t.JZRQ,t.FPH,t.DJLX,t.GKXM,t.KPRQ,t.YYYID,t.SKYID,t.XSJE,BZ,t.QKJE,XSLB,t.SKYSSBM,t.THHBJ,t.THHLX,t.THHYY,t.DGYDM,t.LSKBJ,t.SKRQ,
t.HYKH,t.SPJF,t.GKSJ,t.QTDH,t.AZBJ,t.SHFS,t.KHDZ_CS,t.KHDZ_QX,t.KHDZ_XZ,t.KHDZ_HN,t.KHDZ_QC,t.KHDZ_JTDD,t.KHDZ_XXDZ,t.JLJDH,t.KHBM,d.thdh,d.zpbj,d.xssl,d.gys
,d.kcbj,d.lsdj,d.cxje,d.ythdh,d.spbm,d.zspthdh,d.fhrq,d.fhr,d.ywlx,d.ysbj
from JYCC.XSDD t, JYCC.xsspmx d
where t.gsdm=d.gsdm and t.xsdh = d.xsdh
and t.KPRQ between to_date('2011-08-09','yyyy-mm-dd')
and to_date( '2013-08-28','yyyy-mm-dd')
and t.GSDM= '2401';
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1923470307
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2535K| 672M| | 1180K (1)| 03:56:11 |
|* 1 | HASH JOIN | | 2535K| 672M| 210M| 1180K (1)| 03:56:11 |
|* 2 | TABLE ACCESS FULL | XSSPMX | 2142K| 185M| | 530K (1)| 01:46:04 |
|* 3 | TABLE ACCESS FULL | XSDD | 1949K| 347M| | 621K (1)| 02:04:20 |
-------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T"."GSDM"="D"."GSDM" AND "T"."XSDH"="D"."XSDH")
2 - filter("D"."GSDM"='2401')
3 - filter("T"."GSDM"='2401' AND "T"."KPRQ">=TO_DATE(' 2011-08-09
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T"."KPRQ"<=TO_DATE(' 2013-08-28
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
19 rows selected.
这是一条很BT的sql语句,触发bug也是正常,引起ORA-600错误也不足为奇了。可以发现:hash join时候,Direct IO最大限制4096,从执行计划中可以看出,hash join的build table表的cardinality非常大。
解决办法,肯定是优化sql语句是最好的。
查询MOS:该问题符合ORA-600 [kcblasm_1] In 10.2.0.5. (文档 ID 1133845.1)的描述, …
而且Bug 9781592 : ORA-600 [KCBLASM_1] [103] DURING HASH JOIN QUERY USE ON DATABASE IN DWH ,ORA-600 [kcblasm_1]表现特征为
ALERT日志中:
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
dump文件中call stack和以下类似
kcblasm1 kcblasm kxhfFndFreeSlot kxhfNewBuffer qerhjGetNewBuffer ksxb1bqb kxhrPack
This problem is hit only in 10.2.0.5, up to PSU 10.2.0.5.4 in which problem is fixed.
Problem is affecting 10.2.0.5 to 10.2.0.5.3.
将数据库升级psu到10.2.0.5.4和11.2可以修正该问题,对于10.2.0.5.0到10.2.0.5.3的版本,也可以直接打PATCH 7612454来避免改错误。
临时的解决方法:
此外是因为hash join造成的设置_hash_join_enabled为 false
如果是hash group by造成的设置_gby_hash_aggregation_enable为false
因此对于这次错误,是因为hash join造成的设置
alter session set "_hash_join_enabled"= false; or hint /*+ opt_param(‘_gby_hash_aggregation_enabled’,'false')
来临时解决。
但是,根本的解决问题的方法
1.优化超级BT的sql语句,避免遇到bug;
2.升级
(1)将数据库升级psu到10.2.0.5.4和11.2可以修正该问题
(2)对于10.2.0.5.0到10.2.0.5.3的版本,打PATCH 7612454来避免改错误(该补丁替换lib中的kcbl.o文件)。