某行综合报表平台自上线以来,ETL耗时越来越长,日批处理耗时约为4小时30分钟到6小时,月批处理耗时约为12小时到14小时,并且耗时越来越长,RAC节点间流量达到80M/S,冲突非常大,CPU很高。
库的问题非常多,比如该sql:
Elapsed Time (s) |
CPU Time (s) |
Executions |
Elap per Exec (s) |
% Total DB Time |
SQL Id |
SQL Module |
SQL Text |
19,578 |
658 |
3 |
6525.84 |
2.31 |
|
select * from ( select row$_.*... |
可见该语句运行了6525.84秒,将近2个小时。该语句非常长,我们仅截取其中一小段核心语句进行深入分析:
select sum(nvl(a.LOAN_BAL, 0)) as O4, a.DUE_BILL_ID as F4, SUBSTR(a.ORG_CD, 1, 6) as D4, SUBSTR(a.ORG_CD, 1, 4) as B4, SUBSTR(a.ORG_CD, 1, 2) as A4 from ADBCDW.AC_HB_DUE_BILL_NPL_MONTH a where (SUBSTR(a.RPT_END_DATE, 1, 6) = '201112') and (a.ORG_CD like '3599%') and ((1 > 2 OR a.ORG_CD like '3599%') AND (SUBSTR(a.NPL_FORM_DATE, 1, 4) < '2012')) group by a.DUE_BILL_ID, SUBSTR(a.ORG_CD, 1, 6), SUBSTR(a.ORG_CD, 1, 4), SUBSTR(a.ORG_CD, 1, 2) PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------- Plan hash value: 2416238426 ----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 192 | 69364 (2)| 00:20:49 | | | | 1 | HASH GROUP BY | | 3 | 192 | 69364 (2)| 00:20:49 | | | | 2 | PARTITION RANGE ALL| | 4 | 256 | 69363 (2)| 00:20:49 | 1 | 38 | |* 3 | TABLE ACCESS FULL | AC_HB_DUE_BILL_NPL_MONTH | 4 | 256 | 69363 (2)| 00:20:49 | 1 | 38 | ----------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("A"."ORG_CD" LIKE '3599%' AND "A"."ORG_CD" LIKE '3599%' AND SUBSTR("A"."RPT_END_DATE",1,6)='201112' AND SUBSTR("A"."NPL_FORM_DATE",1,4)<'2012')
该sql的问题如下:
1. 该表按RPT_END_DATE字段进行了按月的范围分区,但RPT_END_DATE字段设计为VARCHAR2(8)。如按月查询,语句中将写成SUBSTR(a.RPT_END_DATE, 1, 6) = '201112'。这样,将导致Oracle无法采用分区裁剪功能,也无法使用建立在RPT_END_DATE字段之上的索引,最终导致全表扫描。
2.表分区极不合理,分区数据量不一,有的几千条,有的4K多万条,索引建了一堆,没有几个有用的。
对该sql进行了初步的优化,梳理索引后,优化对比如下:
|
优化前 |
优化后 |
|
|
|
响应速度 |
00:04:08.24 |
00:00:00.97 |
内存消耗 |
153017 *32K = 4,781M |
370*32K=11.56M |
I/O消耗 |
109014 *32K= 3,406M |
242*32K=7.56M |
可见很多性能问题都是设计不合理、开发不合理导致。该库经过彻底优化、索引梳理,后又经过存储升级,迁移至sam后,性能问题基本解决。