背景:定制了一个脚本 排查数据库中具有潜在风险的SQL,显示下面这条SQL触发了多个风险 执行时间是33分钟
SELECT T1.DATA_DT,
T1.BRANCH_NO,
T5.FINANCE_ORG_NO,
DECODE(T2.CUST_TYP, '01', '1', '02', '0', NULL),
SUBSTR(T1.KEY_1, 4, 16),
SUBSTR(T1.KEY_1, 4, 16),
T4.PROD_TYP_CD,
TO_VALID_DATE(2415020 + T1.ACCT_OPEN_DT),
CASE
WHEN T4.PROD_CD IN ('31060010', '41060010', '31060020', '41170010') THEN
TO_DATE('19990101', 'YYYYMMDD')
WHEN T4.PROD_CD IN ('41090310', '41090110', '41090010', '41080010',
'31260010', '31080020', '31080010', '41160010', '31090310', '31090110',
'31090010','31090410') THEN
TO_DATE('19990107', 'YYYYMMDD')
WHEN T4.PROD_CD = '41110010' THEN
NULL
ELSE
TO_DATE(SUBSTR(T1.OD_VISA_AREA, 23, 8), 'YYYY-MM-DD')
END,
T1.CURRENCY,
T1.CURR_BAL,
CASE
WHEN T4.PROD_TYP_CD IN ('D02', 'D03') THEN
'RF02'
ELSE
'RF01'
END,
CASE T1.VAR_INT_RATE
WHEN 0 THEN
(SELECT B.RATE
FROM S_CCC_MMMM_H_TTTT B
WHERE B.BASE_ID = SUBSTR(T3.CR_ID_DET, 1, 4)
AND B.DATA_DT = TO_DATE('2017-12-31', 'YYYY-MM-DD')
AND TRIM(B.RATE_ID) = SUBSTR(T3.CR_ID_DET, 5, 3)
AND TO_CHAR(TO_VALID_DATE(2415020 + (999999999 - B.BASM_DATE)),
'YYYYMMDD') || B.BASM_TIME =
(SELECT MAX(TO_CHAR(TO_VALID_DATE(2415020 +
(999999999 - S.BASM_DATE)),
'YYYYMMDD') || S.BASM_TIME)
FROM S_CCC_MMMM_H_TTTT S
WHERE S.BASE_ID = B.BASE_ID
AND S.DATA_DT = TO_DATE('2017-12-31', 'YYYY-MM-DD')
AND S.RATE_ID = B.RATE_ID
AND TO_VALID_DATE(2415020 + (999999999 - S.BASM_DATE)) <=
TO_DATE('2017-12-31', 'YYYY-MM-DD')))
ELSE
T1.VAR_INT_RATE
END,
T2.CUST_NM,
''
FROM S_CCC_MMMM T1
LEFT JOIN S_I_CCCC_OOOO T2 ON LPAD(T2.CUST_NUM, 16, '0') =
T1.CUSTOMER_NO
INNER JOIN S_SSS_DDDD T3 ON T3.SYS = 'INV' --INV代表存款
AND T3.TYPE = T1.ACCT_TYPE
AND T3.INT_CAT = T1.INT_CAT
INNER JOIN M_CCCC_DDD_CCC_SSS_TMP T4 ON T1.ACCT_TYPE || T1.INT_CAT = T4.PROD_CD
AND T4.PROD_TYP_CD <> 'D051'
AND T4.BIZ_CLS_CD <> '1'
INNER JOIN JRTJAPP.RRR_OOO_OOOO T5 ON T1.BRANCH_NO = T5.ORG_NO
WHERE T1.CURR_BAL <> 0
AND T1.DATA_DT = TO_DATE('2017-12-31', 'YYYY-MM-DD')
AND T1.CURRENCY = 'CNY'
AND SUBSTR(T1.GL_CLASS_CODE, 10, 8) <> '13090101';
Plan hash value: 1747367332
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 407 | 538K (1)| 01:47:38 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE SINGLE | | 1 | 72 | 13 (0)| 00:00:01 | 2 | 2 |
|* 3 | TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT | 1 | 72 | 13 (0)| 00:00:01 | 2 | 2 |
| 4 | SORT AGGREGATE | | 1 | 59 | | | | |
| 5 | PARTITION RANGE SINGLE | | 1 | 59 | 13 (0)| 00:00:01 | 2 | 2 |
|* 6 | TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT | 1 | 59 | 13 (0)| 00:00:01 | 2 | 2 |
|* 7 | HASH JOIN OUTER | | 1 | 407 | 538K (1)| 01:47:38 | | |
| 8 | NESTED LOOPS | | | | | | | |
| 9 | NESTED LOOPS | | 1 | 387 | 490K (1)| 01:38:03 | | |
|* 10 | HASH JOIN | | 1 | 377 | 490K (1)| 01:38:03 | | |
|* 11 | HASH JOIN | | 1 | 354 | 490K (1)| 01:38:03 | | |
|* 12 | TABLE ACCESS FULL | S_CCC_MMMM | 1 | 313 | 490K (1)| 01:38:03 | | |
|* 13 | TABLE ACCESS FULL | M_CCCC_DDD_CCC_SSS_TMP | 158 | 6478 | 5 (0)| 00:00:01 | | |
|* 14 | TABLE ACCESS FULL | S_SSS_DDDD | 1100 | 25300 | 7 (0)| 00:00:01 | | |
|* 15 | INDEX RANGE SCAN | PK_RRR_OOO_OOOO | 1 | | 1 (0)| 00:00:01 | | |
| 16 | TABLE ACCESS BY INDEX ROWID| RRR_OOO_OOOO | 1 | 10 | 2 (0)| 00:00:01 | | |
| 17 | TABLE ACCESS FULL | S_I_CCCC_OOOO | 23M| 456M| 47781 (2)| 00:09:34 | | |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_CHAR("TO_VALID_DATE"(2415020+(999999999-"B"."BASM_DATE")),'YYYYMMDD')||TO_CHAR("B"."BASM_TIME")
= (SELECT MAX(TO_CHAR("TO_VALID_DATE"(2415020+(999999999-"S"."BASM_DATE")),'YYYYMMDD')||TO_CHAR("S"."BASM_TIME")
) FROM "S_CCC_MMMM_H_TTTT" "S" WHERE "S"."BASE_ID"=:B1 AND "S"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "S"."RATE_ID"=:B2 AND "TO_VALID_DATE"(2415020+(999999999-"S"."BASM_DATE"))<=TO_DAT
E(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
3 - filter("B"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"B"."BASE_ID"=SUBSTR(:B1,1,4) AND TRIM("B"."RATE_ID")=SUBSTR(:B2,5,3))
6 - filter("S"."BASE_ID"=:B1 AND "S"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"S"."RATE_ID"=:B2 AND "TO_VALID_DATE"(2415020+(999999999-"S"."BASM_DATE"))<=TO_DATE(' 2017-12-31 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
7 - access("T1"."CUSTOMER_NO"=LPAD("T2"."CUST_NUM"(+),16,'0'))
10 - access("T3"."TYPE"="T1"."ACCT_TYPE" AND "T3"."INT_CAT"="T1"."INT_CAT")
11 - access("T4"."PROD_CD"="T1"."ACCT_TYPE"||"T1"."INT_CAT")
12 - filter("T1"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T1"."CURR_BAL"<>0
AND SUBSTR("T1"."GL_CLASS_CODE",10,8)<>'13090101' AND "T1"."CURRENCY"='CNY')
13 - filter("T4"."PROD_TYP_CD"<>'D051' AND "T4"."BIZ_CLS_CD"<>'1')
14 - filter("T3"."SYS"='INV')
15 - access("T1"."BRANCH_NO"="T5"."ORG_NO")
Note
-----
- dynamic sampling used for this statement (level=2)
用定制的脚本先抓取出SQL涉及到的表和索引的一些信息
OWNER OBJECT_TYPE OBJECT_NAME ALIAS PARTITIONED SIZE_MB NUM_ROWS ESTIMATE_P LAST_ANALYZED STATUS
---------- -------------------- ------------------------------ ---------- ----------- ---------- ---------- ---------- ------------- --------------
JRTJ TABLE M_CCCC_DDD_CCC_SSS_TMP T4 NO 0.125 % 统计信息过期
JRTJAPP TABLE RRR_OOO_OOOO T5 NO 0.375 3043 100% 2018/1/2 19:2 统计信息未过期
JRTJ TABLE S_CCC_MMMM_H_TTTT B | S YES 1.125 6867 100% 2018/1/2 19:2 统计信息未过期
JRTJ TABLE S_SSS_DDDD T3 NO 0.1875 1100 100% 2018/1/2 19:1 统计信息未过期
JRTJ TABLE S_CCC_MMMM T1 NO 14144 36104177 100% 2018/1/2 19:2 统计信息未过期
JRTJ TABLE S_I_CCCC_OOOO T2 NO 1411 23955089 100% 2018/1/2 19:2 统计信息未过期
JRTJAPP INDEX (UNIQUE) PK_RRR_OOO_OOOO NOALIAS NO 0.125 3043 100% 2018/1/2 19:2 统计信息未过期
分析过程:这个SQL从语句的写法和执行计划可以看出来有3处可能引起性能问题的点:标量子查询、有两个儿子节点的FILTER、嵌套循环
一、标量子查询
1.怎么找到标量子查询
SQL中SELECT后面FROM前面的子查询,并且和主查询存在关联关系,这一类的子查询就属于标量子查询
在执行计划中 ID=1 有兄弟ID,父ID不是连接方式(NEST LOOP ,HASH,SMJ等等)这一类就属于标量子查询,请看执行计划ID=1和ID=7
2.标量子查询有什么缺点
标量子查询的主查询返回多少行,标量子查询就会被执行多少次。所以对性能影响主要取决于主查询的返回的行数,
一般主查询返回在1000行以内,标量对性能没有丝毫影响。10000行(这里取决于数据库服务器性能,性能特别好的10w)往上才会产生性能问题。
所以标量子查询的缺点很明显:每次主查询返回的结果集不同,会影响到整个SQL的性能,主查询返回的结果集如果随着时间的推移而增加,这个性能问题才会慢慢浮现
一旦引起的性能问题,除非改写SQL。没有任何优化余地
3.怎么判断标量子查询是否对这个SQL产生了影响。
如果想快速判断,直接注释掉标量子查询的那段代码,再次运行SQL观察性能有没有提升
如果想准确判断,只能先count(*)查出主查询返回的结果集行数是否超过了临界值(10000)
两种方法都会消耗掉大量时间。所以标量子查询的性能问题都使用排除法在最后判断
4.怎么处理标量子查询引起的性能问题
使用外连接的方式对标量子查询进行等价改写
二、有两个儿子节点的FILTER
FILTER在执行计划中如果只有<=1的儿子ID,本身只表示一个简单的过滤,没有任何性能问题。
如果下面有两个以上的儿子ID,这时候FILTER则是一个连接方式,类似嵌套循环ID=1 FILTER下面有两个儿子ID 分别是2和4,
表示ID=2 PARTITION RANGE SINGLE 返回N行
ID=4 SORT AGGREGATE就会被执行N次,也就是ID=4最终的儿子节点ID=6 TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT会执行N次
这个N的临界值同样是10000
所以这时候我们需要查询ID=3 TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT 这一步返回的行数,
从上面的查询可看出S_CCC_MMMM_H_TTTT的NUM_ROWS是6867行
而且表上面有过滤3 - filter("B"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"B"."BASE_ID"=SUBSTR(:B1,1,4) AND TRIM("B"."RATE_ID")=SUBSTR(:B2,5,3))
所以ID=3返回的结果集 肯定<10000行,其父ID=2同样
上面我们分析到ID=4最终的儿子节点ID=6 TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT会执行N次 可以确定这个N<10000没有性能问题
但是如果ID=6的这个表S_CCC_MMMM_H_TTTT很大,同样会出现大的性能问题(一个大表被TABLE ACCESS FULL一次都是不能容忍的,何况几千次)
当然这里面查到ID=6的这个表S_CCC_MMMM_H_TTTT很小只有1MB,否则需要在ID=6的这个表S_CCC_MMMM_H_TTTT连接列上面建索引来避免几千次的TABLE ACCESS FULL
三、嵌套的驱动表可能返回很大的结果集
ID=9这个NEST LOOP 的驱动表是ID=10这个HASH的结果集(如果这个结果集过大很容易),请看ID=11,12,13这三个
ID=12 这个表S_CCC_MMMM的NUM_ROWS是36104177 行,这是一张事实表
ID=13 这个表M_CCCC_DDD_CCC_SSS_TMP经过查询很小,是一个参数表,也就是说,事实表和参数表做关联的时候,返回的结果集约等于事实表的NUM_ROWS(因为参数表不起过滤作用)
对于事实表本身的过滤12 - filter("T1"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T1"."CURR_BAL"<>0
AND SUBSTR("T1"."GL_CLASS_CODE",10,8)<>'13090101' AND "T1"."CURRENCY"='CNY') 但是返回结也在10w行数量级,所以嵌套循环的驱动表会返回10w+行
被驱动表会被访问10w+次,所以真正的性能瓶颈 在这里 只需要将NEST LOOP 改为使用HASH的连接方式即可
四、对于HINT的使用
首先注意格式,空格。USE_HASH里面的表之间可以用逗号或者空格 隔开
执行顺序,单使用USE_HASH(T1,T4,T3,T5) 有时候无法控制顺序,使用ORDERED强制让连接变得有序。因为顺序错乱的情况下可能会出现笛卡尔积
举个例子:WHERE A.ID=B.ID AND A.ID=C.ID 如果USE_HASH先让B,C作关联结果集关联A,会因为B,C关联条件的缺失造成笛卡尔积,当然这种概率不大,只有在统计信息过期的情况下才会出现的小概率事件
所以优化后的SQL如下:
SELECT /*+ USE_HASH(T1,T4,T3,T5) ORDERED */T1.DATA_DT,
T1.BRANCH_NO,
T5.FINANCE_ORG_NO,
DECODE(T2.CUST_TYP, '01', '1', '02', '0', NULL),
SUBSTR(T1.KEY_1, 4, 16),
SUBSTR(T1.KEY_1, 4, 16),
T4.PROD_TYP_CD,
TO_VALID_DATE(2415020 + T1.ACCT_OPEN_DT),
CASE
WHEN T4.PROD_CD IN ('31060010', '41060010', '31060020', '41170010') THEN
TO_DATE('19990101', 'YYYYMMDD')
WHEN T4.PROD_CD IN ('41090310', '41090110', '41090010', '41080010',
'31260010', '31080020', '31080010', '41160010', '31090310', '31090110',
'31090010','31090410') THEN
TO_DATE('19990107', 'YYYYMMDD')
WHEN T4.PROD_CD = '41110010' THEN
NULL
ELSE
TO_DATE(SUBSTR(T1.OD_VISA_AREA, 23, 8), 'YYYY-MM-DD')
END,
T1.CURRENCY,
T1.CURR_BAL,
CASE
WHEN T4.PROD_TYP_CD IN ('D02', 'D03') THEN
'RF02'
ELSE
'RF01'
END,
CASE T1.VAR_INT_RATE
WHEN 0 THEN
(SELECT B.RATE
FROM S_CCC_MMMM_H_TTTT B
WHERE B.BASE_ID = SUBSTR(T3.CR_ID_DET, 1, 4)
AND B.DATA_DT = TO_DATE('2017-12-31', 'YYYY-MM-DD')
AND TRIM(B.RATE_ID) = SUBSTR(T3.CR_ID_DET, 5, 3)
AND TO_CHAR(TO_VALID_DATE(2415020 + (999999999 - B.BASM_DATE)),
'YYYYMMDD') || B.BASM_TIME =
(SELECT MAX(TO_CHAR(TO_VALID_DATE(2415020 +
(999999999 - S.BASM_DATE)),
'YYYYMMDD') || S.BASM_TIME)
FROM S_CCC_MMMM_H_TTTT S
WHERE S.BASE_ID = B.BASE_ID
AND S.DATA_DT = TO_DATE('2017-12-31', 'YYYY-MM-DD')
AND S.RATE_ID = B.RATE_ID
AND TO_VALID_DATE(2415020 + (999999999 - S.BASM_DATE)) <=
TO_DATE('2017-12-31', 'YYYY-MM-DD')))
ELSE
T1.VAR_INT_RATE
END,
T2.CUST_NM,
''
FROM S_CCC_MMMM T1
LEFT JOIN S_I_CCCC_OOOO T2 ON LPAD(T2.CUST_NUM, 16, '0') =
T1.CUSTOMER_NO
INNER JOIN S_SSS_DDDD T3 ON T3.SYS = 'INV' --INV代表存款
AND T3.TYPE = T1.ACCT_TYPE
AND T3.INT_CAT = T1.INT_CAT
INNER JOIN M_CCCC_DDD_CCC_SSS_TMP T4 ON T1.ACCT_TYPE || T1.INT_CAT = T4.PROD_CD
AND T4.PROD_TYP_CD <> 'D051'
AND T4.BIZ_CLS_CD <> '1'
INNER JOIN JRTJAPP.RRR_OOO_OOOO T5 ON T1.BRANCH_NO = T5.ORG_NO
WHERE T1.CURR_BAL <> 0
AND T1.DATA_DT = TO_DATE('2017-12-31', 'YYYY-MM-DD')
AND T1.CURRENCY = 'CNY'
AND SUBSTR(T1.GL_CLASS_CODE, 10, 8) <> '13090101';
Plan hash value: 3497671024
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 407 | 538K (1)| 01:47:38 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE SINGLE | | 1 | 72 | 13 (0)| 00:00:01 | 2 | 2 |
|* 3 | TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT | 1 | 72 | 13 (0)| 00:00:01 | 2 | 2 |
| 4 | SORT AGGREGATE | | 1 | 59 | | | | |
| 5 | PARTITION RANGE SINGLE| | 1 | 59 | 13 (0)| 00:00:01 | 2 | 2 |
|* 6 | TABLE ACCESS FULL | S_CCC_MMMM_H_TTTT | 1 | 59 | 13 (0)| 00:00:01 | 2 | 2 |
|* 7 | HASH JOIN OUTER | | 1 | 407 | 538K (1)| 01:47:38 | | |
|* 8 | HASH JOIN | | 1 | 387 | 490K (1)| 01:38:03 | | |
|* 9 | HASH JOIN | | 1 | 377 | 490K (1)| 01:38:03 | | |
|* 10 | HASH JOIN | | 1 | 354 | 490K (1)| 01:38:03 | | |
|* 11 | TABLE ACCESS FULL | S_CCC_MMMM | 1 | 313 | 490K (1)| 01:38:03 | | |
|* 12 | TABLE ACCESS FULL | M_CCCC_DDD_CCC_SSS_TMP | 158 | 6478 | 5 (0)| 00:00:01 | | |
|* 13 | TABLE ACCESS FULL | S_SSS_DDDD | 1100 | 25300 | 7 (0)| 00:00:01 | | |
| 14 | TABLE ACCESS FULL | RRR_OOO_OOOO | 3043 | 30430 | 13 (0)| 00:00:01 | | |
| 15 | TABLE ACCESS FULL | S_I_CCCC_OOOO | 23M| 456M| 47781 (2)| 00:09:34 | | |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_CHAR("TO_VALID_DATE"(2415020+(999999999-"B"."BASM_DATE")),'YYYYMMDD')||TO_CHAR("B"."BASM_T
IME")= (SELECT MAX(TO_CHAR("TO_VALID_DATE"(2415020+(999999999-"S"."BASM_DATE")),'YYYYMMDD')||TO_CHAR("S"."B
ASM_TIME")) FROM "S_CCC_MMMM_H_TTTT" "S" WHERE "S"."BASE_ID"=:B1 AND "S"."DATA_DT"=TO_DATE(' 2017-12-31
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "S"."RATE_ID"=:B2 AND
"TO_VALID_DATE"(2415020+(999999999-"S"."BASM_DATE"))<=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')))
3 - filter("B"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"B"."BASE_ID"=SUBSTR(:B1,1,4) AND TRIM("B"."RATE_ID")=SUBSTR(:B2,5,3))
6 - filter("S"."BASE_ID"=:B1 AND "S"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "S"."RATE_ID"=:B2 AND "TO_VALID_DATE"(2415020+(999999999-"S"."BASM_DATE"))<=TO_DATE('
2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
7 - access("T1"."CUSTOMER_NO"=LPAD("T2"."CUST_NUM"(+),16,'0'))
8 - access("T1"."BRANCH_NO"="T5"."ORG_NO")
9 - access("T3"."TYPE"="T1"."ACCT_TYPE" AND "T3"."INT_CAT"="T1"."INT_CAT")
10 - access("T4"."PROD_CD"="T1"."ACCT_TYPE"||"T1"."INT_CAT")
11 - filter("T1"."DATA_DT"=TO_DATE(' 2017-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"T1"."CURR_BAL"<>0 AND SUBSTR("T1"."GL_CLASS_CODE",10,8)<>'13090101' AND "T1"."CURRENCY"='CNY')
12 - filter("T4"."PROD_TYP_CD"<>'D051' AND "T4"."BIZ_CLS_CD"<>'1')
13 - filter("T3"."SYS"='INV')
Note
-----
- dynamic sampling used for this statement (level=2)
优化完之后 7min就出结果。可以看出优化完之后,标量子查询和FILTER都在,但是因为他们不是性能瓶颈所以不需要处理。