专题描述
修改存储过程时解决了SQL语句因为GROUP BY子句中多余的字段造成的性能问题。
问题提出
1
优化存储过程15-PRC_EXPRESS_SPECIAL_REBATE_INVOICE时发现第338行SQL语句存在group by子句中字段过多造成的性能问题,原SQL语句如下:
INSERT INTO tt_express_invoice (
.........
SELECT
e.express_invoice_code,
e.bill_code,
e.cust_code,
e.cust_dept,
e.vip_code,
e.vip_dept,
e.start_dt,
e.end_dt,
e.bill_start_dt,
e.bill_end_dt,
1,
e.BILLING_VERSION_NO,
e.bill_period,
e.bill_type,
SUM(s.discount_amt),
V_EXPRESS_SPECIAL_ZK,
e.gl_dt,
1,
e.BILL_JOB_BATCH_NO,
e.bill_batch_no,
NOW(),
v_invoice_batch_no,
2
FROM
(SELECT DISTINCT
e.express_invoice_code,
e.bill_code,
e.cust_code,
e.cust_dept,
e.vip_code,
e.vip_dept,
e.start_dt,
e.end_dt,
e.gl_dt,
e.bill_start_dt,
e.bill_end_dt,
e.BILLING_VERSION_NO,
e.bill_period,
e.bill_type,
e.BILL_JOB_BATCH_NO,
e.bill_batch_no
FROM
tt_express_invoice e
WHERE e.`invoice_batch_no` = v_invoice_batch_no
AND NOT EXISTS
(SELECT
1
FROM
tt_express_invoice t
WHERE t.`cust_code` = e.`cust_code`
AND t.`cust_dept` = e.`cust_dept`
AND t.`bill_start_dt` = e.`bill_start_dt`
AND t.`bill_end_dt` = e.`bill_end_dt`
AND t.`invoice_batch_no` = v_invoice_batch_no
AND t.`FEE_TYPE_CODE` = V_EXPRESS_SPECIAL_ZK)) e,
tt_special_rebate s
WHERE e.`cust_code` = s.`customer_code`
AND e.`cust_dept` = s.`customer_dept`
AND e.`bill_start_dt` = s.`start_dt`
AND e.`bill_end_dt` = s.`end_dt`
AND s.`billing_flag` = 1
AND s.`rebate_type` = 1
GROUP BY e.express_invoice_code,
e.bill_code,
e.cust_code,
e.cust_dept,
e.vip_code,
e.vip_dept,
e.start_dt,
e.end_dt,
e.bill_start_dt,
e.bill_end_dt,
e.billing_version_no,
e.bill_period,
e.bill_type,
e.gl_dt,
bill_job_batch_no ;
分析过程
1
通过分析发现GROUP BY e.express_invoice_code,
e.bill_code,
e.cust_code,
e.cust_dept,
e.vip_code,
e.vip_dept,
e.start_dt,
e.end_dt,
e.bill_start_dt,
e.bill_end_dt,
e.billing_version_no,
e.bill_period,
e.bill_type,
e.gl_dt,
bill_job_batch_no
子句中 e.express_invoice_code列是主键记录唯一,
且子句中其他列与e.express_invoice_code来自于同一张表
解决方案
1
将语句中GROUP BY e.express_invoice_code,
e.bill_code,
e.cust_code,
e.cust_dept,
e.vip_code,
e.vip_dept,
e.start_dt,
e.end_dt,
e.bill_start_dt,
e.bill_end_dt,
e.billing_version_no,
e.bill_period,
e.bill_type,
e.gl_dt,
bill_job_batch_no
改为
GROUP BY e.express_invoice_code
问题原因
1
group by子句中包含了多余的列,因为当e.express_invoice_code能唯一标识子句中其他列时,优化前后查询结果集相同。
知识点
1
只有当优化之后的group by子句中的列能唯一标识一行数据,且优化之前子句中其他列与其在同一张表中才能进行group by去重优化。
2
见MySQL5.6手册1.8.1 MySQL Extensions to Standard SQL章节中文字“You don't need to name all selected columns in the GROUP BY clause. This gives better
performance for some very specific, but quite normal queries. See Section 12.19, “Functions and
Modifiers for Use with GROUP BY Clauses”.”
而oracle的语法:使用GROUP BY分组计算或去重时,SELECT子句中出现的字段必须在GROUP BY中出现;
3
详见手册:12.19.3 MySQL Handling of GROUP BY,文中指出你可以使用这一特征(group by语句中不必出现select语句中所有的列),避免不必要的列排序和分组获得更好的性能。
4
优化后的SQL可以避免二次排序,优化之前需要根据多个字段进行排序后到聚簇索引中取出数据,而优化后的SQL因为express_invoice_code是主键,在innodb引擎中聚簇索引按照主键排序。