查询性能很大程度上依赖于硬件的性能,这里暂不考虑硬件的影响。
原始语句:
EXPLAIN EXTENDED SELECT sql_no_cache
L_RETURNFLAG,
L_LINESTATUS,
SUM(L_QUANTITY) AS SUM_QTY,
SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS SUM_DISC_PRICE,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT) * (1 + L_TAX)) AS SUM_CHARGE,
AVG(L_QUANTITY) AS AVG_QTY,
AVG(L_EXTENDEDPRICE) AS AVG_PRICE,
AVG(L_DISCOUNT) AS AVG_DISC,
COUNT(*) AS COUNT_ORDER
FROM
LINEITEM
WHERE
L_SHIPDATE <= DATE'1998-12-01' - INTERVAL '90' DAY
GROUP BY
L_RETURNFLAG,
L_LINESTATUS
ORDER BY
L_RETURNFLAG,
L_LINESTATUS;
+----+-------------+----------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | LINEITEM | ALL | i_l_shipdate | NULL | NULL | NULL | 5959532 | 50.00 | Using where; Using temporary; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
1 row in set, 1 warning (0.22 sec)
可以看到这里做的是全表扫描。
查询耗时4 rows in set (48.45 sec)
查看各列在表中的唯一性:
mysql> select count(distinct L_SHIPDATE),count(distinct L_RETURNFLAG),count(distinct L_LINESTATUS),count(distinct L_QUANTITY),count(distinct L_EXTENDEDPRICE),count(distinct L_DISCOUNT),count(distinct L_TAX) from LINEITEM\G;
*************************** 1. row ***************************
count(distinct L_SHIPDATE): 2526
count(distinct L_RETURNFLAG): 3
count(distinct L_LINESTATUS): 2
count(distinct L_QUANTITY): 50
count(distinct L_EXTENDEDPRICE): 933900
count(distinct L_DISCOUNT): 11
count(distinct L_TAX): 9
1 row in set (26.26 sec)
增加覆盖索引:
mysql> alter table lineitem add index idx_merge(`l_shipDATE`,`l_returnflag`,`l_linestatus`,`l_extendedprice`,`l_quantity`,`l_discount`,`l_tax`);
查看查询执行计划:
+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | LINEITEM | range | i_l_shipdate,idx_merge | idx_merge | 4 | NULL | 2979766 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+-----------------------------------------------------------+
1 row in set, 1 warning (0.01 sec)
可以考到使用了临时表和文件排序。
查询耗时4 rows in set (18.01 sec)
更改索引列顺序:
mysql> alter table lineitem drop index idx_merge,add index idx_merge(`l_returnflag`,`l_linestatus`,`l_shipDATE`,`l_extendedprice`,`l_quantity`,`l_discount`,`l_tax`);
查看执行计划:
+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | LINEITEM | index | i_l_shipdate,idx_merge | idx_merge | 44 | NULL | 5959532 | 50.00 | Using where; Using index |
+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.10 sec)
消除了临时表和文件排序。
但是从影响的行数可以看出,做的又是全表扫描。
此时的查询耗时:4 rows in set (15.07 sec)。比此前提升了3s左右。
通过对profile各个状态的时间分析,主要瓶颈在cpu的处理时间上。
mysql> show profile for query 2;
+----------------------+-----------+
| Status | Duration |
+----------------------+-----------+
| starting | 0.000129 |
| checking permissions | 0.000009 |
| Opening tables | 0.000018 |
| init | 0.000039 |
| System lock | 0.000015 |
| optimizing | 0.000011 |
| statistics | 0.000001 |
| preparing | 0.000041 |
| Sorting result | 0.000006 |
| executing | 0.000005 |
| Sending data | 15.066530 |
| end | 0.000062 |
| query end | 0.000026 |
| closing tables | 0.000022 |
| freeing items | 0.001095 |
| logging slow query | 0.000042 |
| cleaning up | 0.000022 |
+----------------------+-----------+
17 rows in set, 1 warning (0.06 sec)
综上,理论上正确的索引字段顺序where条件应该在最前面,以此过滤到不必要的数据。但是该sql比较特殊,排序字段值的唯一值非常低。肯定还有更好的优化策略,不在探讨中。