浅析TPCH对查询Q1的优化-牺牲过滤条件对性能的提升

    查询性能很大程度上依赖于硬件的性能,这里暂不考虑硬件的影响。

    原始语句:

EXPLAIN EXTENDED SELECT sql_no_cache

L_RETURNFLAG, 

L_LINESTATUS, 

SUM(L_QUANTITY) AS SUM_QTY, 

SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, 

SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS SUM_DISC_PRICE, 

SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT) * (1 + L_TAX)) AS SUM_CHARGE, 

AVG(L_QUANTITY) AS AVG_QTY, 

AVG(L_EXTENDEDPRICE) AS AVG_PRICE, 

AVG(L_DISCOUNT) AS AVG_DISC, 

COUNT(*) AS COUNT_ORDER 

FROM 

LINEITEM

WHERE 

L_SHIPDATE <= DATE'1998-12-01' - INTERVAL '90' DAY

GROUP BY 

L_RETURNFLAG, 

L_LINESTATUS

ORDER BY 

L_RETURNFLAG, 

L_LINESTATUS;

+----+-------------+----------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+

| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                        |

+----+-------------+----------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+

|  1 | SIMPLE      | LINEITEM | ALL  | i_l_shipdate  | NULL | NULL    | NULL | 5959532 |    50.00 | Using where; Using temporary; Using filesort | 

+----+-------------+----------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+

1 row in set, 1 warning (0.22 sec)

    可以看到这里做的是全表扫描。

    查询耗时4 rows in set (48.45 sec)

    查看各列在表中的唯一性:

mysql> select count(distinct L_SHIPDATE),count(distinct L_RETURNFLAG),count(distinct L_LINESTATUS),count(distinct L_QUANTITY),count(distinct L_EXTENDEDPRICE),count(distinct  L_DISCOUNT),count(distinct L_TAX) from LINEITEM\G;

*************************** 1. row ***************************

     count(distinct L_SHIPDATE): 2526

   count(distinct L_RETURNFLAG): 3

   count(distinct L_LINESTATUS): 2

     count(distinct L_QUANTITY): 50

count(distinct L_EXTENDEDPRICE): 933900

    count(distinct  L_DISCOUNT): 11

          count(distinct L_TAX): 9

1 row in set (26.26 sec)

    增加覆盖索引:

mysql> alter table lineitem add index idx_merge(`l_shipDATE`,`l_returnflag`,`l_linestatus`,`l_extendedprice`,`l_quantity`,`l_discount`,`l_tax`);

    查看查询执行计划:

+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+-----------------------------------------------------------+

| id | select_type | table    | type  | possible_keys          | key       | key_len | ref  | rows    | filtered | Extra                                                     |

+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+-----------------------------------------------------------+

|  1 | SIMPLE      | LINEITEM | range | i_l_shipdate,idx_merge | idx_merge | 4       | NULL | 2979766 |   100.00 | Using where; Using index; Using temporary; Using filesort | 

+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+-----------------------------------------------------------+

1 row in set, 1 warning (0.01 sec)

    可以考到使用了临时表和文件排序。

    查询耗时4 rows in set (18.01 sec)

    更改索引列顺序:

mysql> alter table lineitem drop index idx_merge,add index idx_merge(`l_returnflag`,`l_linestatus`,`l_shipDATE`,`l_extendedprice`,`l_quantity`,`l_discount`,`l_tax`);

    查看执行计划:

+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+--------------------------+

| id | select_type | table    | type  | possible_keys          | key       | key_len | ref  | rows    | filtered | Extra                    |

+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+--------------------------+

|  1 | SIMPLE      | LINEITEM | index | i_l_shipdate,idx_merge | idx_merge | 44      | NULL | 5959532 |    50.00 | Using where; Using index | 

+----+-------------+----------+-------+------------------------+-----------+---------+------+---------+----------+--------------------------+

1 row in set, 1 warning (0.10 sec)

    消除了临时表和文件排序。

    但是从影响的行数可以看出,做的又是全表扫描。

    此时的查询耗时:4 rows in set (15.07 sec)。比此前提升了3s左右。

    通过对profile各个状态的时间分析,主要瓶颈在cpu的处理时间上。

mysql> show profile for query 2;

+----------------------+-----------+

| Status               | Duration  |

+----------------------+-----------+

| starting             |  0.000129 | 

| checking permissions |  0.000009 | 

| Opening tables       |  0.000018 | 

| init                 |  0.000039 | 

| System lock          |  0.000015 | 

| optimizing           |  0.000011 | 

| statistics           |  0.000001 | 

| preparing            |  0.000041 | 

| Sorting result       |  0.000006 | 

| executing            |  0.000005 | 

| Sending data         | 15.066530 | 

| end                  |  0.000062 | 

| query end            |  0.000026 | 

| closing tables       |  0.000022 | 

| freeing items        |  0.001095 | 

| logging slow query   |  0.000042 | 

| cleaning up          |  0.000022 | 

+----------------------+-----------+

17 rows in set, 1 warning (0.06 sec)

    综上,理论上正确的索引字段顺序where条件应该在最前面,以此过滤到不必要的数据。但是该sql比较特殊,排序字段值的唯一值非常低。肯定还有更好的优化策略,不在探讨中。

你可能感兴趣的:(TPCH对查询Q1的优化)