后台某维度统计语句
SELECT products_id, sku, px_id, sj_id, cat_path, COUNT(*) AS pv, COUNT(DISTINCT ip) AS ip_numbers, SUM(is_bounce) AS bounce_numbers, SUM(remain_time) AS remain_time FROM dm_pv_records_search WHERE 1 AND add_date >= '2014-02-26 10:00:00' AND products_id > 0 GROUP BY products_id,sku, px_id, sj_id, cat_path order by ip_numbers desc limit 0,20;
每次有同事在后台查询这类数据时,都反应数据非常慢,偶尔不能出来数据。
explain
+----+-------------+----------------------+-------+----------------------+----------+---------+------+---------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------------+-------+----------------------+----------+---------+------+---------+----------------------------------------------+ | 1 | SIMPLE | dm_pv_records_search | range | products_id,add_date | add_date | 8 | NULL | 12831019 | Using where; Using temporary; Using filesort | +----+-------------+----------------------+-------+----------------------+----------+---------+------+---------+----------------------------------------------+
这条语句索引使用是正确的,不过结果集及其大,而且需要对结果集group by操作。
观察该sql语句发现,sku,px_id,sj_id,cat_path四个字段为冗余字段,都可以从其他表连表获得。
而改sql又存在order by .. limit m,n,可知道后期结果集只有20个,更加适合后期连接相关表获取其他必要的字段sku,px_id,sj_id,cat_path。
如此一转换,可以将group by中的sku,px_id,sj_id,cat_path去掉(其中sku和cat_path为字符串),节省后期大结果集中排序的内存。
而group by和distinct优化的一种方式是,尽量减少不必要的字段,可以参考简朝阳写的《mysql性能调优和架构设计》的第8.6小结,或者参考以下连接:http://isky000.com/database/mysql_group_by_implement 。
我们可以尝试将sql改成
SELECT products_id, COUNT(*) AS pv, COUNT(DISTINCT ip) AS ip_numbers, SUM(is_bounce) AS bounce_numbers, SUM(remain_time) AS remain_time FROM dm_pv_records_search WHERE 1 AND add_date >= '2014-02-26 10:00:00' AND products_id > 0 GROUP BY products_id order by ip_numbers desc limit 0,20;
对比两次profiling,前者执行时间是58s,后者执行时间是5s.
前者profiling如下
+----------------------+-----------+----------+------------+--------------+---------------+ | Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +----------------------+-----------+----------+------------+--------------+---------------+ | starting | 0.000133 | 0.000000 | 0.000000 | 0 | 0 | | checking permissions | 0.000014 | 0.000000 | 0.000000 | 0 | 0 | | Opening tables | 0.000031 | 0.000000 | 0.000000 | 0 | 0 | | System lock | 0.000018 | 0.000000 | 0.000000 | 0 | 0 | | init | 0.000061 | 0.000000 | 0.000000 | 0 | 0 | | optimizing | 0.000019 | 0.000000 | 0.000000 | 0 | 0 | | statistics | 0.000073 | 0.000000 | 0.000000 | 0 | 0 | | preparing | 0.000051 | 0.000000 | 0.000000 | 0 | 0 | | Creating tmp table | 0.000051 | 0.000000 | 0.000000 | 0 | 0 | | Sorting for group | 47.735389 | 4.614299 | 10.773362 | 3632 | 2811456 | | executing | 0.000010 | 0.000000 | 0.000000 | 0 | 0 | | Copying to tmp table | 11.566292 | 0.910861 | 0.586911 | 256 | 4408 | | Sorting result | 0.030459 | 0.025996 | 0.004000 | 0 | 0 | | Sending data | 0.000057 | 0.000000 | 0.000000 | 0 | 0 | | end | 0.000005 | 0.000000 | 0.000000 | 0 | 0 | | removing tmp table | 0.008139 | 0.000000 | 0.008998 | 0 | 0 | | end | 0.000007 | 0.000000 | 0.000000 | 0 | 0 | | query end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | | closing tables | 0.000011 | 0.000000 | 0.000000 | 0 | 0 | | freeing items | 0.000071 | 0.000000 | 0.000000 | 0 | 0 | | removing tmp table | 0.000006 | 0.000000 | 0.000000 | 0 | 0 | | freeing items | 0.000331 | 0.000000 | 0.000000 | 0 | 0 | | logging slow query | 0.000006 | 0.000000 | 0.000000 | 0 | 0 | | logging slow query | 0.000047 | 0.000000 | 0.000000 | 0 | 8 | | cleaning up | 0.000005 | 0.000000 | 0.000000 | 0 | 0 | +----------------------+-----------+----------+------------+--------------+---------------+
去掉多余字段后的profiling
+--------------------------------+----------+----------+------------+--------------+---------------+ | Status | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | +--------------------------------+----------+----------+------------+--------------+---------------+ | starting | 0.000027 | 0.000000 | 0.000000 | 0 | 0 | | Waiting for query cache lock | 0.000008 | 0.000000 | 0.000000 | 0 | 0 | | checking query cache for query | 0.000082 | 0.000000 | 0.000000 | 0 | 0 | | checking permissions | 0.000011 | 0.000000 | 0.000000 | 0 | 0 | | Opening tables | 0.000020 | 0.000000 | 0.000000 | 0 | 0 | | System lock | 0.000014 | 0.000000 | 0.000000 | 0 | 0 | | Waiting for query cache lock | 0.000038 | 0.000000 | 0.000000 | 0 | 0 | | init | 0.000040 | 0.000000 | 0.000000 | 0 | 0 | | optimizing | 0.000019 | 0.000000 | 0.000000 | 0 | 0 | | statistics | 0.000061 | 0.000000 | 0.000000 | 0 | 0 | | preparing | 0.000051 | 0.000000 | 0.000000 | 0 | 0 | | Creating tmp table | 0.000042 | 0.000000 | 0.000000 | 0 | 0 | | Sorting for group | 5.275880 | 2.539614 | 6.813964 | 120 | 287368 | | executing | 0.000011 | 0.000000 | 0.000000 | 0 | 0 | | Copying to tmp table | 0.492344 | 0.478927 | 0.021997 | 24 | 264 | | Sorting result | 0.018007 | 0.016997 | 0.001000 | 0 | 0 | | Sending data | 0.000039 | 0.000000 | 0.000000 | 0 | 0 | | end | 0.000005 | 0.000000 | 0.000000 | 0 | 0 | | removing tmp table | 0.001028 | 0.000000 | 0.000999 | 0 | 0 | | end | 0.000007 | 0.000000 | 0.000000 | 0 | 0 | | query end | 0.000004 | 0.000000 | 0.000000 | 0 | 0 | | closing tables | 0.000011 | 0.000000 | 0.000000 | 0 | 0 | | freeing items | 0.000073 | 0.000000 | 0.000000 | 0 | 0 | | removing tmp table | 0.000006 | 0.000000 | 0.000000 | 0 | 0 | | freeing items | 0.000008 | 0.000000 | 0.000000 | 0 | 0 | | Waiting for query cache lock | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | | freeing items | 0.000290 | 0.000000 | 0.000000 | 0 | 0 | | Waiting for query cache lock | 0.000007 | 0.000000 | 0.000000 | 0 | 0 | | freeing items | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | | storing result in query cache | 0.000005 | 0.000000 | 0.000000 | 0 | 0 | | logging slow query | 0.000003 | 0.000000 | 0.000000 | 0 | 0 | | logging slow query | 0.000068 | 0.000000 | 0.000000 | 0 | 8 | | cleaning up | 0.000007 | 0.000000 | 0.000000 | 0 | 0 | +--------------------------------+----------+----------+------------+--------------+---------------+
我们可以看到没有去掉多余字段前 Sorting for group和Copying to tmp table占用了大部分io和cpu,
而去掉后Copying to tmp table占用io和cpu时间相对来说较少,而且 Sorting for group性能提升5倍之多!
注意事项
1. 优化group by和distinct我们在sql层面能做的事情是尽量减少不必要的字段