ORDER BY limit 10比ORDER BY limit 100更慢

问题分析

pg数据库中执行sql时,ORDER BY limit 10比ORDER BY limit 100更慢

执行计划分析

 SELECT
			*,
        (select cl.ITEM_DESC from tablelzl2 cl where item_name='name' and cl.ITEM_NO='abcdefg') AS "item"
        FROM
        tablelzl1 RI
        WHERE  RI.column1='AAAA'
		AND  RI.column2 = 'applyno20231112'
        ORDER BY
        RI.column3 DESC    limit 10
 Limit  (cost=0.43..1522.66 rows=10 width=990)
   ->  Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri  (cost=0.43..158007.45 rows=1038 width=990)
         Filter: (((column1)::text = 'AAAA'::text) AND ((column2)::text = 'applyno20231112'::text))
         SubPlan 1
           ->  Index Scan using uk_tablelzl2_ii on tablelzl2 cl  (cost=0.27..5.29 rows=1 width=18)
                 Index Cond: (((item_no)::text = 'manualSign'::text) AND ((item_name)::text = (ri.manual_sign)::text))

主表没有走到column2索引,而是走column3排序字段索引的Index Scan Backward,scan index的cost非常高,而最终的cost比较低,实际执行需要9s
如果把limit 10改成limit 100,执行计划正常:

 SELECT
			*,
        (select cl.ITEM_DESC from tablelzl2 cl where cl.ITEM_NAME = RI.MANUAL_SIGN AND   cl.ITEM_NO='manualSign') AS "manualSign"
        FROM
        tablelzl1 RI
        WHERE  RI.column1='AAAA'
		AND  RI.column2 = 'applyno20231112'
        ORDER BY
        RI.column3 DESC limit 100 
                                                     QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
Limit  (cost=2632.28..3162.78 rows=100 width=990)
  ->  Result  (cost=2632.28..8138.87 rows=1038 width=990)
        ->  Sort  (cost=2632.28..2634.87 rows=1038 width=474)
              Sort Key: ri.column3 DESC
              ->  Index Scan using idx_cri_column2 on tablelzl1 ri  (cost=0.43..2592.61 rows=1038 width=474)
                    Index Cond: ((column2)::text = 'applyno20231112'::text)
                    Filter: ((column1)::text = 'AAAA'::text)
        SubPlan 1
          ->  Index Scan using uk_tablelzl2_ii on tablelzl2 cl  (cost=0.27..5.29 rows=1 width=18)
                Index Cond: (((item_no)::text = 'manualSign'::text) AND ((item_name)::text = (ri.manual_sign)::text))
(10 rows)

子查询执行计划不变,主表走到column2单列索引,回表后排序再limit,执行非常快。
不仅是limit,如果原sql仅更换column2的值,执行计划也正常。也就是说这个生产的sql只有极个别的column2的值时执行计划是异常的。

执行计划分析:
子查询前后没变可以不用分析,主要是索引选择上的不同。column2是过滤字段,column3是排序字段,两个执行计划分别选择了这2个字段的索引。

  • 异常的limit 10执行计划:反向扫描排序字段索引->回表 ->limit。因为不需要额外排序,反向扫描索引时找到limit个数据就可以不用继续扫描了;扫描排序字段索引的预估代价非常高,最上层的limit最终代价预估很低。
  • 正常的limit 100执行计划:访问过滤字段索引->回表 ->以排序字段排序 ->limit。因为要排序,需要把符合条件的所有索引条目全部找出来;本身访问过滤字段的索引代价预估低。

所以问题的关键在于部分反向扫描排序索引时,代价预估的过低

真实的执行情况

explain (analyze,buffers) 看下真实的执行情况

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..1521.93 rows=10 width=990) (actual time=23.311..8122.516 rows=10 loops=1)
   Buffers: shared hit=861100 read=42985 dirtied=7
   I/O Timings: read=6741.003
   ->  Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri  (cost=0.43..157932.45 rows=1038 width=990) (actual time=23.309..8122.505 rows=10 loops=1)
         Filter: (((column1)::text = 'AAAA'::text) AND ((column2)::text = 'applyno20231112'::text))
         Rows Removed by Filter: 1521796
         Buffers: shared hit=861100 read=42985 dirtied=7
         I/O Timings: read=6741.003
         SubPlan 1
           ->  Index Scan using uk_tablelzl2_ii on tablelzl2 cl  (cost=0.27..5.29 rows=1 width=18) (actual time=0.005..0.005 rows=0 loops=10)
                 Index Cond: (((item_no)::text = 'manualSign'::text) AND ((item_name)::text = (ri.manual_sign)::text))
                 Buffers: shared hit=6
 Planning:
   Buffers: shared hit=121 read=28
   I/O Timings: read=1.476
 Planning Time: 2.314 ms
 Execution Time: 8122.658 ms
 Limit  (cost=2632.28..3162.78 rows=100 width=990) (actual time=150.101..150.122 rows=14 loops=1)
   Buffers: shared hit=700 read=274
   I/O Timings: read=146.903
   ->  Result  (cost=2632.28..8138.87 rows=1038 width=990) (actual time=150.100..150.119 rows=14 loops=1)
         Buffers: shared hit=700 read=274
         I/O Timings: read=146.903
         ->  Sort  (cost=2632.28..2634.87 rows=1038 width=474) (actual time=150.072..150.073 rows=14 loops=1)
               Sort Key: ri.column3 DESC
               Sort Method: quicksort  Memory: 30kB
               Buffers: shared hit=694 read=274
               I/O Timings: read=146.903
               ->  Index Scan using idx_cri_column2 on tablelzl1 ri  (cost=0.43..2592.61 rows=1038 width=474) (actual time=0.418..149.973 rows=14 loops=1)
                     Index Cond: ((column2)::text = 'applyno20231112'::text)
                     Filter: ((column1)::text = 'AAAA'::text)
                     Rows Removed by Filter: 1218
                     Buffers: shared hit=691 read=274
                     I/O Timings: read=146.903
         SubPlan 1
           ->  Index Scan using uk_tablelzl2_ii on tablelzl2 cl  (cost=0.27..5.29 rows=1 width=18) (actual time=0.002..0.002 rows=0 loops=14)
                 Index Cond: (((item_no)::text = 'manualSign'::text) AND ((item_name)::text = (ri.manual_sign)::text))
                 Buffers: shared hit=6
 Planning Time: 0.334 ms
 Execution Time: 150.257 ms

limit 10的执行计划,执行8s,内存读shared hit=861100 磁盘读read=42985 ,丢弃了1521796行
limit 100的执行计划执行0.1s shared hit=694 read=274,丢弃了1218行
limit 10的执行计划明显是不正常,读了太多的数据才找到符合条件的行,这是sql执行过慢的原因

统计信息分析

本身预估的代价不高,但是实际上需要扫描非常多的索引行,首先想到是否是统计信息是否准确
表的统计信息:

[postgres@cnsz381785:7169/(rasesql)phmamp][10-30.15:01:26]M=#  select relpages,reltuples::bigint from pg_class where relname='tablelzl1';
 relpages | reltuples 
----------+-----------
    91172 |   2280874 --count出来差不多

字段的统计信息:

[phmampopr@cnsz381785:7169/(rasesql)phmamp][10-27.17:08:48]M=>  select * from pg_stats where tablename='tablelzl1' and attname='column2';
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tablelzl1
attname                | column2
inherited              | f
null_frac              | 0
avg_width              | 18
n_distinct             | -0.11990886
most_common_vals       | {applyno20231112,DY20190723006650,DY20200102012899,DY20180827000557,DY20190524001304,DY20190529001885,DY20190728002359}
most_common_freqs      | {0.0005,0.00026666667,0.00023333334,0.0002,0.0002,0.0002,0.0002}
histogram_bounds       | {CULZF0000121605605,DSNEW0000126854232,DSNEW0000137652871,DY20160516001057,DY20161104005509,DY20170306002677,DY20170703010428,DY20170928013517,DY20180410007383,DY20180615002936,DY20180
correlation            | 0.3131596
most_common_elems      | [null]
most_common_elem_freqs | [null]
elem_count_histogram   | [null]

这个column2 applyno20231112刚好就是排第一的most_common_vals,出现预估概率是0.0005,用预估的行2280874*0.0005=1140,与实际的行数1232差不多

[postgres@cnsz381785:7169/(rasesql)phmamp][10-30.15:05:28]M=# select count(*) from tablelzl1 where column2 = 'applyno20231112';
 count 
-------
  1232

说明统计信息是准确的,实际上运行analze收集统计信息也不会解决这个问题

数据分布不均的计算

用当前统计信息计算出来的符合条件的行有1140个,那么预计从排序字段的索引上找到第一条数据平均要扫描2280874/1140=2000个索引行。如果找10条便是20000个索引行,100条便是200000个索引行。

把sort禁用,让limit 100语句强行走排序字段的索引

M=# set enable_sort=off;
SET
--limit 100的执行计划
 Limit  (cost=0.43..15222.69 rows=100 width=990)
   ->  Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri  (cost=0.43..158007.45 rows=1038 width=990)
         Filter: (((column1)::text = 'AAAA'::text) AND ((column2)::text = 'applyno20231112'::text))
         SubPlan 1
           ->  Index Scan using uk_tablelzl2_ii on tablelzl2 cl  (cost=0.27..5.29 rows=1 width=18)
                 Index Cond: (((item_no)::text = 'manualSign'::text) AND ((item_name)::text = (ri.manual_sign)::text))

limit 10改成limit 100后的执行计划,代价从1522.66升到了15222.69,基本上只是简单的*10。limit 100的代价15222.69大于了走过滤字段索引的执行计划cost 3162.78,所以limit 10和limit 100执行计划不同,选择了不同的索引。

以上的估算都是以数据零散的放在排序列的索引上 为前提的,实际情况有可能数据在最后一条(反向扫描索引),很快就能找到;也有可能数据全部在索引叶节点前面的几个pages,此时几乎是扫描全部索引并回表,代价便非常高。
那么两个字段的关联度,数据在索引上的分布情况,决定了使用排序字段的索引 的效率。

再看下真实的执行扫描了多少行数据:

   ->  Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri  (cost=0.43..157932.45 rows=1038 width=990) (actual time=23.309..8122.505 rows=10 loops=1)
         Filter: (((column1)::text = 'AAAA'::text) AND ((column2)::text = 'applyno20231112'::text))
         Rows Removed by Filter: 1521796

实际上差不多扫描了1521796行才找到这10条数据,本来预估的是20000,整整相差了76倍!

触发场景

  • 必须有where +order by+limit语句
  • 排序字段和过滤字段都必须有索引
  • 一般limit不会特别大
  • 数据分布不均

解决办法

改写sql语句:添加表达式,不让order by字段走索引即可

 SELECT
			*,
        (select cl.ITEM_DESC from tablelzl2 cl where cl.ITEM_NAME = RI.MANUAL_SIGN AND   cl.ITEM_NO='manualSign') AS "manualSign"
        FROM
        tablelzl1 RI
        WHERE  RI.column1='AAAA'
		AND  RI.column2 = 'applyno20231112'
        ORDER BY
        RI.column3 +'0' DESC    limit 10

oracle是怎么做的

执行计划cost的预估差异

从上面的执行计划分析,pg的执行计划cost看起来不太适应,上层的cost小于内层的cost,不像oracle这样阶梯式的累加计算
这里做一个oracle和pg的实验,一张表仅存储colname='x’的数据,看下pg和oracle的对cost计算的区别:

[postgres@cnsz381785:7169/(rasesql)dbmgr][10-31.14:32:19]M=# explain select * from testlzl where col1='x' limit 1;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Limit  (cost=0.00..0.02 rows=1 width=2)
   ->  Seq Scan on testlzl  (cost=0.00..17747.20 rows=1048576 width=2)
         Filter: ((col1)::text = 'x'::text)
[postgres@cnsz381785:7169/(rasesql)dbmgr][10-31.14:32:30]M=# explain select * from testlzl where col1='xx' limit 1;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Limit  (cost=0.00..17747.20 rows=1 width=2)
   ->  Seq Scan on testlzl  (cost=0.00..17747.20 rows=1 width=2)
         Filter: ((col1)::text = 'xx'::text)

col1='x’立马就能找到,limit的算法没有推入到全表扫描的成本中,total cost是17747.20,跟扫描完表的成本是一样的。limit的成本cost虽然没有下推到内层的cost做计算,但是rows计算了!

来看下oracle是执行计划是怎么做的:

SYS@t8icss1> select * from dbmgr.testlzl where a='x' and rownum<=1;

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2045386539

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |     1 |     2 |     2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY     |         |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| TESTLZL |     1 |     2 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   2 - filter("A"='x')

SYS@t8icss1> select * from dbmgr.testlzl where a='xx' and rownum<=1;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 2045386539

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |     1 |     2 |   302   (2)| 00:00:01 |
|*  1 |  COUNT STOPKEY     |         |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| TESTLZL |     1 |     2 |   302   (2)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   2 - filter("A"='xx')

对于oracle的计划,a='x’的数据可以立即找到的话,STOPKEY的代价算进了内层的cost中,cost只有2,实际上扫描全表的代价比较高302。

这一点是oracle与pg关于cost计算的一个重要区别:

  • oracle的外层cost必然>=内层cost;pg则不一定
  • oracle的内层cost计算包含了外层的算子(比如stopkey);但是pg不会包含,直接给子路径的全部成本

oracle的数据分布不均问题

知道了数据分布不均的原理,造一条数据把他放在排序索引的开头即可

create table tlzl(a char(100) not null,b char(100) not null);
--插入批量数据
begin
for i in 1..100000 loop
insert into tlzl values('test','test');
end loop;
end;
/
--插入特殊数据
insert into tlzl values('aaaa','aaaa');
insert into tlzl values('zzzz','zzzz');
--创建索引
create index idx_a on tlzl(a);
create index idx_b on tlzl(b);
--收集统计信息
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'SYS',TABNAME=>'TLZL',estimate_percent => 10, degree=>1,METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO',cascade=>true);
select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b='aaaa' order by a) where rownum<=1; 
select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b='zzzz' order by a) where rownum<=1; 
SYS@t8icss1> select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b='aaaa' order by a) where rownum<=1; 

Execution Plan
----------------------------------------------------------
Plan hash value: 3674066029

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |   204 |  2210   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |       |       |       |            |          |
|   2 |   VIEW                        |       |     1 |   204 |  2210   (1)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TLZL  |     1 |   202 |  2210   (1)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IDX_A | 98830 |       |   779   (1)| 00:00:01 |
---------------------------------------------------------------------------------------
SYS@t8icss1> select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b='zzzz' order by a) where rownum<=1; 

Execution Plan
----------------------------------------------------------
Plan hash value: 3674066029

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |   204 |  2210   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |       |       |       |            |          |
|   2 |   VIEW                        |       |     1 |   204 |  2210   (1)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TLZL  |     1 |   202 |  2210   (1)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IDX_A | 98830 |       |   779   (1)| 00:00:01 |
---------------------------------------------------------------------------------------

oracle的优化器也是一样的,优化器并不知道数据到底放在索引的哪个地方,没有办法,放在索引的第一条和最后一条都是估算的同一代价。
不过oracle有很多方法可以解决这个问题,如extended statistic、Automatic Column Group Detection、固化执行计划等。

参考

http://www.postgres.cn/v2/news/viewone/1/717
https://oracle-base.com/articles/12c/automatic-column-group-detection-extended-statistics-12cr1

你可能感兴趣的:(PG,ORACLE,postgresql,执行计划,数据分布,order,by,limit)