之前有写过一个案例,order by limit因为数据分布不均而选择了错误的索引,这是由于优化器没法判断数据的分布关系,默认认为数据分布是均匀的所导致的。
bill=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int); CREATE TABLE
bill=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100; INSERT 0 10000000
3、写入另一批100万条数据,c1,c2 与前面1000万的值不一样。
bill=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200; INSERT 0 1000000
bill=# create index idx_tbl_1 on tbl(id); CREATE INDEX bill=# create index idx_tbl_2 on tbl(c1,c2,c3,c4); CREATE INDEX
bill=# vacuum analyze tbl; VACUUM
bill=# explain select * from tbl where c1=200 and c2=200 order by id; QUERY PLAN ------------------------------------------------------------------------------------- Sort (cost=72109.20..72344.16 rows=93984 width=20) Sort Key: id -> Bitmap Heap Scan on tbl (cost=1392.77..60811.81 rows=93984 width=20) Recheck Cond: ((c1 = 200) AND (c2 = 200)) -> Bitmap Index Scan on idx_tbl_2 (cost=0.00..1369.28 rows=93984 width=0) Index Cond: ((c1 = 200) AND (c2 = 200)) (6 rows)
bill=# begin; BEGIN bill=*# explain declare tt cursor for select * from tbl where c1=200 and c2=200 order by id; QUERY PLAN ------------------------------------------------------------------------------- Index Scan using idx_tbl_1 on tbl (cost=0.43..329277.60 rows=93984 width=20) Filter: ((c1 = 200) AND (c2 = 200)) (2 rows)
因为对于这张表,优化器认为数据是均匀分布的,而实际上,数据分布是不均匀的,c1=200 and c2=200的记录在表的末端。当我们在游标中只检索了前10%的行,所以会得到一个错误的执行计划。
导致选择了错误的执行计划。 /* Determine what fraction of the plan is likely to be scanned */ if (cursorOptions & CURSOR_OPT_FAST_PLAN) { /* * We have no real idea how many tuples the user will ultimately FETCH * from a cursor, but it is often the case that he doesn't want 'em * all, or would prefer a fast-start plan anyway so that he can * process some of the tuples sooner. Use a GUC parameter to decide * what fraction to optimize for. */ tuple_fraction = cursor_tuple_fraction; /* * We document cursor_tuple_fraction as simply being a fraction, which * means the edge cases 0 and 1 have to be treated specially here. We * convert 1 to 0 ("all the tuples") and 0 to a very small fraction. */ if (tuple_fraction >= 1.0) tuple_fraction = 0.0; else if (tuple_fraction <= 0.0) tuple_fraction = 1e-10; } else { /* Default assumption is we need all the tuples */ tuple_fraction = 0.0; }