explain返回行是通过对应表的列的统计信息来评估的,在系统表pg_stats中。
建表,插入测试数据:
postgres=# create table tb15(id integer,name character varying);
CREATE TABLE
postgres=# insert into tb15 select generate_series(1,10000),'john';
INSERT 0 10000
postgres=# analyze tb15;
ANALYZE
例1. where条件完全落在柱状图的边界值。
postgres=# explain(analyze,verbose,buffers)select * from tb15 where id<1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on public.tb15 (cost=0.00..180.00 rows=1000 width=9) (actual time=0.009..0.821 rows=999 loops=1)
Output: id, name
Filter: (tb15.id < 1000)
Rows Removed by Filter: 9001
Buffers: shared hit=55
Total runtime: 0.901 ms
(6 rows)
postgres=# \x
Expanded display is on
postgres=# select * from pg_stats where tablename ='tb15' and attname='id';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | tb15
attname | id
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {1,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
当where条件不是等于的时候,且n_distinct为-1的时候(表示该字段的值都是唯一的),可以参考pg_stats表中的histogram_bounds字段的统计值,也就是表字段的柱状图。
postgres=# show default_statistics_target ;
default_statistics_target
---------------------------
100
(1 row)
这里提一下也可以修改字段的统计的default_statistics_target值
postgres=# \h alter table
Command: ALTER TABLE
Description: change the definition of a table
Syntax:
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
action [, ... ]
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
RENAME [ COLUMN ] column_name TO new_column_name
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ * ]
RENAME CONSTRAINT constraint_name TO new_constraint_name
ALTER TABLE [ IF EXISTS ] name
RENAME TO new_name
ALTER TABLE [ IF EXISTS ] name
SET SCHEMA new_schema
where action is one of:
ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]
ALTER [ COLUMN ] column_name SET DEFAULT expression
ALTER [ COLUMN ] column_name DROP DEFAULT
ALTER [ COLUMN ] column_name { SET | DROP } NOT NULL
ALTER [ COLUMN ] column_name SET STATISTICS integer --设置字段的default_statistics_target值
ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] )
ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
ADD table_constraint [ NOT VALID ]
...
..
.
分析rows=1000是怎么来的?
postgres=# explain(analyze,verbose,buffers)select * from tb15 where id<920;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on public.tb15 (cost=0.00..180.00 rows=920 width=9) (actual time=0.013..2.352 rows=919 loops=1)
Output: id, name
Filter: (tb15.id < 920)
Rows Removed by Filter: 9081
Buffers: shared hit=55
Total runtime: 2.453 ms
(6 rows)
现在条件是id<920,观察上面的柱状图,id<900的记录完整落在柱状图的前面9个区间里(1,100,200,300,400,500,600,700,800,900),id>=900 and id<920的落在了第10个区间,这个时候rows=920的计算方法是:
rows=((9+(920-900)/(1000-900.0))/100)*10000=920.
10000:10000行数据。
postgres=# insert into tb15 select 1,'join' from generate_series(1,500);
INSERT 0 500
postgres=# insert into tb15 select 2,'join' from generate_series(1,300);
INSERT 0 300
postgres=# insert into tb15 select 3,'join' from generate_series(1,200);
INSERT 0 200
postgres=# insert into tb15 select 4,'join' from generate_series(1,100);
INSERT 0 100
postgres=# analyze tb15;
ANALYZE
postgres=# select relname,relpages,reltuples from pg_class where relname='tb15';
relname | relpages | reltuples
---------+----------+-----------
tb15 | 60 | 11100
(1 row)
现在总共11100条记录。
看一个执行计划:postgres=# explain(analyze,verbose,buffers)select * from tb15 where id=1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on public.tb15 (cost=0.00..198.75 rows=501 width=9) (actual time=0.008..0.922 rows=501 loops=1)
Output: id, name
Filter: (tb15.id = 1)
Rows Removed by Filter: 10599
Buffers: shared hit=60
Total runtime: 0.962 ms
(6 rows)
分析rows=501是怎么来的?
postgres=# select * from pg_stats where tablename ='tb15' and attname='id';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | tb15
attname | id
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -0.900901
most_common_vals | {1,2,3,4} --高频值
most_common_freqs | {0.0451351,0.0271171,0.0181081,0.0090991} --高频值对应的占比
histogram_bounds | {5,104,204,304,404,504,604,704,804,904,1004,1104,1204,1304,1404,1504,1604,1704,1804,1904,2004,2103,2203,2303,2403,2503,2603,2703,2803,2903,3003,3103,3203,3303,3403,3503,3603,3703,3803,3903,4003,4102,4202,4302,4402,4502,4602,4702,4802,4902,5002,5102,5202,5302,5402,5502,5602,5702,5802,5902,6002,6101,6201,6301,6401,6501,6601,6701,6801,6901,7001,7101,7201,7301,7401,7501,7601,7701,7801,7901,8001,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}
correlation | 0.464526
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
可以看到柱状图是从5开始的,柱状图已经把高频值排除在外。
postgres=# explain(analyze,verbose,buffers)select * from tb15 where id<3;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on public.tb15 (cost=0.00..198.75 rows=803 width=9) (actual time=0.010..1.174 rows=802 loops=1)
Output: id, name
Filter: (tb15.id < 3)
Rows Removed by Filter: 10298
Buffers: shared hit=60
Total runtime: 1.233 ms
(6 rows)
rows=11100*(0.0451351+0.0271171)=801.9994200;
postgres=# explain(analyze,verbose,buffers)select * from tb15 where id=1 or id>9900;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on public.tb15 (cost=0.00..226.50 rows=596 width=9) (actual time=0.009..1.673 rows=601 loops=1)
Output: id, name
Filter: ((tb15.id = 1) OR (tb15.id > 9900))
Rows Removed by Filter: 10499
Buffers: shared hit=60
Total runtime: 1.726 ms
(6 rows)
rows = 11100*(0.0451351)+1/100.0*(1-0.0451351-0.0271171-0.0181081-0.0090991)*11100=600.9596166
postgres=# explain select * from tb15 where id=1 and name='john';
QUERY PLAN
--------------------------------------------------------
Seq Scan on tb15 (cost=0.00..226.50 rows=451 width=9)
Filter: ((id = 1) AND ((name)::text = 'john'::text))
(2 rows
postgres=# select * from pg_stats where tablename ='tb15' and attname='id';--id字段的统计信息
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | tb15
attname | id
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -0.900901
most_common_vals | {1,2,3,4}
most_common_freqs | {0.0451351,0.0271171,0.0181081,0.0090991}
histogram_bounds | {5,104,204,304,404,504,604,704,804,904,1004,1104,1204,1304,1404,1504,1604,1704,1804,1904,2004,2103,2203,2303,2403,2503,2603,2703,2803,2903,3003,3103,3203,3303,3403,3503,3603,3703,3803,3903,4003,4102,4202,4302,4402,4502,4602,4702,4802,4902,5002,5102,5202,5302,5402,5502,5602,5702,5802,5902,6002,6101,6201,6301,6401,6501,6601,6701,6801,6901,7001,7101,7201,7301,7401,7501,7601,7701,7801,7901,8001,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}
correlation | 0.464526
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
postgres=# select * from pg_stats where tablename ='tb15' and attname='name'; --name字段的统计信息
-[ RECORD 1 ]----------+---------------------
schemaname | public
tablename | tb15
attname | name
inherited | f
null_frac | 0
avg_width | 5
n_distinct | 2
most_common_vals | {john,join}
most_common_freqs | {0.900901,0.0990991}
histogram_bounds |
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
rows = 11100*(0.0451351*0.900901)=451.3510496486100。