1、数据量:
表名 | 数据量 |
---|---|
f_invoice | 87346130 |
f_invoice_item | 97535867 |
2、索引:
表:f_invoice_item
CREATE INDEX f_invoice_item_order_item_id_idx ON ins_dw_prd12.f_invoice_item USING btree (order_item_id) CREATE INDEX f_invoice_item_invoice_id_idx ON ins_dw_prd12.f_invoice_item USING btree (invoice_id) WITH (fillfactor='100')
表:f_invoice
CREATE INDEX idx_f_invoice_gin ON ins_dw_prd12.f_invoice USING gin (source_type, invoice_type, invoice_status, invoice_title, invoice_date, seller_taxer_code, shop_id, create_time) CREATE INDEX idx_f_invoice_invoice_date ON ins_dw_prd12.f_invoice USING btree (invoice_date) WITH (fillfactor='100') CREATE INDEX idx_f_invoice_seller_taxer_code ON ins_dw_prd12.f_invoice USING btree (seller_taxer_code) WITH (fillfactor='100') CREATE INDEX idx_invoice_createtime_btree ON ins_dw_prd12.f_invoice USING btree (create_time) WITH (fillfactor='100')
explain(analyse, timing) SELECT count(*) from (SELECT fi.invoice_id FROM ins_dw_prd12.f_invoice fi WHERE (fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94') and fi.create_time >= '2020-01-01 00:00:00' and fi.create_time <= '2020-01-31 00:00:00')) AS mm INNER JOIN ins_dw_prd12.f_invoice_item fit ON fit.invoice_id = mm.invoice_id inner join ins_dw_prd12.f_invoice m on m.invoice_id = mm.invoice_id
Finalize Aggregate (cost=3083416.86..3083416.87 rows=1 width=8) (actual time=85251.980..85251.980 rows=1 loops=1) -> Gather (cost=3083416.44..3083416.85 rows=4 width=8) (actual time=85251.097..85269.008 rows=5 loops=1) Workers Planned: 4 Workers Launched: 4 -> Partial Aggregate (cost=3082416.44..3082416.45 rows=1 width=8) (actual time=85244.739..85244.739 rows=1 loops=5) -> Nested Loop (cost=184106.68..3082211.80 rows=81856 width=0) (actual time=2308.041..85237.967 rows=57076 loops=5) -> Parallel Hash Join (cost=184106.11..2879308.96 rows=81856 width=16) (actual time=2307.992..85029.464 rows=57076 loops=5) Hash Cond: (fit.invoice_id = fi.invoice_id) -> Parallel Seq Scan on f_invoice_item fit (cost=0.00..2631148.52 rows=24401652 width=8) (actual time=0.466..79746.085 rows=19507465 loops=5) -> Parallel Hash (cost=183190.09..183190.09 rows=73282 width=8) (actual time=334.243..334.243 rows=54056 loops=5) Buckets: 524288 Batches: 1 Memory Usage: 14752kB -> Parallel Index Scan using idx_invoice_createtime_btree on f_invoice fi (cost=0.57..183190.09 rows=73282 width=8) (actual time=0.177..314.460 rows=54056 loops=5) Index Cond: ((create_time >= '2020-01-01 00:00:00'::timestamp without time zone) AND (create_time <= '2020-01-31 00:00:00'::timestamp without time zone)) Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) Rows Removed by Filter: 455651 -> Index Only Scan using f_invoice_pkey on f_invoice m (cost=0.57..2.48 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=285380) Index Cond: (invoice_id = fit.invoice_id) Heap Fetches: 285380 Planning Time: 8.120 ms Execution Time: 85269.153 ms
其中耗时最严重的点在:
并行顺序扫描了表f_invoice_item,并且loops=5,每次扫描行数:rows=19507465;而表f_invoice_item数据量才9700万左右。
-> Parallel Seq Scan on f_invoice_item fit (cost=0.00..2631148.52 rows=24401652 width=8) (actual time=0.466..79746.085 rows=19507465 loops=5)
问题:表f_invoice_item上有索引f_invoice_item_invoice_id_idx,为什么会不走呢??
explain(analyse, timing) SELECT count(*) FROM (select * from ins_dw_prd12.f_invoice fi where fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94') and fi.create_time >= '2020-03-01 00:00:00' and fi.create_time <= '2020-03-31 00:00:00') m INNER JOIN (select * from ins_dw_prd12.f_invoice_item where invoice_id in (SELECT fi.invoice_id FROM ins_dw_prd12.f_invoice fi WHERE fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94') and fi.create_time >= '2020-03-01 00:00:00' and fi.create_time <= '2020-03-31 00:00:00')) fit ON fit.invoice_id = m.invoice_id
Finalize Aggregate (cost=428280.97..428280.98 rows=1 width=8) (actual time=2400.367..2400.367 rows=1 loops=1) -> Gather (cost=428280.55..428280.96 rows=4 width=8) (actual time=2399.218..2432.599 rows=5 loops=1) Workers Planned: 4 Workers Launched: 4 -> Partial Aggregate (cost=427280.55..427280.56 rows=1 width=8) (actual time=2394.585..2394.585 rows=1 loops=5) -> Nested Loop (cost=203100.20..427279.71 rows=334 width=0) (actual time=1465.895..2388.019 rows=52988 loops=5) -> Parallel Hash Join (cost=203099.63..405399.83 rows=299 width=16) (actual time=1459.954..1850.252 rows=47458 loops=5) Hash Cond: (fi.invoice_id = fi_1.invoice_id) -> Parallel Index Scan using idx_invoice_createtime_btree on f_invoice fi (cost=0.57..202088.56 rows=80840 width=8) (actual time=0.313..363.616 rows=47458 loops=5) Index Cond: ((create_time >= '2020-03-01 00:00:00'::timestamp without time zone) AND (create_time <= '2020-03-31 00:00:00'::timestamp without time zone)) Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) Rows Removed by Filter: 601517 -> Parallel Hash (cost=202088.56..202088.56 rows=80840 width=8) (actual time=1459.076..1459.076 rows=47458 loops=5) Buckets: 524288 Batches: 1 Memory Usage: 13472kB -> Parallel Index Scan using idx_invoice_createtime_btree on f_invoice fi_1 (cost=0.57..202088.56 rows=80840 width=8) (actual time=1.947..1438.735 rows=47458 loops=5) Index Cond: ((create_time >= '2020-03-01 00:00:00'::timestamp without time zone) AND (create_time <= '2020-03-31 00:00:00'::timestamp without time zone)) Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) Rows Removed by Filter: 601517 -> Index Only Scan using f_invoice_item_invoice_id_idx on f_invoice_item (cost=0.57..70.85 rows=233 width=8) (actual time=0.011..0.011 rows=1 loops=237290) Index Cond: (invoice_id = fi_1.invoice_id) Heap Fetches: 264945 Planning Time: 0.591 ms Execution Time: 2432.666 ms
从优化前85秒到优化后2.4秒,性能提升接近40倍。