非原创
一 原文:http://bonesmoses.org/2014/05/08/trumping-the-postgresql-query-planner/
Trumping the PostgreSQL Query Planner
With the release of PostgreSQL 8.4, the community gained the ability to use CTE syntax. As such, this is a fairly old feature, yet it’s still misunderstood in a lot of ways. At the same time, the query planner has been advancing incrementally since that time. Most recently, PostgreSQL has gained the ability to perform index-only scans, making it possible to fetch results straight from the index, without confirming rows with the table data.
Unfortunately, this still isn’t enough. There are still quite a few areas where the PostgreSQL query planner is extremely naive, despite the advances we’ve seen recently. For instance, PostgreSQL still can’t do a basic loose index scan natively. It has to be tricked by using CTE syntax.
To demonstrate this further, imagine this relatively common scenario: an order processing system where clients can order products. What happens when we want to find the most recent order for all current customers? Boiled down to its minimum elements, this extremely simplified table will act as our order system.
CREATE TABLE test_order ( client_id INT NOT NULL, order_date TIMESTAMP NOT NULL, filler TEXT NOT NULL );
Now we need data to test with. We can simulate a relatively old order processing system by taking the current date and subtracting 1,000 days. We can also bootstrap with 10,000 clients, and make the assumption that newer clients will be more active. This allows us to represent clients that have left our services as time goes on. So we start with this test data:
INSERT INTO test_order SELECT s1.id, (CURRENT_DATE - INTERVAL '1000 days')::DATE + generate_series(1, s1.id%1000), repeat(' ', 20) FROM generate_series(1, 10000) s1 (id);
The
generate_series
function is very handy for building fake data. We’re still not ready to use that data, however. Since we want to find the most recent order for all customers, we need an index that will combine the
client_id
and
order_date
columns in such a way that a single lookup will provide the value we want for any particular client. This index should do nicely:
CREATE INDEX idx_test_order_client_id_order_date ON test_order (client_id, order_date DESC);
Finally, we analyze to make sure the PostgreSQL engine has the most recent stats for our table. Just to make everything easily repeatable, we also set the
default_statistics_target
to a higher value than default as well.
SET default_statistics_target TO 500; ANALYZE test_order;
Now we’ll start with the most obvious query. Here, we just use the
client_id
column and look for the max
order_date
for each:
EXPLAIN ANALYZE SELECT client_id, max(order_date) FROM test_order GROUP BY client_id;
The query plan is fairly straight-forward, and will probably include a sequence scan. On the virtual server we’re testing with, the total runtime for us ended up looking like this:
Total runtime: 1117.408 ms
There is some variance, but the end result is just over one second per execution. We ran this query several times to ensure it was properly cached by PostgreSQL. Why didn’t the planner use the index we created? Let’s assume the planner doesn’t know what
max
does, and treats it like any other function. With that in mind, we can exploit a different type of syntax that should make the index much more usable. So let’s try
DISTINCT ON
with an explicit
ORDER
clause that matches the definition of our index:
EXPLAIN ANALYZE SELECT DISTINCT ON (client_id) client_id, order_date FROM test_order ORDER BY client_id, order_date DESC;
Well, this time our test system used an index-only scan, and produced the results somewhat faster. Our new runtime looks like this:
Total runtime: 923.300 ms
That’s almost 20% faster than the sequence scan. Depending on how much bigger the table is than the index, reading the index and producing these results can vary significantly. And while the query time improved, it’s still pretty bad. For systems with tens or hundreds of millions of orders, the performance of this query will continue to degrade along with the row count. We’re also not really using the index effectively.
Reading the index from top to bottom and pulling out the desired results is faster than reading the whole table. But why should we do that? Due to the way we built this index, the root node for each client should always represent the value we’re looking for. So why doesn’t the planner simply perform a shallow index scan along the root nodes? It doesn’t matter what the reason is, because we can force it to do so. This is going to be ugly, but this query will act just as we described:
EXPLAIN ANALYZE WITH RECURSIVE skip AS ( (SELECT client_id, order_date FROM test_order ORDER BY client_id, order_date DESC LIMIT 1) UNION ALL (SELECT (SELECT min(client_id) FROM test_order WHERE client_id > skip.client_id ) AS client_id, (SELECT max(order_date) FROM test_order WHERE client_id = ( SELECT min(client_id) FROM test_order WHERE client_id > skip.client_id ) ) AS order_date FROM skip WHERE skip.client_id IS NOT NULL) ) SELECT * FROM skip;
The query plan for this is extremely convoluted, and we’re not even going to try to explain what it’s doing. But the final query execution time is hard to discount:
Total runtime: 181.501 ms
So what happened here? How can the abusive and ugly CTE above outwit the PostgreSQL query planner? We use the same principle as described in the PostgreSQL wiki for loose index scans. We start with the desired maximum order date for a single
client_id
, then recursively begin adding clients one by one until the index is exhausted. Due to limitations preventing us from using the recursive element in a sub-query, we have to use the
SELECT
clause to get the next client ID and the associated order date for that client.
This technique works universally for performing sparse index scans, and actually improves as cardinality (the number of unique values) decreases. As unlikely as that sounds, since we are only using the root nodes within the index tree, performance increases when there are less root nodes to check. This is the exact opposite to how indexes are normally used, so we can see why PostgreSQL doesn’t natively integrate this technique. Yet we would like to see it added eventually so query authors can use the first query example we wrote, instead of the excessively unintuitive version that actually produced good performance.
In any case, all PostgreSQL DBAs owe it to themselves and their clusters to learn CTEs. They provide a powerful override for the query planner, and helps solve the edge cases it doesn’t yet handle.
Tags: CTE, Database, Index Scan, Postgres, Postgres Feed
二 执行过程和执行结果
test=# CREATE TABLE test_order
test-# (
test(# client_id INT NOT NULL,
test(# order_date TIMESTAMP NOT NULL,
test(# filler TEXT NOT NULL
test(# );
CREATE TABLE
test=# INSERT INTO test_order
test-# SELECT s1.id,
test-# (CURRENT_DATE - INTERVAL '1000 days')::DATE
test-# + generate_series(1, s1.id%1000),
test-# repeat(' ', 20)
test-# FROM generate_series(1, 10000) s1 (id);
INSERT 0 4995000
test=# CREATE INDEX idx_test_order_client_id_order_date
test-# ON test_order (client_id, order_date DESC);
CREATE INDEX
test=# select version();
version
-------------------------------------------------------------
PostgreSQL 9.3.4, compiled by Visual C++ build 1700, 64-bit
(1 row)
test=# SET default_statistics_target TO 500;
SET
test=# ANALYZE test_order;
ANALYZE
test=# EXPLAIN ANALYZE
test-# SELECT client_id, max(order_date) FROM test_order GROUP BY client_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=116550.00..116646.78 rows=9678 width=12) (actual time=3030.529..3034.324 rows=9990 loops=1)
-> Seq Scan on test_order (cost=0.00..91575.00 rows=4995000 width=12) (actual time=0.031..723.038 rows=4995000 loops=1)
Total runtime: 3034.951 ms
(3 rows)
test=# EXPLAIN ANALYZE
test-# SELECT DISTINCT ON (client_id) client_id, order_date FROM test_order ORDER BY client_id, order_date DESC;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=903415.03..928390.03 rows=9678 width=12) (actual time=8553.686..9803.013 rows=9990 loops=1)
-> Sort (cost=903415.03..915902.53 rows=4995000 width=12) (actual time=8553.684..9394.096 rows=4995000 loops=1)
Sort Key: client_id, order_date
Sort Method: external sort Disk: 126896kB
-> Seq Scan on test_order (cost=0.00..91575.00 rows=4995000 width=12) (actual time=0.296..1344.247 rows=4995000 loops=1)
Total runtime: 9816.962 ms
(6 rows)
test=# EXPLAIN ANALYZE
test-# WITH RECURSIVE skip AS
test-# (
test(# (SELECT client_id, order_date
test(# FROM test_order
test(# ORDER BY client_id, order_date DESC
test(# LIMIT 1)
test(# UNION ALL
test(# (SELECT (SELECT min(client_id)
test(# FROM test_order
test(# WHERE client_id > skip.client_id
test(# ) AS client_id,
test(# (SELECT max(order_date)
test(# FROM test_order
test(# WHERE client_id = (
test(# SELECT min(client_id)
test(# FROM test_order
test(# WHERE client_id > skip.client_id
test(# )
test(# ) AS order_date
test(# FROM skip
test(# WHERE skip.client_id IS NOT NULL)
test(# )
test-# SELECT *
test-# FROM skip;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------
CTE Scan on skip (cost=581.08..583.10 rows=101 width=12) (actual time=0.092..1580.743 rows=9991 loops=1)
CTE skip
-> Recursive Union (cost=0.43..581.08 rows=101 width=12) (actual time=0.090..1573.902 rows=9991 loops=1)
-> Limit (cost=0.43..1.75 rows=1 width=12) (actual time=0.088..0.088 rows=1 loops=1)
-> Index Only Scan using idx_test_order_client_id_order_date on test_order test_order_3 (cost=0.43..6566323.18 rows=4995000 width=1
2) (actual time=0.083..0.083 rows=1 loops=1)
Heap Fetches: 1
-> WorkTable Scan on skip skip_1 (cost=0.00..57.73 rows=10 width=4) (actual time=0.156..0.156 rows=1 loops=9991)
Filter: (client_id IS NOT NULL)
Rows Removed by Filter: 0
SubPlan 2
-> Result (cost=1.75..1.76 rows=1 width=0) (actual time=0.119..0.119 rows=1 loops=9990)
InitPlan 1 (returns $2)
-> Limit (cost=0.43..1.75 rows=1 width=4) (actual time=0.117..0.117 rows=1 loops=9990)
-> Index Only Scan using idx_test_order_client_id_order_date on test_order (cost=0.43..2199136.81 rows=1665000 widt
h=4) (actual time=0.117..0.117 rows=1 loops=9990)
Index Cond: ((client_id IS NOT NULL) AND (client_id > skip_1.client_id))
Heap Fetches: 9989
SubPlan 6
-> Result (cost=3.98..3.99 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=9990)
InitPlan 4 (returns $5)
-> Result (cost=1.75..1.76 rows=1 width=0) (actual time=0.018..0.018 rows=1 loops=9990)
InitPlan 3 (returns $4)
-> Limit (cost=0.43..1.75 rows=1 width=4) (actual time=0.017..0.017 rows=1 loops=9990)
-> Index Only Scan using idx_test_order_client_id_order_date on test_order test_order_1 (cost=0.43..2199136
.81 rows=1665000 width=4) (actual time=0.016..0.016 rows=1 loops=9990)
Index Cond: ((client_id IS NOT NULL) AND (client_id > skip_1.client_id))
Heap Fetches: 9989
InitPlan 5 (returns $6)
-> Limit (cost=0.43..2.22 rows=1 width=8) (actual time=0.033..0.033 rows=1 loops=9990)
-> Index Only Scan using idx_test_order_client_id_order_date on test_order test_order_2 (cost=0.43..921.00 rows=516
width=8) (actual time=0.014..0.014 rows=1 loops=9990)
Index Cond: ((client_id = $5) AND (order_date IS NOT NULL))
Heap Fetches: 9989
Total runtime: 1582.306 ms
(31 rows)