背景1:
当我们需要查询很多客户的,离当前时间最近订单时
参考网址:
http://bonesmoses.org/2014/05/08/trumping-the-postgresql-query-planner/
创建测试表
CREATE TABLE test_order ( client_id INT NOT NULL, order_date TIMESTAMP NOT NULL, filler TEXT NOT NULL );
插入测试数据
INSERT INTO test_order SELECT s1.id, (CURRENT_DATE - INTERVAL '1000 days')::DATE + generate_series(1, s1.id%1000), repeat(' ', 20) FROM generate_series(1, 10000) s1 (id);
CREATE INDEX idx_test_order_client_id_order_date ON test_order (client_id, order_date DESC);
执行普通SQL
不走索引
EXPLAIN ANALYZE SELECT client_id, max(order_date) FROM test_order GROUP BY client_id;
"Execution time: 5741.682 ms"
使用索引
EXPLAIN ANALYZE SELECT DISTINCT ON (client_id) client_id, order_date FROM test_order ORDER BY client_id, order_date DESC;
"Execution time: 4628.510 ms"
优化后SQL
EXPLAIN ANALYZE WITH RECURSIVE skip AS ( (SELECT client_id, order_date FROM test_order ORDER BY client_id, order_date DESC LIMIT 1) UNION ALL (SELECT (SELECT min(client_id) FROM test_order WHERE client_id > skip.client_id ) AS client_id, (SELECT max(order_date) FROM test_order WHERE client_id = ( SELECT min(client_id) FROM test_order WHERE client_id > skip.client_id ) ) AS order_date FROM skip WHERE skip.client_id IS NOT NULL) ) SELECT * FROM skip;
"Execution time: 865.889 ms"
查询结果
client_id; order_date 1;"2014-03-09 00:00:00" 2;"2014-03-10 00:00:00" 3;"2014-03-11 00:00:00" 4;"2014-03-12 00:00:00" 5;"2014-03-13 00:00:00" 6;"2014-03-14 00:00:00" 7;"2014-03-15 00:00:00" 8;"2014-03-16 00:00:00" 9;"2014-03-17 00:00:00" 10;"2014-03-18 00:00:00" 11;"2014-03-19 00:00:00" 12;"2014-03-20 00:00:00" 13;"2014-03-21 00:00:00" 14;"2014-03-22 00:00:00" 15;"2014-03-23 00:00:00" 16;"2014-03-24 00:00:00" 17;"2014-03-25 00:00:00" 18;"2014-03-26 00:00:00" 19;"2014-03-27 00:00:00" 20;"2014-03-28 00:00:00" 21;"2014-03-29 00:00:00" 22;"2014-03-30 00:00:00" 23;"2014-03-31 00:00:00" 24;"2014-04-01 00:00:00"
背景2:
参考网址:
https://yq.aliyun.com/articles/65202?spm=5176.8091938.0.0.tZZBTS
有一个这样的场景,一张小表A,里面存储了一些ID,大约几百个。
(比如说巡逻车辆ID,环卫车辆的ID,公交车,微公交的ID)。
另外有一张日志表B,每条记录中的ID是来自前面那张小表的,但不是每个ID都出现在这张日志表中,比如说一天可能只有几十个ID会出现在这个日志表的当天的数据中。
(比如车辆的行车轨迹数据,每秒上报轨迹,数据量就非常庞大)。
那么我怎么快速的找出今天没有出现的ID呢。
(哪些巡逻车辆没有出现在这个片区,是不是偷懒了?哪些环卫车辆没有出行,哪些公交或微公交没有出行)?
select id from A where id not in (select id from B where time between ? and ?);
这个QUERY会很慢,有什么优化方法呢。
当然,你还可以让车辆签到的方式来解决这个问题,但是总有未签到的,或者没有这种设计的时候,那么怎么解决呢?
-- A create table a(id int primary key, info text); -- B create table b(id int primary key, aid int, crt_time timestamp); create index b_aid on b(aid); -- a表插入1000条 insert into a select generate_series(1,1000), md5(random()::text); -- b表插入500万条,只包含aid的500个id。 insert into b select generate_series(1,5000000), generate_series(1,500), clock_timestamp(); 优化前: select * from a where id not in (select aid from b); 执行时间:大于1min 优化后: select * from a where id not in (with recursive skip as ( ( select min(aid) aid from b where aid is not null ) union all ( select (select min(aid) aid from b where b.aid > s.aid and b.aid is not null) from skip s where s.aid is not null ) -- 这里的where s.aid is not null 一定要加,否则就死循环了. ) select aid from skip where aid is not null); 执行时间:46 msec
情景3:
生成树形结构
参考网址:
http://blog.databasepatterns.com/2014/02/trees-paths-recursive-cte-postgresql.html
create table subregions ( id smallint primary key, name text not null, parent_id smallint null references subregions(id) ); insert into subregions values (1,'World',null), (2,'Africa',1), (5,'South America',419), (9,'Oceania',1), (11,'Western Africa',2), (13,'Central America',419), (14,'Eastern Africa',2), (15,'Northern Africa',2), (17,'Middle Africa',2), (18,'Southern Africa',2), (19,'Americas',1), (21,'Northern America',19), (29,'Caribbean',419), (30,'Eastern Asia',142), (34,'Southern Asia',142), (35,'South-Eastern Asia',142), (39,'Southern Europe',150), (53,'Australia and New Zealand',9), (54,'Melanesia',9), (57,'Micronesia',9), (61,'Polynesia',9), (142,'Asia',1), (143,'Central Asia',142), (145,'Western Asia',142), (150,'Europe',1), (151,'Eastern Europe',150), (154,'Northern Europe',150), (155,'Western Europe',150), (419,'Latin America and the Caribbean',19); And you wanted to make a pretty tree like this: World Africa Eastern Africa Middle Africa Northern Africa Southern Africa Western Africa Americas Latin America and the Caribbean Caribbean Central America South America Northern America Asia Central Asia Eastern Asia South-Eastern Asia Southern Asia Western Asia Europe Eastern Europe Northern Europe Southern Europe Western Europe Oceania Australia and New Zealand Melanesia Micronesia Polynesia Here's how you'd do it: with recursive my_expression as ( --start with the "anchor", i.e. all of the nodes whose parent_id is null: select id, name as path, name as tree, 0 as level from subregions where parent_id is null union all --then the recursive part: select current.id as id, previous.path || ' > ' || current.name as path, repeat(' ', previous.level + 1) || current.name as tree, previous.level + 1 as level from subregions current join my_expression as previous on current.parent_id = previous.id ) select tree from my_expression order by path 路径间加入父节点和分割 select path from my_expression order by path 输出结果: World World > Africa World > Africa > Eastern Africa World > Africa > Middle Africa World > Africa > Northern Africa World > Africa > Southern Africa World > Africa > Western Africa World > Americas World > Americas > Latin America and the Caribbean World > Americas > Latin America and the Caribbean > Caribbean World > Americas > Latin America and the Caribbean > Central America World > Americas > Latin America and the Caribbean > South America World > Americas > Northern America World > Asia World > Asia > Central Asia World > Asia > Eastern Asia World > Asia > South-Eastern Asia World > Asia > Southern Asia World > Asia > Western Asia World > Europe World > Europe > Eastern Europe World > Europe > Northern Europe World > Europe > Southern Europe World > Europe > Western Europe World > Oceania World > Oceania > Australia and New Zealand World > Oceania > Melanesia World > Oceania > Micronesia World > Oceania > Polynesia