突然发现测试环境一条慢sql,就想着分析一下,写写总结。说到优化其实我个人认为是不到不得已还是没有必要的,毕竟除非特别重大的问题,影响了基本操作和体验,平时还是基本的配置也够了,就像《重构:改善代码的既有设计》当你闻到了代码的坏味道才需要重构,毕竟如果一个项目用户量小,并发不高,其实优化跟不优化差距差不了多少,而且有可能改着改着,新的bug就有产生了。但大的项目就不一样了,一点点小小的优化就是指数级别的收益,就像你跑步一样,跑个5公里,10公里,什么都可以不在乎,但是你跑个50公里,衣服鞋子肺活量身体素质,这些都是要考虑的范围。突然让我想起最近边跑马拉松边抽烟那哥们,简直违反生物学,牛顿看了都会直呼惊呆了。
select fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side,
fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time, bo.price book_price, sp.symbol_name, sp.exchange
from ccc fo JOIN aaa sp ON fo.symbol = sp.symbol
JOIN bbb bo ON bo.order_status = '1' and bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_id
where 1 = 1 and fo.trade_time >= 1609430400000 and fo.trade_time <= 1702655999999
and fo.broker_account = any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[]
)
ORDER BY fo.trade_time DESC, fo.fund_account, fo.symbol LIMIT 30 OFFSET 0
言归正传,当我看到这条sql的时候,我查了一下发现sql执行用了12s,顿时有点惊呆了,一般的sql大概超过2s就应该优化了,好了我们来分析一下吧。
拿到sql我就想看看表数据量多少
select count(*) from aaa;
6945
select count(*) from bbb;
558729
select count(*) from ccc;
798024
从表数据上面来看,50多万的数据还好,不至于这么差,接下来好好分析一下。
第一个点,但从sql上面我就发现一个点不合理,我之前也喜欢用 where 1=1
觉得后面就是一个条件true,直到后来经过跟别人讨论,有一种可能SQL解析会认为1是一个属性名,完了去表里面找这样就跟写SQL背到而驰了,我们理解可能是认为他就是TRUE,但是回到SQL解析上面又差别不大,去掉1=1之后发现运行速度快了3秒,从某种程度来说还是会影响SQL的执行效率,而且从多表拼接的SQL上面确实发现啊了200毫秒,单表可能看不出什么差距。
explain analyze select * from ccc where 1= 1 limit 10;
Limit (cost=0.00..0.35 rows=10 width=289) (actual time=0.017..0.020 rows=10 loops=1)
-> Seq Scan on ccc (cost=0.00..27920.00 rows=798300 width=289) (actual time=0.016..0.018 rows=10 loops=1)
Planning time: 0.150 ms
Execution time: 0.057 ms
explain analyze select * from ccc limit 10;
Limit (cost=0.00..0.35 rows=10 width=289) (actual time=0.020..0.023 rows=10 loops=1)
-> Seq Scan on ccc (cost=0.00..27920.00 rows=798300 width=289) (actual time=0.018..0.019 rows=10 loops=1)
Planning time: 0.173 ms
Execution time: 0.062 ms
这是当天的SQL的执行计划和索引,我发现几个神奇的现象:
create unique index idx_aaa_symbol
on aaa (symbol);
-----
create index idx_bbb_place_time
on bbb (place_time);
create index idx_bbb_broker_account
on bbb (broker_account);
-----
create index idx_ccc_trade_time
on ccc (trade_time);
create index idx_ccc_broker_account
on ccc (broker_account);
------
explain (ANALYZE,BUFFERS)
select fo.fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id,
fo.symbol, fo.trade_type, fo.side,
fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time
,bo.price book_price
-- ,sp.symbol_name, sp.exchange
from ccc fo
-- JOIN aaa sp ON fo.symbol = sp.symbol
JOIN bbb bo ON bo.order_status = '1'
-- and bo.order_id = fo.order_id
and bo.cl_order_id = fo.cl_order_id
where
fo.trade_time < 1702655999999
and fo.trade_time > 1609430400000
-- fo.trade_time = 1669704603470
-- and fo.fo_id = 1007811
-- and fo.broker_account in ('0014510913T0','0014510911As','0014510911Th','0014510911Ts','0014510913Alpha','100000001Ts','100000001As','100000001Th','0014510910As','0014510910Ts','0014510910Th','0014510921Th','0014510921Ts','0014510219Th','0014510219As',
-- '0014510921Ah','0014510219Ts','0014510921As','16654T0','16654Alpha','0321080000016656Th','0321080000016656Ts','0321080000016656As','0014510920Th','0014510920Ts','0014510920As','0014510910Ah','100000001Ah','0014510911Ah','0311020001234597Ah','0014510219Ah','0321080000016656Ah','0014510920Ah','0014510909T0','0014510909Alpha')
and fo.broker_account = any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[]
)
ORDER BY fo.trade_time DESC
, fo.fund_account ASC
, fo.symbol
LIMIT 30 OFFSET 0;
进行了字段排序sort,但是ANY却不匹配索引了,所以进行了全表检索,执行效率非常的慢。
Limit (cost=1483726.37..1483726.45 rows=30 width=89) (actual time=3398.207..3398.213 rows=30 loops=1)
" Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"
-> Sort (cost=1483726.37..1554119.82 rows=28157379 width=89) (actual time=3398.205..3398.208 rows=30 loops=1)
" Sort Key: fo.trade_time DESC, fo.fund_account, fo.symbol"
Sort Method: top-N heapsort Memory: 37kB
" Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"
-> Merge Join (cost=226264.61..652113.59 rows=28157379 width=89) (actual time=2051.812..2788.036 rows=1345806 loops=1)
Merge Cond: (fo.cl_order_id = bo.cl_order_id)
" Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"
-> Sort (cost=134482.56..135779.65 rows=518838 width=89) (actual time=1436.513..1584.291 rows=512740 loops=1)
Sort Key: fo.cl_order_id
Sort Method: external merge Disk: 56464kB
" Buffers: shared hit=1113 read=16319 dirtied=2, temp read=10028 written=10050"
-> Seq Scan on ccc fo (cost=0.00..58628.56 rows=518838 width=89) (actual time=0.069..882.202 rows=512740 loops=1)
" Filter: ((trade_time < '1702655999999'::numeric) AND (trade_time > '1609430400000'::numeric) AND ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[])))"
Rows Removed by Filter: 188246
Buffers: shared hit=1113 read=16319 dirtied=2
-> Materialize (cost=91406.68..93963.41 rows=511346 width=16) (actual time=613.979..813.987 rows=1521084 loops=1)
" Buffers: shared hit=1010 read=25927 dirtied=3, temp read=1629 written=1637"
-> Sort (cost=91406.68..92685.04 rows=511346 width=16) (actual time=613.975..695.387 rows=510968 loops=1)
Sort Key: bo.cl_order_id
Sort Method: external merge Disk: 13032kB
" Buffers: shared hit=1010 read=25927 dirtied=3, temp read=1629 written=1637"
-> Seq Scan on bbb bo (cost=0.00..34181.50 rows=511346 width=16) (actual time=0.013..361.825 rows=510968 loops=1)
Filter: (order_status = '1'::bpchar)
Rows Removed by Filter: 67455
Buffers: shared hit=1010 read=25927 dirtied=3
Planning time: 0.879 ms
Execution time: 2828.213 ms
没有进行字段排序no sort,但是却走了索引,这可能是ANY的一大特点,,所以从执行结果来看效果非常明显。所以我一直陷入一个怪圈,想着把ANY的多字符匹配给去掉,换成一种直接匹配的方式会好很多,可是多字符匹配哪里那么容易改变,然后就查资料查文档。突然我想到阿里的葵花宝典还是在哪里记得,我看过这么一句话,外键要加索引,但是看了文档却没有发现就不了了之了,之后又被其他的事情耽误了。
Limit (cost=0.42..2866.61 rows=30 width=89) (actual time=27.068..400.315 rows=30 loops=1)
Buffers: shared hit=36576
-> Nested Loop (cost=0.42..2690142553.60 rows=28157379 width=89) (actual time=27.066..400.294 rows=30 loops=1)
Buffers: shared hit=36576
-> Seq Scan on bbb bo (cost=0.00..34181.50 rows=511346 width=16) (actual time=0.012..0.063 rows=13 loops=1)
Filter: (order_status = '1'::bpchar)
Buffers: shared hit=1
-> Index Scan using ccc_order_id_cl_order_id_index on ccc fo (cost=0.42..5260.72 rows=12 width=89) (actual time=25.727..30.779 rows=2 loops=13)
Index Cond: (cl_order_id = bo.cl_order_id)
" Filter: ((trade_time < '1702655999999'::numeric) AND (trade_time > '1609430400000'::numeric) AND ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[])))"
Buffers: shared hit=36575
Planning time: 0.761 ms
Execution time: 400.389 ms
过几天我再来看的时候发现执行效率恢复了正常,细看发现,只是改动了几个索引,果然验证了那个猜想,外键加索引瞬间快到飞起。这是修改索引之后的执行结果
--- ADD
create index bbb_cl_order_id_order_id_order_status_index
on bbb (cl_order_id, order_id, order_status);
--- UDATE
create index idx_ccc_trade_time_fund_account_symbol
on ccc (trade_time desc, fund_account asc, symbol asc);
--- ADD
create index idx_ccc_cl_order_id_order_id
on ccc (cl_order_id, order_id);
----
explain ANALYZE select fo.fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side,
fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time, bo.price book_price, sp.symbol_name, sp.exchange
from ccc fo JOIN aaa sp ON fo.symbol = sp.symbol
JOIN bbb bo ON bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_id
where 1=1 AND bo.order_status = '1' and fo.trade_time >= 1609430400000 and fo.trade_time <= 1702655999999
and fo.broker_account = any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[]
)
ORDER BY fo.trade_time DESC
, fo.fund_account, fo.symbol
LIMIT 30 OFFSET 0;
Limit (cost=1001.16..5150.43 rows=30 width=106) (actual time=95.097..119.294 rows=30 loops=1)
-> Gather Merge (cost=1001.16..258255.92 rows=1860 width=106) (actual time=95.094..119.286 rows=30 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=1.13..257041.20 rows=775 width=106) (actual time=4.812..21.327 rows=15 loops=3)
-> Nested Loop (cost=0.85..256808.08 rows=775 width=90) (actual time=4.756..21.130 rows=15 loops=3)
-> Parallel Index Scan using idx_ccc_trade_time_fund_account_symbol on ccc fo (cost=0.42..95331.87 rows=230396 width=90) (actual time=4.706..20.921 rows=9 loops=3)
Index Cond: ((trade_time >= '1609430400000'::numeric) AND (trade_time <= '1702655999999'::numeric))
" Filter: ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[]))"
Rows Removed by Filter: 11298
-> Index Scan using bbb_cl_order_id_order_id_order_status_index on bbb bo (cost=0.42..0.69 rows=1 width=21) (actual time=0.018..0.020 rows=2 loops=26)
Index Cond: ((cl_order_id = fo.cl_order_id) AND ((order_id)::text = (fo.order_id)::text) AND (order_status = '1'::bpchar))
-> Index Scan using idx_aaa_symbol on aaa sp (cost=0.28..0.30 rows=1 width=23) (actual time=0.011..0.011 rows=1 loops=46)
Index Cond: ((symbol)::text = (fo.symbol)::text)
Planning time: 1.990 ms
Execution time: 119.455 ms
越来越觉得PG数据库,有点像半自动,很多东西需要自己去配置,不像mysql那么智能,放到上面基本没有什么SQL上面的问题。总的来说还是蛮有收获的,外键加索引,查看执行计划看有没有走索引,索引其实也是一把双刃剑,加快了查询的速度,却增加了新增修改的速度,就像达摩克之剑需要小心谨慎使用。
PostgreSQL in 与 = any 的SQL语法异同与性能优化
Postgresql sql-explain
慢sql查询