ClickHouse 的 SQL 优化规则是基于 RBO(Rule Based Optimization),下面是一些优化规则
以下规则中使用了ClickHouse官方测试数据集,数据集使用步骤移步这里。
1.COUNT优化
在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows,例如:
date1002 :) EXPLAIN SELECT count() FROM datasets.hits_v1;
EXPLAIN
SELECT count()
FROM datasets.hits_v1
Query id: 5bc13813-9749-47c2-876a-71816f306358
┌─explain─────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromStorage (MergeTree(with Aggregate projection _minmax_count_projection)) │
└─────────────────────────────────────────────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.003 sec.
ReadFromStorage,直接从持久化文件中读取count,这是对count的优化
如果count某一具体列字段,则不会使用此项优化
date1002 :) EXPLAIN SELECT count(CounterID) FROM datasets.hits_v1;
EXPLAIN
SELECT count(CounterID)
FROM datasets.hits_v1
Query id: 4a8a69d4-a8b1-478a-a0fb-fbad12a84fa5
┌─explain───────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromMergeTree │
└───────────────────────────────────────────────────────────────────────────────┘
5 rows in set. Elapsed: 0.002 sec.
执行SQL看区别
执行SELECT count() FROM datasets.hits_v1;
date1002 :) SELECT count() FROM datasets.hits_v1;
SELECT count()
FROM datasets.hits_v1
Query id: 96ee9fc8-29ee-4e45-9917-33b6c9b99574
┌─count()─┐
│ 8873898 │
└─────────
1 rows in set. Elapsed: 0.002 sec.
执行指定某一列的SQL可看到扫描了所有数据
date1002 :) SELECT count(CounterID) FROM datasets.hits_v1;
SELECT count(CounterID)
FROM datasets.hits_v1
Query id: 1a8bb3fa-9864-4e33-a92e-55b89bd8e9e5
┌─count(CounterID)─┐
│ 8873898 │
└──────────────────┘
1 rows in set. Elapsed: 0.008 sec. Processed 8.87 million rows, 35.50 MB (1.09 billion rows/s., 4.34 GB/s.)
2.消除子查询重复字段
下面语句子查询中有两个重复的 id 字段,会被去重:
date1002 :) EXPLAIN SYNTAX
SELECT
a.UserID,
b.VisitID,
a.URL,
b.UserID
FROM datasets.hits_v1 AS a
LEFT JOIN
(
SELECT
UserID,
UserID AS HaHa,
VisitID
FROM datasets.visits_v1
) AS b USING (UserID)
LIMIT 3;
EXPLAIN SYNTAX
SELECT
a.UserID,
b.VisitID,
a.URL,
b.UserID
FROM datasets.hits_v1 AS a
LEFT JOIN
(
SELECT
UserID,
UserID AS HaHa,
VisitID
FROM datasets.visits_v1
) AS b USING (UserID)
LIMIT 3
Query id: e6907eb2-abe9-4a39-95bd-cfdcc4b87939
┌─explain─────────────────────┐
│ SELECT │
│ UserID, │
│ VisitID, │
│ URL, │
│ b.UserID │
│ FROM datasets.hits_v1 AS a │
│ ALL LEFT JOIN │
│ ( │
│ SELECT │
│ UserID, │
│ VisitID │
│ FROM datasets.visits_v1 │
│ ) AS b USING (UserID) │
│ LIMIT 3 │
└─────────────────────────────┘
14 rows in set. Elapsed: 0.002 sec.
3.谓词下推
当 group by 有 having 子句,但是没有 with cube、with rollup 或者 with totals 修饰的时候,having 过滤会下推到 where 提前过滤。例如下面的查询,HAVING name 变成了 WHERE name,在 group by 之前过滤:
date1002 :) EXPLAIN SYNTAX SELECT UserID FROM datasets.hits_v1 GROUP BY UserID HAVING UserID ='8585742290196126178';
EXPLAIN SYNTAX
SELECT UserID
FROM datasets.hits_v1
GROUP BY UserID
HAVING UserID = '8585742290196126178'
Query id: 87cd76aa-947a-4cf3-bd00-b21d2a5951cf
┌─explain──────────────────────────────┐
│ SELECT UserID │
│ FROM datasets.hits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ GROUP BY UserID │
└──────────────────────────────────────┘
4 rows in set. Elapsed: 0.002 sec.
子查询也支持谓词下推:
EXPLAIN SYNTAX
SELECT *
FROM
(
SELECT UserID
FROM datasets.visits_v1
)
WHERE UserID = '8585742290196126178'
Query id: 4db29eee-5dc5-4bf5-84de-564e2f58a4f6
┌─explain──────────────────────────────────┐
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM datasets.visits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
└──────────────────────────────────────────┘
8 rows in set. Elapsed: 0.002 sec.
另外一个例子
EXPLAIN SYNTAX
SELECT *
FROM
(
SELECT *
FROM
(
SELECT UserID
FROM datasets.visits_v1
)
UNION ALL
SELECT *
FROM
(
SELECT UserID
FROM datasets.visits_v1
)
)
WHERE UserID = '8585742290196126178'
Query id: b317cb59-9594-4d4f-8e5e-6ddf2f3bd52f
┌─explain──────────────────────────────────────┐
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM datasets.visits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
│ UNION ALL │
│ SELECT UserID │
│ FROM │
│ ( │
│ SELECT UserID │
│ FROM datasets.visits_v1 │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
│ ) │
│ WHERE UserID = '8585742290196126178' │
└──────────────────────────────────────────────┘
22 rows in set. Elapsed: 0.005 sec.
4.聚合计算外推
聚合函数内的计算,会外推,例如:
EXPLAIN SYNTAX
SELECT sum(UserID * 2)
FROM datasets.visits_v1
Query id: a1fa9181-51b0-427f-96cc-dc106b8eade6
┌─explain─────────────────┐
│ SELECT sum(UserID) * 2 │
│ FROM datasets.visits_v1 │
└─────────────────────────┘
2 rows in set. Elapsed: 0.001 sec.
5.聚合函数消除
如果对聚合键,也就是 group by key 使用 min、max、any 聚合函数,则将函数消除,例如:
EXPLAIN SYNTAX
SELECT
sum(UserID * 2),
max(VisitID),
max(UserID)
FROM datasets.visits_v1
GROUP BY UserID
Query id: df74bce8-e31b-4ff5-8b7e-83bcbb3b8849
┌─explain─────────────────┐
│ SELECT │
│ sum(UserID) * 2, │
│ max(VisitID), │
│ UserID │
│ FROM datasets.visits_v1 │
│ GROUP BY UserID │
└─────────────────────────┘
6 rows in set. Elapsed: 0.002 sec.
6.删除重复的order by key
例如下面的语句,重复的聚合键 id 字段会被去重:
EXPLAIN SYNTAX
SELECT *
FROM datasets.visits_v1
ORDER BY
UserID ASC,
UserID ASC,
VisitID ASC,
VisitID ASC
Query id: 5c368e05-f17e-4600-8543-fb87c1035f7a
┌─explain───────────────────────────────────┐
│ SELECT │
│ CounterID, │
│ StartDate, │
│ Sign, │
│ IsNew, │
│ VisitID, │
│ UserID, │
│ StartTime, │
│ Duration, │
│ UTCStartTime, │
│ PageViews, │
│ Hits, │
│ IsBounce, │
│ Referer, │
│ StartURL, │
│ RefererDomain, │
│ StartURLDomain, │
│ EndURL, │
│ LinkURL, │
│ IsDownload, │
│ TraficSourceID, │
│ SearchEngineID, │
│ SearchPhrase, │
│ AdvEngineID, │
│ PlaceID, │
│ RefererCategories, │
│ URLCategories, │
│ URLRegions, │
│ RefererRegions, │
│ IsYandex, │
│ GoalReachesDepth, │
│ GoalReachesURL, │
│ GoalReachesAny, │
│ SocialSourceNetworkID, │
│ SocialSourcePage, │
│ MobilePhoneModel, │
│ ClientEventTime, │
│ RegionID, │
│ ClientIP, │
│ ClientIP6, │
│ RemoteIP, │
│ RemoteIP6, │
│ IPNetworkID, │
│ SilverlightVersion3, │
│ CodeVersion, │
│ ResolutionWidth, │
│ ResolutionHeight, │
│ UserAgentMajor, │
│ UserAgentMinor, │
│ WindowClientWidth, │
│ WindowClientHeight, │
│ SilverlightVersion2, │
│ SilverlightVersion4, │
│ FlashVersion3, │
│ FlashVersion4, │
│ ClientTimeZone, │
│ OS, │
│ UserAgent, │
│ ResolutionDepth, │
│ FlashMajor, │
│ FlashMinor, │
│ NetMajor, │
│ NetMinor, │
│ MobilePhone, │
│ SilverlightVersion1, │
│ Age, │
│ Sex, │
│ Income, │
│ JavaEnable, │
│ CookieEnable, │
│ JavascriptEnable, │
│ IsMobile, │
│ BrowserLanguage, │
│ BrowserCountry, │
│ Interests, │
│ Robotness, │
│ GeneralInterests, │
│ Params, │
│ `Goals.ID`, │
│ `Goals.Serial`, │
│ `Goals.EventTime`, │
│ `Goals.Price`, │
│ `Goals.OrderID`, │
│ `Goals.CurrencyID`, │
│ WatchIDs, │
│ ParamSumPrice, │
│ ParamCurrency, │
│ ParamCurrencyID, │
│ ClickLogID, │
│ ClickEventID, │
│ ClickGoodEvent, │
│ ClickEventTime, │
│ ClickPriorityID, │
│ ClickPhraseID, │
│ ClickPageID, │
│ ClickPlaceID, │
│ ClickTypeID, │
│ ClickResourceID, │
│ ClickCost, │
│ ClickClientIP, │
│ ClickDomainID, │
│ ClickURL, │
│ ClickAttempt, │
│ ClickOrderID, │
│ ClickBannerID, │
│ ClickMarketCategoryID, │
│ ClickMarketPP, │
│ ClickMarketCategoryName, │
│ ClickMarketPPName, │
│ ClickAWAPSCampaignName, │
│ ClickPageName, │
│ ClickTargetType, │
│ ClickTargetPhraseID, │
│ ClickContextType, │
│ ClickSelectType, │
│ ClickOptions, │
│ ClickGroupBannerID, │
│ OpenstatServiceName, │
│ OpenstatCampaignID, │
│ OpenstatAdID, │
│ OpenstatSourceID, │
│ UTMSource, │
│ UTMMedium, │
│ UTMCampaign, │
│ UTMContent, │
│ UTMTerm, │
│ FromTag, │
│ HasGCLID, │
│ FirstVisit, │
│ PredLastVisit, │
│ LastVisit, │
│ TotalVisits, │
│ `TraficSource.ID`, │
│ `TraficSource.SearchEngineID`, │
│ `TraficSource.AdvEngineID`, │
│ `TraficSource.PlaceID`, │
│ `TraficSource.SocialSourceNetworkID`, │
│ `TraficSource.Domain`, │
│ `TraficSource.SearchPhrase`, │
│ `TraficSource.SocialSourcePage`, │
│ Attendance, │
│ CLID, │
│ YCLID, │
│ NormalizedRefererHash, │
│ SearchPhraseHash, │
│ RefererDomainHash, │
│ NormalizedStartURLHash, │
│ StartURLDomainHash, │
│ NormalizedEndURLHash, │
│ TopLevelDomain, │
│ URLScheme, │
│ OpenstatServiceNameHash, │
│ OpenstatCampaignIDHash, │
│ OpenstatAdIDHash, │
│ OpenstatSourceIDHash, │
│ UTMSourceHash, │
│ UTMMediumHash, │
│ UTMCampaignHash, │
│ UTMContentHash, │
│ UTMTermHash, │
│ FromHash, │
│ WebVisorEnabled, │
│ WebVisorActivity, │
│ `ParsedParams.Key1`, │
│ `ParsedParams.Key2`, │
│ `ParsedParams.Key3`, │
│ `ParsedParams.Key4`, │
│ `ParsedParams.Key5`, │
│ `ParsedParams.ValueDouble`, │
│ `Market.Type`, │
│ `Market.GoalID`, │
│ `Market.OrderID`, │
│ `Market.OrderPrice`, │
│ `Market.PP`, │
│ `Market.DirectPlaceID`, │
│ `Market.DirectOrderID`, │
│ `Market.DirectBannerID`, │
│ `Market.GoodID`, │
│ `Market.GoodName`, │
│ `Market.GoodQuantity`, │
│ `Market.GoodPrice`, │
│ IslandID │
│ FROM datasets.visits_v1 │
│ ORDER BY │
│ UserID ASC, │
│ VisitID ASC │
└───────────────────────────────────────────┘
186 rows in set. Elapsed: 0.004 sec.
7.删除重复的limit by key
例如下面的语句,重复声明的 name 字段会被去重:
EXPLAIN SYNTAX
SELECT *
FROM datasets.visits_v1
LIMIT 3 BY
VisitID,
VisitID
LIMIT 10
Query id: cfb47dca-ebf5-48c2-a913-032d2f054cb9
┌─explain───────────────────────────────────┐
│ SELECT │
│ CounterID, │
│ StartDate, │
│ Sign, │
│ IsNew, │
│ VisitID, │
│ UserID, │
│ StartTime, │
│ Duration, │
│ UTCStartTime, │
│ PageViews, │
│ Hits, │
│ IsBounce, │
│ Referer, │
│ StartURL, │
│ RefererDomain, │
│ StartURLDomain, │
│ EndURL, │
│ LinkURL, │
│ IsDownload, │
│ TraficSourceID, │
│ SearchEngineID, │
│ SearchPhrase, │
│ AdvEngineID, │
│ PlaceID, │
│ RefererCategories, │
│ URLCategories, │
│ URLRegions, │
│ RefererRegions, │
│ IsYandex, │
│ GoalReachesDepth, │
│ GoalReachesURL, │
│ GoalReachesAny, │
│ SocialSourceNetworkID, │
│ SocialSourcePage, │
│ MobilePhoneModel, │
│ ClientEventTime, │
│ RegionID, │
│ ClientIP, │
│ ClientIP6, │
│ RemoteIP, │
│ RemoteIP6, │
│ IPNetworkID, │
│ SilverlightVersion3, │
│ CodeVersion, │
│ ResolutionWidth, │
│ ResolutionHeight, │
│ UserAgentMajor, │
│ UserAgentMinor, │
│ WindowClientWidth, │
│ WindowClientHeight, │
│ SilverlightVersion2, │
│ SilverlightVersion4, │
│ FlashVersion3, │
│ FlashVersion4, │
│ ClientTimeZone, │
│ OS, │
│ UserAgent, │
│ ResolutionDepth, │
│ FlashMajor, │
│ FlashMinor, │
│ NetMajor, │
│ NetMinor, │
│ MobilePhone, │
│ SilverlightVersion1, │
│ Age, │
│ Sex, │
│ Income, │
│ JavaEnable, │
│ CookieEnable, │
│ JavascriptEnable, │
│ IsMobile, │
│ BrowserLanguage, │
│ BrowserCountry, │
│ Interests, │
│ Robotness, │
│ GeneralInterests, │
│ Params, │
│ `Goals.ID`, │
│ `Goals.Serial`, │
│ `Goals.EventTime`, │
│ `Goals.Price`, │
│ `Goals.OrderID`, │
│ `Goals.CurrencyID`, │
│ WatchIDs, │
│ ParamSumPrice, │
│ ParamCurrency, │
│ ParamCurrencyID, │
│ ClickLogID, │
│ ClickEventID, │
│ ClickGoodEvent, │
│ ClickEventTime, │
│ ClickPriorityID, │
│ ClickPhraseID, │
│ ClickPageID, │
│ ClickPlaceID, │
│ ClickTypeID, │
│ ClickResourceID, │
│ ClickCost, │
│ ClickClientIP, │
│ ClickDomainID, │
│ ClickURL, │
│ ClickAttempt, │
│ ClickOrderID, │
│ ClickBannerID, │
│ ClickMarketCategoryID, │
│ ClickMarketPP, │
│ ClickMarketCategoryName, │
│ ClickMarketPPName, │
│ ClickAWAPSCampaignName, │
│ ClickPageName, │
│ ClickTargetType, │
│ ClickTargetPhraseID, │
│ ClickContextType, │
│ ClickSelectType, │
│ ClickOptions, │
│ ClickGroupBannerID, │
│ OpenstatServiceName, │
│ OpenstatCampaignID, │
│ OpenstatAdID, │
│ OpenstatSourceID, │
│ UTMSource, │
│ UTMMedium, │
│ UTMCampaign, │
│ UTMContent, │
│ UTMTerm, │
│ FromTag, │
│ HasGCLID, │
│ FirstVisit, │
│ PredLastVisit, │
│ LastVisit, │
│ TotalVisits, │
│ `TraficSource.ID`, │
│ `TraficSource.SearchEngineID`, │
│ `TraficSource.AdvEngineID`, │
│ `TraficSource.PlaceID`, │
│ `TraficSource.SocialSourceNetworkID`, │
│ `TraficSource.Domain`, │
│ `TraficSource.SearchPhrase`, │
│ `TraficSource.SocialSourcePage`, │
│ Attendance, │
│ CLID, │
│ YCLID, │
│ NormalizedRefererHash, │
│ SearchPhraseHash, │
│ RefererDomainHash, │
│ NormalizedStartURLHash, │
│ StartURLDomainHash, │
│ NormalizedEndURLHash, │
│ TopLevelDomain, │
│ URLScheme, │
│ OpenstatServiceNameHash, │
│ OpenstatCampaignIDHash, │
│ OpenstatAdIDHash, │
│ OpenstatSourceIDHash, │
│ UTMSourceHash, │
│ UTMMediumHash, │
│ UTMCampaignHash, │
│ UTMContentHash, │
│ UTMTermHash, │
│ FromHash, │
│ WebVisorEnabled, │
│ WebVisorActivity, │
│ `ParsedParams.Key1`, │
│ `ParsedParams.Key2`, │
│ `ParsedParams.Key3`, │
│ `ParsedParams.Key4`, │
│ `ParsedParams.Key5`, │
│ `ParsedParams.ValueDouble`, │
│ `Market.Type`, │
│ `Market.GoalID`, │
│ `Market.OrderID`, │
│ `Market.OrderPrice`, │
│ `Market.PP`, │
│ `Market.DirectPlaceID`, │
│ `Market.DirectOrderID`, │
│ `Market.DirectBannerID`, │
│ `Market.GoodID`, │
│ `Market.GoodName`, │
│ `Market.GoodQuantity`, │
│ `Market.GoodPrice`, │
│ IslandID │
│ FROM datasets.visits_v1 │
│ LIMIT 3 BY VisitID │
│ LIMIT 10 │
└───────────────────────────────────────────┘
185 rows in set. Elapsed: 0.004 sec.
8.删除重复的USING Key
例如下面的语句,重复的关联键 id 字段会被去重:
EXPLAIN SYNTAX
SELECT
a.UserID,
a.UserID,
b.VisitID,
a.URL,
b.UserID
FROM datasets.hits_v1 AS a
LEFT JOIN datasets.visits_v1 AS b USING (UserID, UserID)
Query id: 83d2ea38-4d51-48fd-8674-618eb1379d33
┌─explain──────────────────────────────────────────────┐
│ SELECT │
│ UserID, │
│ UserID, │
│ VisitID, │
│ URL, │
│ b.UserID │
│ FROM datasets.hits_v1 AS a │
│ ALL LEFT JOIN datasets.visits_v1 AS b USING (UserID) │
└──────────────────────────────────────────────────────┘
8 rows in set. Elapsed: 0.003 sec.
9.标量替换
如果子查询只返回一行数据,在被引用的时候用标量替换,例如下面语句中的total_disk_usage 字段:
EXPLAIN SYNTAX
WITH (
SELECT sum(bytes)
FROM system.parts
WHERE active
) AS total_disk_usage
SELECT
(sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table
FROM system.parts
GROUP BY table
ORDER BY table_disk_usage DESC
LIMIT 10
Query id: 499e49b3-62e8-4a26-afd0-e3936d8c43f7
┌─explain─────────────────────────────────────────────────────────────────────────┐
│ WITH identity(_CAST(0, 'Nullable(UInt64)')) AS total_disk_usage │
│ SELECT │
│ (sum(bytes_on_disk AS bytes) / total_disk_usage) * 100 AS table_disk_usage, │
│ table │
│ FROM system.parts │
│ GROUP BY table │
│ ORDER BY table_disk_usage DESC │
│ LIMIT 10 │
└─────────────────────────────────────────────────────────────────────────────────┘
8 rows in set. Elapsed: 0.003 sec.
10.三元运算优化
如果开启了 optimize_if_chain_to_multiif 参数,三元运算符会被替换成 multiIf 函数,例如:
EXPLAIN SYNTAX
SELECT if(number = 1, 'hello', if(number = 2, 'world', 'fengzhen'))
FROM numbers(10)
Query id: 9601b406-349b-4af8-bccf-8ccfd62d8638
┌─explain─────────────────────────────────────────────────────────────┐
│ SELECT if(number = 1, 'hello', if(number = 2, 'world', 'fengzhen')) │
│ FROM numbers(10) │
└─────────────────────────────────────────────────────────────────────┘
2 rows in set. Elapsed: 0.001 sec.
开启multiif:
set optimize_if_chain_to_multiif = 1;
EXPLAIN SYNTAX
SELECT if(number = 1, 'hello', if(number = 2, 'world', 'fengzhen'))
FROM numbers(10)
Query id: 36d98f1c-d0b1-4ecb-952f-1a7a8393d187
┌─explain──────────────────────────────────────────────────────────────┐
│ SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'fengzhen') │
│ FROM numbers(10) │
└──────────────────────────────────────────────────────────────────────┘
2 rows in set. Elapsed: 0.001 sec.