由user.xml中的参数load_balancing
控制,共有四个选项:random、nearest_hostname、in_order、first_or_random
默认值,选择errors_count最小的replica,如果多个replica的errors_count相同,则随机选择一个
选择errors_count最小的replica,如果多个replica的errors_count相同,则采用逐一比较的方法,取与client的hostname不同字节最少的一个replica
选择errors_count最小的replica,如果多个replica的errors_count相同,则根据metrika.xml定义的replica顺序选择
选择errors_count最小的replica,如果多个replica的errors_count相同,则根据metrika.xml定义的replica顺序选择第一个,如果第一个不可用,则随机选择一个
如连接clickhouse1执行以下语句:
[root@clickhouse1 ~]# clickhouse-client -h clickhouse1 --port 9000 -u default --password default123 -m -n
ClickHouse client version 21.6.5.37 (official build).
Connecting to clickhouse1:9000 as user default.
Connected to ClickHouse server version 21.6.5 revision 54448.
clickhouse1 :) select * from distribute_test_all;
SELECT *
FROM distribute_test_all
Query id: 7ba77245-e180-4e1a-9ad2-e2b74b3ae6d2
┌─id─┬─name─┐
│ 1 │ yi │
└────┴──────┘
┌─id─┬─name─┐
│ 2 │ er │
└────┴──────┘
2 rows in set. Elapsed: 0.129 sec.
clickhouse1 :)
假设分片01选择的clickhouse1, 分片02选择的clickhouse3
则clickhouse1通过distribute_test_all表向clickhouse1和clickhouse3发起select * from distribute_test_local
命令进行子查询,最后clickhouse1再将结果进行联合
clickhouse1 :)
clickhouse1 :) select * from distribute_test_all;
SELECT *
FROM distribute_test_all
Query id: 3c96aaea-9f5b-41c7-ae57-59b7eb7268da
┌─id─┬─name─┐
│ 1 │ 一 │
└────┴──────┘
┌─id─┬─name─┐
│ 1 │ yi │
│ 2 │ er │
└────┴──────┘
3 rows in set. Elapsed: 0.041 sec.
clickhouse1 :)
clickhouse1 :) select * from distribute_test_all where name = '一' and id global in (select id from distribute_test_all where name = 'yi');
SELECT *
FROM distribute_test_all
WHERE (name = '一') AND (id GLOBAL IN
(
SELECT id
FROM distribute_test_all
WHERE name = 'yi'
))
Query id: ad0eea08-aab2-413e-aa63-8bb54b0459fc
┌─id─┬─name─┐
│ 1 │ 一 │
└────┴──────┘
1 rows in set. Elapsed: 0.142 sec.
clickhouse1 :)
select id from distribute_test_local where name = 'yi'
命令进行查询select * from distribute_test_all where name = '一' and id global in (临时内存表)
命令进行查询结论:尽量减少第2步的临时内存表大小
CREATE TABLE distribute_score_local ON CLUSTER sharding_ha
(
`id` UInt64,
`score` Float64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/distribute_score/{shard}', '{replica}')
ORDER BY id;
CREATE TABLE distribute_score_all ON CLUSTER sharding_ha
(
`id` UInt64,
`score` Float64
)
ENGINE = Distributed(sharding_ha, default, distribute_score_local, rand())
insert into distribute_score_all(id, score) values(2, 100);
clickhouse1 :)
clickhouse1 :) select * from distribute_test_all a join distribute_score_all b on a.id = b.id;
SELECT *
FROM distribute_test_all AS a
INNER JOIN distribute_score_all AS b ON a.id = b.id
Query id: 6180dcd3-15e8-499f-abdc-e8cb290011fa
┌─id─┬─name─┬─b.id─┬─score─┐
│ 2 │ er │ 2 │ 100 │
└────┴──────┴──────┴───────┘
1 rows in set. Elapsed: 0.330 sec.
clickhouse1 :)
clickhouse1 :) select * from distribute_test_all a global join distribute_score_all b on a.id = b.id;
SELECT *
FROM distribute_test_all AS a
GLOBAL INNER JOIN distribute_score_all AS b ON a.id = b.id
Query id: 122d6215-9261-4c97-aff5-8f28031313bb
┌─id─┬─name─┬─b.id─┬─score─┐
│ 2 │ er │ 2 │ 100 │
└────┴──────┴──────┴───────┘
1 rows in set. Elapsed: 0.095 sec.
clickhouse1 :)
select * from distribute_test_local a join distribute_score_all b on a.id = b.id
命令进行查询select * from distribute_score_local
命令进行查询(查询distribute_score_all第1次)select * from distribute_test_local a join 临时表1 b on a.id = b.id
select * from distribute_score_local
命令进行查询(查询distribute_score_all第2次)select * from distribute_test_local a join 临时表2 b on a.id = b.id
select * from distribute_score_local
命令进行查询(只查询distribute_score_all 1次)select * from distribute_test_local a join 临时表 b on a.id = b.id
命令进行查询结论:join右表尽量小
clickhouse没有这种实现,将join两边表相同的key, 分发到同一台服务器上进行数据操作
所有需要我们自己设计表的时候,需要将表尽量按同一key进行分片,这样执行select * from distribute_test_all a join distribute_score_local b on a.id = b.id
也能得到正确的结果