声明:只在我实际使用中碰到的一个问题
clickhouse官网文档中并没有找到full join 的信息,但是实际使用的时候是可以用 all full join 关联将两个不同接口的表通过某些字段关联成一张表。
问题:
我用两个子查询查出的结果用 all full join 关联,关联keyword_id和fr字段,但是两张表中除了这两个字段以外还有一个同名的account_id字段,遇到的问题是,第一个子查询得到的结果是44条,第二个子查询得到的结果是5条,能根据keyword_id 和fr关联上的数据只有两条。结果是47条数据,这个是没问题的,而有问题的是其中有3条是第二张表的与第一张没对应上的结果account_id变成空了。
如下:
sql:
SELECT
*
FROM
(
SELECT
fr,
keyword_id,
account_id,
sum(cost) AS ad_cost,
sum(cost_real) AS ad_cost_real,
sum(impression) AS ad_impression,
sum(click) AS ad_click
FROM
marketing.sem_keyword_report
WHERE
1 = 1
AND the_day >= '2018-12-12'
AND the_day <= '2018-12-18'
AND lower(fr) IN ('bd_sem')
AND account_id IN ('18091503')
GROUP BY
account_id,
keyword_id,
fr
) ALL
FULL JOIN (
SELECT
fr,
keyword_id,
account_id,
clue_all,
clue_all_new,
c1_kpi_daily_new_customer_amount,
c1_kpi_new_customer_amount,
c2_kpi_daily_new_customer_amount,
c2_kpi_new_customer_amount,
c2c_c1_create,
c2b_c1_create,
c2c_c1_onsite,
c2b_c1_onsite,
c2c_c1_onsale,
c2b_c1_onsale,
c2c_c2_appoint,
b2c_c2_appoint,
ssss_c2_appoint,
c2c_c2_finish_appoint,
b2c_c2_finish_appoint,
ssss_c2_finish_appoint,
c2c_c2_order,
b2c_c2_order,
weighting_number,
ssss_c2_order
FROM
(
SELECT
fr,
kid AS keyword_id,
sum(clue_all) AS clue_all,
sum(clue_all_new) AS clue_all_new,
sum(
c1_kpi_daily_new_customer_amount
) AS c1_kpi_daily_new_customer_amount,
sum(c1_kpi_new_customer_amount) AS c1_kpi_new_customer_amount,
sum(
c2_kpi_daily_new_customer_amount
) AS c2_kpi_daily_new_customer_amount,
sum(c2_kpi_new_customer_amount) AS c2_kpi_new_customer_amount,
sum(c2c_c1_create) AS c2c_c1_create,
sum(c2b_c1_create) AS c2b_c1_create,
sum(c2c_c1_onsite) AS c2c_c1_onsite,
sum(c2b_c1_onsite) AS c2b_c1_onsite,
sum(c2c_c1_onsale) AS c2c_c1_onsale,
sum(c2b_c1_onsale) AS c2b_c1_onsale,
sum(c2c_c2_appoint) AS c2c_c2_appoint,
sum(b2c_c2_appoint) AS b2c_c2_appoint,
sum(ssss_c2_appoint) AS ssss_c2_appoint,
sum(c2c_c2_finish_appoint) AS c2c_c2_finish_appoint,
sum(b2c_c2_finish_appoint) AS b2c_c2_finish_appoint,
sum(ssss_c2_finish_appoint) AS ssss_c2_finish_appoint,
sum(c2c_c2_order) AS c2c_c2_order,
sum(b2c_c2_order) AS b2c_c2_order,
sum(weighting_number) AS weighting_number,
sum(ssss_c2_order) AS ssss_c2_order
FROM
marketing.market_kid_stat_new_v5
WHERE
1 = 1
AND dims = 'clue'
AND dts >= '2018-12-12'
AND dts <= '2018-12-18'
AND lower(fr) IN ('bd_sem')
GROUP BY
keyword_id,
fr
) ANY
LEFT JOIN (
SELECT
fr,
keyword_id,
account_id
FROM
marketing.sem_keyword_type
) USING keyword_id,
fr
WHERE
1 = 1
AND lower(fr) IN ('bd_sem')
AND account_id IN ('18091503')
) USING keyword_id,fr
继续排查:
将右边的account_id 起别名显示时 account_id as aid
结果如下: aid显示的是第二张表能查到的5条数据对应的account_id ,
将第一张表与第二张表颠倒顺序,结果就是相反的。只有5条数据有accountid而其他42条没有account_id
数据字段是没有问题的,没有的为0,但是keyword_id和fr被补齐了,而account_id 却没有。
看了一下sql 可能是因为fr和keyword_id作为关联条件using了。那么尝试一下把account_id 也加入using
结果account_id被补全了
结果:
结论:
虽然我不是很清楚到底是什么机制导致了上述的问题,也不明白为什么最后在using中加入那个字段,结果就被补全了
但是有一点儿我能明白的是clickhouse使用 full join 还是隐约跟左右表有关,并不是真正实现将左右两张表完全放到根据条件相同完全补全字段。同名字段可能会存在问题。
以上是我遇到的问题以及解决,仅作为自己工作记录
感谢观看