相同的筛选条件,有了group by后数值不一样,如
1、原始sql
SELECT
source,
--platform,
max(source_name) as source_name,
uniqExactMerge(device_id_state) as uv,
toStartOfMonth(toDate(date_str)) as date_str
FROM
mdata_flows_user_realtime_day_all table_alias
WHERE
bd_id != '6'
AND shop_type = '1'
AND type = 'prod'
AND source IN ('1')
AND bd_id IN ('5',
'12')
AND platform IN ('12',
'20')
AND (table_alias.date_str between '2021-05-01' and '2021-05-25'
or table_alias.date_str between '2020-05-01' and '2020-05-25'
or table_alias.date_str between '2021-04-01' and '2021-04-25')
GROUP BY
date_str,
source
--platform
结果:
source_name|source|uv |date_str |
-----------|------|-------|----------|
主站 |1 |1135552|2021-05-01
uv=1135552
2、再加一个group by字段
SELECT
new_id,
--platform,
uniqExactMerge(device_id_state) as uv,
toStartOfMonth(toDate(date_str)) as date_str
FROM
mdata_flows_user_realtime_day_all table_alias
WHERE
bd_id != '6'
AND source = '1'
AND shop_type = '1'
AND type = 'prod'
AND bd_id IN ('5',
'12')
AND platform IN ('12',
'20')
AND (table_alias.date_str between '2021-05-01' and '2021-05-25'
or table_alias.date_str between '2020-05-01' and '2020-05-25'
or table_alias.date_str between '2021-04-01' and '2021-04-25')
GROUP BY
date_str,
new_id
--platform
结果:
new_id|uv |date_str |
------|-------|----------|
0| 207473|2021-05-01|
1| 961056|2021-05-01
uv=207473+961056=1168529
3、继续添加一个字段platform
SELECT
new_id,
platform,
uniqExactMerge(device_id_state) as uv,
toStartOfMonth(toDate(date_str)) as date_str
FROM
mdata_flows_user_realtime_day_all table_alias
WHERE
bd_id != '6'
AND source = '1'
AND shop_type = '1'
AND type = 'prod'
AND bd_id IN ('5',
'12')
AND platform IN ('12',
'20')
AND (table_alias.date_str between '2021-05-01' and '2021-05-25'
or table_alias.date_str between '2020-05-01' and '2020-05-25'
or table_alias.date_str between '2021-04-01' and '2021-04-25')
GROUP BY
date_str,
new_id,
platform
结果:
new_id|platform|uv |date_str |
------|--------|------|----------|
1|20 |193091|2021-05-01|
1|12 |767966|2021-05-01|
0|20 | 36139|2021-05-01|
0|12 |171334|2021-05-01|
uv=767966+171334+36139+193091=1168530
可见,三个完全相同筛选条件的sql,只因为group by字段的不同数据有差异。
具体原因就是去重的顺序影响的,也就是 先汇总再去重 和先group by 再去重的区别。
后者往往会大于等于前者,不同group by分组中可能存在相同的去重字段。