支出表: Spending
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| spend_date | date |
| platform | enum |
| amount | int |
+-------------+---------+
这张表记录了用户在一个在线购物网站的支出历史,该在线购物平台同时拥有桌面端('desktop')和手机端('mobile')的应用程序。
这张表的主键是 (user_id, spend_date, platform)。
平台列 platform 是一种 ENUM ,类型为('desktop', 'mobile')。
写一段 SQL 来查找每天 仅 使用手机端用户、仅 使用桌面端用户和 同时 使用桌面端和手机端的用户人数和总支出金额。
查询结果格式如下例所示:
Spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1 | 2019-07-01 | mobile | 100 |
| 1 | 2019-07-01 | desktop | 100 |
| 2 | 2019-07-01 | mobile | 100 |
| 2 | 2019-07-02 | mobile | 100 |
| 3 | 2019-07-01 | desktop | 100 |
| 3 | 2019-07-02 | desktop | 100 |
+---------+------------+----------+--------+
Result table:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop | 100 | 1 |
| 2019-07-01 | mobile | 100 | 1 |
| 2019-07-01 | both | 200 | 1 |
| 2019-07-02 | desktop | 100 | 1 |
| 2019-07-02 | mobile | 100 | 1 |
| 2019-07-02 | both | 0 | 0 |
+------------+----------+--------------+-------------+
在 2019-07-01, 用户1 同时 使用桌面端和手机端购买, 用户2 仅 使用了手机端购买,而用户3 仅 使用了桌面端购买。
在 2019-07-02, 用户2 仅 使用了手机端购买, 用户3 仅 使用了桌面端购买,且没有用户 同时 使用桌面端和手机端购买。
思路:
user_id
和spend_date
列为粒度进行汇总计算,其中结果列platform
可以分为“只有mobile”、“只有desktop”、“mobile和desktop列均有”的三种情况。代码片段解释:
select
user_id,
spend_date,
case when count(platform) = 2 then 'both' else platform end as platform,
sum(amount) amount,
count(distinct user_id) users
from spending
group by user_id, spend_date
order by spend_date
# Result:
{"headers": ["user_id", "spend_date", "platform", "amount", "users"], "values": [
[1, "2019-07-01", "both", 200, 1],
[2, "2019-07-01", "mobile", 100, 1],
[3, "2019-07-01", "desktop", 100, 1],
[2, "2019-07-02", "mobile", 100, 1],
[3, "2019-07-02", "desktop", 100, 1]
]}
S2:以spend_date
和platform
作为更粗粒度进行汇总;
S3:由于amount
和users
值为0的结果列也要显示,因此需要构造包含platform
三种情况的表,使用right join
进行关联,然后计算显示,具体SQL如下。
代码:
select
t2.spend_date,
t2.platform,
coalesce(sum(t1.amount), 0) total_amount, # coalesce(expr, expr, ...) ,返回第一个非空值
coalesce(sum(t1.users), 0) total_users
from (
select
user_id,
spend_date,
case when count(platform) = 2 then 'both' else platform end as platform,
sum(amount) amount,
count(distinct user_id) users
from spending
group by user_id, spend_date
) as t1 right join (
select spend_date, 'mobile' as platform from spending
union
select spend_date, 'desktop' as platform from spending
union
select spend_date, 'both' as platform from spending
) as t2 on t1.spend_date = t2.spend_date and t1.platform = t2.platform
group by t2.spend_date, t2.platform;
思路:
和方法 1 思路一样,区别是方法 2 构造表字段少一个spend_date字段,而且连接后用if
条件(等同于on
条件)进行筛选。
# 一个用户只能是一个平台(mobile/desktop/both), 这里的 amount 在 t2 表中被计算过了
sum(if(t1.platform = t2.platform, amount, 0))
# 一个用户只能是一个平台(mobile/desktop/both),所以输出 1
count(if(t1.platform = t2.platform, 1, null))
代码:
select
t1.spend_date,
t2.platform,
sum(if(t1.platform = t2.platform, amount, 0)) as total_amount, # 相当于方法1中的on条件
count(if(t1.platform = t2.platform, 1, null)) as total_users
from (
select
user_id,
spend_date,
if(count(platform) = 2, 'both', platform) platform,
sum(amount) amount
from spending
group by user_id, spend_date
) as t1, (
select 'mobile' as platform union
select 'desktop' as platform union
select 'both' as platform
) as t2 # 构造的表只有platform,无spend_date
group by t1.spend_date, t2.platform;
思考:什么情况下,基于A粒度的数据可以粒度的数据计算?
……
1、详细步骤解答
2、两种方法详细解答——用户购买平台