我没有参加阿里秋招,在群里看见题目分享,今早做了一下。
name: table_t001
表table_t001, user_id(用户名)、dt(购买日期)、amt(购买金额),找出购买天数最多的用户和购买金额最多的用户,按要求输出用户id、购买天数、购买金额,备注是购买天数最多还是购买金额最多。
-- 临时表 v_tablet001
CREATE VIEW v_tablet001 AS
SELECT user_id,
COUNT(DISTINCT dt) AS dt_count,
SUM(amt) AS acc_amt
FROM table_t001
GROUP BY user_id;
临时表 v_tablet001
UNION
合并。-- 查询
SELECT user_id, dt_count '购买天数', acc_amt '购买金额', '最大购买天数' AS '备注' FROM v_tablet001
WHERE dt_count = (SELECT MAX(dt_count) FROM v_tablet001)
UNION
SELECT user_id, dt_count '购买天数', acc_amt '购买金额', '最大购买金额' AS '备注' FROM v_tablet001
WHERE acc_amt = (SELECT MAX(acc_amt) FROM v_tablet001);
至此,已经筛选出符合题目条件的记录。
但是,你有没有想过
如果有「同时是最大购买天数和最大购买金额」的情况呢?
如何根据情况判断,自动匹配备注,而不是手动提前判断并备注呢?
可见,上面的代码仅仅满足了题意,还有漏洞。
我们完善一下题目,
找出购买天数最多的用户和购买金额最多的用户,按要求输出用户id、购买天数、购买金额,备注是「购买天数最多」还是「购买金额最多」还是「同时购买天数和购买金额都最多」。
我新添加一条数据,创造第三种情况。新数据如下,
以下代码实现,基于新数据。
-- 建临时表 v_tablet001
CREATE VIEW v_tablet001 AS
SELECT user_id,
COUNT(DISTINCT dt) AS dt_count,
SUM(amt) AS acc_amt
FROM table_t001
GROUP BY user_id;
CASE WHEN
分情况添加备注。-- 查询,CASE WHEN 条件判断
SELECT user_id, dt_count '购买天数', acc_amt '购买金额', note '备注' FROM
(SELECT *,
CASE
WHEN dt_count = (SELECT MAX(dt_count) FROM v_tablet001) AND
acc_amt <> (SELECT MAX(acc_amt) FROM v_tablet001) THEN '最大购买天数'
WHEN acc_amt = (SELECT MAX(acc_amt) FROM v_tablet001) AND
dt_count <> (SELECT MAX(dt_count) FROM v_tablet001) THEN '最大购买金额'
WHEN dt_count = (SELECT MAX(dt_count) FROM v_tablet001 ) AND
acc_amt = (SELECT MAX(acc_amt) FROM v_tablet001) THEN '同时购买天数和购买金额都最多'
END AS note
FROM v_tablet001) AS new_table
WHERE note IS NOT NULL;
思路:用RANK()分别对购买天数和购买金额逆序排序,筛选排名第一的记录。
-- 查询,CASE WHEN 条件判断
SELECT user_id, dt_count '购买天数', acc_amt '购买金额',
CASE
WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
WHEN rank1 = 1 AND rank2 <> 1 THEN '最大购买天数'
WHEN rank2 = 1 AND rank1 <> 1 THEN '最大购买金额'
END AS note
FROM
(SELECT user_id, dt_count, acc_amt,
RANK() OVER (ORDER BY dt_count DESC) AS rank1,
RANK() OVER (ORDER BY acc_amt DESC) AS rank2
FROM v_tablet001) AS t
WHERE rank1 = 1 OR rank2 = 1;
完整代码如下,
SELECT user_id, dt_count '购买天数', acc_amt '购买金额',
CASE
WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
WHEN rank1 = 1 AND rank2 <> 1 THEN '最大购买天数'
WHEN rank2 = 1 AND rank1 <> 1 THEN '最大购买金额'
END AS note
FROM
(SELECT user_id, dt_count, acc_amt,
RANK() OVER (ORDER BY dt_count DESC) AS rank1,
RANK() OVER (ORder BY acc_amt DESC) AS rank2
FROM
(
SELECT user_id,
COUNT(DISTINCT dt) AS dt_count,
SUM(amt) AS acc_amt
FROM table_t001
GROUP BY user_id
) AS t1 -- t1对应原本的临时表
) AS t2
WHERE rank1 = 1 OR rank2 = 1;
WITH AS
子句改进也可以。WITH RECURSIVE t1 AS
(
SELECT user_id,
COUNT(DISTINCT dt) AS dt_count,
SUM(amt) AS acc_amt
FROM table_t001
GROUP BY user_id
)
SELECT user_id, dt_count '购买天数', acc_amt '购买金额',
CASE
WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
WHEN rank1 = 1 AND rank2 <> 1 THEN '最大购买天数'
WHEN rank2 = 1 AND rank1 <> 1 THEN '最大购买金额'
END AS note
FROM
(SELECT user_id, dt_count, acc_amt,
RANK() OVER (ORDER BY dt_count DESC) AS rank1,
RANK() OVER (ORder BY acc_amt DESC) AS rank2
FROM t1) AS t2
WHERE rank1 = 1 OR rank2 = 1;
-- 严谨的写法
SELECT user_id, dt_count '购买天数', acc_amt '购买金额',
CASE
WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
WHEN rank1 = 1 AND rank2 <> 1 THEN '最大购买天数'
WHEN rank2 = 1 AND rank1 <> 1 THEN '最大购买金额'
END AS note
FROM
……(省略)
分三种情况讨论备注时候,注意各情况之间要互斥。「同时购买天数和购买金额都最多」包含「购买天数最多」、「购买金额最多」。下面这段代码是不严谨的。
-- 情况1 包含了情况2和情况3。不严谨。
SELECT user_id, dt_count '购买天数', acc_amt '购买金额',
CASE
WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多'
WHEN rank1 = 1 THEN '最大购买天数'
WHEN rank2 = 1 THEN '最大购买金额'
END AS note
FROM
……(省略)
SELECT user_id, dt_count '购买天数', acc_amt '购买金额',
COALESCE
(CASE WHEN rank1 = 1 AND rank2 = 1 THEN '同时购买天数和购买金额都最多' ELSE NULL END,
CASE WHEN rank1 = 1 THEN '最大购买天数' ELSE NULL END,
CASE WHEN rank2 = 1 THEN '最大购买金额' ELSE NULL END) AS note
FROM
……省略
如果你不知道coalesce() 函数 (其实我也第一次用……
COALESCE是一个函数, (expression_1, expression_2, …,expression_n)依次参考各参数表达式,遇到非null值即停止并返回该值。如果所有的表达式都是空值,最终将返回一个空值。使用COALESCE在于大部分包含空值的表达式最终将返回空值。
来源:https://www.cnblogs.com/baxianhua/p/9100640.html
这样的话,三个情况是独立事件,因为不是写在同一个CASE WHEN里,而是三个分开。一旦遇到第一种情况,购买天数和购买金额最大同时满足,即停止并返回该值。所以不会出错。
目前展现在你面前的版本,比我最初的答案充实和发散了太多。在这个不断完善的过程,感谢aa和zliang给我的启发。
如若我还有疏漏之处,欢迎评论指点。
❤️ 「更多我的秋招经验贴」
《2020我的秋招总结帖 [数据分析岗] | 目录索引》