Leetcode1132. 报告的记录 II(中等)

题目
动作表: Actions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| post_id       | int     |
| action_date   | date    |
| action        | enum    |
| extra         | varchar |
+---------------+---------+

这张表没有主键,并有可能存在重复的行。
action 列的类型是 ENUM,可能的值为 ('view', 'like', 'reaction', 'comment', 'report', 'share')。
extra 列拥有一些可选信息,例如:报告理由(a reason for report)或反应类型(a type of reaction)等。

移除表: Removals

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| post_id       | int     |
| remove_date   | date    | 
+---------------+---------+

这张表的主键是 post_id。
这张表的每一行表示一个被移除的帖子,原因可能是由于被举报或被管理员审查。

编写一段 SQL 来查找:在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位。

查询结果的格式如下:

Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 2       | 2019-07-04  | view   | null   |
| 2       | 2       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-03  | view   | null   |
| 5       | 2       | 2019-07-03  | report | racism |
| 5       | 5       | 2019-07-03  | view   | null   |
| 5       | 5       | 2019-07-03  | report | racism |
+---------+---------+-------------+--------+--------+

Removals table:

+---------+-------------+
| post_id | remove_date |
+---------+-------------+
| 2       | 2019-07-20  |
| 3       | 2019-07-18  |
+---------+-------------+

Result table:

+-----------------------+
| average_daily_percent |
+-----------------------+
| 75.00                 |
+-----------------------+

2019-07-04 的垃圾广告移除率是 50%,因为有两张帖子被报告为垃圾广告,但只有一个得到移除。
2019-07-02 的垃圾广告移除率是 100%,因为有一张帖子被举报为垃圾广告并得到移除。
其余几天没有收到垃圾广告的举报,因此平均值为:(50 + 100) / 2 = 75%
注意,输出仅需要一个平均值即可,我们并不关注移除操作的日期。

生成数据

DROP TABLE Actions;

CREATE TABLE Actions (user_id INT, post_id INT, action_date DATE, ACTION VARCHAR(20), extra VARCHAR(20));
 
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('1', '1', '2019-07-01', 'view', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('1', '1', '2019-07-01', 'like', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('1', '1', '2019-07-01', 'share', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('2', '2', '2019-07-04', 'view', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('2', '2', '2019-07-04', 'report', 'spam');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('3', '4', '2019-07-04', 'view', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('3', '4', '2019-07-04', 'report', 'spam');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('4', '3', '2019-07-02', 'view', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('4', '3', '2019-07-02', 'report', 'spam');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('5', '2', '2019-07-04', 'view', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('5', '2', '2019-07-04', 'report', 'racism');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('5', '5', '2019-07-04', 'view', 'None');
INSERT INTO Actions (user_id, post_id, action_date, ACTION, extra) VALUES ('5', '5', '2019-07-04', 'report', 'racism');


CREATE TABLE Removals(
post_id INT,
remove_date DATE );

INSERT INTO Removals VALUE(2, '2019-07-20'), (3, '2019-07-18');

解答
题目要求的结果是:在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位。

在被报告为垃圾广告的帖子为Actions表中extra为spam的记录 但其不一定都出现在Removals表中
进而可以求得移除帖子的每日占比
最后求得 移除帖子的每日平均占比

解答
先对两边进行连接

SELECT *
FROM Actions a
LEFT JOIN Removals b 
ON a.post_id = b.post_id;

选出被报告为垃圾广告的帖子

SELECT *
FROM Actions a
LEFT JOIN Removals b 
ON a.post_id = b.post_id
WHERE a.extra = 'spam';

对日期进行分组 统计a.post_id的数量即为垃圾邮件数量 统计b.post_id即为删除的数量

SELECT a.`action_date`, COUNT(DISTINCT a.`post_id`) AS spamCount, COUNT(DISTINCT b.`post_id`) AS delCount
FROM Actions a
LEFT JOIN Removals b 
ON a.post_id = b.post_id
WHERE a.extra = 'spam'
GROUP BY a.`action_date`;

delCount/spamCount即可得到求得移除帖子的每日占比

SELECT a.`action_date`, COUNT(DISTINCT b.`post_id`)/COUNT(DISTINCT a.`post_id`) AS daily_percent
FROM Actions a
LEFT JOIN Removals b 
ON a.post_id = b.post_id
WHERE a.extra = 'spam'
GROUP BY a.`action_date`;

子查询对每日占比取平均即为每日平均占比

SELECT ROUND(AVG(tmp.daily_percent),2) AS average_daily_percent
FROM (SELECT a.`action_date`, COUNT(DISTINCT b.`post_id`)/COUNT(DISTINCT a.`post_id`)*100 AS daily_percent
FROM Actions a
LEFT JOIN Removals b 
ON a.post_id = b.post_id
WHERE a.extra = 'spam'
GROUP BY a.`action_date`) AS tmp;

别人的解答

SELECT round(SUM(delCount / spamCount * 100)  / COUNT(DISTINCT action_date), 2) AS average_daily_percent
FROM (
-- count行数
    SELECT action_date, COUNT(distinct a.post_id) AS spamCount, count(distinct b.post_id) AS delCount
    FROM Actions a
        LEFT JOIN Removals b ON a.post_id = b.post_id
    where a.extra = 'spam'
    GROUP BY a.action_date
) a

你可能感兴趣的:(Leetcode1132. 报告的记录 II(中等))