题目描述
用户行为日志表tb_user_log
id |
uid |
artical_id |
in_time |
out_time |
sign_cin |
1 |
101 |
9001 |
2021-11-01 10:00:00 |
2021-11-01 10:00:11 |
0 |
2 |
102 |
9001 |
2021-11-01 10:00:09 |
2021-11-01 10:00:38 |
0 |
3 |
103 |
9001 |
2021-11-01 10:00:28 |
2021-11-01 10:00:58 |
0 |
4 |
104 |
9002 |
2021-11-01 11:00:45 |
2021-11-01 11:01:11 |
0 |
5 |
105 |
9001 |
2021-11-01 10:00:51 |
2021-11-01 10:00:59 |
0 |
6 |
106 |
9002 |
2021-11-01 11:00:55 |
2021-11-01 11:01:24 |
0 |
7 |
107 |
9001 |
2021-11-01 10:00:01 |
2021-11-01 10:01:50 |
0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:artical_id-文章ID代表用户浏览的文章的ID,artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。
问题:统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,先记录用户数增加再记录减少,结果按最大人数降序。
输出示例:
示例数据的输出结果如下
artical_id |
max_uv |
9001 |
3 |
9002 |
2 |
解释:10点0分10秒时,有3个用户正在浏览文章9001;11点01分0秒时,有2个用户正在浏览文章9002。
解题思路
1. 进入时间 和离开时间分别记为 1 和 -1, 分别统计,最后用union all 连接
-- in_time 和 out_time 都标记为 num,分别用 1 和-1 代表
with t1 as (
select artical_id, in_time dt,1 num from tb_user_log
where artical_id != 0
union all
select artical_id, out_time dt, -1 num from tb_user_log
where artical_id != 0
)
2. 使用窗口函数,sum(num)over()实现对同时在线人数的统计
select artical_id,
sum(num) over(partition by artical_id order by dt asc,num desc) as cnt
from
t1
3. 按照article_id分组,求取每组的最大num
select artical_id, max(cnt) max_uv from
(
select artical_id,
sum(num) over(partition by artical_id order by dt asc,num desc) as cnt
from
t1
)
题目描述
用户打车记录表tb_get_car_record
id |
uid |
city |
event_time |
end_time |
order_id |
1 |
108 |
北京 |
2021-10-20 08:00:00 |
2021-10-20 08:00:40 |
9008 |
2 |
118 |
北京 |
2021-10-20 08:00:10 |
2021-10-20 08:00:45 |
9018 |
3 |
102 |
北京 |
2021-10-20 08:00:30 |
2021-10-20 08:00:50 |
9002 |
4 |
106 |
北京 |
2021-10-20 08:05:41 |
2021-10-20 08:06:00 |
9006 |
5 |
103 |
北京 |
2021-10-20 08:05:50 |
2021-10-20 08:07:10 |
9003 |
6 |
104 |
北京 |
2021-10-20 08:01:01 |
2021-10-20 08:01:20 |
9004 |
7 |
105 |
北京 |
2021-10-20 08:01:15 |
2021-10-20 08:01:30 |
9019 |
8 |
101 |
北京 |
2021-10-20 08:28:10 |
2021-10-20 08:30:00 |
9011 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id |
order_id |
uid |
driver_id |
order_time |
start_time |
finish_time |
mileage |
fare |
grade |
1 |
9008 |
108 |
204 |
2021-10-20 08:00:40 |
2021-10-20 08:03:00 |
2021-10-20 08:31:00 |
13.2 |
38 |
4 |
2 |
9018 |
108 |
214 |
2021-10-20 08:00:45 |
2021-10-20 08:04:50 |
2021-10-20 08:21:00 |
14 |
38 |
5 |
3 |
9002 |
102 |
202 |
2021-10-20 08:00:50 |
2021-10-20 08:06:00 |
2021-10-20 08:31:00 |
10 |
41.5 |
5 |
4 |
9006 |
106 |
206 |
2021-10-20 08:06:00 |
2021-10-20 08:09:00 |
2021-10-20 08:31:00 |
8 |
25.5 |
4 |
5 |
9003 |
103 |
203 |
2021-10-20 08:07:10 |
2021-10-20 08:15:00 |
2021-10-20 08:31:00 |
11 |
41.5 |
4 |
6 |
9004 |
104 |
204 |
2021-10-20 08:01:20 |
2021-10-20 08:13:00 |
2021-10-20 08:31:00 |
7.5 |
22 |
4 |
7 |
9019 |
105 |
205 |
2021-10-20 08:01:30 |
2021-10-20 08:11:00 |
2021-10-20 08:51:00 |
10 |
39 |
4 |
8 |
9011 |
101 |
211 |
2021-10-20 08:30:00 |
2021-10-20 08:31:00 |
2021-10-20 08:54:00 |
10 |
35 |
5 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
问题:请统计各个城市在2021年10月期间,单日中最大的同时等车人数。
注: 等车指从开始打车起,直到取消打车、取消等待或上车前的这段时间里用户的状态。
如果同一时刻有人停止等车,有人开始等车,等车人数记作先增加后减少。
结果按各城市最大等车人数升序排序,相同时按城市升序排序。
解题思路
回归主题: 最大同时等车人数 ,可以理解为最大同时在线人数。
1. 确定状态: 等车时间指的是 从开始打车起,直到取消打车、取消等待或上车前的这段时间里用户的状态。
打车开始时间:event_time
等车结束时间:三种情况:取消打车 | 取消等待 |上车。
分别对应 order_id is null, start_time is null , start_time。
分类使用 case when ,构建临时表:
with t1 as(
select tcr.uid, city, event_time,
(case when tcr.order_id is null then end_time
when start_time is null then finish_time
when start_time is not null then start_time end) as e_time
from tb_get_car_record tcr
join tb_get_car_order tco
using(order_id)
where date(event_time) between '2021-10-1' and '2021-10-31'
)
2. 在获得临时表之后的操作就和求同时在线人数一样, 等车开始时间和等车结束时间分别做统计,区别是开始是1 , 结束是-1, 最后使用union all 联结。
SELECT city,event_time AS dt,1 AS num FROM t1
UNION ALL
SELECT city,e_time AS dt,- 1 AS num FROM t1
3. 按照city,时间分区,按照时间升序,人数降序排序。
SELECT city,
sum( num ) over ( PARTITION BY city,date(dt) order by dt, num desc ) total_num
FROM
(
SELECT
city,
event_time AS dt,
1 AS num
FROM
t1 UNION ALL
SELECT
city,
e_time AS dt,
- 1 AS num
FROM
t1
) t2
4. 按照city分组,统计求最大的同时在线人数
SELECT
city,
max( total_num ) max_wait_uv
FROM
(
SELECT
city,
sum( num ) over ( PARTITION BY city,date(dt) order by dt, num desc ) total_num
FROM
(
SELECT
city,
event_time AS dt,
1 AS num
FROM
t1 UNION ALL
SELECT
city,
e_time AS dt,
- 1 AS num
FROM
t1
) t2
) t3
GROUP BY
city
ORDER BY
max_wait_uv,
city
问题描述
牛客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id |
course_name |
course_datetime |
1 |
Python |
2021-12-1 19:00-21:00 |
2 |
SQL |
2021-12-2 19:00-21:00 |
3 |
R |
2021-12-3 19:00-21:00 |
上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):
user_id |
course_id |
in_datetime |
out_datetime |
100 |
1 |
2021-12-01 19:00:00 |
2021-12-01 19:28:00 |
100 |
1 |
2021-12-01 19:30:00 |
2021-12-01 19:53:00 |
101 |
1 |
2021-12-01 19:00:00 |
2021-12-01 20:55:00 |
102 |
1 |
2021-12-01 19:00:00 |
2021-12-01 19:05:00 |
104 |
1 |
2021-12-01 19:00:00 |
2021-12-01 20:59:00 |
101 |
2 |
2021-12-02 19:05:00 |
2021-12-02 20:58:00 |
102 |
2 |
2021-12-02 18:55:00 |
2021-12-02 21:00:00 |
104 |
2 |
2021-12-02 18:57:00 |
2021-12-02 20:56:00 |
107 |
2 |
2021-12-02 19:10:00 |
2021-12-02 19:18:00 |
100 |
3 |
2021-12-03 19:01:00 |
2021-12-03 21:00:00 |
102 |
3 |
2021-12-03 18:58:00 |
2021-12-03 19:05:00 |
108 |
3 |
2021-12-03 19:01:00 |
2021-12-03 19:56:00 |
请你统计每个科目最大同时在线人数(按course_id排序),以上数据的输出结果如下:
course_id |
course_name |
max_num |
1 |
Python |
4 |
2 |
SQL |
4 |
3 |
R |
3 |
解题思路
1. 确定开始和结束时间,in_datetime 和 out_datetime, 然后分别标记为1和-1
2. 使用窗口函数sum()对num进行求和
3. 课程表和上课情况表连接,可以使用笛卡尔积,也可以 情况表左连接课程表,然后按照course_id 和course_name 进行分组
注意: 只按照course_id会报错,意思是course_name不依赖于course_id
contains nonaggregated column 'c.course_name' which is not functionally dependent on columns in GROUP BY clause;
select
b.course_id,
c.course_name,
max(cnt) max_num
from(
select course_id,sum(num) over (partition by course_id order by dt asc,num desc) cnt
from(
select course_id,in_datetime dt,1 num from attend_tb
union all
select course_id,out_datetime dt,-1 num from attend_tb)
as a
)as binner join course_tb c on c.course_id = b.course_id
group by c.course_id,c.course_name
order by c.course_id