目标:需要求出每个时间段,有客人在住的房间数量。
如果只考虑一人一房,我们可以借助于同时在线人数统计的思路,入住时间加辅助标记记为1,离店时间加辅助标记记为-1,并按照时间进行顺序排序,求当前累计值,具体SQL如下:
参考文章:SQL同时在线问题分析(求最高在线人数以及最高峰时间段)【累加器思想,流式分析】--HiveSql面试题29_sql统计同时在线时间_莫叫石榴姐的博客-CSDN博客
类似于公交车上车下车场景,统计公交车某一时刻车上人数,上车加1,下车减1,最终某一时刻人数通过加1减1这种辅助标记来进行累加,累加反应的是某一时间节点处,截止当前状态情况。
with data as
(
select 7 user_id, 2004 room_num, '2021-03-05' in_time, '2021-03-07' out_time
union all
select 23 user_id, 2010 room_num, '2021-03-05' in_time, '2021-03-06' out_time
union all
select 7 user_id, 1003 room_num, '2021-03-07' in_time, '2021-03-08' out_time
union all
select 8 user_id, 2014 room_num, '2021-03-07' in_time, '2021-03-08' out_time
union all
select 14 user_id, 3001 room_num, '2021-03-07' in_time, '2021-03-10' out_time
union all
select 18 user_id, 3002 room_num, '2021-03-08' in_time, '2021-03-10' out_time
union all
select 23 user_id, 3020 room_num, '2021-03-08' in_time, '2021-03-09' out_time
union all
select 25 user_id, 2006 room_num, '2021-03-09' in_time, '2021-03-12' out_time
)
select *
from (select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
from (select time, sum(num) over (order by time) as acc_cnt
from (
select in_time as time, 1 as num
from data
union all
select out_time as time, -1 as num
from data
) t
) t
group by time, acc_cnt
) t
where end_time is not null
但是这里面有个问题,实际场景是一人一房间吗?如果有多个人共住一房间,今天退了一个人,明天又退了一个人,后天的时候才退完,虽然在退,但房间还是有人住的,这种情况,是不是也算?如果考虑这种情况则需要对累加的状态进行调整,此时需要考虑每个房间中截止当前时间的人数情况。
第一步:先求出每个房间当前时间的人数累计值,作为状态判断辅助条件
增加一行数据:
union all
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
具体SQL如下:
with data as
(
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
union all
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
union all
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
union all
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
union all
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
union all
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
union all
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
)
select time
, room_num
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
from (
select in_time as time, room_num, count(user_id) user_cnt
from data
group by in_time, room_num
union all
select out_time as time, room_num, -1 * count(user_id) user_cnt
from data
group by out_time, room_num
) t
第二步:基于累计的每个房间人数的状态变量进行判断:如果房间有人就标记1,没有人时候就标记为-1。标记思路实现:
方法1:case when 判断当user_cnt > 0时标记1,否则标记-1
case when user_cnt > 0 then 1 else -1 end num1
方法2:判断每个房间截止当前人数的最小值,如果大于0 标记为1,否则标记为-1
case when min(user_cnt) over (partition by room_num order by time) > 0 then 1 else -1 end
具体SQL如下:
with data as
(
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
union all
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
union all
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
union all
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
union all
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
union all
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
union all
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
)
select time
, room_num
, user_cnt
, case when user_cnt > 0 then 1 else -1 end num1
, case when min(user_cnt) over (partition by room_num order by time) > 0 then 1 else -1 end num2
from (select time
, room_num
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
from (select in_time as time, room_num, count(user_id) user_cnt
from data
group by in_time, room_num
union all
select out_time as time, room_num, -1 * count(user_id) user_cnt
from data
group by out_time, room_num
) t
) t
第三步:
基于上述结果,累计num值(其实本质就是通过构造辅助条件对辅助标记值进行变换,累加值变了),反应截止当前时间点 有人入住的房间数量,最终SQL如下:
with data as
(
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
union all
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
union all
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
union all
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
union all
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
union all
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
-- union all
-- select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
)
select *
from
(select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
from (select time
, sum(case when user_cnt > 0 then 1 else -1 end) over (order by time) acc_cnt
from (select time
, room_num
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
from (select in_time as time, room_num, count(user_id) user_cnt
from data
group by in_time, room_num
union all
select out_time as time, room_num, -1 * count(user_id) user_cnt
from data
group by out_time, room_num
) t
) t
) t
group by time, acc_cnt
) t
where end_time is not null
最终结果如下:
新增测试数据结果如下:
with data as
(
select 7 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-07' out_time
union all
select 23 user_id,2010 room_num,'2021-03-05' in_time,'2021-03-06' out_time
union all
select 7 user_id,1003 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 8 user_id,2014 room_num,'2021-03-07' in_time,'2021-03-08' out_time
union all
select 14 user_id,3001 room_num,'2021-03-07' in_time,'2021-03-10' out_time
union all
select 18 user_id,3002 room_num,'2021-03-08' in_time,'2021-03-10' out_time
union all
select 23 user_id,3020 room_num,'2021-03-08' in_time,'2021-03-09' out_time
union all
select 25 user_id,2006 room_num,'2021-03-09' in_time,'2021-03-12' out_time
union all
select 100 user_id,2004 room_num,'2021-03-05' in_time,'2021-03-08' out_time
)
select *
from
(select time as start_time, lead(time) over (order by time) as end_time, acc_cnt
from (select time, sum(case when user_cnt > 0 then 1 else -1 end) over (order by time) acc_cnt
from (select time
, room_num
, sum(user_cnt) over (partition by room_num ORDER BY TIME) user_cnt
from (select in_time as time, room_num, count(user_id) user_cnt
from data
group by in_time, room_num
union all
select out_time as time, room_num, -1 * count(user_id) user_cnt
from data
group by out_time, room_num
) t
) t
) t
group by time, acc_cnt
) t
where end_time is not null
本文总结了一种当前时间点状态统计的思路和方法,对于此类问题主要采用构造辅助计数变量及累加变换思路进行求解。常见的场景有:直播同时在线人数、服务器实时并发数、公家车当前时间段人数、某个仓库的货物积压数量,某一段时间内的同时处于服务过程中的最大订单量等