Hive实际工作场景Sql题(业务自想)

HiveSql练习题

工作之余,结合业务所需构思的工作时常遇sql效果场景(实际业务场景可结合sql题自我构思)

有更好的sql解题思路欢迎大家到评论区交流

第一题

题目

数据原型:
time,t1,t2,t3
2021-07-01 00:01:01,1,4,1
2021-07-01 00:01:03,1,5,1
2021-07-01 00:01:11,1,6,1
2021-07-01 00:01:13,1,7,0
2021-07-01 00:01:23,1,8,0
2021-07-01 00:01:24,1,9,1
2021-07-01 00:02:24,1,10,1

所需效果:
2021-07-01 00:01:01,3,4,1,2021-07-01 00:01:01,2021-07-01 00:01:13
2021-07-01 00:01:13,2,7,0,2021-07-01 00:01:13,2021-07-01 00:01:24
2021-07-01 00:01:24,2,9,1,2021-07-01 00:01:24,2021-07-01 00:02:24

答案

答案方式1:
with tmp1 as(
select
 time,
 t1,t2,t3,
 row_number() over(partition by t3 order by time) as sumOver
from demo10
),tmp2 as(
select
 time,
 t1,t2,t3,(t2-sumOver) as groupId
from tmp1
),tmp3 as (
 select
  min(time) as time,
  min(t1) as tmpScheme,
  sum(t1) as t1,
  min(t2) as t2,
  min(t3) as t3
 from tmp2 group by groupId
),tmp4 as(
 select
   time,t1,t2,t3,
   time,
   lead(`time`,1,`time`) over(partition by tmpScheme order by `time` ) as last_time
 from tmp3
)
select * from tmp4 order by `time`

答案方式2:
with tmp1 as(
select
 time,
 t1,t2,t3,(t2-t3) as diffNum,
 sum(`t3`) over(partition by t1 order by time) as sumOver
from demo10
),tmp2 as(
 select
  time,t1,t2,t3,diffNum,sumOver,if(t3!=0,(diffNum-sumOver),(diffNum-sumOver)-diffNum) as groupId
 from tmp1
),tmp3 as (
 select
  min(time) as time,
  min(t1) as stationid,
  sum(t1) as t1,
  min(t2) as t2,
  min(t3) as t3
 from tmp2 group by groupId order by time
),tmp4 as (
 select
  time,t1,t2,t3,time as startTime,lead(time,1,time) over(partition by stationid order by time) as endTime
 from tmp3
)
select * from tmp4

第二题

题目

数据原型
姓名,课程,分数
张三,语文,74
张三,数学,83
张三,物理,93
李四,语文,74
李四,数学,84
李四,物理,94

所需效果
姓名,语文,数学,物理,总分,平均分
李四,74,84,94,252,84.00
张三,74,83,93,250,83.33

答案

答案方式1
select
 `name`,
 sum(if(course='语文',score,0)) as chinese,
 sum(if(course='数学',score,0)) as math,
 sum(if(course='物理',score,0)) as physics,
 sum(score) as sum,
 round(avg(score),1) as avg
from questionV2 group by `name`

答案方式2
with tmp1 as (
select
 `name`,
 collect_list(cast(`score` as string))[0] as `chinese`,
 collect_list(cast(`score` as string))[1] as `math`,
 collect_list(cast(`score` as string))[2] as `physics`,
 sum(`score`) as sum,
 avg(`score`) as score 
from questionV2 group by `name`
)
select * from tmp1

答案方式3
with tmp1 as (
select
 `name`,
 str_to_map(concat_ws(',',collect_set(concat_ws(':',course,score)))) as `courseList`,
 sum(score) as sum,
 avg(score) as avg
from
 questionV2
 group by `name`
),tmp2 as (
 select
  `name`,
  courseList['语文'] as `chinese`,
  courseList['数学'] as `math`,
  courseList['物理'] as `physics`,
  sum,avg
 from tmp1
)
select * from tmp2

第三题

题目

数据原型

stationid hour1 hour2 hour3 hour4 hour5 hour6 hour7 hour8 hour9 hour10 hour11 hour12 hour13 hour14 hour15 hour16 hour17 hour18 hour19 hour20 hour21 hour22 hour23 hour24
G01X01 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0

展现效果

stationid col0
G01X01 11.0
G01X01 12.0
G01X01 13.0
G01X01 14.0
G01X01 15.0
G01X01 16.0
G01X01 17.0
G01X01 18.0
G01X01 19.0
G01X01 20.0
G01X01 21.0
G01X01 22.0
G01X01 23.0
G01X01 24.0
G01X01 25.0
G01X01 26.0
G01X01 27.0
G01X01 28.0
G01X01 29.0
G01X01 30.0
G01X01 31.0
G01X01 32.0
G01X01 33.0
G01X01 34.0

答案

实现方式1select
 `stationid`,
 posexplode(split(concat_ws(',',hour1,hour2,hour3,hour4,hour5,hour6,hour7,hour8,hour9,hour10,hour11,
 hour12,hour13,hour14,hour15,hour16,hour17,hour18,hour19,hour20,hour21,hour22,hour23,hour24),","))
from questionV3


实现方式2select
 `stationid`,
 stack(24,hour1,hour2,hour3,hour4,hour5,hour6,hour7,hour8,hour9,hour10,hour11,hour12,hour13,hour14,
hour15,hour16,hour17,hour18,hour19,hour20,hour21,hour22,hour23,hour24)
from questionV3

题目四

题目

题目原型(前后数值超过5的 过滤出来)

stationid num ct
A001 5 1623733736
A001 10 1623733737
A001 15 1623733738
A001 20 1623733739
A001 30 1623733740
A001 35 1623733741
A001 40 1623733742
A001 50 1623733743
A001 55 1623733744

展现效果

stationid num dt session_id
A002 5 1623733736 0
A002 10 1623733737 0
A002 15 1623733738 0
A002 20 1623733739 0
A002 30 1623733740 30
A002 35 1623733741 30
A002 40 1623733742 30
A002 50 1623733743 50
A002 55 1623733744 50

答案

实现方式1:
with tmp1 as(
select
 uid,num,dt,
 lag(`num`,1,0) over(partition by `uid` order by dt) as lag_num
from questionV4
),tmp2 as(
 select uid,num,dt,lag_num,(num-lag_num) as session_id from tmp1
),tmp3 as(
 select
  uid,num,dt,
  if(session_id>5,num,0) as max_session_id
 from tmp2
),tmp4 as(
 select
  uid,num,dt,
  max(`max_session_id`) over(partition by `uid` order by dt) as new_session_id
 from tmp3
)
select * from tmp4

你可能感兴趣的:(大数据,sql,hive,数据库)