现有一张员工的销售记录表,表样式如下。现在需要统计每个员工在2023年10月份,截止到每天的月累计销售额。注意:存在有的员工在某几天是没有销售记录的。
sale_date emp_id emp_name sale_amount
'2023-10-02' ,'101' ,'张三' , 1000
'2023-10-03' ,'101' ,'张三' , 3000
'2023-10-05' ,'101' ,'张三' , 4000
'2023-10-10' ,'101' ,'张三' , 2000
'2023-10-13' ,'101' ,'张三' , 5000
'2023-10-15' ,'101' ,'张三' , 4000
'2023-10-27' ,'101' ,'张三' , 2000
'2023-10-01' ,'102' ,'李四' , 1111
'2023-10-03' ,'102' ,'李四' , 2222
'2023-10-08' ,'102' ,'李四' , 3333
'2023-10-11' ,'102' ,'李四' , 111
'2023-10-23' ,'102' ,'李四' , 4550
'2023-10-28' ,'102' ,'李四' , 6666
如果是每个员工在每天都有销售记录,那么直接开窗就可以计算出来当月截至到每天的累计销售额。现在由于销售记录有缺失,他虽然在某天没有销售记录,但是他是有当月累计销售额的,且当月累计销售额与前一天的累计销售额一样,这种依然需要统计出来。所以我们首先考虑将每个人每天的销售记录补齐,当天没有销售记录的那么销售额给0,然后就可以开窗计算当月截至到每天的累计销售额了。
with temp as (
select '2023-10-02' as sale_date,'101' as emp_id,'张三' as emp_name, 1000 as sale_amount
union all
select '2023-10-03' as sale_date,'101' as emp_id,'张三' as emp_name, 3000 as sale_amount
union all
select '2023-10-05' as sale_date,'101' as emp_id,'张三' as emp_name, 4000 as sale_amount
union all
select '2023-10-10' as sale_date,'101' as emp_id,'张三' as emp_name, 2000 as sale_amount
union all
select '2023-10-13' as sale_date,'101' as emp_id,'张三' as emp_name, 5000 as sale_amount
union all
select '2023-10-15' as sale_date,'101' as emp_id,'张三' as emp_name, 4000 as sale_amount
union all
select '2023-10-27' as sale_date,'101' as emp_id,'张三' as emp_name, 2000 as sale_amount
union all
select '2023-10-01' as sale_date,'102' as emp_id,'李四' as emp_name, 1111 as sale_amount
union all
select '2023-10-03' as sale_date,'102' as emp_id,'李四' as emp_name, 2222 as sale_amount
union all
select '2023-10-08' as sale_date,'102' as emp_id,'李四' as emp_name, 3333 as sale_amount
union all
select '2023-10-11' as sale_date,'102' as emp_id,'李四' as emp_name, 111 as sale_amount
union all
select '2023-10-23' as sale_date,'102' as emp_id,'李四' as emp_name, 4550 as sale_amount
union all
select '2023-10-28' as sale_date,'102' as emp_id,'李四' as emp_name, 6666 as sale_amount
)
select
dt
,t1.emp_id
,t1.emp_name
,nvl(t2.sale_amount,0) as sale_amount
,sum(if(t2.sale_amount is null ,0,t2.sale_amount))over(partition by t1.emp_id order by t1.dt) as total_sale_amount
from
(select
date_add(t.start_date,tab.pos) as dt
,t.emp_id
,t.emp_name
from
(
select
emp_id
,emp_name
,'2023-10-01' as start_date
,'2023-10-31' as end_date
from temp
group by emp_id,emp_name
)t
lateral view posexplode(split(repeat(',',datediff(end_date,start_date)),',')) tab as pos,val
) t1
left join
(
select
sale_date
,emp_id
,emp_name
,sale_amount
from temp
)t2
on t1.dt=t2.sale_date and t1.emp_id=t2.emp_id and t1.emp_name=t2.emp_name
;
1、repeat函数
在 Hive 中,REPEAT 函数用于将指定字符串重复多次,返回一个新的字符串。它的语法如下:
REPEAT(str, n)
其中,str 表示要重复的字符串,n 表示要重复的次数。
sql1:
SELECT REPEAT('hello', 3) as words;
---结果为
words
hellohellohello
sql2:
SELECT repeat(',',datediff('2023-10-05','2023-10-01')) as symbol;
---结果为:
symbol
,,,,
sql3:
select posexplode(split(repeat(',',datediff('2023-10-05','2023-10-01')),',')) as (pos,val); ----4个逗号,其实是5个位置
---结果为:
pos val
0
1
2
3
4
备注:repeat 多和posexplod或explode等函数一起使用;
2、space函数
在 Hive 中,SPACE 函数用于返回由多个空格组成的字符串。它的语法如下:
SPACE(n)
其中,n 表示空格的数量。
sql1:
SELECT SPACE(5) as symbol;
---结果为
symbol
sql2:
select posexplode(split(space(datediff('2023-10-05','2023-10-01')),'')) as (pos,val); ----4个空格,其实是5个位置
---结果为:
pos val
0
1
2
3
4
备注:space多和posexplod或explode等函数一起使用;