hive函数——lag和lead函数取偏移量

应用场景

实际业务中,企业客户的日志、订单、购买记录等存在多条,在计算频次、购买间隔、忠诚度等指标业务中会需要连续两次购买(活跃)之间的间隔时长数据,这个时候就可以用到lag和lead函数

原数据如下,目标是客户取距离上次购买间隔时长

customer_id

biz_date

200031

2021-01-27

200031

2021-06-23

200031

2021-10-18

200031

2021-10-27

200031

2021-11-30

200031

2021-12-25

200031

2021-12-29

200031

2021-12-30

200031

2022-01-19

200031

2022-03-10

200031

2022-04-18

200031

2022-05-27

200031

2022-07-06

200031

2022-09-27

代码如下

lag(字段,向上取几行,取不到给默认值),lag函数是可以直接取指定行数,取不到默认为null,当然也可以自己赋值;

lead 同理,向下取行。

select customer_id,biz_date,
    lag(biz_date,1) over (partition by customer_id order by biz_date) as time1,-- 上一条记录时间
    lead(biz_date,1) over (partition by customer_id order by biz_date) as time2 -- 下一条记录时间
from table_name a;

lag结果的首条记录的上一次时间(已经是首条记录故没有上一次),默认为null,同理lead结果的末条记录时间的下一次时间为null

customer_id

购买日期

上一次时间

下一次时间

200031

2021-01-27

null

2021-06-23

200031

2021-06-23

2021-01-27

2021-10-18

200031

2021-10-18

2021-06-23

2021-10-27

200031

2021-10-27

2021-10-18

2021-11-30

200031

2021-11-30

2021-10-27

2021-12-25

200031

2021-12-25

2021-11-30

2021-12-29

200031

2021-12-29

2021-12-25

2021-12-30

200031

2021-12-30

2021-12-29

2022-01-19

200031

2022-01-19

2021-12-30

2022-03-10

200031

2022-03-10

2022-01-19

2022-04-18

200031

2022-04-18

2022-03-10

2022-05-27

200031

2022-05-27

2022-04-18

2022-07-06

200031

2022-07-06

2022-05-27

2022-09-27

200031

2022-09-27

2022-07-06

null

加入年度起始、结束日期,并计算距上次购买间隔天数

select customer_id,biz_date,
    lag(biz_date,1,'2021-01-01') over (partition by customer_id order by biz_date) as time1,-- 上一条记录时间
    lead(biz_date,1,'2022-12-31') over (partition by customer_id order by biz_date) as time2,-- 下一条记录时间
    datediff(biz_date,lag(biz_date,1,'2021-01-01') over (partition by customer_id order by biz_date)) -- 距离上次间隔天数
from table_name a;

customer_id

购买日期

上一次时间

下一次时间

距离上次间隔天数

200031

2021-01-27

2021-01-01

2021-06-23

26

200031

2021-06-23

2021-01-27

2021-10-18

147

200031

2021-10-18

2021-06-23

2021-10-27

117

200031

2021-10-27

2021-10-18

2021-11-30

9

200031

2021-11-30

2021-10-27

2021-12-25

34

200031

2021-12-25

2021-11-30

2021-12-29

25

200031

2021-12-29

2021-12-25

2021-12-30

4

200031

2021-12-30

2021-12-29

2022-01-19

1

200031

2022-01-19

2021-12-30

2022-03-10

20

200031

2022-03-10

2022-01-19

2022-04-18

50

200031

2022-04-18

2022-03-10

2022-05-27

39

200031

2022-05-27

2022-04-18

2022-07-06

39

200031

2022-07-06

2022-05-27

2022-09-27

40

200031

2022-09-27

2022-07-06

2022-12-31

83

大致常规使用,不放心可以加一列根据购买日期排序的序号编码进行验证。

你可能感兴趣的:(#,hive函数,hive,hive,sql)