ROW_NUMBER()与FIRST_VALUE(url)区别(取首次记录)

工作中经常会遇到一次时间可以进行多次更改的情况,每次更改都会生成一条记录。那么如果想要取第一条数据的情况下,可以使用函数来解决。
hive函数取首次数据:
1.FIRST_VALUE
取分组内排序后,截止到当前行,第一个值(弊端:只能取到第一个字段)
select distinct wo_id,first_value(created_at) over(partition by wo_id order by id asc) as change_appoint_time_fir
from ods.ods_mall_swo_operate_log
where dt=‘2019-07-31’ and type=‘CH_SW_APPOINT’ and wo_id=‘346617364164486041’

2.ROW_NUMBER()
适用性比较强,能够取得一整段,比如我第一次到店是北京,第二次上海,使用这个函数就可以。
select wo_id,created_at as change_appoint_time_fir from (
select wo_id,created_at,ROW_NUMBER() over(partition by wo_id order by id ) as rn
from ods.ods_mall_swo_operate_log
where dt=‘2019-07-31’ and type=‘CH_SW_APPOINT’)ga where rn=1 AND wo_id=‘346622775823278114’

语法:

SELECT cookieid,
createtime,
url,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS first1 
FROM test

cookie id createtime url rn first1

cookie 1 2015-04-10 10:00:00 url1 1 url1
cookie 1 2015-04-10 10:00:02 url2 2 url1
cookie 1 2015-04-10 10:03:04 url3 3 url1

你可能感兴趣的:(Hadoop,数据库知识)