Hive窗口函数之NTILE,ROW_NUMBER,RANK,DENSE_RANK

本博客中的数据使用的是**中的数据。

1、NTILE函数

NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值,如果切片不均匀,默认增加第一个切片的分布。

SELECT 
cookieid,
createtime,
pv,
NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime) AS rn1,
NTILE(3) OVER(PARTITION BY cookieid ORDER BY createtime) AS rn2,
NTILE(4) OVER(ORDER BY createtime) AS rn3      
FROM cookies 
ORDER BY cookieid,createtime;

结果:

Hive窗口函数之NTILE,ROW_NUMBER,RANK,DENSE_RANK_第1张图片

具体应用体现:统计一个cookie,pv数最多的前1/3的天

SELECT temp.cookieid, temp.createtime, temp.pv
FROM (SELECT cookieid,createtime,pv,NTILE(3) OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn FROM cookies) temp
WHERE temp.rn = 1;

结果:

Hive窗口函数之NTILE,ROW_NUMBER,RANK,DENSE_RANK_第2张图片

2、ROW_NUMBER()函数

ROW_NUMBER() –从1开始,按照顺序,生成分组内记录的序列–比如,按照pv降序排列,生成分组内每天的pv名次。

SELECT 
cookieid,
createtime,
pv,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn 
FROM cookies;

结果:

Hive窗口函数之NTILE,ROW_NUMBER,RANK,DENSE_RANK_第3张图片

3、RANK()和DENSE_RANK()函数

—RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
—DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位

SELECT 
cookieid,
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3 
FROM cookies 
WHERE cookieid = 'cookie1';

结果:

Hive窗口函数之NTILE,ROW_NUMBER,RANK,DENSE_RANK_第4张图片

你可能感兴趣的:(Hive)