Hive分析函数之ntile、排名函数学习

1、Ntile使用
可以看成是:它把有序的数据集合平均分配到指定的数量(num)个桶中, 将桶号分配给每一行。如果不能平均分配,则优先分配较小编号的桶,并且各个桶中能放的行数最多相差1。
语法是:ntile (num)  over ([partition_clause]  order_by_clause)  as your_bucket_num
然后可以根据桶号,选取前或后 n分之几的数据。

数据会完整展示出来,只是给相应的数据打标签;具体要取几分之几的数据,需要再嵌套一层根据标签取出。
1.1、总体分片
select uid,sum(amount) pay_amount,ntile(100)over(order by sum(amount) desc) til
from data_chushou_pay_info
where pt_day between '2017-01-01' and '2017-11-14' and state=0
group by uid;

select pt_month,sum(amount) pay_amount,ntile(3)over(order by sum(amount) desc) til
from data_chushou_pay_info
where pt_month between '2017-01' and '2017-11' and state=0
group by pt_month;

1.2、分组内分片
select pt_month,pt_day,sum(amount) pay_amount,ntile(3)over(partition by pt_month order by sum(amount) desc) til
from data_chushou_pay_info
where pt_month between '2017-09' and '2017-11' and state=0
group by pt_month,pt_day;

2、排名函数
ROW_NUMBER() 
–从1开始,按照顺序,生成分组内记录的序列
RANK 和 DENSE_RANK
—RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
—DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位

2.1、分组3种排名
select pt_month,uid,sum(amount) pay_amount,
ROW_NUMBER()over(partition by pt_month order by sum(amount) desc) rk1,
RANK()over(partition by pt_month order by sum(amount) desc) rk2,
DENSE_RANK()over(partition by pt_month order by sum(amount) desc) rk3
from data_chushou_pay_info
where pt_day between '2017-01-01' and '2017-11-14' and state=0
group by pt_month,uid;

2.2、总体3种排名
select pt_month,uid,sum(amount) pay_amount,
ROW_NUMBER()over(order by sum(amount) desc) rk1,
RANK()over(order by sum(amount) desc) rk2,
DENSE_RANK()over(order by sum(amount) desc) rk3
from data_chushou_pay_info
where pt_day between '2017-01-01' and '2017-11-14' and state=0
group by pt_month,uid;

你可能感兴趣的:(#,Hive,Sql)