Hive分析函数
Example:
Ntile(分片)
使用场景:计算百分之几的用户的结果
- 给了用户和每个用户对应的消费信息表, 计算花费前50%的用户的平均消费;
-- 把用户和消费表,按消费下降顺序平均分成2份
drop table if exists test_by_payment_ntile;
create table test_by_payment_ntile as
select
nick,
payment ,
NTILE(2) OVER(ORDER BY payment desc) AS rn
from test_nick_payment;
-- 分别对每一份计算平均值,就可以得到消费靠前50%和后50%的平均消费
select
'avg_payment' as inf,
t1.avg_payment_up_50 as avg_payment_up_50,
t2.avg_payment_down_50 as avg_payment_down_50
from
(select
avg(payment) as avg_payment_up_50
from test_by_payment_ntile
where rn=1
)t1
join
(select
avg(payment) as avg_payment_down_50
from test_by_payment_ntile
where rn=2
)t2
on (t1.dp_id=t2.dp_id);
Rank,Dense_Rank,Row_Number
使用场景:Top N
- Rank : 相同的排名会留下空缺,1,2,2,4
- Dense_Rank: 相同的排名不会留下空缺,1,2,2,3
- Row_Number:不会重复
Lag,Lead
使用场景:计算用户页面的停留时间
统计窗口内往上(往下)第n行值,当前行不算
-- 组内排序后,向后或向前偏移
-- 如果省略掉第三个参数,默认为NULL,否则补上。
select
dp_id,
mt,
payment,
LAG(mt,2) over(partition by dp_id order by mt) mt_new
from test2;
-- 组内排序后,向后或向前偏移
-- 如果省略掉第三个参数,默认为NULL,否则补上。
select
dp_id,
mt,
payment,
LEAD(mt,2,'1111-11') over(partition by dp_id order by mt) mt_new
from test2;
FIRST_VALUE, LAST_VALUE
使用场景:计算每个部门的最高工资与最低工资
-- FIRST_VALUE 获得组内当前行往前的首个值
-- LAST_VALUE 获得组内当前行往前的最后一个值
-- FIRST_VALUE(DESC) 获得组内全局的最后一个值
select
dp_id,
mt,
payment,
FIRST_VALUE(payment) over(partition by dp_id order by mt) payment_g_first,
LAST_VALUE(payment) over(partition by dp_id order by mt) payment_g_last,
FIRST_VALUE(payment) over(partition by dp_id order by mt desc) payment_g_last_global
from test2
ORDER BY dp_id,mt;
多维度
使用场景:计算每一类圈子的观看量,和每一类圈子下每一个标签视频的观看量
-- grouping sets
select
order_id,
departure_date,
count(*) as cnt
from ord_test
where order_id=410341346
group by order_id,
departure_date
grouping sets (order_id,(order_id,departure_date))
;
---- 等价于以下
group by order_id
union all
group by order_id,departure_date
-- cube
select
order_id,
departure_date,
count(*) as cnt
from ord_test
where order_id=410341346
group by order_id,
departure_date
with cube
;
---- 等价于以下
select count(*) as cnt from ord_test where order_id=410341346
union all
group by order_id
union all
group by departure_date
union all
group by order_id,departure_date
-- rollup
select
order_id,
departure_date,
count(*) as cnt
from ord_test
where order_id=410341346
group by order_id,
departure_date
with rollup
;
---- 等价于以下
select count(*) as cnt from ord_test where order_id=410341346
union all
group by order_id
union all
group by order_id,departure_date
计算比当前小的百分比
使用场景:计算当前组内比你小的人数比例
cume_list: 小于等于当前值的行数/分组内总行数
percent_rank:分组内当前行的RANK值-1/分组内总行数-1
根窗口聚合函数使用相同