Hive学习(二)窗口函数

1、首先使用之前的数据,查找订单信息要求按用户id分组,消费金额进行降序排序。

select * 
FROM 
( select id, order_money, member_id, create_time, row_number() over(partition by member_id ORDER BY order_money desc) as row_num
from dm_order_2018
) t;

结果:


Hive学习(二)窗口函数_第1张图片
image.png

2、窗口命名

select *
FROM
( select id,order_money,member_id,create_time,
              row_number() over w as row_num
      from dm_order_2018
window w as (partition by member_id order by order_money desc)
) t;

查询结果一样

3、窗口滑动示例

select * 
FROM 
( select id, order_money, member_id, create_time, 
            avg(order_money) over w as avg_num,
            sum(order_money) over w as sum_num
    from dm_order_2018
window w as (partition by member_id ORDER BY create_time DESC rows BETWEEN 1 preceding AND 1 following)
) t;

结果:


Hive学习(二)窗口函数_第2张图片
image.png

从结果可以看出,id为A001订单属于边界值,没有前一行,因此平均订单金额为(100.15+100.15)/2=100.15;
第二行id为A001的订 单前后都有订单,所以平均订单金额为(100.15+100.15+88.15/3=96.15;
以此类推就可以得到一个基于滑动窗口的动态 平均订单值。此例中,窗口函数用到了传统的聚合函数avg(),用来计算动态的平均值。

4、序号函数
查询排序小于等于3的订单

select * 
FROM 
( select id, order_money, member_id, create_time, 
            row_number() over(partition by member_id ORDER BY order_money desc) as row_num
    from dm_order_2018
) t where row_num <= 3;

结果


Hive学习(二)窗口函数_第3张图片
image.png

多种排序方式

select * 
FROM 
( select id, order_money, member_id, create_time, 
            row_number() over(partition by member_id ORDER BY order_money desc) as row_num1,
            rank() over(partition by member_id ORDER BY order_money desc) as row_num2,
            dense_rank() over(partition by member_id ORDER BY order_money desc) as row_num3
    from dm_order_2018
) t;

结果


Hive学习(二)窗口函数_第4张图片
image.png

总结:
即rank()会产生序号相同的记录,同时可能产生序号间隙;而dense_rank()也会产生序号相同的记录,但不会产生序号间隙。

5、分布函数

select * 
FROM 
( select id, order_money, member_id, create_time, 
            row_number() over w as row_num,
            percent_rank() over w as per
    from dm_order_2018
    window w as (partition by member_id ORDER BY create_time DESC)
) t;

结果


Hive学习(二)窗口函数_第5张图片
image.png
select * 
FROM 
( select id, order_money, member_id, create_time, 
            row_number() over w as row_num,
            cume_dist() over w as percent
    from dm_order_2018
    window w as (partition by member_id ORDER BY create_time DESC)
) t;

结果


Hive学习(二)窗口函数_第6张图片
image.png

总结:
CUME_DIST 小于等于当前值的行数/分组内总行数。
PERCENT_RANK 分组内当前行的RANK值-1/分组内总行数-1。

你可能感兴趣的:(Hive学习(二)窗口函数)