hive系列-2.hive自带的三种排序UDF

Hive自带的几种rank函数

综述

1.ROW_NUMBER() => 用于纯数据排序,相同值排序也不同
//全局唯一排序
2.DENSE_RANK() =>常搭配PARTITION BY x ORDER BY y用于分组取(唯一)数据
//连续排序
3.RANK() =>常用于类似成绩排名之列的场景
//跳跃排序,重复几次排序跳几个

具体样例

初始化

create table tmp.test_rank(id string,salary float);
insert into table tmp.test_rank select "1",24000;
insert into table tmp.test_rank select "1",24000;
insert into table tmp.test_rank select "1",25000;

ROW_NUMBER()

select *,row_number() over (order by salary) from tmp.test_rank;
+-----+----------+---------+
| id  |  salary  | _wcol0  |
+-----+----------+---------+
| 1   | 24000.0  | 1       |
| 2   | 24000.0  | 2       |
| 1   | 24000.0  | 3       |
| 1   | 25000.0  | 4       |
+-----+----------+---------+

DENSE_RANK()

select *,dense_rank() over (order by salary) from tmp.test_rank;
+-----+----------+---------+
| id  |  salary  | _wcol0  |
+-----+----------+---------+
| 1   | 24000.0  | 1       |
| 2   | 24000.0  | 1       |
| 1   | 24000.0  | 1       |
| 1   | 25000.0  | 2       |
+-----+----------+---------+

RANK()

select *,rank() over (order by salary) from tmp.test_rank;
+-----+----------+---------+
| id  |  salary  | _wcol0  |
+-----+----------+---------+
| 1   | 24000.0  | 1       |
| 2   | 24000.0  | 1       |
| 1   | 24000.0  | 1       |
| 1   | 25000.0  | 4       |
+-----+----------+---------+

你可能感兴趣的:(hive)