Hive窗口函数

原始表数据:

Hive窗口函数_第1张图片

 

 

 

结果如下:
hive> select 
    >     id,
    >     name,
    >     score,
    >     row_number() over(partition by name order by score desc) as r1,
    >     dense_rank() over(partition by name order by score desc) as d1,
    >     rank() over(partition by name order by score desc) as r2
    >     from  top_tmp;
Query ID = nwom_20181128095353_09418b0c-77dc-4332-95a4-810c49d01ac7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1536918059760_23105, Tracking URL = http://gxmaster01:8088/proxy/application_1536918059760_23105/
Kill Command = /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/bin/hadoop job  -kill job_1536918059760_23105
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2018-11-28 11:00:44,837 Stage-1 map = 0%,  reduce = 0%
2018-11-28 11:00:51,086 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.24 sec
2018-11-28 11:00:55,253 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.66 sec
2018-11-28 11:01:01,497 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.8 sec
MapReduce Total cumulative CPU time: 5 seconds 800 msec
Ended Job = job_1536918059760_23105
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 5.8 sec   HDFS Read: 12325 HDFS Write: 403 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 800 msec
OK
2       a       32      1       1       1
2       a       32      2       1       1
2       a       32      3       1       1
2       a       27      4       2       4
2       a       27      5       2       4
2       a       27      6       2       4
2       a       20      7       3       7
3       b       26      1       1       1
3       b       26      2       1       1
3       b       26      3       1       1
3       b       23      4       2       4
3       b       19      5       3       5
3       b       19      6       3       5
3       b       19      7       3       5
3       b       16      8       4       8
4       e       75      1       1       1
4       e       71      2       2       2
4       e       71      3       2       2
4       e       71      4       2       2
4       e       27      5       3       5
4       e       27      6       3       5
4       e       27      7       3       5
4       e       22      8       4       8
5       f       56      1       1       1
5       f       50      2       2       2
5       f       50      3       2       2
5       f       50      4       2       2
5       f       32      5       3       5
5       f       30      6       4       6
5       f       30      7       4       6
5       f       30      8       4       6

Hive窗口函数_第2张图片

可以发现:
row_number over(partition by 字段)    会正常依次排序
dense_rank() over(partition by)       排序的同时会重复,就是排名相同的名次都是一样的
rank over(partition by)               排序的同时会重复,就是排名相同的名次都是一样的,后面的会发生跳跃

你可能感兴趣的:(hadoop)