[HIVE] rank() dense_rank() row_number()的学习

在对数据进行汇总和分析的时候,经常需要用到排名相关的操作,下面是hive中经常用到的3个排名函数:
rank()
dense_rank()
row_number()

函数说明:
rank():在一组数据内按顺序显示排名顺序,值相同的情况下,排序数会重复,下个排序数会根据记录数接着排名。
dense_rank():在一组数据内按顺序显示排名顺序,值相同的情况下,排序数会重复,下个排序数会根据排名数接着排名。
row_number():在一组数据内按顺序显示排名顺序,值相同的情况下,排序数不会重复。

实际操作:
booksales表中有分类(catalog)字段,价格(price)字段,现在想查询一下每个分类(catalog)下价格(price)有高到低的排名情况

排名前数据查询:
hive> select * from booksales;
OK
bookid catalog price amount dt
B00000001 novel 54.0 5 20200718
B00000002 novel 33.0 9 20200718
B00000003 novel 54.0 11 20200718
B00000004 novel 59.0 10 20200718
B00000005 novel 53.0 4 20200718
B00000006 literature 24.0 10 20200718
B00000007 literature 35.0 8 20200718
B00000008 literature 35.0 4 20200718
B00000009 literature 35.0 7 20200718
B00000010 literature 35.0 9 20200718
B00000011 management 84.0 6 20200718
B00000012 management 73.0 7 20200718
B00000013 management 84.0 5 20200718
B00000014 management 52.0 11 20200718
B00000015 management 87.0 11 20200718
B00000016 management 45.0 7 20200718
B00000017 social 39.0 1 20200718
B00000018 social 16.0 9 20200718
B00000019 social 85.0 9 20200718
B00000020 social 92.0 4 20200718

排名后数据查询:
hive> select
> *
> ,rank() over (partition by catalog order by price desc) price_rank
> ,dense_rank() over (partition by catalog order by price desc) price_dense_rank
> ,row_number() over (partition by catalog order by price desc) price_row_number
> from booksales;
OK
bookid catalog price amount dt price_rank price_dense_rank price_row_number
B00000008 literature 35.0 4 20200718 1 1 1
B00000007 literature 35.0 8 20200718 1 1 2
B00000010 literature 35.0 9 20200718 1 1 3
B00000009 literature 35.0 7 20200718 1 1 4
B00000006 literature 24.0 10 20200718 5 2 5
B00000015 management 87.0 11 20200718 1 1 1
B00000013 management 84.0 5 20200718 2 2 2
B00000011 management 84.0 6 20200718 2 2 3
B00000012 management 73.0 7 20200718 4 3 4
B00000014 management 52.0 11 20200718 5 4 5
B00000016 management 45.0 7 20200718 6 5 6
B00000004 novel 59.0 10 20200718 1 1 1
B00000001 novel 54.0 5 20200718 2 2 2
B00000003 novel 54.0 11 20200718 2 2 3
B00000005 novel 53.0 4 20200718 4 3 4
B00000002 novel 33.0 9 20200718 5 4 5
B00000020 social 92.0 4 20200718 1 1 1
B00000019 social 85.0 9 20200718 2 2 2
B00000017 social 39.0 1 20200718 3 3 3
B00000018 social 16.0 9 20200718 4 4 4

你可能感兴趣的:([HIVE] rank() dense_rank() row_number()的学习)