hive--TopN 案例

题目要求:

现在有这样一份数据:
1,huangxiaoming,45,a-c-d-f
2,huangzitao,36,b-c-d-e
3,huanglei,41,c-d-e
4,liushishi,22,a-d-e
5,liudehua,39,e-f-d
6,liuyifei,35,a-d-e

字段的意义:
id,name,age,favors
id,姓名,年龄,爱好

其中需要注意的是:每一条记录中的爱好有多个值,以"-"分隔

需求:
求出每种爱好中,年龄最大的两个人(爱好,年龄,姓名)

解题步骤:

1) 通过 explode() 函数将一条数据转为多行数据。

select id,name,age,h from hobby lateral view explode(split(hobby,"-")) hob as h;  
+-----+----------------+------+----+
| id  |      name      | age  | h  |
+-----+----------------+------+----+
| 1   | huangxiaoming  | 45   | a  |
| 1   | huangxiaoming  | 45   | c  |
| 1   | huangxiaoming  | 45   | d  |
| 1   | huangxiaoming  | 45   | f  |
| 2   | huangzitao     | 36   | b  |
| 2   | huangzitao     | 36   | c  |
| 2   | huangzitao     | 36   | d  |
| 2   | huangzitao     | 36   | e  |
| 3   | huanglei       | 41   | c  |
| 3   | huanglei       | 41   | d  |
| 3   | huanglei       | 41   | e  |
| 4   | liushishi      | 22   | a  |
| 4   | liushishi      | 22   | d  |
| 4   | liushishi      | 22   | e  |
| 5   | liudehua       | 39   | e  |
| 5   | liudehua       | 39   | f  |
| 5   | liudehua       | 39   | d  |
| 6   | liuyifei       | 35   | a  |
| 6   | liuyifei       | 35   | d  |
| 6   | liuyifei       | 35   | e  |
+-----+----------------+------+----+

2) 用开窗函数 row_number( ) over( ),按  h【爱好】分区,按age 排序,为每一行添加行号。

为了方便查看此处创建中间表hobby_bak:
 

create table hobby_bak as 
select tt.id id,tt.name name,tt.age age,tt.h h,
row_number()over(partition by h order by age desc) as rownum
from
(select id,name,age,h from hobby 
 lateral view explode(split(hobby,"-")) hob as h
) as tt);

结果如下:

+---------------+-----------------+----------------+--------------+-------------------+
| hobby_bak.id  | hobby_bak.name  | hobby_bak.age  | hobby_bak.h  | hobby_bak.rownum  |
+---------------+-----------------+----------------+--------------+-------------------+
| 1             | huangxiaoming   | 45             | a            | 1                 |
| 6             | liuyifei        | 35             | a            | 2                 |
| 4             | liushishi       | 22             | a            | 3                 |
| 2             | huangzitao      | 36             | b            | 1                 |
| 1             | huangxiaoming   | 45             | c            | 1                 |
| 3             | huanglei        | 41             | c            | 2                 |
| 2             | huangzitao      | 36             | c            | 3                 |
| 1             | huangxiaoming   | 45             | d            | 1                 |
| 3             | huanglei        | 41             | d            | 2                 |
| 5             | liudehua        | 39             | d            | 3                 |
| 2             | huangzitao      | 36             | d            | 4                 |
| 6             | liuyifei        | 35             | d            | 5                 |
| 4             | liushishi       | 22             | d            | 6                 |
| 3             | huanglei        | 41             | e            | 1                 |
| 5             | liudehua        | 39             | e            | 2                 |
| 2             | huangzitao      | 36             | e            | 3                 |
| 6             | liuyifei        | 35             | e            | 4                 |
| 4             | liushishi       | 22             | e            | 5                 |
| 1             | huangxiaoming   | 45             | f            | 1                 |
| 5             | liudehua        | 39             | f            | 2                 |
+---------------+-----------------+----------------+--------------+-------------------+

3) 取行号小于3的记录。

select * from hobby_bak where rownum<3;
+---------------+-----------------+----------------+--------------+-------------------+
| hobby_bak.id  | hobby_bak.name  | hobby_bak.age  | hobby_bak.h  | hobby_bak.rownum  |
+---------------+-----------------+----------------+--------------+-------------------+
| 1             | huangxiaoming   | 45             | a            | 1                 |
| 6             | liuyifei        | 35             | a            | 2                 |
| 2             | huangzitao      | 36             | b            | 1                 |
| 1             | huangxiaoming   | 45             | c            | 1                 |
| 3             | huanglei        | 41             | c            | 2                 |
| 1             | huangxiaoming   | 45             | d            | 1                 |
| 3             | huanglei        | 41             | d            | 2                 |
| 3             | huanglei        | 41             | e            | 1                 |
| 5             | liudehua        | 39             | e            | 2                 |
| 1             | huangxiaoming   | 45             | f            | 1                 |
| 5             | liudehua        | 39             | f            | 2                 |
+---------------+-----------------+----------------+--------------+-------------------+

 

 

你可能感兴趣的:(hive)