1. 首先是mysql实现分组排序
如下scores表记录了某次考试各班级学生的成绩:
id是学号(主键),class是班级,Chinese、math、English分别为语文数学英语的成绩:
如何查询出每个班级语文成绩前三名的同学记录?
先上一下sql
SELECT
a.`id`,
a.class,
a.`name`,
a.chinese,
count(b.`id`) + 1 as rank_num
from scores a
left join scores b on a.class = b.class
and a.chinese < b.chinese
group by a.`id`,
a.class,
a.`name`,
a.chinese
HAVING rank_num <=3
order by a.class asc,a.chinese desc
结果下如图:
可以看到,rank_num实现了按照班级对语文成绩的降序排列,下面分步解释:
1.1 把scores表与它自己left Join,会产生以下结果
SELECT
a.*,
b.*
from scores a
left join scores b on a.class = b.class
and a.chinese < b.chinese
where a.class = 215 -- 方便说明,只选取一个班级
order by a.chinese desc,a.name
可以看到,语文成绩最高的胡贵波没有关联到b表的记录,因为有a.chinese < b.chinese限制,语文成绩第二的马雨韵只关联到了第一的胡贵波,而第三的丰雪关联到了前两位,
那么此时对b表的id进行count,就会得出一个rank数字
SELECT
a.`id`,
a.class,
a.`name`,
a.chinese,
count(b.`id`) as rank_nnum
from scores a
left join scores b on a.class = b.class
and a.chinese < b.chinese
where a.class = 215 -- 方便说明,只选取一个班级
group by a.`id`,
a.class,
a.`name`,
a.chinese
order by a.chinese desc,a.name
此时,rank已对语文成绩做出了排名,按照习惯没有第 0 名,所以rank_num加个1,也就是count(b.
id
) + 1,
所以,最终sql是:
SELECT
a.`id`,
a.class,
a.`name`,
a.chinese,
count(b.`id`) + 1 as rank_num
from scores a
left join scores b on a.class = b.class
and a.chinese < b.chinese
group by a.`id`,
a.class,
a.`name`,
a.chinese
HAVING rank_num <=3
order by a.class asc,a.chinese desc
2. Hive实现分组排序
a 表如下,id为人员id,日期为签到日期,取每个用户前两天的记录
id | date |
---|---|
1 | 2017-01-01 |
1 | 2017-01-02 |
1 | 2017-01-03 |
2 | 2017-01-01 |
2 | 2017-01-02 |
2 | 2017-01-03 |
3 | 2017-01-01 |
3 | 2017-01-02 |
4 | 2017-01-01 |
在hive里分组排序可以使用 row_number() over(partition by 列名1 order by 列名2)
select
id,
date,
row_number() over(partition by id order by date) as rank
from a
结果如下:
id | date | rank |
---|---|---|
1 | 2017-01-01 | 1 |
1 | 2017-01-02 | 2 |
1 | 2017-01-03 | 3 |
2 | 2017-01-01 | 1 |
2 | 2017-01-02 | 2 |
2 | 2017-01-03 | 3 |
在此基础上对rank进行筛选就可以得出想要的结果了