Hive和MySQL分组排序取前n条记录

原文地址:http://chenxiaoqiong.com/articles/hiverownum/

实例需求

有一张职员信息表test(id,姓名,部门,入职时间),要求:查询出各部门最早入职的职员信息,表数据如下图:
Hive和MySQL分组排序取前n条记录_第1张图片

Hive实现

语法:row_number() over (partition BY 字段A order by 字段B desc)

partition by:类似hive的建表,分区的意思;

order by :排序,默认是升序,加desc降序;

这里按字段A分区,对字段B进行降序排序


引用上面语法(按部门分组,按时间正序排序)查询row_num,
查询语句:
SELECT *,row_number() over (partition BY dept ORDER BY time ASC) AS row_num FROM test;
查询结果:
Hive和MySQL分组排序取前n条记录_第2张图片
最后取row_num为1的数据就是想要的结果
完整查询语句:
SELECT name,dept,time FROM (
SELECT *,row_number() over (partition BY dept ORDER BY time ASC) AS row_num FROM test ) AS test
WHERE row_num<=1
最后结果:
Hive和MySQL分组排序取前n条记录_第3张图片

MySQL实现

我看到一些博文说mysql同样支持(partition BY 字段A order by 字段B desc),但我测试结果是:
[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(partition BY dept ORDER BY time ASC) AS row_num FROM test’ at line 1
下面我说一说我是怎么用mysql实现上面的需求的

  1. 实现row_num
    查询语句:
    set @rownum=0;
    SELECT *,@rownum:=@rownum+1 as row_num FROM test ORDER BY time;
    查询结果:
    Hive和MySQL分组排序取前n条记录_第4张图片
  2. 按部门分组并取row_num最小的一条记录
    为了看起来比较清晰,我们将上述的查询结果存入表test_row,即
    完整查询语句:
    set @rownum=0;
    CREATE TABLE test_row AS SELECT *,@rownum:=@rownum+1 as row_num FROM test ORDER BY time;
    SELECT name,dept,time FROM test_row WHERE row_num in (SELECT min(row_num) FROM test_row GROUP BY dept);
    查询结果:

你可能感兴趣的:(hadoop)