Hive分组TopN

People表明细如下:


image.png

需求:按照性别分组,求分组后年龄最大的两个年龄的人员信息

SELECT * FROM
(select id,name,age,sex,
row_number()  OVER(partition by sex order by age desc) r
 FROM safety.dw_hive_people_df) t
 WHERE r<=2
image.png

学生技能表如下


image.png

需求:按照技能分组,求分组后年龄最大的两个学生信息

先将skills炸开

SELECT id,name,age,skill,
row_number() over(partition by skill order by age desc ) r
 FROM safety.dw_hive_stu_skils_df 
LATERAL VIEW explode(split(skills,'-')) tb_view as skill
image.png

如果是SparkSQL可以直接写成

SELECT id,name,age,explode(split(skills,'-')) as skill
 FROM safety.dw_hive_stu_skils_df 

最终统计SQL为:

SELECT id,name,age,skill FROM 
(
SELECT id,name,age,skill,
row_number() over(partition by skill order by age desc ) r
 FROM safety.dw_hive_stu_skils_df 
LATERAL VIEW explode(split(skills,'-')) tb_view as skill
) t
where t.r<=2
image.png

你可能感兴趣的:(Hive分组TopN)