- 薪水表中是员工薪水的基本信息,包括雇员编号,部门编号和薪水
- 第1行表示雇员编号为10001的员工在1号部门,薪水为60117元;
- 第2行表示雇员编号为10002的员工在2号部门,薪水为92102元;
- ...
- 第10行表示雇员编号为10010的员工在1号部门,薪水为76884元
问题:查询每个部门除去最高、最低薪水后的平均水平,并保留整数
create table if not exists salary
(
emp_num string comment '雇员编号',
dep_num string comment '部门编号',
salary string comment '薪水'
) comment '薪水表';
insert overwrite table salary
values ('10001','1','60117'),
('10002','2','92102'),
('10003','2','86074'),
('10004','1','66596'),
('10005','1','66961'),
('10006','2','81046'),
('10007','2','94333'),
('10008','1','75286'),
('10009','2','85994'),
('10010','1','76884');
根据题意进行拆解,有三个关键点信息:
1.每个部门(按照部门分组)
2.除去最高、最低薪水(需要先求出最高,最低的薪水,并进行过滤)
3.在上述两步的基础上,求解平均值,并保留整数
方式一:找出部门中最高、最低薪水并进行过滤。采用排序函数row_number()进行标记
select
emp_num,
dep_num,
salary,
row_number() over (partition by dep_num order by salary) as rn1,
row_number() over (partition by dep_num order by salary desc ) as rn2
from salary
根据需求,过滤出中间结果,即rn1 >1 和 rn2 >1 同时成立,最终的sql如下:
select *
from (
select
emp_num,
dep_num,
salary,
row_number() over (partition by dep_num order by salary) as rn1,
row_number() over (partition by dep_num order by salary desc ) as rn2
from salary
) tmp1
where rn1 > 1
and rn2 > 1;
方式二:为了过滤出最大,最小值,可以利用row_number()只进行 一次排序。先求出分组的总行数cnt,再对组内的salary进行升序排序,需要得到的中间结果只需要满足: 1< rn < cnt ,sql如下:
select *
from (
select
emp_num,
dep_num,
salary,
--分组的 总行数
count(1) over (partition by dep_num) as cnt,
--组内对salary 字段升序排序
row_number() over (partition by dep_num order by salary) as rn
from salary
) tmp1
where rn > 1
and rn < cnt;
综上,最终平均值的sql为:
--方法一:
select
dep_num,
round(avg(salary), 0) as avg
from (
select
emp_num,
dep_num,
salary,
row_number() over (partition by dep_num order by salary) as rn1,
row_number() over (partition by dep_num order by salary desc ) as rn2
from salary
) tmp1
where rn1 > 1
and rn2 > 1
group by dep_num;
--方法二
select
dep_num,
round(avg(salary), 0) as avg
from (
select
emp_num,
dep_num,
salary,
row_number() over (partition by dep_num order by salary) as rn,
count(1) over(partition by dep_num) as cnt
from salary
) tmp1
where rn > 1
and rn < cnt
group by dep_num;
本案例使用的知识点总结:
(1)排名函数row_number()使用
(2) 排除最大、最小值的方式
注:文章参考:HiveSql面试题12--如何分析去掉最大最小值的平均薪水(字节跳动)_hive 去除最大5%-CSDN博客文章浏览阅读2k次,点赞5次,收藏23次。0 问题描述薪水表中是员工薪水的基本信息,包括雇员编号,部门编号和薪水第1行表示雇员编号为10001的员工在1号部门,薪水为60117元;第2行表示雇员编号为10002的员工在2号部门,薪水为92102元;...第10行表示雇员编号为10010的员工在1号部门,薪水为76884元问题:查询每个部门除去最高、最低薪水后的平均薪水,并保留整数。1 数据准备(1)数据基本数据雇员编号部门编号薪水1000116011710002._hive 去除最大5%https://blog.csdn.net/godlovedaniel/article/details/112372060?spm=1001.2014.3001.5501