Employee
表包含所有员工信息,每个员工有其对应的工号 Id
,姓名 Name
,工资 Salary
和部门编号 DepartmentId
。
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
Department
表包含公司所有部门的信息。
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
编写一个 SQL 查询,找出每个部门获得前三高工资的所有员工。例如,根据上述给定的表,查询结果应返回:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Randy | 85000 |
| IT | Joe | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
解释:
IT 部门中,Max
获得了最高的工资,Randy
和 Joe
都拿到了第二高的工资,Will
的工资排第三。销售部门(Sales)只有两名员工,Henry
的工资最高,Sam
的工资排第二。
思路:
使用dense_rank函数,找到每个部门最高,然后取dense_rank<=3的结果即可。
代码-版本1:
大数据量时,要尽量避免通过salary这种数字进行表间联结,性能会很不可测。
# 击败74.59%
select
Department
, Employee
, Salary
from (
select
t2.name as department
, t1.name as employee
, t1.salary
, dense_rank() over (partition by departmentid order by salary desc) as rk
from employee as t1
inner join department as t2 on t1.departmentid = t2.id
) as t3
where rk <= 3;
代码-版本2:
可以先dense_rank,再join维度表(hive或spark里必要时进行map join),在分布式计算中,性能会高一些。
# more quicker,击败90.91%
select
t2.name AS Department
, t1.Employee
, t1.Salary
from (
select
DepartmentId,
name as employee,
salary,
dense_rank() over (partition by departmentid order by salary desc) as rnk
from employee
) t1
inner join department as t2 on t1.departmentid = t2.id
where t1.rnk <= 3;
思路:
S1:拆解问题,先把各个部门内薪水排名搞定。 根据部门升序、薪水降序的方式对员工记录进行排序,具体可以分为下面两种情况:
(1)本条记录与上条记录的部门ID相同,若薪水相同,排名不变;薪水不同,排名累加。
(2)本条记录与上条记录的部门ID不同,说明这是新部门的第一条记录,排名置1。
具体实现的代码如下:
SELECT
name
, Salary
, DepartmentId
, CASE
WHEN @preDeptId = DepartmentId AND @preSal = Salary THEN @rnk := @rnk
WHEN @preDeptId = DepartmentId AND @preSal != Salary THEN @rnk := @rnk + 1
WHEN @preDeptId != DepartmentId THEN @rnk := 1
Else @rnk := 1 # @preDeptId != DepartmentId, preDeptId为null时,结果为null
END AS RNK
, @preDeptId := DepartmentId
, @preSal := Salary
FROM Employee, (SELECT @preDeptId := null, @preSal := null, @rnk := 1) as init
ORDER BY DepartmentId, Salary DESC
S2:筛选记录,得到预期结果。 关联department表,得到部门名;再用rnk<3条件,得到排名前三的薪水
代码:
# 击败92.10%
SELECT t2.Name Department, t1.Name Employee, t1.Salary
FROM
(## 自定义变量RANK, 查找出 每个部门工资前三的排名
SELECT
name
, Salary
, DepartmentId
, CASE
WHEN @preDeptId = DepartmentId AND @preSal = Salary THEN @rnk := @rnk
WHEN @preDeptId = DepartmentId AND @preSal != Salary THEN @rnk := @rnk + 1
WHEN @preDeptId != DepartmentId THEN @rnk := 1
Else @rnk := 1 # @preDeptId != DepartmentId, preDeptId为null时,结果为null
END AS RNK
, @preDeptId := DepartmentId
, @preSal := Salary
FROM Employee, (SELECT @preDeptId := null, @preSal := null, @rnk := 1) as init
ORDER BY DepartmentId, Salary DESC
) as t1
INNER JOIN Department as t2 ON t1.DepartmentId = t2.Id
where t1.RNK <= 3;
思路:
我们先找出公司里前 3 高的薪水,意思是不超过三个值比这些值大,
SELECT e1.Salary
FROM Employee AS e1
WHERE 3 > (
SELECT count(DISTINCT e2.Salary)
FROM Employee AS e2
WHERE e1.Salary < e2.Salary AND e1.DepartmentId = e2.DepartmentId
);
举个栗子:
当 e1 = e2 = [4,5,6,7,8]
e1.Salary = 4,e2.Salary 可以取值 [5,6,7,8],count(DISTINCT e2.Salary) = 4
e1.Salary = 5,e2.Salary 可以取值 [6,7,8],count(DISTINCT e2.Salary) = 3
e1.Salary = 6,e2.Salary 可以取值 [7,8],count(DISTINCT e2.Salary) = 2
e1.Salary = 7,e2.Salary 可以取值 [8],count(DISTINCT e2.Salary) = 1
e1.Salary = 8,e2.Salary 可以取值 [],count(DISTINCT e2.Salary) = 0
最后 3 > count(DISTINCT e2.Salary),所以 e1.Salary 可取值为 [6,7,8],即集合前 3 高的薪水
再把表 Department 和表 Employee 连接,获得各个部门工资前三高的员工。
代码:
SELECT
t2.NAME AS Department
, t1.NAME AS Employee
, t1.Salary AS Salary
FROM
Employee AS t1, Department as t2
WHERE t1.DepartmentId = t2.Id
AND 3 > (
SELECT count(DISTINCT t3.Salary)
FROM Employee AS t3
WHERE t1.Salary < t3.Salary AND t1.DepartmentId = t3.DepartmentId
)
ORDER BY t2.NAME, Salary DESC;
1、185. 部门工资前三高的员工
2、使用dense_rank,更简洁的解;尽量避免基于salary这种数字型字段的连接(join,in,exists)
3、dense_rank()开窗函数,简洁清晰易懂 bit98%
4、MySQL 自定义变量解法