SQL Leetcode练习题

文章目录

  • 第二高的薪水
    • 错误
    • 正确
      • ifnull
      • limit offset
  • 第 n 高的薪水 - 方程传参
  • 连续出现N次-lag() over()
    • 错误
    • 正确
  • 工资超过其经理-巧用内链接与笛卡尔积
  • 分组最高
  • 上升的温度
    • lag
    • cross join
  • 行程和用户!!!
  • 删除重复的电子邮箱
  • 游戏玩法分析
    • 1:基础解法
    • 2:联合查询
    • 3:累积问题
      • 窗口函数
      • 自连接
    • 4:留存
    • 游戏玩法分析 IV
  • 员工薪水中位数 !!!
    • 方法一: 利用 中位数 的定义
      • SIGN
      • any_value
    • 方法二:mod/ceil
        • mod
    • 方法三: 排序后再找中位数
    • 方法四: 变量
  • 给定数字的频率查询中位数
    • step1 - 错位相加求和
    • step2
    • 变量
  • 员工奖金
  • 查询回答率最高的问题
  • 查询员工的累计薪水
    • way1
    • way2 - over(desc rows between current row and 2 following)
      • way2.1
      • way2.2
  • 当选者
  • 朋友关系列表: Friendship

第二高的薪水

编写一个 SQL 查询,获取 Employee 表中第二高的薪水(Salary) 。

Id Salary
1 100
2 200
3 300

例如上述 Employee 表,SQL查询应该返回 200 作为第二高的薪水。如果不存在第二高的薪水,那么查询应返回 null。

错误

select salary as SecondHighestSalary
from 
(select salary,
dense_rank() over(order by salary desc) as ranks
from Employee) internb_t
where ranks =2

正确

ifnull

select ifnull((select distinct salary as SecondHighestSalary
from 
(select salary,
dense_rank() over(order by salary desc) as ranks
from Employee) internb_t
where ranks =2),null) as SecondHighestSalary

limit offset

LIMIT 4 OFFSET 3指示MySQL等DBMS返回从第3行(从0行计数)起的4行数据。第一个数字是检索的行数,第二个数字是指从哪儿开始。

MySQL和MariaDB支持简化版的LIMIT 4 OFFSET 3语句,即LIMIT 3,4。使用这个语法, ,之前的值对应OFFSET, ,之后的值对应LIMIT 。

limit10000,20的意思扫描满足条件的10020行,扔掉前面的10000行,返回最后的20行.

SELECT
    IFNULL(
      (SELECT DISTINCT Salary
       FROM Employee
       ORDER BY Salary DESC
        LIMIT 1 OFFSET 1),
    NULL) AS SecondHighestSalary

第 n 高的薪水 - 方程传参

编写一个 SQL 查询,获取 Employee 表中第 n 高的薪水(Salary)。

±—±-------+
| Id | Salary |
±—±-------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
±—±-------+
例如上述 Employee 表,n = 2 时,应返回第二高的薪水 200。如果不存在第 n 高的薪水,那么查询应返回 null。

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  RETURN (
      # Write your MySQL query statement below.
	select ifnull((select distinct salary as SecondHighestSalary
			from 
				(select salary,
				dense_rank() over(order by salary desc) as ranks
				from Employee) internb_t
				where ranks =N),null) 
			  );
END

连续出现N次-lag() over()

编写一个 SQL 查询,查找所有至少连续出现三次的数字。

±—±----+
| Id | Num |
±—±----+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
±—±----+

错误

select distinct num from
(select num,
lag(num, 1) OVER(ORDER BY num) as num1,
lag(num, 2) OVER(ORDER BY num) as num2
from Logs) final_t
where num = num1
and num = num2

输出
{“headers”: [“num”], “values”: [[1], [2]]}
预期结果
{“headers”: [“ConsecutiveNums”], “values”: [[1]]}

select num,
lag(num, 1) OVER(ORDER BY num) as num1,
lag(num, 2) OVER(ORDER BY num) as num2
from Logs

输出
{“headers”: [“num”, “num1”, “num2”],
“values”:
[[1, null, null], [1, 1, null], [1, 1, 1], [1, 1, 1], [2, 1, 1], [2, 2, 1], [2, 2, 2]]}

正确

select num,
-- 重点!!!order by ID
lag(num, 1) OVER(ORDER BY Id) as num1,
lag(num, 2) OVER(ORDER BY Id) as num2
from Logs
select distinct num from
(select num,
lag(num, 1) OVER(ORDER BY Id) as num1,
lag(num, 2) OVER(ORDER BY Id) as num2
from Logs) final_t
where num = num1
and num = num2

工资超过其经理-巧用内链接与笛卡尔积

Employee 表包含所有员工,他们的经理也属于员工。每个员工都有一个 Id,此外还有一列对应员工的经理的 Id。

±—±------±-------±----------+
| Id | Name | Salary | ManagerId |
±—±------±-------±----------+
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | NULL |
| 4 | Max | 90000 | NULL |
±—±------±-------±----------+
给定 Employee 表,编写一个 SQL 查询,该查询可以获取收入超过他们经理的员工的姓名。在上面的表格中,Joe 是唯一一个收入超过他的经理的员工。

select t1.Name as Employee
from Employee t1 left join Employee t2
on t1.ManagerId = t2.Id
where t1.Salary > t2.Salary

分组最高

Employee 表包含所有员工信息,每个员工有其对应的 Id, salary 和 department Id。

±—±------±-------±-------------+
| Id | Name | Salary | DepartmentId |
±—±------±-------±-------------+
| 1 | Joe | 70000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
±—±------±-------±-------------+
Department 表包含公司所有部门的信息。

±—±---------+
| Id | Name |
±—±---------+
| 1 | IT |
| 2 | Sales |
±—±---------+
编写一个 SQL 查询,找出每个部门工资最高的员工。例如,根据上述给定的表格,Max 在 IT 部门有最高工资,Henry 在 Sales 部门有最高工资。

select Department.Name as Department,t1.Name as Employee,Salary
from 
(select Name,Salary,DepartmentId,
max(Salary) over(partition by DepartmentId) as maxsalary
from Employee) t1
inner join 
Department
on t1.DepartmentId = Department.Id
where Salary = maxsalary

上升的温度

给定一个 Weather 表,编写一个 SQL 查询,来查找与之前(昨天的)日期相比温度更高的所有日期的 Id。

±--------±-----------------±-----------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
±--------±-----------------±-----------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
±--------±-----------------±-----------------+

lag

select Id from
(
select Id,RecordDate as date1,Temperature as temp1,
lag(RecordDate, 1) OVER(ORDER BY Id) as date2,
lag(Temperature,1) OVER(ORDER BY Id) as temp2
from  Weather 
)intertt
-- 注意挪动之后谁是第一天,谁是第二天
where datediff(date1,date2)=1
and temp2<temp1

cross join

不用写on哪个key

-- 注意是t2的key蛤~
select t2.Id as Id
from Weather t1 cross join Weather t2
where datediff(t2.RecordDate,t1.RecordDate) = 1
and t2.Temperature > t1.Temperature

行程和用户!!!

Trips 表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。 Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。
±—±----------±----------±--------±-------------------±---------+
| Id | Client_Id | Driver_Id | City_Id | Status |Request_at|
±—±----------±----------±--------±-------------------±---------+
| 1 | 1 | 10 | 1 | completed |2013-10-01|
| 2 | 2 | 11 | 1 | cancelled_by_driver|2013-10-01|
| 3 | 3 | 12 | 6 | completed |2013-10-01|
| 4 | 4 | 13 | 6 | cancelled_by_client|2013-10-01|
| 5 | 1 | 10 | 1 | completed |2013-10-02|
| 6 | 2 | 11 | 6 | completed |2013-10-02|
| 7 | 3 | 12 | 6 | completed |2013-10-02|
| 8 | 2 | 12 | 12 | completed |2013-10-03|
| 9 | 3 | 10 | 12 | completed |2013-10-03|
| 10 | 4 | 13 | 12 | cancelled_by_driver|2013-10-03|
±—±----------±----------±--------±-------------------±---------+

Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。
±---------±-------±-------+
| Users_Id | Banned | Role |
±---------±-------±-------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
±---------±-------±-------+

写一段 SQL 语句查出 2013年10月1日 至 2013年10月3日 期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。

取消率的计算方式如下:(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)

±-----------±------------------+
| Day | Cancellation Rate |
±-----------±------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
±-----------±------------------+

IF(t.`Status` = 'completed', 1, NULL)
解题思路:首先筛选出所有的非禁止订单,再进行组内频率计算。
select request_at as day,
round(sum(case when status!="completed" then 1 else 0 end)/count(status),2) as "cancellation rate"
from
--the table for all non banned trips
(
select *
from trips
where client_id in (select users_id from users where banned="no" and role="client")
and driver_id in (select users_id from users where banned="no" and role="driver")
and request_at between "2013-10-01" and "2013-10-03"
) as non_banned_trips
group by day

select 
-- count(distinct case when Status='cancelled_by_client' then Client_Id when Status='cancelled_by_client' then Client_Id else null end) 
-- 用case when的时候,用or或者两个when then都是可以滴~
count(distinct case when Status='cancelled_by_client' or Status='cancelled_by_client' then Client_Id else null end) / 
count(distinct Client_Id)
from Trips 
where Client_Id in 
(-- 非禁止用户
-- 问题是这里的userid没有和clientid和driverid相连接,所以没有保证client和driver都不是被ban的
select Users_Id from Users
where Banned != 'No')

删除重复的电子邮箱

编写一个 SQL 查询,来删除 Person 表中所有重复的电子邮箱,重复的邮箱里只保留 Id 最小 的那个。

±—±-----------------+
| Id | Email |
±—±-----------------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
±—±-----------------+
Id 是这个表的主键。

DELETE p1 FROM Person p1,
    Person p2
WHERE
    p1.Email = p2.Email AND p1.Id > p2.Id

有慢查询优化经验的同学会清楚,在实际生产中,面对千万上亿级别的数据,连接的效率往往最高,因为用到索引的概率较高。

在DELETE官方文档中,给出了这一用法,比如下面这个DELETE语句

DELETE t1 FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;

这种DELETE方式很陌生,竟然和SELETE的写法类似。它涉及到t1和t2两张表,DELETE t1表示要删除t1的一些记录,具体删哪些,就看WHERE条件,满足就删;

这里删的是t1表中,跟t2匹配不上的那些记录。

游戏玩法分析

活动表 Activity:

±-------------±--------+
| Column Name | Type |
±-------------±--------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
±-------------±--------+
表的主键是 (player_id, event_date)。
这张表展示了一些游戏玩家在游戏平台上的行为活动。
每行数据记录了一名玩家在退出平台之前,当天使用同一台设备登录平台后打开的游戏的数目(可能是 0 个)。

1:基础解法

写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期

查询结果的格式如下所示:

Activity 表:
±----------±----------±-----------±-------------+
| player_id | device_id | event_date | games_played |
±----------±----------±-----------±-------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |

# Write your MySQL query statement below
select player_id,first_login from
(select player_id,event_date as first_login,
-- 第一次!所以应该是asc生序排列
row_number() over(partition by player_id order by event_date asc ) as rank1
from Activity) test_table
where rank1 = 1
select player_id, min(event_date) as first_login 
from Activity 
group by player_id 
order by player_id asc

2:联合查询

请编写一个 SQL 查询,描述每一个玩家首次登陆的设备名称

select player_id, device_id
from activity
where (player_id, event_date) in 
(select player_id, min(event_date)
from activity
group by player_id)

3:累积问题

编写一个 SQL 查询,同时报告每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。详细情况请查看示例。

窗口函数

select player_id,event_date,
-- 相当于累计加和
sum(games_played) over(partition by player_id order by event_date asc ) as games_played_so_far
from Activity

自连接

select 
    a1.player_id,
    a1.event_date,
    sum(a2.games_played) games_played_so_far
from Activity a1,Activity a2
where a1.player_id=a2.player_id and 
      a1.event_date>=a2.event_date
group by 1,2;

4:留存

编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的分数,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。

select round(count(distinct (case when datediff(date2,date1)=1 then id else null end))/count(distinct id),2) as fraction
from
(select t1.player_id as id,
t1.event_date as date1,t2.event_date as date2
from 
(select player_id,min(event_date) as event_date
from Activity 
group by player_id) t1 cross join Activity t2) ttt
-- sum!!!
select sum(case when datediff(a.event_date,b.first_date)=1 then 1 else 0 end),(select count(distinct(player_id)) from activity)
from activity a,
(select player_id,min(event_date) first_date from activity group by player_id) b
where a.player_id=b.player_id

游戏玩法分析 IV

Table: Activity

±-------------±--------+
| Column Name | Type |
±-------------±--------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
±-------------±--------+
(player_id,event_date)是此表的主键。
这张表显示了某些游戏的玩家的活动情况。
每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)。

编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的分数,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。

查询结果格式如下所示:

Activity table:
±----------±----------±-----------±-------------+
| player_id | device_id | event_date | games_played |
±----------±----------±-----------±-------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
±----------±----------±-----------±-------------+

Result table:
±----------+
| fraction |
±----------+
| 0.33 |
±----------+
只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33

-- ~计算留存率
select round(count(distinct (case when datediff(date2,date1)=1 then id else null end))/count(distinct id),2) as fraction
from
(select t1.player_id as id,
t1.event_date as date1,t2.event_date as date2
from 
-- 找出首次登录日期
(select player_id,min(event_date) as event_date
from Activity 
group by player_id) t1 left join Activity t2
on t1.player_id = t2.player_id) ttt

员工薪水中位数 !!!

Employee 表包含所有员工。Employee 表有三列:员工Id,公司名和薪水。

±----±-----------±-------+
|Id | Company | Salary |
±----±-----------±-------+
|1 | A | 2341 |
|2 | A | 341 |
|3 | A | 15 |
|4 | A | 15314 |
|5 | A | 451 |
|6 | A | 513 |
|7 | B | 15 |
|8 | B | 13 |
|9 | B | 1154 |
|10 | B | 1345 |
|11 | B | 1221 |
|12 | B | 234 |
|13 | C | 2345 |
|14 | C | 2645 |
|15 | C | 2645 |
|16 | C | 2652 |
|17 | C | 65 |
±----±-----------±-------+
请编写SQL查询来查找每个公司的薪水中位数。挑战点:你是否可以在不使用任何内置的SQL函数的情况下解决此问题。

预期结果:
±----±-----------±-------+
|Id | Company | Salary |
±----±-----------±-------+
|5 | A | 451 |
|6 | A | 513 |
|12 | B | 234 |
|9 | B | 1154 |
|14 | C | 2645 |
±----±-----------±-------+
当只取一个或两个中位数时,即如果是偶数个则取两个中位数:

思路

对于一个 奇数 长度数组中的 中位数,大于这个数的数值个数等于小于这个数的数值个数。

算法

根据上述的定义,我们来找一下 [1, 3, 2] 中的中位数。首先 1 不是中位数,因为这个数组有三个元素,却有两个元素 (3,2) 大于 1。3 也不是中位数,因为有两个元素小于 3。对于最后一个 2 来说,大于 2 和 小于 2 的元素数量是相等的,因此 2 是当前数组的中位数。

当数组长度为 偶数,且元素唯一时,中位数等于排序后 中间两个数 的平均值。
当数组长度为 奇数对这两个数来说,大于当前数的数值个数跟小于当前数的数值个数绝对值之差为 1,恰好等于这个数出现的频率。

总的来说,不管是数组长度是奇是偶,也不管元素是不是唯一,中位数出现的频率一定大于等于大于它的数和小于它的数的绝对值之差。这个规律是这道题的关键,可以通过下面这个搜索条件来过滤。

SELECT
    Employee.Id, Employee.Company, Employee.Salary
FROM
    Employee,
    Employee alias
WHERE
    Employee.Company = alias.Company
GROUP BY  Employee.Id,Employee.Company , Employee.Salary

{“headers”: [“Id”, “Company”, “Salary”],
“values”: [[6, “A”, 513], [5, “A”, 451], [4, “A”, 15314], [3, “A”, 15], [2, “A”, 341], [1, “A”, 2341], [12, “B”, 234], [11, “B”, 1221], [10, “B”, 1345], [9, “B”, 1154], [8, “B”, 13], [7, “B”, 15], [17, “C”, 65], [16, “C”, 2652], [15, “C”, 2645], [14, “C”, 2645], [13, “C”, 2345]]}

SELECT
    Employee.Id, Employee.Company, Employee.Salary
FROM
    Employee,
    Employee alias
WHERE
    Employee.Company = alias.Company
-- 没有Employee.Id
GROUP BY  Employee.Company , Employee.Salary

{“headers”: [“Id”, “Company”, “Salary”],
“values”:
[[6, “A”, 513], [5, “A”, 451], [4, “A”, 15314], [3, “A”, 15], [2, “A”, 341], [1, “A”, 2341],
[12, “B”, 234], [11, “B”, 1221], [10, “B”, 1345], [9, “B”, 1154], [8, “B”, 13], [7, “B”, 15],
[17, “C”, 65], [16, “C”, 2652], [15, “C”, 2645], [13, “C”, 2345]]}

区别
|14 | C | 2645 |
|15 | C | 2645 |

  • 如果按照 Employee.Id,Employee.Company , Employee.Salary进行分组,则两个数据都会存在
  • 如果按照Employee.Company , Employee.Salary进行分组,则只有一个数据都会被保留
  • 因为我们要求中位数,因此不需要保留两个一样的salary数据(虽然感觉不太科学)
SELECT
    Employee.Id, Employee.Company, Employee.Salary,
    SUM(SIGN(Employee.Salary - alias.Salary)) as 'sign'
FROM
    Employee,
    Employee alias
WHERE
    Employee.Company = alias.Company
GROUP BY Employee.Company , Employee.Salary

{“headers”: [“Id”, “Company”, “Salary”, “sign”],
“values”:
[[6, “A”, 513, 1], [5, “A”, 451, -1], [4, “A”, 15314, 5], [3, “A”, 15, -5], [2, “A”, 341, -3], [1, “A”, 2341, 3],
[12, “B”, 234, -1], [11, “B”, 1221, 3], [10, “B”, 1345, 5], [9, “B”, 1154, 1], [8, “B”, 13, -5], [7, “B”, 15, -3],
[17, “C”, 65, -4], [16, “C”, 2652, 4], [15, “C”, 2645, 2], [13, “C”, 2345, -2]]}

SELECT
    Employee.Id, Employee.Company, Employee.Salary,
    SUM(CASE
    WHEN Employee.Salary = alias.Salary THEN 1
    ELSE 0
END) as 'sum',
    SUM(SIGN(Employee.Salary - alias.Salary)) as 'sign'
FROM
    Employee,
    Employee alias
WHERE
    Employee.Company = alias.Company
GROUP BY Employee.Company , Employee.Salary

{“headers”: [“Id”, “Company”, “Salary”, “sum”, “sign”],
“values”:
[[6, “A”, 513, 1, 1], [5, “A”, 451, 1, -1], [4, “A”, 15314, 1, 5], [3, “A”, 15, 1, -5], [2, “A”, 341, 1, -3], [1, “A”, 2341, 1, 3],
[12, “B”, 234, 1, -1], [11, “B”, 1221, 1, 3], [10, “B”, 1345, 1, 5], [9, “B”, 1154, 1, 1], [8, “B”, 13, 1, -5], [7, “B”, 15, 1, -3],
[17, “C”, 65, 1, -4], [16, “C”, 2652, 1, 4], [15, “C”, 2645, 4, 2], [13, “C”, 2345, 1, -2]]}

方法一: 利用 中位数 的定义

SIGN

对一个自然数进行判断,如果为零,返回0,如果为负数,统一返回-1,如果为正数,统一返回1。这时就需要 sign() 函数的支持

-- 每个company中有count(员工)个,相当于最大index
SUM(CASE
    WHEN Employee.Salary = alias.Salary THEN 1
    ELSE 0
END) 
-- 
>= ABS(SUM(SIGN(Employee.Salary - alias.Salary)))

作者:LeetCode
链接:https://leetcode-cn.com/problems/median-employee-salary/solution/yuan-gong-xin-shui-zhong-wei-shu-by-leetcode/
来源:力扣(LeetCode)
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

根据上述的搜索条件,可以轻松写出下面的 MySQL 代码。

any_value

Tips:

  • MySQL提供了any_value()函数来抑制ONLY_FULL_GROUP_BY值被拒绝
  • any_value()会选择被分到同一组的数据里第一条数据的指定列值作为返回数据
SELECT
    Employee.Id, Employee.Company, Employee.Salary
FROM
    Employee,
    Employee alias
WHERE
    Employee.Company = alias.Company
-- 将相同公司相同工资的人,只保留一个
GROUP BY Employee.Company , Employee.Salary
HAVING SUM(CASE
    WHEN Employee.Salary = alias.Salary THEN 1
    ELSE 0
END) >= ABS(SUM(SIGN(Employee.Salary - alias.Salary)))
ORDER BY Employee.Id

注意:在 MySQL 5.6 中,这个代码是可以运行的,但如果你用的是 MySQL 5.7+,就需要在 SELECT 语句中 把 Employee.id 改成 ANY_VALUE(Employee.Id)。

方法二:mod/ceil

mod:除余
ceil:向下取整

select Id,Company,Salary ,count(*) over(partition by Company ) total,
cast(count(*) over(partition by Company) as decimal)/2 mid,
ceil(cast(count(*) over(partition by Company) as decimal)/2) next,
row_number() over(partition by Company order by Salary) rn
from Employee 

{“headers”: [“Id”, “Company”, “Salary”, “total”, “mid”, “next”, “rn”],
“values”:
[[3, “A”, 15, 6, 3.0000, 3, 1], [2, “A”, 341, 6, 3.0000, 3, 2], [5, “A”, 451, 6, 3.0000, 3, 3], [6, “A”, 513, 6, 3.0000, 3, 4], [1, “A”, 2341, 6, 3.0000, 3, 5], [4, “A”, 15314, 6, 3.0000, 3, 6],

[8, “B”, 13, 6, 3.0000, 3, 1], [7, “B”, 15, 6, 3.0000, 3, 2], [12, “B”, 234, 6, 3.0000, 3, 3], [9, “B”, 1154, 6, 3.0000, 3, 4], [11, “B”, 1221, 6, 3.0000, 3, 5], [10, “B”, 1345, 6, 3.0000, 3, 6],

[17, “C”, 65, 5, 2.5000, 3, 1], [13, “C”, 2345, 5, 2.5000, 3, 2], [14, “C”, 2645, 5, 2.5000, 3, 3], [15, “C”, 2645, 5, 2.5000, 3, 4], [16, “C”, 2652, 5, 2.5000, 3, 5]]}

mod

此函数返回N除以M余值
e.g. SELECT MOD(29,3); -》 2

select Id,Company,Salary
from (
select Id,Company,Salary ,count(*) over(partition by Company ) total,
cast(count(*) over(partition by Company) as decimal)/2 mid,
ceil(cast(count(*) over(partition by Company) as decimal)/2) next,
row_number() over(partition by Company order by Salary) rn
from Employee 
) x
where (mod(total,2) = 0 and rn in ( mid, mid+1 )) --偶数:rank选两个,且mid是整数
or ( mod(total,2) = 1 and rn = next) --奇数:rank选两个,因为mid是小数,因此将它进行ceil处理

方法三: 排序后再找中位数

当只取一个中位数时:

SELECT Id, Company, Salary FROM
(SELECT Id, Company, Salary, 
COUNT(Salary) OVER (PARTITION BY Company) AS CN,
ROW_NUMBER() OVER (PARTITION BY Company ORDER BY Salary) AS RN 
FROM Employee) T
WHERE RN = (CN+1)/2 OR RN = CN/2

输出:
{“headers”: [“Id”, “Company”, “Salary”], “values”: [[5, “A”, 451], [12, “B”, 234], [14, “C”, 2645]]}

方法四: 变量

思路

如果记录本身就是根据 salary 来排名的,那么很容易就能找到 中位数。但 MySQL 本身是没有内置的排名方法的,所以这里会有一些东西需要处理。

算法

根据 salary 排序记录,利用会话变量计算排名。由于不需要级联表,这个方法要比方法一更高效。

但是这个运行会报错

SELECT 
    Id, Company, Salary
FROM
    (SELECT 
        e.Id,
        e.Salary,
        e.Company,
        IF(@prev = e.Company, @Rank:=@Rank + 1, @Rank:=1) AS rank,
        @prev:=e.Company
    FROM
        Employee e, (SELECT @Rank:=0, @prev:=0) AS temp
    ORDER BY e.Company , e.Salary , e.Id) Ranking
        INNER JOIN
    (SELECT 
        COUNT(*) AS totalcount, Company AS name
    FROM
        Employee e2
    GROUP BY e2.Company) companycount ON companycount.name = Ranking.Company
WHERE
    Rank = FLOOR((totalcount + 1) / 2)
        OR Rank = FLOOR((totalcount + 2) / 2)

给定数字的频率查询中位数

SQL Leetcode练习题_第1张图片
Numbers 表保存数字的值及其频率。

±---------±------------+
| Number | Frequency |
±---------±------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
±---------±------------+
在此表中,数字为 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3,所以中位数是 (0 + 0) / 2 = 0。

±-------+
| median |
±-------|
| 0.0000 |
±-------+
请编写一个查询来查找所有数字的中位数并将结果命名为 median 。

step1 - 错位相加求和

select
n1.number,
n1.frequency,
-- 错位的效果
-- 都不需要join诶!高级!!!
(select sum(frequency) from numbers n2 where n2.number<=n1.number) as asc_frequency,
(select sum(frequency) from numbers n3 where n3.number>=n1.number) as desc_frequency
from numbers n1

{“headers”:
[“number”, “frequency”, “asc_frequency”, “desc_frequency”],
“values”:
[[0, 7, 7, 12],
[1, 1, 8, 5],
[2, 3, 11, 4],
[3, 1, 12, 1]]}

错误

select 
n1.number,
n1.frequency,
sum(case when n1.number >= n2.number then n1.frequency else 0 end) as asc_frequency,
sum(case when n1.number <= n2.number then n1.frequency else 0 end) as desc_frequency
from numbers n1 cross join numbers n2
on n1.number = n2.number
cross join numbers n3
-- 此时,由于连接了两表的number,所以并没有形成笛卡尔积。那么相对应的,两表的frequency也是一样的数,因此sum得到的数和frequency是一样的。
on n1.number = n3.number
group by n1.number,
n1.frequency

{“headers”:
[“number”, “frequency”, “asc_frequency”, “desc_frequency”],
“values”:
[[0, 7, 7, 7],
[1, 1, 1, 1],
[2, 3, 3, 3],
[3, 1, 1, 1]]}

错误
为了避免上述sql中出现的问题。避免固定了number和frequency的值。本sql语句中,用cross join且不限定key的连接试一下

select 
n1.number,
n1.frequency,
sum(case when n1.number >= n2.number then n1.frequency else 0 end) as asc_frequency,
sum(case when n1.number <= n2.number then n1.frequency else 0 end) as desc_frequency
from numbers n1 cross join numbers n2
group by n1.number,
n1.frequency

非常错误。。 连number的顺序都错了

{“headers”:
[“number”, “frequency”, “asc_frequency”, “desc_frequency”],
“values”:
[[3, 1, 4, 1],
[2, 3, 9, 6],
[1, 1, 2, 3],
[0, 7, 7, 28]]}

step2

select
avg(t.number) as median
from
(select
n1.number,
n1.frequency,
(select sum(frequency) from numbers n2 where n2.number<=n1.number) as asc_frequency,
(select sum(frequency) from numbers n3 where n3.number>=n1.number) as desc_frequency
from numbers n1) t
-- 如果是偶数就是=。奇数就是>。 asc_frequency和desc_frequency同时限定,就可以找到中间位置的数。
where t.asc_frequency>= (select sum(frequency) from numbers)/2  -- 12/2 = 6 
and t.desc_frequency>= (select sum(frequency) from numbers)/2

变量

select avg(Number) as median from(
select
Number,
Frequency,
@sum as sum1,
@sum:=Frequency+@sum as sum2
from Numbers,(select @sum:=0) t
order by Number
) t
where if(@sum&1,
sum1<=floor(@sum/2) and sum2>floor(@sum/2),
sum1<=(@sum/2) and sum2>=(@sum/2))

作者:dodo1cheng-1odob
链接:https://leetcode-cn.com/problems/find-median-given-frequency-of-numbers/solution/mysqlyi-ge-bian-liang-jie-jue-by-dodo1cheng-1odob/
来源:力扣(LeetCode)
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

员工奖金

选出所有 bonus < 1000 的员工的 name 及其 bonus。

Employee 表单

empId name supervisor salary
1 John 3 1000
2 Dan 3 2000
3 Brad null 4000
4 Thomas 3 4000

empId 是这张表单的主关键字
Bonus 表单

±------±------+
| empId | bonus |
±------±------+
| 2 | 500 |
| 4 | 2000 |
±------±------+
empId 是这张表单的主关键字
输出示例:

±------±------+
| name | bonus |
±------±------+
| John | null |
| Dan | 500 |
| Brad | null |
±------±------+

重点!!!
结果需要将null的部分也输出出来!!!

select name, bonus
from Employee left join Bonus
on Employee.EmpId = Bonus.EmpId
where bonus is null or bonus < 1000
select 
    Employee.name,
    Bonus.bonus
from 
    Employee
left join
    Bonus
on Employee.empId=Bonus.empId
where 
    ifnull(bonus,0)<1000; -- 不可以写(bonus,null)

{“headers”: [“name”, “bonus”], “values”: [[“Brad”, null], [“John”, null], [“Dan”, 500]]}

查询回答率最高的问题

从 survey_log 表中获得回答率最高的问题,survey_log 表包含这些列:id, action, question_id, answer_id, q_num, timestamp。

id 表示用户 id;action 有以下几种值:“show”,“answer”,“skip”;当 action 值为 “answer” 时 answer_id 非空,而 action 值为 “show” 或者 “skip” 时 answer_id 为空;q_num 表示当前会话中问题的编号。

请编写 SQL 查询来找到具有最高回答率的问题。

示例:

输入:
±-----±----------±-------------±-----------±----------±-----------+
| id | action | question_id | answer_id | q_num | timestamp |
±-----±----------±-------------±-----------±----------±-----------+
| 5 | show | 285 | null | 1 | 123 |
| 5 | answer | 285 | 124124 | 1 | 124 |
| 5 | show | 369 | null | 2 | 125 |
| 5 | skip | 369 | null | 2 | 126 |
±-----±----------±-------------±-----------±----------±-----------+
输出:
±------------+
| survey_log |
±------------+
| 285 |
±------------+
解释:
问题 285 的回答率为 1/1,而问题 369 回答率为 0/1,因此输出 285 。

提示:回答率最高的含义是:同一问题编号中回答数占显示数的比例最高。

来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/get-highest-answer-rate-question
著作权归领扣网络所有。商业转载请联系官方授权,非商业转载请注明出处。

# Write your MySQL query statement below
select question_id as survey_log from 
(select question_id, 
(count(case when action='answer' then action else null end) over(partition by question_id))/(count(*) over(partition by question_id)) as answer_ratio
from survey_log) ttt
order by answer_ratio desc
limit 1

查询员工的累计薪水

SQL Leetcode练习题_第2张图片
Employee 表保存了一年内的薪水信息。

请你编写 SQL 语句,来查询每个员工每个月最近三个月的累计薪水(不包括当前统计月,不足三个月也要计算)。

结果请按 ‘Id’ 升序,然后按 ‘Month’ 降序显示。

示例:
输入:

Id Month Salary
1 1 20
2 1 20
1 2 30
2 2 30
3 2 40
1 3 40
3 3 60
1 4 60
3 4 70

输出:

Id Month Salary
1 3 90
1 2 50
1 1 20
2 1 20
3 3 100
3 2 40

解释:

员工 ‘1’ 除去最近一个月(月份 ‘4’),有三个月的薪水记录:月份 ‘3’ 薪水为 40,月份 ‘2’ 薪水为 30,月份 ‘1’ 薪水为 20。

所以近 3 个月的薪水累计分别为 (40 + 30 + 20) = 90,(30 + 20) = 50 和 20。

Id Month Salary
1 3 90
1 2 50
1 1 20

员工 ‘2’ 除去最近的一个月(月份 ‘2’)的话,只有月份 ‘1’ 这一个月的薪水记录。

Id Month Salary
2 1 20

员工 ‘3’ 除去最近一个月(月份 ‘4’)后有两个月,分别为:月份 ‘4’ 薪水为 60 和 月份 ‘2’ 薪水为 40。所以各月的累计情况如下:

Id Month Salary
3 3 100
3 2 40

way1

我写的:没有解决最近3个的要求

# Write your MySQL query statement below
select id,month,Salary from 
(select id,month,
sum(salary) over(partition by id,month order by month desc) as Salary,
count(*) over(partition by id) as maxs
from Employee
order by id asc,month desc) ttt
where month != maxs

way2 - over(desc rows between current row and 2 following)

way2.1

select id,month,salary,
-- 只计算当前项的后三个数据
sum(salary) over(partition by Id order by Month desc rows between current row and 2 following) as Salary
from Employee
order by id asc,month desc

{“headers”: [“id”, “month”, “salary”, “Salary”],
“values”:
[[1, 4, 60, 130],
[1, 3, 40, 90],
[1, 2, 30, 50],
[1, 1, 20, 20],
[2, 2, 30, 50],
[2, 1, 20, 20],
[3, 4, 70, 170],
[3, 3, 60, 100],
[3, 2, 40, 40]]}

# Write your MySQL query statement below
select id,month,Salary from 
(select id,month,
sum(salary) over(partition by Id order by Month desc rows between current row and 2 following) as Salary,
max(month) over(partition by id) as maxs
from Employee
order by id asc,month desc) ttt
where month != maxs

{“headers”:
[“id”, “month”, “Salary”],
“values”:
[[1, 3, 90],
[1, 2, 50],
[1, 1, 20],
[2, 1, 20],
[3, 3, 100],
[3, 2, 40]]}

way2.2

忒复杂了

select
    Id,
    Month,
    sum(Salary) over(partition by Id order by Month desc rows between current row and 2 following) Salary
from
(
    select 
        Employee.Id,
        Employee.Month,
        Salary
    from
        (
            select 
                Id,
                max(Month) cur_month
            from 
                Employee
            group by
                Id
        ) t1
        join
            Employee
        on
            t1.Id = Employee.Id
    where
        Employee.Month < t1.cur_month
) t2;

一所大学有 2 个数据表,分别是 student 和 department ,这两个表保存着每个专业的学生数据和院系数据。

写一个查询语句,查询 department 表中每个专业的学生人数 (即使没有学生的专业也需列出)。

将你的查询结果按照学生人数降序排列。 如果有两个或两个以上专业有相同的学生数目,将这些部门按照部门名字的字典序从小到大排列。

student 表格如下:

Column Name Type
student_id Integer
student_name String
gender Character
dept_id Integer

其中, student_id 是学生的学号, student_name 是学生的姓名, gender 是学生的性别, dept_id 是学生所属专业的专业编号。

department 表格如下:

Column Name Type
dept_id Integer
dept_name String

dept_id 是专业编号, dept_name 是专业名字。

这里是一个示例输入:
student 表格:

student_id student_name gender dept_id
1 Jack M 1
2 Jane F 1
3 Mark M 2

department 表格:

dept_id dept_name
1 Engineering
2 Science
3 Law

示例输出为:

dept_name student_number
Engineering 2
Science 1
Law 0

来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/count-student-number-in-departments
著作权归领扣网络所有。商业转载请联系官方授权,非商业转载请注明出处。

# Write your MySQL query statement below
select dept_name,count(student_id) as counts  -- ifnull()也行
from department left join student
on student.dept_id = department.dept_id
group by dept_name
-- order by dept_name asc
ORDER BY student_number DESC , department.dept_name

给定表 customer ,里面保存了所有客户信息和他们的推荐人。

±-----±-----±----------+
| id | name | referee_id|
±-----±-----±----------+
| 1 | Will | NULL |
| 2 | Jane | NULL |
| 3 | Alex | 2 |
| 4 | Bill | NULL |
| 5 | Zack | 1 |
| 6 | Mark | 2 |
±-----±-----±----------+
写一个查询语句,返回一个编号列表,列表中编号的推荐人的编号都 不是 2。

对于上面的示例数据,结果为:

±-----+
| name |
±-----+
| Will |
| Jane |
| Bill |
| Zack |
±-----+

# Write your MySQL query statement below
select name
from customer 
where referee_id!=2 or referee_id is null  -- 重点!sql结果默认不会显示null,如果想显示null,那么需要限定一下  referee_id is null 

当选者

找出得票最多的候选人

select Name
from (
  select CandidateId as id
  from Vote
  group by CandidateId
  order by count(id) desc  -- order里竟然可是直接这样跟count!!!记住~以后可以用哦~
  limit 1
) as Winner join Candidate
on Winner.id = Candidate.id

select name from 
  (select name, count(t2.id) as counts
  from Candidate t1 left join Vote t2
  on t1.id = t2.CandidateId
  group by name
  order by counts desc
  limit 1) tt

-- select name,CandidateId,t2.id
-- from Candidate t1 left join Vote t2
-- on t1.id = t2.CandidateId

朋友关系列表: Friendship

SQL Leetcode练习题_第3张图片
±--------------±--------+
| Column Name | Type |
±--------------±--------+
| user1_id | int |
| user2_id | int |
±--------------±--------+
这张表的主键是 (user1_id, user2_id)。
这张表的每一行代表着 user1_id 和 user2_id 之间存在着朋友关系。

喜欢列表: Likes

±------------±--------+
| Column Name | Type |
±------------±--------+
| user_id | int |
| page_id | int |
±------------±--------+
这张表的主键是 (user_id, page_id)。
这张表的每一行代表着 user_id 喜欢 page_id。

写一段 SQL 向user_id = 1 的用户,推荐其朋友们喜欢的页面。不要推荐该用户已经喜欢的页面。

你返回的结果中不应当包含重复项。

返回结果的格式如下例所示:

Friendship table:
±---------±---------+
| user1_id | user2_id |
±---------±---------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 6 | 1 |
±---------±---------+

Likes table:
±--------±--------+
| user_id | page_id |
±--------±--------+
| 1 | 88 |
| 2 | 23 |
| 3 | 24 |
| 4 | 56 |
| 5 | 11 |
| 6 | 33 |
| 2 | 77 |
| 3 | 77 |
| 6 | 88 |
±--------±--------+

Result table:
±-----------------+
| recommended_page |
±-----------------+
| 23 |
| 24 |
| 56 |
| 33 |
| 77 |
±-----------------+
用户1 同 用户2, 3, 4, 6 是朋友关系。
推荐页面为: 页面23 来自于 用户2, 页面24 来自于 用户3, 页面56 来自于 用户3 以及 页面33 来自于 用户6。
页面77 同时被 用户2 和 用户3 推荐。
页面88 没有被推荐,因为 用户1 已经喜欢了它。

来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/page-recommendations
著作权归领扣网络所有。商业转载请联系官方授权,非商业转载请注明出处。

# Write your MySQL query statement below
with friends_of1 as 
(select user2_id as id from Friendship
where user1_id = 1
union
select user1_id as id from Friendship
where user2_id = 1)


select t1.page_id as recommended_page from
(-- 找到user_id = 1的好友喜欢的页面
select page_id from Likes
where user_id in (select id from friends_of1)) t1
left join
(-- 去掉user_id = 1已经喜欢的页面
select page_id from Likes where user_id = 1) t2
on t1.page_id = t2.page_id
where t2.page_id is null
group by t1.page_id

用 case when 也可以实现union的效果

select (case when user2_id='1' then user1_id                                    -- 第一步
             when user1_id='1' then user2_id end) as f_id from Friendship )

你可能感兴趣的:(sql)