直接按要求排序输出即可。
select country
, gold_medals
, silver_medals
, bronze_medals
from Olympic
order by gold_medals desc, silver_medals desc, bronze_medals desc, country
直接按照题目要求拼接名字和职业的第一个字母,这里还需要在拼接的时候衔接小括号,按要求输出即可。
select person_id
, concat(name, '(', left(profession,1), ')') as name
from Person
order by person_id desc
select
内部套两个子查询就行了 这题主要理解题意,只需要在两个表分别算出来去重后的数据个数除一下,去个空值即可。
select ifnull(round((select count(distinct requester_id,accepter_id) from RequestAccepted)/(select count(distinct sender_id, send_to_id) from FriendRequest),2),0) as accept_rate
这道题就是一个大查询里面嵌套两个算优秀学生数的子查询,按照两个比较的结果,给出相应的输出值。
select case when (select sum(score >= 90) from NewYork) > (select sum(score >= 90) from California) then 'New York University'
when (select sum(score >= 90) from NewYork) = (select sum(score >= 90) from California) then 'No Winner'
when (select sum(score >= 90) from NewYork) < (select sum(score >= 90) from California) then 'California University'
end as Winner
if
语句嵌套在select
里直接得结果not in
和 null
关系) 此题难度不大,解题思路需要从原表中拿到每一种节点的关系:根节点没有上层节点,即 p_id 为null;中间节点是其他节点的上层节点,即出现在其他节点的 p_id 中;叶子节点是不曾出现在其他节点的上层节点中,也就是不曾出现在其他节点的 p_id 中,只需要按照上面的逻辑,进行分组判断得到结果即可。
这题真正困难的地方是在于
not in
和null
的关系,not in
的本质是将前面的元素和后面的查询结果进行一一匹配不等,比如前面的元素为id
,就是将id
和后面的查询表的元素进行比较id <> p_id
那么当后面的元素存在一个null
的时候,匹配结果就会返回null
,这时所有的判断结果全是false
,自然就没办法得到想要的结果,所以此时,我们要做非空处理,将所有空值去掉即可正常判断。
select id
, case when p_id is null then 'Root'
when id in (select p_id from Tree) then 'Inner'
when id not in (select p_id from Tree where p_id is not null) then 'Leaf' end as type
from Tree
只需要判断两边相加大于第三边即可,但是每两边都要相加比一次。
select x
, y
, z
, if(x+y>z and x+z>y and y+z>x, 'Yes', 'No') as triangle
from Triangle
这道题算是一个精彩题,按照不同情况进行分类得出id
之后拉通就做完了。来看具体逻辑,第一个when
表示当id
为奇数且是最后一项的时候,和题意中的最后一个学生id
不交换相对应,所以此时不交换id
;第二种情况是当为偶数时id
减一,例如2,4这种就是减一之后得到1,3,变相实现了换id
位置;第三种情况就是常规为奇数时候加一实现与原位置的偶数交换位置。这题主要逻辑就在于此,还是很精彩的。
select case when mod(id, 2) = 1 and id = (select count(*) from seat) then id
when mod(id, 2) = 0 then id - 1
else id + 1 end as id
, student
from seat
order by id
首先把两队所有的比赛中进球全部拿出来,进行比较,按照主客顺序以及进球数进行判断,判断得分后在嵌套的外查询内根据队伍id
进行分组计算聚合分数综合,接着发现需要输出队伍名,右连接team
表,即可得出队伍名,这里需要注意之前算的聚合分数需要进行去空值,处理完之后输出即可。
with all_points as (
select host_team as team
, case when host_goals > guest_goals then 3
when host_goals = guest_goals then 1
else 0 end as points
from Matches
union all
select guest_team
, case when host_goals > guest_goals then 0
when host_goals = guest_goals then 1
else 3 end as points
from Matches
)
select team_id
, team_name
, coalesce(sum(points),0) as num_points
from all_points right join Teams
on all_points.team = Teams.team_id
group by team_id
order by num_points desc, team_id
首先进行连表,拿到我们需要的国家名的信息,接着按照国家名进行分组,求平均值,再按照题目要求,对平均值进行判断大小,输出相应的天气情况即可。
select country_name
, case when avg(weather_state) >= 25 then 'Hot'
when avg(weather_state) <= 15 then 'Cold'
else 'Warm' end as weather_type
from Weather inner join Countries
on Weather.country_id = Countries.country_id
where date_format(day,"%Y-%m")='2019-11'
group by country_name
这道题首先得连接两表,将x,y
的值全部拿到,但由于连接一次表只可以拿到一个变量的值,所以这里连接两次,2次连接后,将左变量和右变量全部拿到,拿到之后,按照operator
的值进行判断,当他为>
时则输出x>y
,其他情况于此同理,由此很容易得到结果,但是输出结果为1或者0,所以我们再做一次判断,当前若为1则输出true
反之为false
select left_operand
, operator
, right_operand
, if(case operator when '>' then v1.value > v2.value
when '=' then v1.value = v2.value
when '<' then v1.value < v2.value end,'true','false') as value
from Expressions inner join Variables as v1
on Expressions.left_operand = v1.name
inner join Variables as v2
on Expressions.right_operand = v2.name
直接按照题目条件进行判断,满足则返回完整的工资,否则为0,输出即可。
select employee_id
, if(mod(employee_id,2)=1 and left(name,1)<>'M', salary, 0) as bonus
from Employees
order by employee_id
首先直接连表计算算出每个visit_id
对应的转化率,按照转化率划分为不同的等级,接着直接外连接连表找到不曾访问过的visit_id
,由于不曾访问过,所以等级是青铜,但这里是空值,所以去空值转换成青铜,直接输出名字和编号即可。
select Members.member_id
, name
, ifnull(category, 'Bronze') as category
from(select member_id
, case when avg(Purchases.visit_id is not null) < 0.5 then 'Silver'
when avg(Purchases.visit_id is not null) >= 0.5 and avg(Purchases.visit_id is not null) < 0.8 then 'Gold'
when avg(Purchases.visit_id is not null) >= 0.8 then 'Diamond' end as category
from Visits left join Purchases
on Visits.visit_id = Purchases.visit_id
group by member_id) as t1 right join Members
on t1.member_id = Members.member_id
直接使用date_format
函数使用参数将时间转为周内的星期几,0为周天,6为周六,所以直接带入,当为5或者6,则算入weekend_cnt
,反正则在工作日为working_cnt
,累加计算完毕输出即可。
select sum(date_format(submit_date,'%w') = 0 or date_format(submit_date,'%w') = 6) as weekend_cnt
, sum(date_format(submit_date,'%w') <> 0 and date_format(submit_date,'%w') <> 6) as working_cnt
from Tasks
where
判断得结果 直接搜索原表,只需要推荐id
不为 2 或者为空值即可。
select name
from Customer
where referee_id <> 2 or referee_id is null
两个条件满足其一即可,直接where
后面两个条件 or
一下即可
select name
, population
, area
from World
where area >= 3000000
or population >= 25000000
这道题使用mod()
求出id
为奇数,并且限制影片描述不为boring
,即不等于关系,得出结果后排序即可。
select id
, movie
, description
, rating
from cinema
where mod(id, 2) = 1 and description <> 'boring'
order by rating desc
这题就是按照日期筛选出合适的用户和会话之后,直接去重算会话数计数值和用户数计数值的商即可,不要被题目迷惑,出现的会话只要不是空值就全都是有效活动,无需再做判断,记得最后做去空值处理。
select ifnull(round(count(distinct session_id)/count(distinct user_id),2),0) as average_sessions_per_user
from Activity
where activity_date between '2019-06-28' and '2019-07-27'
直接作者id
和读者id
相等,筛选出来就解决问题,记得去重后按id
排序输出即可。
select distinct author_id as id
from Views
where author_id = viewer_id
order by id
这里直接计算出content
的长度,并要求大于15即可将tweet_id
输出。
select tweet_id
from Tweets
where length(content) > 15
根据题意判断,直接where
后面跟表达式判断全为是时输出即可。
select product_id
from Products
where low_fats = 'Y'
and recyclable = 'Y'
直接按照题意选择收入大于0且年份为2021即可。
select customer_id
from Customers
where revenue > 0
and year = 2021
本题难点在于题目意思模糊不清,应当是在时间段内单笔消费满足大于给出的消费最低值,而不是区间内某用户总消费,按照意思直接where
筛选后直接输出去重的用户的计数项即可。
CREATE FUNCTION getUserIDs(startDate DATE, endDate DATE, minAmount INT) RETURNS INT
BEGIN
RETURN (
select count(distinct user_id)
from Purchases
where time_stamp >= startDate
and time_stamp <= endDate
and amount >= minAmount
);
END
跟上题一样,但无需计数,直接输出不重复的用户即可。
CREATE PROCEDURE getUserIDs(startDate DATE, endDate DATE, minAmount INT)
BEGIN
select distinct user_id
from(select user_id,amount
from Purchases
where time_stamp>=startDate and time_stamp<=endDate and amount>=minAmount) as t2
order by user_id;
END
inner join
内连接一步解决连表一步解决,没有难度。
select t1.name as Employee
from Employee as t1 inner join Employee as t2
on t1.managerId = t2.id
where t1.salary > t2.salary
直接连表,连表条件是后一天温度大于前一天,直接得出结果,
select w1.id
from Weather as w1 inner join Weather as w2
on w1.Temperature > w2.Temperature
and datediff(w1.recordDate, w2.recordDate) = 1
连接两个表,让不一样的点连接到同一行,计算出距离的大小,再取最小值即可。
select round(min(sqrt((t1.x-t2.x)*(t1.x-t2.x) + (t1.y-t2.y)*(t1.y-t2.y))),2) as shortest
from Point2D as t1 join Point2D as t2
on t1.x <> t2.x or t1.y<>t2.y
这种做法和上面差不多,差距在于最后是使用元组的方式判断,因为根据题意,(x, y)
是原表的主键,由于直接查主键无需回表,所以我们采用同样的元组,查询速度就会快很多。
select round(min(sqrt((t1.x-t2.x)*(t1.x-t2.x) + (t1.y-t2.y)*(t1.y-t2.y))),2) as shortest
from Point2D as t1 join Point2D as t2
on (t1.x, t1.y) <> (t2.x, t2.y)
意思很明确,只需要连一次表,并且让表左右两边的x
不相等,两个x
差值的绝对值就为需求的距离,求最小值就拿到最小距离。
select min(abs(p1.x - p2.x)) shortest
from Point as p1 join Point as p2
on p1.x > p2.x
给一种窗口函数的做法,因为直线上的距离是两点距离,所以最小距离一定在两个临近点之间产生,所以直接使用窗口函数将临近点放在同一行,相减求绝对值最小值,就得到最小距离。
select min(abs(x-x1)) as shortest
from(select x
, lag(x,1) over(order by x) as x1
from Point
order by x) as temp
连表之后取出合适的输出结果直接输出即可。
select product_name
, year
, price
from Sales inner join Product
on Product.product_id = Sales.product_id
首先拿出所有的好友关系,左边右边都需要全部拿到,拿到之后建立公共表,将公共表和原表进行连接,由于我们只需要考虑user_id
为1的用户的所有好友,所以这里我们无需考虑那么多直接筛选下数据,筛选完毕之后,要考虑我们需要拿到所有好友的推荐的页面,所以直接输出页面,由于可能重复,所以做去重处理先,还有我们需要考虑是否存在推荐的页面是自己也喜欢的,所以这里做子查询查到用户1喜欢的页面做反处理就好,一般的反处理可以不等或者not in
由于子查询查出来的结果可能没有值而不是空值,所以这里我们做not in
代替不等关系,即可正常输出。
with all_friend as (
select user1_id as user_id
, user2_id as friend_id
from Friendship
union all
select user2_id as user_id
, user1_id as friend_id
from Friendship
)
select distinct page_id as recommended_page
from all_friend inner join Likes
on all_friend.user_id = 1
and Likes.user_id = all_friend.friend_id
where page_id not in (select page_id from Likes where user_id = 1)
本题思路和上面一样,区别在于下面这个需要对全部好友进行查询,而上面这个是只需要查1的好友,所以这里要多做一次连表,拿到所有好友的推荐页面,这里做一次判断,筛选掉所有user1_id
推荐的页面,这里筛选的方式我选择的是关联子查询,通过留存不存在page相等于前面用户推荐的页面来控制留下来的所有页面都是被推荐人所不推荐的,就是保证被推荐人的推荐页面和后面都不等,之后就直接分组数出有多少个相同页面即可。
with all_friends as (
select user1_id
, user2_id
from Friendship
union
select user2_id
, user1_id
from Friendship
)
select user1_id as user_id
, page_id as page_id
, count(user2_id) as friends_likes
from all_friends inner join Likes as l2
on all_friends.user2_id = l2.user_id
where not exists (select 'x' from Likes as l1 where l1.user_id = user1_id and l2.page_id = l1.page_id)
group by 1,2
直接连表,连表依据就是可以形成一个矩形的条件,即2对角点的横坐标和纵坐标全不等,按照连表条件链表后,观察样例表图,前面是较小的id
后面则是较大的,我们只需在结果中嵌套if
设置一下最大最小即可,由于面积值为正,所以提取下绝对值,按照题目提示进行排序输出即可。
select if(p1.id p2.y_value)
order by area desc, p1, p2
这道题连表后拿到所有节目对应的类型和节目名称,然后直接进行数据筛选,筛选的目标为:1.找到所有的电影 2.找到所有的儿童适宜 3.找到所有的2020年6月的节目,三项条件做交集,输出即可。
select distinct title
from TVProgram inner join Content
on TVProgram.content_id = Content.content_id
where content_type = 'Movies'
and Kids_content = 'Y'
and left(program_date,7) = '2020-06'
这道题直接三表相连判断每个id
不等且每个名字不等,一个一个判断即可。
select SchoolA.student_name as member_A
, SchoolB.student_name as member_B
, SchoolC.student_name as member_C
from SchoolA inner join SchoolB
on SchoolA.student_id <> SchoolB.student_id
and SchoolA.student_name <> SchoolB.student_name
inner join SchoolC
on SchoolB.student_id <> SchoolC.student_id
and SchoolA.student_id <> SchoolC.student_id
and SchoolA.student_name <> SchoolC.student_name
and SchoolB.student_name <> SchoolC.student_name
本题的第一种思路是常规的子查询写法,直接利用子查询查到在用户表中找到的所有用户数量,根据拿到的数量作为分母,分子为在当前外部查询中找到的当前比赛中所有的注册的参赛人员数量,两相一比就得到了当前比赛的用户注册率。
select contest_id
, round(100*count(*)/(select count(*) from Users), 2) as percentage
from Register
group by contest_id
order by percentage desc, contest_id
这里再提供一种连接的方法来做,直接把所有的的用户和比赛注册表做笛卡尔积,按照算出的结果进行计数,由于这里是笛卡尔积,所以左右表都会出现很多重复的user_id
,所以这里按比赛分组做两边去重之后直接除一下也可以得到结果。
select contest_id
, round(100*count(distinct Register.user_id)/count(distinct Users.user_id), 2) as percentage
from Register join Users
group by contest_id
order by percentage desc,contest_id
本题直接找到所有同时(存在时间重叠)异地登录(ip
不同)的相同账号(account_id
相同),找到后输出即可。
select distinct l1.account_id
from LogInfo as l1 inner join LogInfo as l2
on ((l1.login >= l2.login and l1.logout <= l2.logout)
or (l1.login <= l2.login and l2.login <= l1.logout and l2.logout >= l1.logout) )
and l1.ip_address <> l2.ip_address
and l1.account_id = l2.account_id
直接连表,连表条件是同一个人两次的请求时间的差值大于0且小于等于24小时,又由于直接进行差值只会产生向下取整的整数,所以我们将计算插值的维度调至最小维度秒,这样比较出最小秒的差值的情况,不会产生因为维度不合适导致的缺失值的情况,算出值去重输出即可。
select distinct c1.user_id
from Confirmations as c1 inner join Confirmations as c2
on timestampdiff(second,c1.time_stamp,c2.time_stamp) <= 24*3600
and timestampdiff(second,c1.time_stamp,c2.time_stamp) > 0
and c1.user_id = c2.user_id
首先右连接,因为右表被保存,所以左表存在缺失值,会有姓名是空值,即没有姓名,筛选空值项输出,第二步另一个表左连接,因为左表被保存,所以右表存在缺失值,会有工资为空值,即丢失了工资,筛选空值输出,两个做并集输出即可。
select Salaries.employee_id
from Employees right join Salaries
on Employees.employee_id = Salaries.employee_id
where name is null
union
select Employees.employee_id
from Employees left join Salaries
on Employees.employee_id = Salaries.employee_id
where salary is null
order by employee_id
直接内连接,拿到所有的产品对应的价格,并且按照user_id
进行分组,聚合累加算出用户消费总额,并按照给出的顺序排序即可。
select user_id
, sum(quantity * price) as spending
from Sales inner join Product
on Product.product_id = Sales.product_id
group by user_id
order by spending desc, user_id
直接做不等值连接求笛卡尔积输出即可。
select t1.team_name as home_team
, t2.team_name as away_team
from Teams as t1 join Teams as t2
on t1.team_name <> t2.team_name
直接连表,找到同一个用户不同购买单号在7天内的数据连表即可,内连接会自动去空值,所以这里直接去重后排序输出即可。
select distinct p1.user_id
from Purchases as p1 inner join Purchases as p2
on p1.user_id = p2.user_id
and p1.purchase_id <> p2.purchase_id
and datediff(p1.purchase_date, p2.purchase_date) >= 0
and datediff(p1.purchase_date, p2.purchase_date) <= 7
order by user_id
提取所有的司机作为一个公共表,接着左连接两表,对于司机进行聚合,直接计数乘客输出即可。
with dri as (
select distinct driver_id
from Rides
)
select r1.driver_id
, count(r2.passenger_id) as cnt
from dri as r1 left join Rides as r2
on r1.driver_id = r2.passenger_id
group by 1
本题首先连接产品表和发票表计算每张发票总金额,接着按总金额写窗口函数开窗打标,找到最多金额的发票,接着拿这张发票再次连接发票表和产品表,重新拿到需求价格和数量的数据,拿到后算出该张发票单产品的数据,算出后输出即可。
select Products.product_id
, quantity
, price*quantity as price
from(select invoice_id
from(select invoice_id
, row_number() over(order by sum(quantity * price) desc, invoice_id) as rk
from Purchases inner join Products
on Products.product_id = Purchases.product_id
group by invoice_id) as t1
where rk = 1) as t2 inner join Purchases
on t2.invoice_id = Purchases.invoice_id
inner join Products on Products.product_id = Purchases.product_id
先建立一个临时表给所有的记录打上标记,将同个id
不同标记且在7天内的记录连接在一起,可以连接的就满足条件,输出即可。
with Users as (
select *
, row_number() over() as rk
from Users
)
select distinct u1.user_id
from Users as u1 inner join Users as u2
on u1.user_id = u2.user_id
and datediff(u1.created_at,u2.created_at)>=0
and datediff(u1.created_at,u2.created_at)<=7
and u1.rk <> u2.rk
update
在后面加表名和sex
用于修改原表的属性。
update Salary set sex = if(sex='f','m','f')
delete
的特殊用法(删除连接中的子表数据) 连表条件中写好相同的 email ,并且保证 id 取较小的 t2 的 id ,delete 后面直接跟要删除的子表,将子表的所有内容全部删掉,代码量极少。
delete t1 from Person as t1 inner join Person as t2
on t1.email = t2.email and t1.id > t2.id
left join
左外连接一步解决 这题的难点在于两表相连后,Person表中的PersonId
并不是都出现在了Address
表中,因此Person
表需要获得空值,所以,直接左外连接处理解决。
select firstName
, lastName
, city
, state
from Person left join Address
on Person.personId = Address.personId
首先要确认每个玩家首次登录的日子,第二个就是连表拿到每个玩家第二天是否登录,这里我们既然又要计算玩家总和又要计算第二天登录的玩家数,自然存在空值,调用外连接, 根据每个玩家是第一天登录和登录id相同,还有就是后一个表的日期和前一个应当只相差1天,连接完成后,count(*)
可以拿到所有玩家数量,因为每个玩家的第一天登录那必然只有一个,但是第二天没登录的连接后第二天的日期是空的,数出非空的日期有几个,就算出比率。
select round(count(a2.event_date)/count(*),2) as fraction
from Activity as a1 left join Activity as a2
on a1.player_id = a2.player_id and datediff(a2.event_date,a1.event_date)=1
where a1.event_date = (select min(event_date) from Activity as a3 where a3.player_id=a1.player_id)
和上面完全一致的思路,减少一重循环,会快很多(448ms),上面是4488ms。
with temp as (
select player_id
, min(event_date) as event_date
from Activity
group by player_id
)
select round(count(temp.event_date)/count(distinct Activity.player_id),2) as fraction
from Activity left join temp
on Activity.player_id = temp.player_id
and datediff(Activity.event_date, temp.event_date)=1
观察题目,bonus
和name
在两个表里面,所以需要连表,结果存在空值,且 Employee
全部保留,所以右边可空,为左外连接,连接后的结果直接输出。
select name
, bonus
from Employee left join Bonus
on Employee.empId = Bonus.empId
where bonus < 1000 or bonus is null
这题算得上是连接题目类的天花板,首先两表连接,两个book_id
肯定一样,其次就是判断交易日期,由于交易日期是需要在一年之内,所以对于交易日期直接在连表中直接进行判断,这里就到本题的重难点:
where
和on
的区别,前者在于筛选数据,将不符合条件的数据直接从表中筛去,是不会显示在后续表中的;但是,这里的on
是不一样的,on
对于不符合条件的数据是返回一个空值而非直接删去,这种模式就非常有益,我们可以根据这个机制直接将一些我们需求的列拿出来而非因为where
的机制被删掉。
回到题目中,这里我们对于Orders
表,需要的数据是购买过的书且书的购买数量为10本以下,对于Books
表,需要的数据是从没购买过书的人,但是这里就存在一个矛盾,例如样例中给出的5号书,书本被购买过,但是日期不满足,所以在对于前表筛选的时候就会将这本书筛掉,而在后表中,我们又缺少可以直接拿出在Orders
表出现过但日期不满足的书,这就存在了问题,有没有一种方法可以在前表筛选的时候不将值删掉呢?这就引入了左连接,左连接有on
,可以将不符合的数据转为null
而不是直接清除掉,所以当我们判断出不满足日期的条件时,这本书的数目就变为空值,接着我们使用where
将不满足书的上架时间的数据删掉之后,整个数据中就只剩下了:满足条件的书,本书不够的书,以及销售日期不在规定日期内所以销售数量为0的书,我们由此只需聚合一下,求个和即可,非常好的一个思路,很巧妙的运用到了where
和on
。
select Books.book_id
, name
from Books left join Orders
on Books.book_id = Orders.book_id and dispatch_date >= '2018-06-23'
where available_from < '2019-05-23'
group by book_id
having sum(ifnull(quantity,0)) < 10
这道题算是连表解决问题的典范了,多次连表,多次处理值。首先开窗打标,对于销售员进行分组,并按照时间进行排序,取出rk=2
即每个销售员按日期卖出的第二件商品,但这里不可以直接连接,因为我们的筛选条件是where
,在处理中如果我们在此轮循环就直接连表,还是会导致有一个销售员的数据丢失,所以我们选择再嵌套一层查询,在外层查询中进行右外连接,拿到全部的销售员和喜欢的产品的信息,此时进行最后一次连表,连接Item
表,拿到内层循环传递的,卖出的第二件产品的产品id
,这时候在查询上写上if
进行判断输出即可。总体来说这题不算难,但还是要搞清楚逻辑关系,避免缺失值造成关系混乱。
select seller_id
, if(favorite_brand = item_brand and item_brand is not null,'yes','no') as 2nd_item_fav_brand
from(select user_id as seller_id
, item_id
, favorite_brand
from(select seller_id
, item_id
from(select seller_id
, item_id
, rank() over(partition by seller_id order by order_date) as rk
from Orders) as t1
where rk = 2) as t2 right join Users on user_id = seller_id) as t3 left join Items
on Items.item_id = t3.item_id
这题就烦在连表,我们要高度清晰我们到底要求什么,首先我们按照题目要求将select
后面的查询对象写完,一步步看需要什么,首先时需要顾客名字,这里就告诉我们要连接客户表,第二个就是顾客的联系个数,所以这里再连接联系表,连接完成后数出联系个数,记住这里不是数全部字段,而是只需要数出联系个数即可,因为存在没有联系人的用户联系人字段就是空值,所以这里直接数全字段会出错。第二个就是数出有多少个电子邮件存在于客户表中的客户的联系人,直接sum()
里面嵌套一个表达式判断是否存在于顾客表的所有邮箱中,当然前面的所有操作需要对每张发票单独做分组,这个不能忘,因为我们所有的操作都是建立在这个上面的,连接客户表和联系人表的时候会存在多个发票id
的情况,我们做一次分组,就可以有效将所有相同id
的发票做组合计算。
select invoice_id
, customer_name
, price
, count(contact_name) as contacts_cnt
, ifnull(sum(contact_email in (select email from Customers)),0) as trusted_contacts_cnt
from Invoices left join Contacts
on Invoices.user_id = Contacts.user_id
left join Customers
on Customers.customer_id = Invoices.user_id
group by invoice_id
order by invoice_id
要求求出所有在不存在学校id
的学生id
和姓名,那么按照学校id
直接进行连表,当学校表的id
为空则表示这个学校不存在于学校表中,则不成立,说明这个学校不存在,则输出学生的id
和姓名即可。
select Students.id
, Students.name
from Students left join Departments
on Students.department_id = Departments.id
where Departments.name is null
直接左连接拿到想要的数据,由于是无序排列即可,所以这里直接输出就好了。
select unique_id
, name
from Employees left join EmployeeUNI
on Employees.id = EmployeeUNI.id
这道题题意表达不是很明确,其实就是按照id
和year
进行匹配,也就是按照双方的主键匹配,由于是左外连接,所以当在右表中不存在的键值对在左表被查到时,是查不到的,就会返回一个空的npv
,这里做一个去空值处理返回一个0。
select Queries.id
, Queries.year
, ifnull(npv,0) as npv
from Queries left join NPV
on (Queries.id, Queries.year) = (NPV.id, NPV.year)
这道题直接右连接拿到所有的种类,拿到之后,按照种类做聚合,主要的难点在于date_format()
函数的参数%w
这个参数意指按照星期数据格式化日期,只需要按照相应不同的星期几来窄表变宽表即可。这里还存在一个误区,就是虽然我们采用右连接拿到了种类的空值,但我们不做处理,直接在if
里算总和的时候,因为空值不参与运算,所以直接返回的0,因此不做考虑。
select item_category as CATEGORY
, sum(if(date_format(order_date,'%w')=1,quantity,0)) as MONDAY
, sum(if(date_format(order_date,'%w')=2,quantity,0)) as TUESDAY
, sum(if(date_format(order_date,'%w')=3,quantity,0)) as WEDNESDAY
, sum(if(date_format(order_date,'%w')=4,quantity,0)) as THURSDAY
, sum(if(date_format(order_date,'%w')=5,quantity,0)) as FRIDAY
, sum(if(date_format(order_date,'%w')=6,quantity,0)) as SATURDAY
, sum(if(date_format(order_date,'%w')=0,quantity,0)) as SUNDAY
from Orders right join Items
on Orders.item_id = Items.item_id
group by item_category
order by item_category
直接连表,按照销售员的id
并且要求时间为2020进行连表,这里选择将时间筛选写在on
当中,这样连表结束时可以存在空值,而不是直接在where
里筛选完成后直接将不满足条件的记录删掉不方便处理,由于我们设置的数据存在空值,所以根据空值直接返回销售员名字即可(这里的逻辑就是不满足条件的数据的Orders
部分在连接中会被修改为null
)。
select seller_name
from Orders right join Seller
on Orders.seller_id = Seller.seller_id
and year(sale_date) = 2020
where Orders.seller_id is null
order by seller_name
也是左连接的思路,左表中没有在右表出现的访问都会在右表中返回一个空值,我们可以直接使用这个空值进行判断,若他为空值,则说明没有交易,则可以直接输出顾客id
,并以此分组,计算出每个同样的顾客会有多少次访问不交易。
select customer_id
, count(*) as count_no_trans
from Visits left join Transactions
on Visits.visit_id = Transactions.visit_id
where Transactions.visit_id is null
group by customer_id
按照题意直接连表,大盒子套小盒子,但小盒子不会套小盒子,所以无需考虑这个问题,直接计算两边分别两种水果之和即可,因为存在大盒子中没有小盒子,所以计算中需要考虑去空值问题,去掉空值之后,本题直接计算累加和输出即可。
select sum(Boxes.apple_count + ifnull(Chests.apple_count,0)) as apple_count
, sum(Boxes.orange_count + ifnull(Chests.orange_count,0)) as orange_count
from Boxes left join Chests
on Boxes.chest_id = Chests.chest_id
首先利用递归算出12个月,留作后面备用。接着处理活跃司机数据,第二步先将司机数据提取,将所有小于2020年的数据全改为2020年,大于2020年的直接筛选掉不用,筛选完的数据算出月份取出司机编号,处理完的数据作为第二个公共表。然后,连接月份表和处理完的司机表,利用窗口函数算出每月累加的司机数,这里需要进行分组,分组后计算司机数,这里给出的司机数可能会有空值,因为存在某些月份没有司机新入职,所以这里数的时候会自动跳过空值,这里累加就算出了所有的每月活跃司机数。
然后处理接受乘车数,直接两个乘车表自然连接,连接的结果进行分组,按月份分组,这里同样注意筛选所有2020年的数据,筛选完之后算出所有月份的接受乘车数,因为这里自然连接直接去掉不接受的,所以这里不会出现空值,但是我们需要所有月份的值,也包括数量为接受乘车数0的月份,所以再次进行右外连接,算出所有月份的接受乘车数,由于有些月份的数目不存在,产生的空值做去空值处理。
最后前面两表按照相同的全部月份内连接输出所有值即可。
with recursive month as (
select 1 as mon
union
select mon + 1
from month
where mon < 12
), Dr as (
select driver_id
, month(join_date) as mon
from(select driver_id
, if(join_date < '2020-01-01', cast('2020-01-01' as date), join_date) as join_date
from Drivers
where join_date < '2021-1-1') as t1
), all_dr as (
select month.mon as month
, sum(count(driver_id)) over(order by month.mon rows between unbounded preceding and current row) as cnt
from Dr right join month
on Dr.mon = month.mon
group by month.mon
), acc_Rides as (
select month.mon as month
, ifnull(accepted_rides,0) as accepted_rides
from(select month(Rides.requested_at) as mon
, count(*) as accepted_rides
from Rides natural join AcceptedRides
where requested_at >= '2020-01-01' and requested_at < '2021-01-01'
group by 1) as t2 right join month on month.mon = t2.mon
)
select all_dr.month
, cnt as active_drivers
, accepted_rides
from acc_Rides inner join all_dr
on acc_Rides.month = all_dr.month
这道题建立在上一题的基础上,需要注意的是,表二的接单司机在计算时有重复值,需要去重。该题的本意是用改月不同司机的接单数除以该月活跃司机。
with recursive month as (
select 1 as mon
union
select mon + 1
from month
where mon < 12
), Dr as (
select driver_id
, month(join_date) as mon
from(select driver_id
, if(join_date < '2020-01-01', cast('2020-01-01' as date), join_date) as join_date
from Drivers
where join_date < '2021-1-1') as t1
), all_dr as (
select month.mon as month
, sum(count(distinct driver_id)) over(order by month.mon rows between unbounded preceding and current row) as cnt
from Dr right join month
on Dr.mon = month.mon
group by month.mon
), acc_Rides as (
select month.mon as month
, ifnull(accepted_rides,0) as accepted_rides
from(select month(Rides.requested_at) as mon
, count(distinct driver_id) as accepted_rides
from Rides natural join AcceptedRides
where requested_at >= '2020-01-01' and requested_at < '2021-01-01'
group by 1) as t2 right join month on month.mon = t2.mon
)
select all_dr.month
, ifnull(round(100*accepted_rides / cnt, 2),0) as working_percentage
from acc_Rides inner join all_dr
on acc_Rides.month = all_dr.month
这道题和前两题存在一定的区别,首先我们采用递归拿到所有的月份,第二个是进行数据筛选,我们需要把被接受的乘车和普通乘车结合在一起,组成被接受的乘车,且我们可以拿到他的时间,然后在这里对时间做处理,把时间控制在2020年,然后在外循环内,我们可以直接写一个窗口函数,固定计算每三个月的一个窗口,但是这个窗口是从前往后的,因为它是要算后两个月,在这里,计算结束之后,需要做一步去重处理,而且这里窗口函数中其实是还夹带着一个聚合函数,因为会存在相同的月份的骑行时间和骑行距离不一样,会导致重复,但是在普通进行窗口函数运算当中,它会一个一个行往下算,所以还是要进行处理,还会存在一个问题,就是空值我们这里可以直接用ifnull
来处理。
with recursive month as (
select 1 as mon
union
select mon + 1
from month
where mon < 12
), rids as (
select *
from Rides natural join AcceptedRides
where requested_at between '2020-01-01' and '2020-12-31'
)
select month
, average_ride_distance
, average_ride_duration
from(select mon as month
, round(sum(ifnull(sum(ride_distance),0)) over(order by mon rows between current row and 2 following)/3,2) as average_ride_distance
, round(sum(ifnull(sum(ride_duration),0)) over(order by mon rows between current row and 2 following)/3,2) as
average_ride_duration
from rids right join month on month(requested_at) = mon
group by 1) as temp
where month<=10
首先写一个公共表表达式,找出所有的在剧集中出现过广告的剧集,这里的找法就是按照customer_id
相同(即同一个人),然后就是时间存在重叠,就是广告时间戳在开始和结束时间戳内,找到所有的剧集之后,在外部循环写一个右连接连表条件为相同的剧集,由于前面我们筛选出了所有出现广告的剧集,所以连表找出空值即为,没有广告的剧集。
with cte as(
select session_id
from Playback inner join Ads
on Playback.customer_id = Ads.customer_id
and timestamp >= start_time
and timestamp <= end_time
)
select Playback.session_id
from cte right join Playback
on cte.session_id = Playback.session_id
where cte.session_id is null
首先筛选第一步,找到所有有上级且工资低于30000的员工,接着,直接左连接,找到所有上级不在的员工,即连接的时候上级一栏无法和员工表连接,导致上级为空值,直接筛选空值输出左表employee_id
即可。
select temp.employee_id
from(select *
from Employees
where salary < 30000 and manager_id is not null) as temp left join Employees on temp.manager_id = Employees.employee_id
where Employees.employee_id is null
order by employee_id
按照题目依次往下,按分数排序找出所有容量大于的学生数的分数,并根据筛选出来的值选中所有分数中的最小分数,直接输出即可。
select school_id
, ifnull(min(score),-1) as score
from Schools left join Exam
on Schools.capacity >= student_count
group by school_id
本题直接利用左连接的优秀性质可以很快解决,这里直接按照相同account_id
连接,连接过程中需要注意处理空值和筛选值的关系,因为这里直接使用on
是会将所有未连接成功的右表全部转为空值,但是这里明显不合适,我们需要先进行数据筛选,筛掉所有直接就是错误的值之后再做其他处理,所以这里我们先从左表将所有不属于2021年的账户全部筛选掉,我们只需要处理左表,因为我们没有全外连接,所以需要先处理一下,将会导致连表错误的值删掉,接着处理连表时候的空值状况,将2021的状态作为判断依据,若stream_date
不为2021年,则直接筛为空值。所以,最后提取值的时候,直接取右边空值的即可正常输出。
select count(*) as accounts_count
from Subscriptions left join Streams
on Subscriptions.account_id = Streams.account_id
and year(stream_date) = 2021
where year(end_date) = 2021
and Streams.account_id is null
开拓思维做这题,首先利用窗口函数筛选出所有至少有一个0的顾客,筛选完之后,做二次筛选,找到这类订单中有出现过1的顾客,找到后就得到了一个至少有1个0一个1的顾客类型。拿这个公共表和原表做外连接有两种情况,一种是非空值,即是公共表中所有成员,第二种是空值,则是不满足公共表的所有成员。对于第一种我们需要筛出来之后直接输出其所有类型1的订单输出即可,对于第二种不做要求,筛出来直接输出就满足条件。此思路不采用not in
或者exists
应当算效率较高。
with temp as (
select customer_id
from(select customer_id
, order_type
, sum(order_type=0) over(partition by customer_id) as sum_o
from Orders) as t1
where sum_o > 0 and order_type = 1
)
select order_id
, Orders.customer_id
, order_type
from Orders left join temp
on Orders.customer_id = temp.customer_id
where temp.customer_id is not null
and order_type = 0
union
select order_id
, Orders.customer_id
, order_type
from Orders left join temp
on Orders.customer_id = temp.customer_id
where temp.customer_id is null
这题开题第一步,对于表的内容进行剖析,判断出,需要输出的内容为salary
,则查询的最终目的就是salary
,直接开窗打标,但注意,样例2要求输出空值,需要在查询外面套上select
,针对第四个样例,做去重,防止输出多行报错,注意这里是 dense_rank()
,第八个样例会针对。
select (select distinct salary
from(select id
, salary
, dense_rank() over(order by salary desc) as rk
from Employee) as temp
where rk = 2) as SecondHighestSalary
这里同样提供第二种办法,利用计数办法进行求解,但这种思路极慢,不推荐使用。
select
(select distinct salary
from Employee as t1
where (select count(distinct salary) from Employee as t2 where t2.salary > t1.salary) = 1) as SecondHighestSalary
这题同上题一样,思路完全相同,就是需要注意funcation
的用法,难度也完全一样,这里给出funcation
内部的特殊写法和窗口函数的写法,以供参考,窗口函数的写法是和上题完全一致的。
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
RETURN (
select distinct salary
from(select salary
, dense_rank() over(order by salary desc) as rk
from Employee) as temp
where rk = N
);
END
这里在函数体外部进行变量宣称,之后进行变量设置,这里记得设置变量的大小,最后 limit 解决。
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
declare m INT;
set m = n - 1;
RETURN (
select distinct salary
from Employee
order by salary desc
limit m,1
);
END
注意排名是无空缺数字的,所以dense_rank()
直接秒杀。
select score
, dense_rank() over(order by score desc) as `rank`
from Scores
开窗打标,并且连表拿到每个部门内排名第一的人的部门、姓名和工资,直接输出结束。
select Department
, Employee
, salary
from(select Employee.name as Department
, Department.name as Employee
, salary
, rank() over(partition by departmentId order by salary desc) as rk
from Employee inner join Department on Employee.departmentId = Department.id) as temp
where rk = 1
这题和上题思路完全一致,需要注意的是,这题的工资排名要求是按照档位来算,没有空缺数字,所以应为dense_rank()
,直接套用上题代码,一步解决。
select Department
, Employee
, salary
from(select Employee.name as Department
, Department.name as Employee
, salary
, dense_rank() over(partition by departmentId order by salary desc) as rk
from Employee inner join Department on Employee.departmentId = Department.id) as temp
where rk <= 3
经典开窗打标,按照每天的日期大小从小到大进行排序,按照玩家 id
进行分类,每个玩家的当前日期的行及之前的行所有的玩过的游戏进行窗口求和,最后输出,一气呵成,经典题型了算是。
select player_id
, event_date
, sum(games_played) over(partition by player_id order by event_date) as games_played_so_far
from Activity
这题提供一个关联子查询的写法,效率极其之慢(4700ms),窗口函数只需要600ms,丝毫不推荐。
select player_id
, event_date
, (select sum(games_played) from Activity as a2 where a1.player_id=a2.player_id and a2.event_date<=a1.event_date) as games_played_so_far
from Activity as a1
首先按照公司进行分类,每个公司内部按照工资(从小到大排序,反过来就报错)开窗打标,排好序之后,找出中位数无非是最大值/2和最大值/2+1之间,判断后直接输出salary。
with temp as (select id
, company
, salary
, row_number() over(partition by company order by salary) as rk
from Employee)
, max_rk as (
select company
, max(rk) as mk
from temp
group by company
)
select id
, company
, salary
from temp
where rk in (case mod((select mk from max_rk where company=temp.company),2)
when 0 then floor((select mk from max_rk where company=temp.company)/2) end
, floor((select mk from max_rk where company=temp.company)/2)+1)
提供一种很牛的很简洁的算法,开两个窗,分别打标和计数,最后判断 rk 范围,当计数为奇数时,rk
取不到rk/2
和rk/2 + 1
但是可以取到rk/2 + 0.5
,刚好这种情况下中位数只有一个;偶数时,rk/2
取不到rk/2+0.5
,但是可以取到其他两个,浑然天成,写的非常好看,虽然时间和上面一样,但是明显就更为精简好看。
with temp as (select id
, company
, salary
, row_number() over(partition by company order by salary) as rk
, count(*) over(partition by company) as cnt
from Employee)
select id
, company
, salary
from temp
where rk in (cnt/2,cnt/2+0.5,cnt/2+1)
首先按照投票表的被选举人进行分组,继而根据总计数进行排名,外面套一层 select,取rk
为1,即票数最多者即可,后面根据投票人表中的id
和投票表中的candidateId
进行连表拿到人名,最后输出
select name
from(select candidateId
, rank() over(order by count(*) desc) as rk
from Vote
group by candidateId) as temp
inner join Candidate
on Candidate.id = temp.candidateId
where rk = 1
传统的开窗打标的题目,看到需要进行比较大小,第一时间想到分组,分组计算显示率和回答率,计算之后直接除得到结果,开窗打标拿到相应排名,取排名第一者question_id
。
select question_id as survey_log
from(select question_id
, row_number() over(order by sum(if(action='answer',1,0))/sum(if(action='show',1,0)) desc, question_id) as rk
from SurveyLog
group by question_id) as temp
where rk = 1
这题极其繁琐,用窗口函数写会省很多事情,意思很简单,就是先按照日期分组,再分别算出公司平均薪资和按部门分组的部门平均薪资,就算出来了每个月部门的平均薪资,然后在外面进行比较即可,最外层的case when
就是一个样式来判断出各种情况下的工资状况即可,代码量极其繁琐!
select distinct pay_month
, department_id
, case when amount_avg_dept > amount_avg_all then 'higher'
when amount_avg_dept = amount_avg_all then 'same'
else 'lower' end as comparison
from(select date_format(pay_date,'%Y-%m') as pay_month
, department_id
, avg(amount) over(partition by date_format(pay_date,'%Y-%m'), department_id) as amount_avg_dept
, avg(amount) over(partition by date_format(pay_date,'%Y-%m')) as amount_avg_all
from Salary inner join Employee
on Salary.employee_id = Employee.employee_id) as temp
这题按照筛选条件进行按洲分开,分开之后对于数据进行添加下标,最后根据下标进行拼接,注意在前面的设置坐标的时候,需要按照姓名的字母顺序进行排序,合并的时候,空rk
就返回空值,凭借完成即可。
with America_stu as (
select name as America
, row_number() over(order by name) as rk
from Student
where continent='America'
), Asia_stu as (
select name as Asia
, row_number() over(order by name) as rk
from Student
where continent='Asia'
), Europe_stu as (
select name as Europe
, row_number() over(order by name) as rk
from Student
where continent='Europe'
)
select America
, Asia
, Europe
from America_stu
left join Asia_stu
on America_stu.rk = Asia_stu.rk
left join Europe_stu
on America_stu.rk = Europe_stu.rk
这道题可以使用简便的计算比率的方式,直接调用avg()
函数,一步就可以算出占比结果,方便快捷。
select round(100*avg(order_date = customer_pref_delivery_date),2) as immediate_percentage
from Delivery
这题的做题思路很简单,就是使用窗口函数先进行每个人的首次订单的提取,提取之后,利用avg()
函数直接去算比率,这个avg
就很灵性,函数内部的前面相当于是算满足表达式的值,满足就加1,不满足就加0,后面的计数就是算满足条件的整个列表有多少个项目,整体而言,非常好,很惊喜的一种算法。
with temp as (select customer_id
, order_date
, customer_pref_delivery_date
, row_number() over(partition by customer_id order by order_date) as rk
from Delivery)
select round(100*avg(order_date=customer_pref_delivery_date),2) as immediate_percentage
from temp
where rk=1
这个题目同一年同一个产品会存在多个销售记录,所以直接跑聚合函数的时候,会丢失数据,因为聚合数据的时候会把同样的数据去个重,所以这里使用窗口函数也是为了规避报错,开窗打标之后拿到按产品分组并且按照月份排序的销售记录,取rk
为1的数据,也就是每个产品最小的售卖月份之后输出即可(记住这里无需对后续数据聚合)。
select product_id
, year as first_year
, quantity
, price
from(select product_id
, year
, quantity
, price
, rank() over(partition by product_id order by year) as rk
from Sales) as t1
where rk = 1
这题目标是找到最多每个项目最多工作经验的员工,直接先连表拿项目id
接着开窗打标,打标的依据就是工作经验降序,因为同样最大的需要一起输出,所以不用row_number()
而使用rank()
,排序结束之后直接rk=1
就可以拿到相关数据,输出即可。
with temp as (
select Project.project_id
, Employee.employee_id
, rank() over(partition by project_id order by experience_years desc) as rk
from Employee inner join Project
on Employee.employee_id = Project.employee_id
)
select project_id
, employee_id
from temp
where rk = 1
这个题目按照销售员进行分组,分组之后进行聚合计算销售总额,并根据销售总额进行排序,取最高的rk=1
,取到的销售员就是总销售额最高的。
select seller_id
from(select seller_id
, rank() over(order by sum(price) desc) as rk
from Sales
group by seller_id) as temp
where rk = 1
使用窗口函数按照student_id
进行分组,并且在分组中按照分数降序先排,出现相同的分数则按照course_id
进行顺序排序取较小值,在外部取rk = 1
,拿到学生最大分数及科目,最后排个序即可。
select student_id
, course_id
, grade
from(select student_id
, course_id
, grade
, row_number() over(partition by student_id order by grade desc, course_id) as rk
from Enrollments) as temp
where rk = 1
order by student_id
首先按照平均活动的逻辑算每种活动的平均值,这里已经给了event_type
作为活动,我们直接按这个分组算一下平均活动的occurences
值,但是由于我们使用聚合函数,很多业务和occurences
值会被聚合掉,所以我们采用窗口函数代替聚合函数,后面直接带这个occurences
值,一个记录内直接运算出occurences
值和平均活动occurences
值的大小关系,很是方便,继续往下算,由于我们得到了平均值,直接按照业务id
进行分组,当occurences > 平均值
则满足题中说的当前值大于平均活动的值,而当一个业务中这种大于关系的活动存在2个以上的时候,就满足活跃业务,所以直接用having
算出筛选结果即可。
select business_id
from(select business_id
, event_type
, occurences
, avg(occurences) over(partition by event_type) as avg_act
from Events) as temp
group by business_id
having sum(occurences > avg_act) >= 2
首先对数据进行清洗,题目要求是2019-08-16
,先进行一次数据筛选,把对应的时间数据取出来之后,做第一次处理,按窗口函数进行排序取到每个产品最大的日期,并且拿到该日期下的这天修动过的价格,,并且并上之前被删掉的未出现在现有表中的产品,价格即为10。
with new_products as (
select product_id
, new_price
from(select product_id
, new_price
, rank() over(partition by product_id order by change_date desc) as rk
from Products
where change_date <= '2019-08-16') as temp
where rk = 1
), Products as (
select distinct product_id
from Products
)
select Products.product_id
, ifnull(new_products.new_price,10) as price
from Products left join new_products
on Products.product_id = new_products.product_id
这题和上题一样,难度不大,主要需要看条件,不要漏掉已经给出的登上巴士的顺序,而像我一样算半天的重量反算,按照题目意思,首先使用窗口函数按照给出的顺序对重量进行求和,套一层循环判断加起来的重量小于等于给出的1000,写一个窗口函数进行取最大值,取到之后得出的结果即为最后一个不超过1000的重量的人,即最后一个能进入巴士的人。
select person_name
from(select person_name
, rank() over(order by sum_w desc) as rk
from(select person_name
, sum(weight) over(order by turn) as sum_w
from Queue) as temp
where sum_w <= 1000) as t1
where rk = 1
按照题意直接开窗打标,数出每个员工所在team
的人数,直接输出即可。
select employee_id
, count(*) over(partition by team_id) as team_size
from Employee
这道题引入了一个全新的range
窗口函数子句,这个子句可以对窗口进行逻辑值计算,在一段范围内的时间就都可以给过,所以这里,我们直接使用窗口函数开窗打标,找到每7天的累加和即可,但这里存在问题,就是我们需要取出第一个可以得到七天累加的日子,意指找到比第一天大6天的日子,这里我一开始使用的方法是窗口函数dense_rank()
,因为按照题目所说,每天都至少有一个顾客,我们只需要取标记为7号即可,但是,最后一个样例这样处理直接报错,我翻看了数据,发现原因在于最后一天的1号到7号之间缺少了2,3号,所以直接这样处理是存在漏洞的,我们要更换思维,既然我们想拿到第7天,所以我们直接窗口函数拿到最小的第一天,加个6就是第七天,只需在最外面筛选输出的时候,做一下判断即可,这里算出之后直接处理一下四舍五入,记得去重,因为同一天会存在多个顾客消费信息。
select distinct visited_on
, sum_amount as amount
, round(sum_amount/7,2) as average_amount
from(select visited_on
, sum(amount) over(order by visited_on range between interval '6' day preceding and current row) as sum_amount
, min(visited_on) over() as min_v
from Customer) as temp
where visited_on >= date_add(min_v, interval 6 day)
这道题要分开写,先写评论人,因为需要比较名字字典序,所以需要一开始就直接连表,拿到名字字典序之后,开始排序,先聚合名字,排序依据就是反向的评论个数,接着就是名字的字典序,排序结束之后取最大的个数的就可以了。接着进行并集操作,这里注意是需要做不去重的并集,因为会存在用户名和电影一样的情况,去重就会少值,开始算最多平均评分电影,首先先进性数据筛选,这里可以date_format()
处理,也可以像我一样,直接left()
处理一下,锁定一下2020年2月的时间,这里由于要比较电影名字的字典序,所以连表处理,接着按照电影名字分组、计算平均评分,接着窗口函数,拿到最大评分的电影,两表并集输出即可。
select name as results
from(select name
, row_number() over(order by count(*) desc, name) as rk
from MovieRating inner join Users
on MovieRating.user_id = Users.user_id
group by name) as t1
where rk = 1
union all
select title
from(select title
, row_number() over(order by avg(rating) desc, title) as rk
from MovieRating inner join Movies
on MovieRating.movie_id = Movies.movie_id
where left(created_at, 7) = '2020-02'
group by title) as t2
where rk = 1
按照题意得知该题需按照性别分组,计算逐天逐性别的累加分数和,由此选择直接采用窗口函数,按照性别进行分组,时间为顺序,依次求累加和即可。
select gender
, day
, sum(score_points) over(partition by gender order by day)as total
from Scores
order by gender, day
这道题要求既不是最多也不是最少,那么先分组算出每个活动人数,接着直接窗口函数,算出正排名代表最多的人数,反排名代表最少的人数,各都不取1得到既不最多也不最少,结尾输出即可。
select activity
from(select activity
, rank() over(order by count(*)) as rk1
, rank() over(order by count(*) desc) as rk2
from Friends
group by activity) as temp
where rk1 <> 1 and rk2 <> 1
提取所有需要的字段,并设置窗口函数按开始时间降序拿下标,同时另一个窗口函数拿到每个username
有几个活动,最后在最外面的查询中筛选数据,如果每个人只有一个活动,那么就输出这个活动,有两个及以上的,就输出最近第二个,解决后输出即可。
select username
, activity
, startDate
, endDate
from(select username
, activity
, startDate
, endDate
, count(*) over(partition by username) as cnt
, row_number() over(partition by username order by startDate desc) as rk
from UserActivity) as temp
where rk=if(cnt=1,1,2)
因为窗口函数有些情况性能会较低,所以这里提供一种减少了窗口函数的方法,会快一点。
我们将活动个数为1的和多个的分开,一个的不用担心聚合后只有一组数据,因为本来就只有一组,多个的自然可以输出第二个就更不用担心了,且当只有一个时,窗口函数取不到不会报空值,而是直接不输出,更加便捷,这种方法只用了一次窗口函数,要快一点。
select *
from UserActivity
group by username
having count(*)=1
union
select username
, activity
, startDate
, endDate
from(select username
, activity
, startDate
, endDate
, row_number() over(partition by username order by startDate desc) as rk
from UserActivity) as temp
where rk = 2
首先利用窗口函数进行开窗打标,拿到考试中任意科目最高或者最低分数的名字,由于我们不需要从来没有参加过测验的同学,所以直接连接考试表,拿到所有参加测验的同学再做相同查找即可,接着处理判断当前学生是否不在我们查出的最高分或者最低分的学生集合中,筛选完之后,直接连表得到学生名字输出即可。
with score_s as (
select student_id
from(select student_id
, rank() over(partition by exam_id order by score desc) as rk
from Exam) as temp
where rk = 1
union
select student_id
from(select student_id
, rank() over(partition by exam_id order by score) as rk
from Exam) as temp
where rk = 1
)
select distinct Student.student_id, student_name
from Exam inner join Student
on Exam.student_id = Student.student_id
where Student.student_id not in (select student_id from score_s)
order by Student.student_id
首先窗口函数算出每个公司最大的工资,接着外层循环按照想要的工资算出合适的税率比率,最后相乘算出结果输出即可。
select company_id
, employee_id
, employee_name
, round(salary * case when max_salary < 1000 then 1
when max_salary between 1000 and 10000 then 0.76
else 0.51 end) as salary
from(select *
, max(salary) over(partition by company_id) as max_salary
from Salaries) as temp
直接连表拿到对应的产品名字,接着按照产品名字进行分组 ,排序条件为订单日期的降序,取rk=1
拿到每个产品最近的订单,由于是存在多个订单,所以这里直接拿序号为1输出即可。
select product_name
, product_id
, order_id
, order_date
from(select product_name
, Orders.product_id
, order_id
, order_date
, rank() over(partition by product_name order by order_date desc) as rk
from Orders inner join Products
on Orders.product_id = Products.product_id) as temp
where rk = 1
order by product_name, product_id, order_id
这题和上题一样,连表拿到想要的数据之后,直接进行开窗打标,按照用户编码进行分组,分完组之后直接进行按时间倒序,这样按照从前到后就可以拿到依次最近的日期,取小于等于3时,就可以取到最近的3笔订单,即使没有足够3笔订单,也可以返回拥有的所有订单。
select customer_name
, customer_id
, order_id
, order_date
from(select name as customer_name
, Orders.customer_id
, order_id
, order_date
, row_number() over(partition by Orders.customer_id order by order_date desc) as rk
from Orders inner join Customers
on Orders.customer_id = Customers.customer_id) as temp
where rk <= 3
order by customer_name, customer_id, order_date desc
首先按照用户和产品进行分组聚合,并以此算出每个用户购买每个产品有多少条数据是相同的,意指该用户购买了相同的商品多少次,并以此为排序依据放在窗口函数的排序函数中,拿到每个用户购买相同产品的次数的排序,取次数最多的记录,意指每个用户购买最多的产品,连表拿到用户名和产品名输出即可。
select temp.customer_id
, temp.product_id
, product_name
from(select customer_id
, product_id
, rank() over(partition by customer_id order by count(*) desc) as rk
from Orders
group by customer_id, product_id) as temp inner join Customers on Customers.customer_id = temp.customer_id
inner join Products on Products.product_id = temp.product_id
where rk = 1
这道题直接用窗口函数处理即可,首先使用lead()
函数构建辅助列,将左侧的visit_date
直接依次上移一位作为右侧的辅助列,由此可以拿到两个最近的日期,计算两者相差可以得到日期之间的空挡,因为lead()
函数我们设定的模式是一次上移一位,因此会存在原表每个用户最后一个lag()
值日期为空的情况,因此我们做一次去空值,将空值替换成给出的今天的日期,这样后续计算就不会产生空值了,当每个人的日期为最后一天时,默认取今天日期,所以最后一天和今天之间也可以产生一段空档期满足题意,到外层循环直接分组计算每个人的最大空档期即可。
注意这里的空档期不可以命名为
window
,因为其是系统的关键字。
select user_id
, max(lag_d) as biggest_window
from(select user_id
, datediff(ifnull(lead(visit_date, 1) over(partition by user_id order by visit_date),'2021-1-1' ), visit_date) as lag_d
from UserVisits) as temp
group by user_id
直接窗口函数进行排序,按照每日进行分组并且按照每日交易价格进行倒序排序,取到最大值输出后注意外部排序顺序即可。
select transaction_id
from(select transaction_id
, rank() over(partition by day(day) order by amount desc) as rk
from Transactions) as temp
where rk = 1
order by transaction_id
题目要求,只筛选一个队伍内至少有两个人的,所以先使用窗口函数count()
,按工资分组,外层循环判断计数项必须大于1才满足至少两个人的需求,第二点需要输出相同工资的队伍id
,这里很好处理,还是直接dense_rank()
无缺空数字排序输出即可(按照工资排序时,相同工资排序数相同,且排序函数无空值,所以这里不会出现少队伍编号的问题)。
select employee_id
, name
, salary
, dense_rank() over(order by salary) as team_id
from(select *
, count(*) over(partition by salary) as cnt
from Employees) as temp
where cnt > 1
order by team_id, employee_id
本题的表述很模糊,本意为找到每个订单的平均数量,并且找到这个数量的最大值作为公用和每个订单的最大数量进行比较,找到最大数量大于最大平均数量的订单输出即可。
select distinct order_id
from(select order_id
, max(avg_q) over() as max_avg_q
, max_q
from(select order_id
, avg(quantity) over(partition by order_id) as avg_q
, max(quantity) over(partition by order_id) as max_q
from OrdersDetails) as temp) as t1
where max_q > max_avg_q
按照题目意思,分用户按照时间由小到大进行累加即可,且累加值为当前行前面所有行,这里sum
里面嵌套了一个if
,用以判断当前的type
类型,当为Deposit
,则是存款,加入收入,反之则为取钱,要减少余额,算出后输出即可。
select account_id
, day
, sum(if(type='Deposit',amount,-amount)) over(partition by account_id order by day rows between unbounded preceding and current row) as balance
from Transactions
首先按照不同性别进行分组排序,接着直接外层查询,直接按照rk
排序拉通,又因为这里的三种性别的顺序刚好和其ASCII
码位置相同,所以直接就可以拉通排序,很是巧妙,当然这种思路有点取巧,下面给一种常规思路以供参考。
select user_id
, gender
from(select user_id
, gender
, rank() over(partition by gender order by user_id) as rk
from Genders) as temp
order by rk, gender
正常的思路就是构建一个性别数字对照表,按照表进行对照再排序即可,大致思路和上面一样,只是多了一个建立对照表的部分。
with temp as (
select 'female' as sex
, 1 as round
union
select 'other'
, 2
union
select 'male'
, 3
), all_genders as (
select user_id
, gender
, round
, rank() over(partition by gender order by user_id) as rk
from temp inner join Genders
on Genders.gender = temp.sex
)
select user_id
, gender
from all_genders
order by rk, round
按照题目给的时间条件,乘客上车必须要比车来的早,即车的时间小于乘客时间tpassenger <= tbus
,所以按照这个条件进行非等值连接,表示乘客在这些车到来之前就到了,由于乘客是来车就走,就直接排序拿到rk = 1
的第一项,这里就拿到了每个乘客上的车,按照车分组做计数,并在外部查询连车表拿到所有的空值去空值,排序输出即可。
select Buses.bus_id
, ifnull(passengers_cnt,0) as passengers_cnt
from(select bus_id
, count(*) as passengers_cnt
from(select passenger_id
, bus_id
, row_number() over(partition by passenger_id order by Buses.arrival_time) as rk
from Buses inner join Passengers
on Buses.arrival_time >= Passengers.arrival_time) as temp
where rk = 1
group by bus_id) as t1 right join Buses
on Buses.bus_id = t1.bus_id
order by bus_id
分开做两个cte
分别把两列取出来分别按题目要求窗口排序,排好序后,按照同样的序号连接直接输出即可。
with first as (
select first_col
, row_number() over(order by first_col) as rk
from Data
), second as (
select second_col
, row_number() over(order by second_col desc) as rk
from Data
)
select first_col
, second_col
from first inner join second
on first.rk = second.rk
直接按照城市分组并温度和日期开始排序,提取到排序为1的数据输出即可。
select city_id
, day
, degree
from(select city_id
, day
, degree
, rank() over(partition by city_id order by degree desc, day) as rk
from Weather) as max_de
where rk = 1
利用两个窗口函数分别写出,部门排序和部门人数,算出比率之后去空值并四舍五入后输出即可。
select student_id
, department_id
, ifnull(round(100*(rk1-1)/(cnt-1),2),0) as percentage
from(select student_id
, department_id
, rank() over(partition by department_id order by mark desc) as rk1
, count(*) over(partition by department_id) as cnt
from Students) as temp
先写一个cte
找到投票人的投票数,接着按照这个投票数进行累加算出每个选举人的得票情况,并以此作为排序依据,得到得票数最高的人,输出即可。
with points as (
select voter
, ifnull(1 / count(candidate) over(partition by voter),0) as pt
, candidate
from Votes
)
select candidate
from(select candidate
, rank() over(order by sum(pt) desc) as rk
from points
group by candidate) as temp
where rk = 1
order by candidate
rank()
计算结果的相减——转换类型 首先按照题意开窗打标,先排出所有的国家之前的排名情况,接着连表按国家累加,算出当前的分数,直接排窗口函数,二次排序算出当前的序号,最后直接两个序号相减,算出当前排名变化。这里的难点在于rank()
是unsigned
类型,不可变为负数,所以使用cast()
函数转为有符号数据类型,接着就可以相减了,减完了输出即可。
with rk_old as (
select team_id
, name
, points
, row_number() over(order by points desc, name ) as rk
from TeamPoints
), rk_new as (
select rk_old.team_id
, rk_old.name
, cast(rk as signed) - cast(row_number() over(order by points + points_change desc, rk_old.name) as signed) as rank_diff
from rk_old inner join PointsChange
on rk_old.team_id = PointsChange.team_id
)
select *
from rk_new
同样的开窗打标,经典题型,直接dense_rank()
秒了,但是案例给的还是比较恶心,会卡去重等一系列问题,题目比较简单,就是案例恶心人,要注意。
select distinct num as ConsecutiveNums
from(select id
, num
, dense_rank() over(partition by num order by id) as rk
from Logs) as temp
group by num,(id - cast(rk as signed))
having count(*) >= 3
这道题的难点在于不定长连续数据的处理,首先第一步先cte语法
排除掉不符合人数的数据,拿到数据之后,第二步根据id
和rk
同样的递增连续关系,可以得到相应的每一个 id 下会存在连续数对的长度,排除长度小于3的,剩下的就是我们需要的,注意,这里排除的方式我用了窗口函数的count()
可以随窗口的变化而改变,不至于聚合之后缺失一定的项目值,而且一次比较即可,不需要再内嵌一个判断作用的子查询,判断完长度为3以上就可以直接输出。
with temp as (
select id
, visit_date
, people
from Stadium
where people >= 100
)
select id
, visit_date
, people
from(select id
, visit_date
, people
, count(*) over(partition by (id-rk)) as cnk
from(select id
, visit_date
, people
, dense_rank() over(order by id) as rk
from temp) as t1) as t2
where cnk>=3
order by visit_date
跟上题思路完全一致,使用对数据处理,去掉非空闲座位之后,重新进行套公式,内部算编号和排名之差,外部按照计数进行判断之后取出符合条件长度的座位,输出即可。当然这里的count()
使用窗口函数也是为了减少一重循环的判断。
with free_seat as (
select seat_id
from Cinema
where free = 1
)
select seat_id
from(select seat_id
, count(*) over(partition by (seat_id-rk)) as cnt
from(select seat_id
, dense_rank() over(order by seat_id) as rk
from free_seat) as t1) as t2
where cnt>=2
order by seat_id
这道题就是常规连续问题的思路,首先两种情况分开算,并且做数据筛选,筛选结束后做辅助列,接着利用常规方法进行计算,算出每个连续的开头日期和日期计数值,然后加起来减一天就是结束日期,并起来之后注意排序即可输出。
select 'failed' as period_state
, fail_date as start_date
, date_add(fail_date,interval count(*)-1 day) as end_date
from(select fail_date
, dense_rank() over(order by fail_date) as rk
from Failed
where fail_date between '2019-01-01' and '2019-12-31') as temp
group by date_sub(fail_date, interval rk day)
union
select 'succeeded'
, success_date
, date_add(success_date,interval count(*)-1 day)
from(select success_date
, dense_rank() over(order by success_date) as rk
from Succeeded
where success_date between '2019-01-01' and '2019-12-31') as temp
group by date_sub(success_date, interval rk day)
order by start_date
经典的连续区间问题,利用数字和dense_rank()
做减法,因为同样的一段连续数字,在面对同样递增的dense_rank()
的时候,递增的差值一样,则相减返回的结果也一定相同,由此,我们可以算出一段区间大小,算出后,推一下开始和结束刚好差计数值减一个数字,加上这个,直接解决。
select log_id as start_id
, log_id+count(*)-1 as end_id
from(select log_id
, dense_rank() over(order by log_id) as rk
from Logs) as temp
group by (log_id - rk)
首先利用处理不定长问题的常规方法,套用dense_rank()
并利用递增的login_date
和rk
从而得到一组相同差值的数据,这组数据就是我们算出的连续数据,对该组数据进行计数,计数项大于等于5,则表示同一个id
至少有5次连续的登录,但这5次连续的登陆不能保证每天只有一次,所以写一个cte
,去重空值每一天只有一次登录,最后得到的结果联立用户表,得到用户姓名输出即可。
with Logins as (
select distinct id
, login_date
from Logins
)
select distinct temp.id
, name
from(select id
, login_date
, dense_rank() over(partition by id order by login_date) as rk
from Logins) as temp inner join Accounts
on temp.id = Accounts.id
group by temp.id,date_sub(login_date,interval rk day)
having count(*) >= 5
本题首先按提取日期得到的月份筛选出所有账户的收入,算出每月的收入,再写一个查询调用前面这个表并和账户表连接,筛选出某人某月份收入大于最大收入留作备用,最后直接对得来的数据进行前面题目常做的连续性判断,利用dense_rank()
特有的递增性质以及连续月份的相同递增性质,进行筛选,取计数值大于2,则表示至少有一段2个月以上的可疑收入,需要再做去重计算,防止出现同一个账户因为多段可疑收入而多次输出。
with temp as (
select account_id
, month(day) as mon
, sum(if(type='Creditor', amount, 0)) as sum_amount
from Transactions
group by 1 ,2
), t1 as (
select temp.account_id
, mon
, sum_amount
from temp inner join Accounts
on temp.account_id = Accounts.account_id
where sum_amount > max_income
order by account_id, mon
)
select distinct account_id
from(select account_id
, mon
, dense_rank() over(partition by account_id order by mon) as rk
from t1) as t2
group by account_id, (mon - rk)
having count(*) >= 2
该题分两种情况做,第一种是拿3个金牌,第二个是连续拿3个任意奖牌,针对第一种情况,我们直接跑一个隐式连表,拿到每个人的金牌数,判断3个金牌的人;第二种情况,我们首先按照题意筛选出所有得过奖牌的人的名单以及比赛场次,接着按照处理连续性问题的一贯思路直接处理判断计数值大于3,两种情况做并集即可输出。
with temp as (
select contest_id
, user_id
, dense_rank() over(partition by user_id order by contest_id) as rk
from Users, Contests
where gold_medal = user_id or silver_medal = user_id or bronze_medal = user_id
), t1 as (
select user_id
from Users, Contests
group by user_id
having sum(gold_medal = user_id) >= 3
union
select user_id
from temp
group by user_id, (contest_id - rk)
having count(*) >= 3
)
select name
, mail
from t1 inner join Users
on t1.user_id = Users.user_id
本题本地测试样例有问题,同样的代码本地运行全部报错,直接提交即可。
本题逻辑较为简单,首先Listens
表自连接,连接条件是不为同一个user_id
(保证不为同一个人),且要求是同一天听歌所以连接条件还有同样的日期,第三点就是同样的歌,并按照这个进行分组,接着进行聚合运算,算出同两个人同时听的不同歌,这里加去重是因为防止同样的两个人同一天听相同的歌,所以这里去重后直接计算计数值即可,这里直接判断计数值大于等于3,则保证这两个人肯定听了三首相同的歌。
下一步,取出所有的好友关系,因为原表的好友关系进行了设定,只有一部分,所以这里直接做并集拿到所有的好友关系,后面的处理为排除掉所有好友关系的推荐,有两种思路,一种是直接not in
,判断出推荐好友和被推荐者不属于好友关系,但这样效率不高,所以我们这里直接处理成左连接,当推荐的好友和被推荐者都不是好友关系时,意为这个元组从未在好友关系中出现过,即会出现游标空值,这里直接取all_friends.user1_id
为空值即可正确取出所有的非好友的推荐关系,输出即可。
with temp as (
select l1.user_id as x
, l2.user_id as y
from Listens as l1 inner join Listens as l2
on l1.song_id = l2.song_id
and l1.user_id <> l2.user_id
and l1.day = l2.day
group by l1.user_id, l2.user_id, l2.day
having count(distinct l1.song_id)>=3
), all_friends as(
select user1_id
, user2_id
from Friendship
union
select user2_id
, user1_id
from Friendship
)
select distinct temp.x as user_id
, temp.y as recommended_id
from temp left join all_friends
on temp.x = all_friends.user1_id
and temp.y = all_friends.user2_id
where all_friends.user1_id is null
本题思路和上题完全一致,唯一区别是在结尾输出限制了user1_id < user2_id
,第二个就是由于本题是要求找友谊关系,所以本题需要保证两者为友谊关系,所以需要找非空值(上题为找空值,因为上题是找非好友关系)。
with temp as (
select l1.user_id as x
, l2.user_id as y
from Listens as l1 inner join Listens as l2
on l1.song_id = l2.song_id
and l1.user_id < l2.user_id
and l1.day = l2.day
group by l1.user_id, l2.user_id, l2.day
having count(distinct l1.song_id)>=3
), all_friends as(
select user1_id
, user2_id
from Friendship
union
select user2_id
, user1_id
from Friendship
)
select distinct temp.x as user1_id
, temp.y as user2_id
from temp left join all_friends
on temp.x = all_friends.user1_id
and temp.y = all_friends.user2_id
where all_friends.user1_id is not null
这道题是变式的连续问题,准确的来说叫相同好友问题,存在几个人好友一样需要我们找出来。首先取得所有的好友关系,第二,好友表进行自连接,通过两个人的好友关系进行连接,用户不一样,好友一样则满足要求,连接成功之后分组算出计数项大于等于3,即不同的人(不一定是好友关系)有3个及以上的好友,接着再次和好友表进行内连接,内连接可以排除所有的非好友关系,所以直接输出即可。
with all_friends as (
select user1_id as user_id
, user2_id as friend_id
from Friendship
union
select user2_id
, user1_id
from Friendship
), good_friends as (
select a1.user_id as user1_id
, a2.user_id as user2_id
, count(*) as common_friend
from all_friends as a1 inner join all_friends as a2
on a1.user_id < a2.user_id
and a1.friend_id = a2.friend_id
group by a1.user_id, a2.user_id
having count(*) >= 3
)
select user1_id
, user2_id
, common_friend
from good_friends inner join all_friends
on all_friends.user_id = good_friends.user1_id
and friend_id = user2_id
由于本题要求是关注者和被关注者的单项关系,并不是双向关系,所以这里我们无需制作公共表,直接按上题处理即可,直接进行连表,找到同样的好友关系,并直接进行分组计数,找到不同用户有多少相同好友,直接开窗打标,按照计数项的倒序排序,在外层查询直接拿到最大计数项的记录即可。
select user1_id
, user2_id
from(select a1.user_id as user1_id
, a2.user_id as user2_id
, rank() over(order by count(*) desc) as rk
from Relations as a1 inner join Relations as a2
on a1.user_id < a2.user_id
and a1.follower_id = a2.follower_id
group by 1, 2) as temp
where rk = 1
该题首先进行排序,将所有的选手的比赛按时间进行排序,拿到所有胜利的比赛,剩下的我们排序的rk
此时就被我们构造成了一个连续id
问题,直接dense_rank()
打标签,然后按照每个运动员,以及两个窗口函数的差值进行分组,分组后计算数字,找到每个人的连胜记录,直接count(*)
找到一个id
一段连胜次数,最后外面套大查询,找到每个人最多连胜次数输出即可。
with player as (
select distinct player_id
from Matches
)
select player.player_id
, ifnull(max(longest_streak),0) as longest_streak
from(select player_id
, count(*) as longest_streak
from(select player_id
, rk
, dense_rank() over(partition by player_id order by rk) as rk1
from(select player_id
, row_number() over(partition by player_id order by match_day) as rk
, result
from Matches) as temp
where result = 'Win') as t1
group by player_id, (rk-rk1)) as t2 right join player
on player.player_id = t2.player_id
group by player.player_id
首先分组提取每个一年内有三个以上订单的产品,拿到后就是当作传统的连续问题处理,写一个dense_rank()
拿到排序,接着因为年份递增,所以直接分组,拿到count()
计数项大于等于2,即连续两年,则输出即可。
with t1 as (
select distinct product_id
, year
from(select product_id
, year(purchase_date) as year
, quantity
from Orders
group by product_id, year
having count(*) >= 3) as temp
)
select distinct product_id
from(select product_id
, year
, dense_rank() over(partition by product_id order by year) as rk
from t1) as t2
group by product_id, (year-rk)
having count(*) >= 2
首先,按照用户以及年份开始分组,分组的时候进行一个窗口函数的排序,这里排序的依据是这一年的购买总和,由于我们需要求的购买量是递增的,而且这个年份也一定是递增的,所以这里就存在一个情况,就是我们要利用双递增的条件来进行做一般的连续问题的处理,思路就是利用窗口的排序函数去进行辅助列的添加,这一次既然又有一个递增关系,我们利用三个递增关系来反复做,首先利用排序的递增关系讲年份的递增和这个绑定做一个count拿到同一个用户的一段连续且严格递增购物量的消费记录,拿到这个记录之后我们在外层查询去做一个分组去检测这一段,每一个用户的这一段严格递增的购物量的消费记录是否就是这个用户完整的消费记录?也就是说,他这一段的严格递增是不是在整个用户所有的消费记录中是完美匹配的?利用这个条件,我们就可以筛选出所有购买量随时间严格增加的用户十分巧妙!
select customer_id
from(select customer_id
, cnt
from(select customer_id
, count(*) over(partition by customer_id, (year-rk)) as cnt
from(select customer_id
, year(order_date) as year
, dense_rank() over(partition by customer_id order by sum(price)) as rk
from Orders
group by customer_id, year) as temp ) as t1
group by customer_id
having count(*) = cnt) as t2
首先,按照顾客进行分组并且按时间排序,按照处理一般连续问题的思路处理并且按count(*)
排序,取排序最大者再对customer_id
输出即可。
select customer_id
from(select customer_id
, rank() over(order by count(*) desc) as rk1
from(select customer_id
, transaction_date
, dense_rank() over(partition by customer_id order by transaction_date) as rk
from Transactions) as temp
group by customer_id, date_sub(transaction_date, interval rk day)) as t1
where rk1 = 1
order by 1
这道题可以堪称是连续问题上最难的一道题处理思路,也相对来说需要很强的逻辑,我们一点点来分析。首先,我们构建一个公共表,这个公共表的内层是基于原表去分组,根据顾客进行分组,并按照时间进行排序,由小到大排序,并且将amount,建立一个新的辅助列辅助列是将原来的列依次往下下移一位,并且让可能变为空值的第一列转为零,建立该表之后,我们用窗口函数的连续时间算法去写出判断出当前时间需要是连续的,并且将前面我们算到的所有值以及我们作为一个时间连续的参量的两个时间相差的数值进行输出。
在上一步输出之后,我们接着对数据进行处理。首先,我们设置包装点,包装点的意思就是当amount的值大于上个月amount的值,也就是last的时候,我们就给他零的值,反之则给一将所有的一点作为我们这里的包装点,按照包装点的上下来区分不同的递增状态的情况,最后按照前面给的所有数据进行分组,分完组之后找到最大时间和最小时间,并且判断出时间间隔要大于二,也就是在3及以上,这时输出即可
with temp as (
select *
from(select customer_id
, transaction_date
, amount
, last
, date_sub(transaction_date,interval rk1 day) as date
, count(*) over(partition by customer_id, date_sub(transaction_date,interval rk1 day)) as cnt
from(select customer_id
, transaction_date
, amount
, row_number() over(partition by customer_id order by transaction_date) as rk1
, lag(amount,1,0) over(partition by customer_id order by transaction_date) as last
from Transactions) as t1) as t2
where cnt >= 3
)
select customer_id
, min(transaction_date) as consecutive_start
, max(transaction_date) as consecutive_end
from(select customer_id
, date
, transaction_date
, sum(if(amount>last,0,1)) over(partition by customer_id order by transaction_date) as flag
from temp) as t3
group by customer_id, date, flag
having count(*) > 2
找出在Person
里,但是和自己本身不一样(id
不同)的相同邮箱,记得去重。
select distinct email as Email
from Person as t1
where email in (select email from Person as t2 where t1.id <> t2.id)
找出在订购过的人员中不存在的人,这个人就是从不订购的,exists
是为了提高速度,跑起来更快。
select name as Customers
from Customers
where not exists (select 'x' from orders where orders.customerId = Customers.id)
这里写了一个公共表cte 语法
,让代码更加清晰一些,这道题的难度就是一个小的关联子查询,找出所有没有被禁止的用户(包括司机和乘客),之后分开算个数就好了,这里的sum()
可以嵌套一个判断语句,让代码更加简洁。
with trips_noban as (
select *
from Trips
where client_id not in (select users_id from Users where banned = 'Yes')
and driver_id not in (select users_id from Users where banned = 'Yes')
and request_at >= '2013-10-01' and request_at <= '2013-10-03'
)
select request_at as Day
, ifnull(round(sum(if(status <> 'completed', 1, 0))/count(*) ,2),0) as `Cancellation Rate`
from trips_noban
group by request_at
两个关联子查询判断当前的15投资在其他15投资中出现过,以及地理位置没有出现过即可。
select round(sum(tiv_2016),2) as tiv_2016
from Insurance as i1
where tiv_2015 in (select tiv_2015 from Insurance as i2 where i1.pid <> i2.pid)
and (lat, lon) not in (select lat, lon from Insurance as i3 where i1.pid <> i3.pid)
提供一种存在性判断的写法,如下:
select round(sum(tiv_2016),2) as tiv_2016
from Insurance as i1
where exists (select 'x' from Insurance as i2 where i1.pid <> i2.pid and i1.tiv_2015 = i2.tiv_2015)
and not exists (select 'x' from Insurance as i3 where i1.pid <> i3.pid and i1.lon = i3.lon and i1.lat = i3.lat)
小数据看不出快慢很正常,大数据才能看出存在性判断的速度。
这题反向思维,先写cte
连接公司表和订单表,判断出哪些销售员参与了和 ‘RED’ 有关系的订单,然后在查询中做where
子句拿到不在这群人中的其他人,就为目标人选,使用存在性判断,可以加快运行速度。
with temp as (select sales_id
from Orders inner join Company
on Orders.com_id = Company.com_id
where Company.name = 'RED')
select distinct SalesPerson.name
from SalesPerson left join Orders
on SalesPerson.sales_id = Orders.sales_id
where not exists (select * from temp where temp.sales_id = SalesPerson.sales_id)
这题和这类题的其他题目差不多,先连表,连表之后直接判断满足条件还是不方便,因为只能判断当前记录是否满足条件,不能判断所有相同产品,所以直接反向判断,只要有不满足这个条件的拿出来,产品不在当前不满足该条件的查询中,则满足我们的条件,直接输出即可。
select distinct Product.product_id
, product_name
from Sales inner join Product
on Sales.product_id = Product.product_id
where Product.product_id not in (select product_id from Sales where sale_date < '2019-01-01' or sale_date > '2019-03-31')
这里加一个知识点有关于聚合之后想取空值,需要注意聚合之后想要直接连表拿空值的要求是当前层查询不可以有
where
否则就有可能因为筛选值把空值筛掉,最好的方法就是外层套查询连表,不会报错。
这题直接聚合player_id
,求每个人的最小登陆时间就是第一次登录时间。
select player_id, min(event_date) as first_login
from Activity
group by player_id
这题代码相对来说会长很多,思路其实很简单,先找出公式中两段的意义,第一段是 click+view
的合计为0,就是说没有点击和观看时,点击播放为0,第二段为当click + view
不为0时,输出click
在click + view
中的占比, 这样看来我们只需要计算action = 'Clicked '
和 'Views'
时的计数就可以了,用一个嵌套if
的sum
分别计算相应的计数,这里不可以写count()
因为count()
内部不可以嵌套一个表达式,而sum
可以,算出结果求个商就可以了,还需要去空值和取小数。
select ad_id
, ifnull(round(sum(if(action = 'Clicked', 1, 0))*100 /(sum(if(action = 'Clicked', 1, 0)) + sum(if(action = 'Viewed', 1, 0))),2),0) as ctr
from Ads
group by ad_id
order by ctr desc, ad_id
找准输出的结果和经理有关,锁定为managerId
列,先进行分组,按照managerId
进行分组,并需要求出数量大于5(判断下属数量 > 5),得到之后,直接连表,在第二张表得到经理id
对应得到经理名字。
select t2.name
from Employee as t1 inner join Employee as t2
on t1.managerId = t2.id
group by t1.managerId
having count(*)>=5
先进行分组运算,算出每个部门编号下的学生人数,之后发现需要输出部门名字,再进行连表,条件为相同的部门编号,得到部门名称,直接输出即可。
select dept_name
, count(student_name) as student_number
from Department left join Student
on Department.dept_id = Student.dept_id
group by dept_name
order by student_number desc, dept_name
根据客户编号进行分组,并利用分组后各客户的订单数量进行开窗打标,之后套个外循环,判断排名第一的输出即可。
select customer_number
from(select customer_number
, row_number() over(order by count(*) desc) as rk
from Orders
group by customer_number) as temp
where rk = 1
按课程分组,count()
数出当前人数,判断人数至少为5的课程即可。
select class
from Courses
group by class
having count(*) >= 5
这题先使用cte 语法
获得两边所有用户及好友,第二步分组进行计数,并使用计数和进行开窗打标,得出最多好友的id
,直接输出即可。(相同类型题目:586)
with all_friend as (
select requester_id as user
, accepter_id as friend
from RequestAccepted
union
select accepter_id
, requester_id
from RequestAccepted
)
select id
, num
from(select user as id
, count(*) as num
, row_number() over(order by count(*) desc) as rk
from all_friend
group by user) as t1
where rk = 1
这题不要被迷惑了,其实很简单,依照题意,表中分两类,第一类是关注者,第二类是被关注者,关注者在左边,被关注者在右边,直接cte
两个表单独拿出来,题目中要求关注者关注的被关注者数目,直接拿左边那个子表做处理,同时判断关注者是否是也是其他人的被关注者,直接判断每个名字是否也在右边子表中即可,对处理好的人名做聚合,直接count()
算出每个人关注的被关注者的数量即可(字有点多,大意就是,分别把两边做成公共表,然后对左边的聚合求个数即可)。
with flee as (
select followee
from Follow
), fler as (
select follower
from Follow
)
select followee as follower
, count(*) as num
from flee
where followee in (select * from fler)
group by followee
order by follower
先聚合得出只出现一次的数字有哪些,接着直接求最大的max()
得出结果即可。
select max(num) as num
from(select num
from MyNumbers
group by num
having count(*) = 1) as temp
这道题的目标就是在顾客表中找到购买产品数量为所有产品数量的顾客,也就是直接和产品表中所有的产品种类比较相等即可,分组再计数就可以解决。
select customer_id
from Customer
group by customer_id
having count(distinct product_key) = (select count(*) from Product)
依照题意对演员和导演进行分组,计数结果大于等于3则满足至少三次合作,输出结果即可。
select actor_id
, director_id
from ActorDirector
group by actor_id, director_id
having count(*) >= 3
直接聚合产品id
,聚合后对销售量进行求和即可。
select product_id
, sum(quantity) as total_quantity
from Sales
group by product_id
按照条件先连表拿到项目id
,以及直接分组拿到对应的项目中的平均年限即可输出。
select Project.project_id
, round(avg(experience_years), 2) as average_years
from Employee inner join Project
on Employee.employee_id = Project.employee_id
group by Project.project_id
和上题思路完全一样,只是加了一个窗口函数,分组后依据计数个数直接进行排序,得到计数最大值就是最多人的项目。
select project_id
from(select project_id
, rank() over(order by count(*) desc) as rk
from Project inner join Employee
group by project_id) as temp
where rk = 1
本题的核心是找到每个用户第一天的日期并将他组建成一个新的公共表,这里的处理方法是对每个用户直接进行聚合,聚合后得到的结果就是我们得到的每个用户登录第一天的日期结果。将得到的表和原表进行连接,连接条件是同一个用户以及日期是前大后小,且刚好差一天,得到表之后,对于每一天进行聚合运算,得出结果即可。求第一天留存率这里可以写一个avg
,里面嵌套一个表达式,满足返回1,不满足返回0.也可以恰好满足效果,非常好看。
with a1 as (
select player_id
, min(event_date) as event_date
from Activity
group by player_id
)
select a1.event_date as install_dt
, count(*) as installs
, round(avg(a2.event_date is not null) ,2) as Day1_retention
from a1 left join Activity as a2
on a1.player_id = a2.player_id
and datediff(a2.event_date, a1.event_date) = 1
group by a1.event_date
order by install_dt
这题方法有窗口函数和聚合函数两种,都是可以直接找出每个用户登录的首次日期,并且根据这个日期进行分组,找出每天具体登陆了多少人计数一下就可以了。这题恶心的地方在于,首天登录的用户可能第一个进行的活动不是登录而是运行其他活动,就会很耽误监测数据,这题逻辑出的很差,很不合适,题型倒是很简单,就是数据筛选恶心人。
select activity_date as login_date
, count(*) as user_count
from(select user_id
, activity_date
from(select user_id
, activity
, activity_date
, row_number() over(partition by user_id order by activity_date) as rk
from Traffic
where activity = 'login') as t1
where rk = 1 and datediff('2019-06-30', activity_date) <= 90) as t2
group by login_date
这道题的关键是找到extra
列里的值进行分类,分完类之后常规的习惯是直接count(*)
但是这样会存在重复性数据,因为同样的一篇文章只算一次举报,所以我们要直接加上去重语句,直接对去重之后的post_id
进行计数就可以了,这里还存在一个误区,也是笔者这里所犯的,就是直接去对extra
做操作,但是并不是所有的action
都没有对应的extra
,love
就是例外,所以我们还要对action
进行限制,让它只可以是report
举报,这才满足题意,小题目的坑往往也不少,还是要注意。
select extra as report_reason
, count(distinct post_id) as report_count
from Actions
where action_date = '2019-07-04'
and extra is not null
and action = 'report'
group by report_reason
这道题首先先将两表连接,接着筛选出垃圾邮件,直接算出当前状态下是垃圾邮件连接后的删除表post_id
不是空值的占比,也就是说有多少个是被标记为垃圾邮件并且删了的就是非空值,反之连接的时候就是空值,得到之后外面套个大的查询语句算平均值即可。这题需要注意同一个帖子可能出现多个不同user_id
的情况,需要先拿出来做个cte
公共表去个重,保证一个post_id
只会被标记一次垃圾邮件,后面就不会出问题了。
with Actions as (
select distinct post_id
, action_date
, action
, extra
from Actions
)
select round(avg(avg0)*100,2) as average_daily_percent
from(select action_date, avg(Removals.post_id is not null) as avg0
from Actions left join Removals
on Actions.post_id = Removals.post_id
where extra = 'spam'
group by action_date) as temp
这题进行数据筛选把日期内的数据拿完,接着分组之后对不同的用户进行计数,需要知道一个user_id
可能产生不同的session_id
,所以计数时要计算的是user_id
而不是session_id
,输出完成即可。
select activity_date as day
, count(distinct user_id) as active_users
from Activity
where activity_date between '2019-06-28' and '2019-07-27'
group by activity_date
having active_users <> 0
这题就是直接根据题目意思聚合,即按用户算某一天,分类之后直接算个数,个数满足直接输出,这里还存在用户和看的文章都会重复的情况,所以都需要去重。
select distinct viewer_id as id
from Views
group by viewer_id,view_date
having count(distinct article_id) >= 2
这题直接右外连接把两个表连在一起,连表结束后对用户做分组对判断条件求和,满足条件返回1,不满足返回0,比常规的count()
要好用很多,而且可以避免报错,这里的主要问题是分组的时候需要注意别名,因为原表内存在和别名相同的属性,所以直接用别名分组会导致报错,所以只能对别名前的本名进行分组,求和后输出即可,记得去空值。
select user_id as buyer_id
, join_date
, ifnull(sum(year(order_date)=2019),0) as orders_in_2019
from Orders right join Users
on Users.user_id = Orders.buyer_id
group by user_id
这道题是典型的窄表变宽表问题,按照每个元素的月份不同进行分类成不同的列,由于一个月份只会出现一个数据,但我们最终呈现出来的数据是一个id
接上一整行所有的数据,所以我们这里直接根据id
进行分组聚合,将所有的revenue
进行累和计算,由于sum()
在计算时会直接跳过所有的空值,所以我们遇到空值会直接跳过他,算出每个的值,空值就依旧返回空值。
select id
, sum(if(month = 'Jan', revenue, null)) as Jan_Revenue
, sum(if(month = 'Feb', revenue, null)) as Feb_Revenue
, sum(if(month = 'Mar', revenue, null)) as Mar_Revenue
, sum(if(month = 'Apr', revenue, null)) as Apr_Revenue
, sum(if(month = 'May', revenue, null)) as May_Revenue
, sum(if(month = 'Jun', revenue, null)) as Jun_Revenue
, sum(if(month = 'Jul', revenue, null)) as Jul_Revenue
, sum(if(month = 'Aug', revenue, null)) as Aug_Revenue
, sum(if(month = 'Sep', revenue, null)) as Sep_Revenue
, sum(if(month = 'Oct', revenue, null)) as Oct_Revenue
, sum(if(month = 'Nov', revenue, null)) as Nov_Revenue
, sum(if(month = 'Dec', revenue, null)) as Dec_Revenue
from Department
group by id
这道题首先按照题目要求对于月份进行提取,提取完之后,对于相关的要求数据进行求和或者计数,这里我们采用的主要方法均为求和。直接count()
算出所有项目的计数值,第二个是需要算出通过了的项目的计数值,这个算是一个痛点,一般的思路是count()
嵌套if
但这样写起来会不美观,所以采用sum()
的方式进行计数,,至于剩余的求和就按照相应的部分进行筛选相加就可以了,这道题主要难点是第一个格式化月份数据,第二个是sum()
里嵌套一个if
实现计数或者求和,这种思路也是非常快的方法。
select date_format(trans_date, '%Y-%m') as month
, country
, count(*) as trans_count
, sum(state='approved') as approved_count
, sum(amount) as trans_total_amount
, sum(if(state='approved',amount,0)) as approved_total_amount
from Transactions
group by month,country
这道题我理解错题意了,以为是足球制度,后面两个是比赛中的得分,然后找报错找了半天,心态泵。后面仔细看看题目才发现端倪。这题思路其实很简单,先使用公共表语法,把参赛的第一者和第二者分别拿出来,拿出来之后,做一下并集,得到新的数据表,最后对新表进行处理,对每个人做聚合处理,算每个人的总分数,并在外面连表后利用窗口函数进行排序,拿到每个组内最大分数者,输出即可,主要还是在理解题意,不要想歪了。
with first_points as (
select first_player
, first_score as points
from Matches
), second_points as (
select second_player
, second_score as points
from Matches
), all_points as (
select first_player as player
, points
from first_points
union all
select second_player
, points
from second_points
)
select group_id
, player_id
from(select group_id
, player as player_id
, rank() over(partition by group_id order by sum(points) desc, player) as rk
from all_points inner join Players
on player = player_id
group by player) as temp
where rk = 1
这题按照题意进行剖析,按照查询名进行分组,算出相应的平均值即可,而百分比的计算我使用了avg()
函数,这也是基于之前学到的特殊计算方式,由于该函数前面的sum()
部分是进行内部表达式的求和,内部运算表达式给出结果0或者1,后面是进行count()
计数是基于全部参与过前面运算的表的,所以可以直接算出比率结果,很快很方便。
select query_name
, round(avg(rating/position),2) as quality
, round(100*avg(rating < 3),2) as poor_query_percentage
from Queries
group by query_name
首先拿出所有帖子做成一个公共表,接着和原表连接,按照公共表中的id
和原表中的父id
进行连接,连接完成后对于id
进行计数,计数项为不同的sub_id
,因为这里可能会有重复的评论,接着输出即可。
with pid as (
select sub_id as p_id
from Submissions
where parent_id is null
)
select p_id as post_id
, count(distinct sub_id) as number_of_comments
from Submissions right join pid
on parent_id = p_id
group by post_id
这题首先进行连表,观察可以得到想得到项目需要满足项目时间在开始时间和结束时间内,所以连表条件之一得到了,自然的,也需要满足项目id
,因此链表之后,不可以直接使用avg()
求,因为所求总价格和以及项目计数项和之间没有直接的一一对应关系,因为项目计数在这里也是一个求和项,所以我们这里直接采用分开算的方法,分别计算单价乘数量以及总数量,除一下并且四舍五入即可。
select Prices.product_id
, round(sum(price*units)/sum(units),2) as average_price
from UnitsSold inner join Prices
on UnitsSold.product_id = Prices.product_id
and purchase_date >= start_date and purchase_date <= end_date
group by Prices.product_id
首先进行连表,将所有学生和所有科目的笛卡尔积拿到,拿到之后,得到的结果和考试表进行连表,连表结果就是所有学生考全科情况,如果未参与考试则表左侧为空值,开始计算聚合,聚合的对象就是科目名,因为碰到没有参与过的考试会返回空值,所以直接聚合看计数项,得到的结果就是我们想要的结果,直接数出即可。
with stu_sub as (
select *
from Students join Subjects
)
select stu_sub.student_id
, student_name
, stu_sub.subject_name
, count(Examinations.subject_name) as attended_exams
from Examinations right join stu_sub
on Examinations.subject_name = stu_sub.subject_name
and stu_sub.student_id = Examinations.student_id
group by student_id, subject_name
order by student_id
这道题做一次数据筛选把数据全部锁定到2020年2月,之后对于产品进行分组,分组之后直接求和数量,并在分组之后判断总数量需要大于100,拿做好的表和产品表进行连表,得到各产品名,集合前面算出来的数量输出即可。
with temp as (
select product_id
, sum(unit) as unit
from Orders
where left(order_date, 7)='2020-02'
group by product_id
having unit >= 100
)
select product_name
, unit
from temp inner join Products
on temp.product_id = Products.product_id
题意只是计算最后的股票计算总收益,所以买入为亏损,卖出为收入,因此按照两种情况进行分类计算总合结果输出即可。
select stock_name
, sum(if(operation = 'Buy', -price,price)) as capital_gain_loss
from Stocks
group by stock_name
按照user_id
进行分组,分完组直接进行连表求和,算出来的结果进行去空值直接输出即可,记得这里不可以按名字分组,因为会出现名字相同的人。
select name
, ifnull(sum(distance),0) as travelled_distance
from Rides right join Users
on Rides.user_id = Users.id
group by Rides.user_id
order by travelled_distance desc, name
首先按照销售日期进行分组,分完的组中sum()
里嵌套if
进行筛选数据并计算累和,接着计算两种水果相差的数量输出即可,这里我计算了绝对值之后,发现结果是不需要进行求绝对值的,所以回溯一下,不求绝对值,输出即可。
select sale_date
, sum(if(fruit='apples',sold_num,0))-sum(if(fruit='oranges',sold_num,0)) as diff
from Sales
group by sale_date
这题主要是引入一个新的函数group_concat()
,这个函数是一个字符串拼接函数,用于按照分组条件将括号中的聚合列进行字符串拼接,默认的拼接符号为空,要修改拼接符号可以添加参数separator
,本函数还提供了一种更改排序的方式,就是直接在聚合列后写order by
这个点碰到合适的题目会去说。
带入到本题内,其实很简单,分组之后,进行字符串拼接,并利用函数内置参数修改默认的分隔符号为,
,并进行计数,计数项为去重后的产品,计算完成之后,按照销售日期来排序即可。
select sell_date
, count(distinct product) as num_sold
, group_concat(distinct product separator ',') as products
from Activities
group by sell_date
order by sell_date
首先按照题意进行数据筛选,拿到所有在2020年6、7月的销售数,接着按照用户id
和当前月份进行分组,聚合算出每个用户每个月的消费状况,计算完之后进行判断,顺便进行筛选数据,筛选出所有消费在100以上的用户和月份。最后对于数据处理结束的数据进行连表,拿到用户名的数据。在外层循环判断每个用户是否有2个月的数据,因为在上层循环中,我们已经判断留存所有消费高于100的用户名和用户id
,所以我们只需判断每个用户的所有数据计数超过2即可。
with temp as (
select Customers.customer_id
, name
, month(order_date) as month
from Orders inner join Customers
on Orders.customer_id = Customers.customer_id
inner join Product
on Product.product_id = Orders.product_id
where order_date between '2020-06-01' and '2020-07-31'
group by Customers.customer_id, month(order_date)
having sum(quantity*price)>=100
)
select customer_id
, name
from temp
group by customer_id
having count(*) = 2
这题的题目其实很简单,知识点很有意思,先说原题,首先做处理,使用lower()
函数将表中属性转为全小写,转完之后外面套一个trim()
去掉两端多余空格,对于日期做处理,拿到年份和月份,这里可以用日期处理函数也可以直接当成字符串来取,做完处理之后,直接以这两个值分组,接着count()
算出同一个产品在同一个月卖出多少部,输出按照要求规则排序即可。
涉及到同名字段处理后输出相同字段时,不可以直接作为分组的聚合列直接分组,他会默认以原表的同名原字段分组,而不是新处理后的字段,这里处理分组需要按照处理后字段的表达式分组,或者外层嵌套子查询后分组,或者采用数字序号代替处理后的列来进行分组,比如该题,处理后的
product_name
在原表中有相同字段,则不可以直接用字段代替,而需要写trim(lower(product_name))
来作为分组条件,那么我们处理的时候,就可以直接写1,2代替我们在select
中做的操作的列。
select trim(lower(product_name)) as product_name
, left(sale_date, 7) as sale_date
, count(*) as total
from Sales
group by 1, 2
order by 1, 2
首先筛选数据,将手游消费大于20的订单筛选出来,第二步进行日期提取,提取所有的月份,并按照月份进行分组,这里分组我们采用特殊分组写法,直接写列号分组即可,最后计算订单数,由于订单号为主键,所以无重复,直接数全表即可,数用户时,由于会出现重复用户,需要进行一次去重再数,计算完后直接输出即可。
select left(order_date,7) as month
, count(*) as order_count
, count(distinct customer_id) as customer_count
from Orders
where invoice > 20
group by 1
首先先写一个CTE
,计算出所有产品的占地体积,接着进行连表,连表依据为产品id
相同,并且依照第一列仓库名字进行分组,计算仓库内所有产品的累计体积和,命别名输出即可。
with Products as (
select product_id
, product_name
, Width*Length*Height as volume
from Products
)
select name as warehouse_name
, sum(units*volume) as volume
from Warehouse inner join Products
on Warehouse.product_id = Products.product_id
group by 1
首先连表拿到用户的名字,接着按照用户名字进行分组,直接聚合算出每个人的总收入,判断总收入大于10000,输出即可。
select name
, sum(amount) as balance
from Transactions inner join Users
on Users.account = Transactions.account
group by name
having balance > 10000
本题按照题意是按2个维度进行累加,第一个维度是不同的机器,第二个维度是不同的进程,我们首先按照机器维度拆分,机器内的活动存在两种状态,开始和结束,我们按这两种状态进行累加,由于题目中对时间戳的计算要求是结束时间戳减去开始时间戳,所以我们这里计算总时间就可以直接按照不同状态进行计算,结束时间为正,开始时间为负,累计求和就算出了一个机器内所有的运行时间,又由于我们这里计算的开始或者结束其实是进程的开始或者结束,所以我们只需要除以一个不同的进程数量就可以得出每个进程平均运行时间,做一个四舍五入求3位尾数即可。
select machine_id
, round(sum(if(activity_type='end',timestamp,-timestamp))/count(distinct process_id),3) as processing_time
from Activity
group by machine_id
按照题意进行处理,将字符串拆成第一个字符的部分和后面部分,第一个字符按照题意改成大写,后面部分直接一律转为小写,处理完之后进行字符串拼接即可。这里取字符串的方式使用了左边取一位以及右边取长度减一位(即后面部分),分别用了left()
左取和right()
右取两个方式,这两个函数的使用逻辑可以直接看代码。
select user_id
, concat(upper(left(name,1)),lower(right(name,char_length(name)-1))) as name
from Users
order by user_id
首先右连接两表,得到所有的产品的名字,同时会存在并未在发票中购入的产品,此时就会存在所有字段都为空值,接着直接按名字分类,累加所有的字段和即可,注意处理空值字段。
select name
, coalesce(sum(rest),0) as rest
, coalesce(sum(paid),0) as paid
, coalesce(sum(canceled),0) as canceled
, coalesce(sum(refunded),0) as refunded
from Invoice right join Product
on Invoice.product_id = Product.product_id
group by name
order by name
按照题目要求进行分类,分类完成之后依次计算去重后的每个字段的计数项即可。
select date_id
, make_name
, count(distinct lead_id) as unique_leads
, count(distinct partner_id) as unique_partners
from DailySales
group by date_id, make_name
这里首先处理值,让select
后面的值满足于条件中给的person1 < person2
,处理完成之后,直接根据要求的分组进行聚合运算即可,可以直接算出数目和通话时长之和,得到后无需排序输出即可。
select if(from_id < to_id, from_id, to_id) as person1
, if(from_id > to_id, from_id, to_id) as person2
, count(*) as call_count
, sum(duration) as total_duration
from Calls
group by 1, 2
按照题意根据用户分组,并且直接计数,输出时注意排序顺序即可。
select user_id
, count(*) as followers_count
from Followers
group by user_id
order by user_id
按照题意,直接内连接,排除所有不能进行连表的员工(如果这个员工的reports_to
可以匹配到第二张表的employee_id
,就说明可以找到人汇报),只要可以进行连表,则表示员工的reports_to
是可以找到经理的,即员工的经理存在,则正常连表,连完表后按照经理的员工id
进行分组聚合,算出报告的人数以及报告者的平均年龄,输出即可,还需要对平均年龄进行四舍五入,处理完后就可以运行了。
select t2.employee_id
, t2.name
, count(*) as reports_count
, round(avg(t1.age)) as average_age
from Employees as t1 inner join Employees as t2
on t1.reports_to = t2.employee_id
group by 1
order by 1
因为这里out_time
和in_time
都是时间戳,所以员工在内的时间即为out_time - in_time
,由于需要算员工在一天内的所有总时间之和,所以按照日期和员工进行分组,计算出时间的累加和命别名输出即可。
select event_day as day
, emp_id
, sum(out_time - in_time) as total_time
from Employees
group by day, emp_id
本题两种情况,当参加部门数量为1时,直接选择该部门就好,当部门数量为2时,直接选择他确认的直属部门,即primary_flag = 'Y'
,按照两种情况分开选择判断选择即可。
select employee_id
, department_id
from Employee
group by employee_id
having count(*) = 1
union
select employee_id
, department_id
from Employee
where primary_flag = 'Y'
这里提供一种有意思的写法,通过对用户的primary_flag
进行排序,由于在建表语句中的Y
、M
有内置顺序1,2,所以这里我们直接排序,找到第一个输出即可,当没有Y
时,此时就为只有一个部门,所以会输出唯一的那个部门,但是这题的样例会存在多个部门但未选择部门的情况,所以这里用这个方法不太合适,但是作为一种思路也还可以。
select employee_id
, department_id
from(select employee_id
, department_id
, row_number() over(partition by employee_id order by primary_flag) as rk
from Employee) as temp
where rk = 1
筛选出所有登录在2020年的登陆记录,接着按照人员分组,求出每个人最大登陆时间,即为最后一次登录时间。
select user_id
, max(time_stamp) as last_stamp
from Logins
where year(time_stamp) = 2020
group by user_id
首先按照user_id
进行分组,利用简写求平均方法去求比率(满足表达式返回1,否则0,分母部分数全部个数),求出之后连表找到没有请求任何确认消息的用户,该用户的确认率计算完成为空,去空值处理一下,对输出结果四舍五入即可。
select Signups.user_id
, ifnull(round(avg(action='confirmed'), 2),0) as confirmation_rate
from Confirmations right join Signups
on Confirmations.user_id = Signups.user_id
group by 1
直接按照要求,算出喜欢率直接筛选出小于等于0.6的输出即可。
select problem_id
from Problems
where likes /(likes + dislikes) <= 0.6
order by problem_id
这里直接筛选所有的订单,找到金额大于等于500的订单,因为会出现有同一个客户的多个订单都满足条件,所以这里直接数不同用户,数完输出即可。
select count(distinct customer_id) as rich_count
from Store
where amount > 500
直接根据两边相同字段连表并要求工作时长至少两年,连表完成之后直接分组计算每个人的面试分数,判断面试分数总合是否大于15,若为是输出即可。
select candidate_id
from Candidates inner join Rounds
on Candidates.interview_id = Rounds.interview_id
and years_of_exp >= 2
group by Candidates.interview_id
having sum(score) > 15
直接按照老师进行分组,计算不同的老师教授的不同的课程的计数项输出即可。
select teacher_id
, count(distinct subject_id) as cnt
from Teacher
group by teacher_id
按车子分组,找到每个自行车最大登陆时间即为最近一次登陆时间,排序就按照登录时间反排序即可。
select bike_number
, max(end_time) as end_time
from Bikes
group by bike_number
order by end_time desc
按照艺术家分组,聚合计算数量输出,并按照题目进行排序,输出即可。
select artist
, count(*) as occurrences
from Spotify
group by artist
order by occurrences desc, artist
按照题目意思,直接按着员工进行分组,因为工资是递增的,只需要找到最大的工资则为最新的工资,输出即可。
select emp_id
, firstname
, lastname
, max(salary) as salary
, department_id
from Salary
group by emp_id
order by emp_id
这里连表算出每个销售员的销售额之后,由于有销售员没有销售记录,所以在外层右连销售员表,直接拿到名字,当为空时,直接去空值转为0,输出即可。
select Salesperson.salesperson_id
, name
, ifnull(total,0) as total
from(select salesperson_id
, sum(price) as total
from Customer inner join Sales
on Customer.customer_id = Sales.customer_id
group by salesperson_id) as temp right join Salesperson
on Salesperson.salesperson_id = temp.salesperson_id
首先建立公共表,分别拿到所有机场的所有流量,接着在外查询聚合计算按照总流量进行逆排序,找到逆排序为1的即为流量最大的输出即可。
with all_f as (
select departure_airport as airport_id
, flights_count
from Flights
union all
select arrival_airport
, flights_count
from Flights
)
select airport_id
from(select airport_id
, rank() over(order by sum(flights_count) desc) as rk1
from all_f
group by airport_id) as temp
where rk1 = 1
本题重点考察聚合函数group_concat()
,该函数可以将分组后涉及到聚合字段直接进行拼接,下面给出这个函数的具体用法。
GROUP_CONCAT
是一种在 SQL 中常用的聚合函数,用于将多行数据按照指定的分隔符合并成一个字符串。它可以用于将某一列的多个值拼接成一个字符串,并且可以通过参数进行进一步的配置和定制。
GROUP_CONCAT
函数可以接受多个参数,其中最重要的是待拼接的表达式expr
。你可以指定一个或多个表达式,用于确定要进行拼接的列,以及对列进行聚合时的排序顺序。此外,还可以通过可选的参数对
GROUP_CONCAT
函数的行为进行配置:
DISTINCT
:可选参数,表示去重,仅将不同的值进行拼接。ORDER BY
:可选参数,表示按照指定的列或表达式进行排序。ASC
或DESC
:可选参数,用于指定排序的升序或降序。SEPARATOR
:可选参数,用于指定拼接时使用的分隔符,默认为逗号(,)。
select concat(equation,'=0') as equation
from(select group_concat(if(factor>0,'+','-'),abs(factor),case power when 0 then ''
when 1 then 'X'
else concat('X','^',power) end order by power desc separator '') as equation
from Terms) as temp
直接连表,拿到所有未出现在表中的员工和要求的员工需完成的时间数,拿到数据之后,分组计算每个人工作的时间和,这里由于存在数据丢失,所以需要直接计算秒数向下做除法减少损耗,将需要的工作时间也转为分钟,前面的运算做向上取整后前后比较,找出工作分钟数小于需求分钟数的员工,输出即可。
select employee_id
from(select Employees.employee_id
, ifnull(ceiling(sum(timestampdiff(second, in_time, out_time))/60),0) as tm
, needed_hours*60 as needed_min
from Logs right join Employees
on Logs.employee_id = Employees.employee_id
group by 1) as t1
where tm < needed_min
分组计算每个人购买每个产品的价格和,对每个人分组并以每个人买的产品的消费金额和进行排序,取最大者输出即可。
select user_id
, product_id
from(select user_id
, Sales.product_id
, rank() over(partition by user_id order by sum(price*quantity) desc) as rk
from Sales inner join Product
on Sales.product_id = Product.product_id
group by user_id, Sales.product_id) as temp
where rk = 1
按要求直接算出满足要求的日期的比例,这里使用简易计算比率的方法来算会比较快,计算完排序输出即可。
select order_date
, round(100*avg(order_date = customer_pref_delivery_date), 2) as immediate_percentage
from Delivery
group by order_date
order by order_date
这道题卡时间复杂度卡的会比较难受一点,需要用一些特殊思维去处理,才可以让代码更加流畅一些。
下面第一种这就是常规的做法,首先利用where
锁定当前的这个产品是所需产品的一种,第二步就是利用not in
确认当前消费者没有购买另一种产品,即使用关联子查询的方法去检测当前消费者购买的所有产品中并不存在那种产品即可,这其实是一种二重循环的思路,执行的很慢(笔者在写的时候其实做了一个三重循环,就是第一种产品S8
也是和iphone
一样套在where
里面加了一个关联子查询,以至于超时,所以这种方法被舍弃了),但是这种二重循环极其之慢,运行时间也不合理,所以我们开拓思维,借用程老师的方法,计数法来判断产品个数来找到合适的东西。
select distinct buyer_id
from Sales as t1 inner join Product
on t1.product_id = Product.product_id
where product_name='S8'
and 'iPhone' not in (select product_name from Sales as t2 inner join Product
on t2.product_id = Product.product_id
where t1.buyer_id=t2.buyer_id)
下面方法的思路,就是首先连接表,拿到我们需要的S8
和iPhone
两种产品的名字,接着直接对于消费者进行聚合,聚合判断having
里算出当产品为S8
时候的数量大于1(购买了S8
),iPhone
时候的数量为0,如此判断,非常好看,速度快了3倍!
select buyer_id
from Product inner join Sales
on Product.product_id = Sales.product_id
group by buyer_id
having sum(product_name = 'S8') > 0
and sum(product_name = 'iPhone') = 0
直接使用计数判断法,判断A、B的数量都大于0,则表示这两种产品肯定被买了,C的数量等于0,则表示这种产品肯定没买。
select Customers.customer_id
, customer_name
from Orders inner join Customers
on Orders.customer_id = Customers.customer_id
group by customer_name
having sum(product_name='A')>0
and sum(product_name='B')>0
and sum(product_name='C')=0
首先根据题意,使用递归解压所有数字,把涉及到频率的数字由小到大全部排出来,既然要算中位数,不如直接对所有数字排序,中间两个或者一个就是中位数,窗口函数排序完,自然就获得了一个带有下标的数列,使用前面提到的算中位数的方法直接解决(使用窗口函数算出总数 ,再分别判断奇数和偶数情况下的中位数即可)。
with recursive temp as (
select num, 1 as fq
from Numbers
union all
select num, fq + 1
from temp
where fq < (select frequency from Numbers where num = temp.num)
)
select round(avg(num),1) as median
from(select num
, row_number() over(order by num) as rk
, count(*) over() as cnt
from temp) as t1
where rk in (cnt/2, cnt/2+1 ,cnt/2+0.5)
这道题观察题意是得到相应月份的工资之和,利用递归得到所有12个月份,进行连表将每个月和原表连接,同时保证id
相同,连接完成之后对月工资进行窗口求和,求和完的结果根据每月工资进行排除,排除掉所有之前原表没有的月份,筛选完成后,在外查询进行排序,取除最近一个月的结果,输出即可。
with recursive temp as (
select id, 1 as ids
from Employee
union
select id, ids + 1
from temp
where ids < 12
)
select id
, month
, Salary
from (select id
, month
, Salary
, row_number() over (partition by id order by month desc) as rk
from (select temp.id
, ids as month
, salary as salary0
, sum(ifnull(salary, 0))
over (partition by id order by ids rows between 2 preceding and current row) as Salary
from Employee
right join temp
on Employee.month = temp.ids
and temp.id = Employee.id) as t1
where salary0 is not null
order by id, month desc) as t2
where rk <> 1
这道题常规思路,是直接联立3表,每一层前表的经理是后表的员工,联立之后只需判断任意记录中的manager_id
有一个为1且employee_id
不为1即可。
select t1.employee_id
from Employees as t1 left join Employees as t2
on t1.manager_id = t2.employee_id
left join Employees as t3
on t3.employee_id = t2.manager_id
where t1.employee_id <> 1 and (t2.manager_id = 1 or t3.manager_id = 1 or t1.manager_id = 1)
这题的第二种思路相对难理解一些,就是使用递归,首先是第一轮,我们筛出所有直接向CEO报告的,接着第二轮直接开始回溯,我们联立cte
表和原表,将cte
表中的employee_id
作为原表的上级,一层一层跑,直到上级为1,循环结束,就跑到了每条路的最终点1。这个方法不会管有多少层,虽然效率一般,但是写出来的代码量极少,是一种很不错的方法。
with recursive cte as (
select employee_id
from Employees as t1
where manager_id = 1 and employee_id <> 1
union all
select Employees.employee_id
from Employees inner join cte
on Employees.manager_id = cte.employee_id
)
select *
from cte
这道题无疑是一道极为难理解又繁琐的题目。首先读懂题意,题目意思就是,让我们找到某个用户在某天访问银行,有两种情况,交易或者不交易,我们要算的就是他在这一次访问中,交易了多少次(仔细看交易表,有多行重复),有多少次就表示在一次访问中的交易次数为多少次,然后按照交易次数进行分组,计算多少次访问量即可。
按照上面的意思开始做,首先连接两个表,按照交易日期和交易用户进行连接,连接完成后得到vic_tic
表,在这个表中,记录了每一次交易,空交易(只访问不交易),这里我们做一个处理,对表中的项目进行替换,我们把一次成功的交易记录为1,空交易记录为0,是否交易意味着是否存在交易日期,存在则为交易。然后针对这个表操作,我们按照前面算出的交易情况进行聚合,按照用户和访问时间进行分组,这里一个组就表示一次访问,然后我们sum(tr)
算出一次访问意味着多少次交易,当为0时,则表示这次访问没有交易,然后我们在外层循环按照题意进行再次聚合,看多少次访问可以实现相同的交易次数,就完成了题目中给出的大部分内容,但是还存在一些情况,就是有些交易次数没有访问记录可以做到,应当为0,但在这里是缺失值的,所以这里写一个递归,把所有情况都考虑到,将这个表和前面的连接就可以了,是空值的部分去空值处理一下就好。
这道题是做到现在又臭又长的一道题,但是很考验逻辑思路,可以学一下。
with recursive t3 as (
select 0 as ids
union
select ids+1
from t3
where ids < 100
)
, vic_tic as (
select Visits.user_id
, visit_date
, if(transaction_date is null ,0, 1) as tr
from Transactions right join Visits
on Transactions.user_id = Visits.user_id
and Transactions.transaction_date = Visits.visit_date
), t2 as (
select sum_tr as transactions_count
, count(*) as visits_count
from(select user_id
, visit_date
, sum(tr) as sum_tr
from vic_tic
group by user_id,visit_date) as t1
group by transactions_count
)
select ids as transactions_count
, ifnull(visits_count,0) as visits_count
from t2 right join t3
on t3.ids = t2.transactions_count
where ids <= (select max(transactions_count) from t2)
由于给出的是每日的开始日期和结束日期,所以直接开始递归,我们要算出每个产品的所有售出日期,按照产品和年度分组,之后可以直接得出已给出的产品在每一年的销售情况。拿到数据之后由于需要每一年的天数,所有可以直接数全表,因为前面我们分了组,所以这时我们得到的数字就是该年度内总的销售天数,乘上前面给出的价格就是每一年度的销售情况,要记住这里的report_year
会进行类型检测,只有字符串类型才可以通过检测,所以这里需要cast()
转一下类型转成字符串输出即可。
with recursive all_date as (
select product_id
, average_daily_sales
, period_start as date
from Sales
union all
select product_id
, average_daily_sales
, date_add(date, interval 1 day)
from all_date
where date < (select period_end from Sales where Sales.product_id = all_date.product_id)
)
select all_date.product_id
, product_name
, cast(year(date) as char) as report_year
, count(*)*average_daily_sales as total_amount
from all_date inner join Product
on all_date.product_id = Product.product_id
group by 1,3
order by product_id, report_year
本题意思是会提供一个默认从1到最大值的表,该表会存在缺失值导致整个表不连续且最大值不会是缺失值,所以这里我们引入递归,写一个子查询,直接从1递归到最大的原表给到的customer_id
的最大值,将一整段整表写出来,并且和原表进行左连接,因为连表条件是相同的id
,又由于在右表中会缺少值,所以连接后缺少值转为空值输出,我们只需进行判断,将所有空值的id
输出即可。
上述的说法比较抽象一些,具体一下换个说法,就是现在这个表缺少了一些值,比如1、2、4、5,缺少了3,则我们需要将这个3找到,因此我们需要构造一个不缺少值的表,通过这个歌不缺少值的表进行比对,如果这个缺少值在原表中不存在,则我们直接在不缺少值的总表中输出即可,而这个总表,也就是不缺少值的表,我们采用递归的方式来生成,这里同样有两种生成思路,一种是直接按照题目给出的最大的范围100来生成,第二种是按照原表进行关联子查询,在递归中进行判断,判断递归的最大值不能超过原表的最大值即可,当然如果直接写100,则需要在外层的循环上写入where
判断连接时的最大ids
不能超过原表的最大值,因为根据题意,不会产生最大值的溢出,所以这里我们需要限制一下ids
即可。
with recursive temp as (
select 1 as ids
union
select ids + 1
from temp
where ids < (select max(customer_id) from Customers)
)
select ids
from temp left join Customers
on temp.ids = Customers.customer_id
where Customers.customer_id is null
首先按照给出的所有任务序号以及任务的总计数值进行递归,将所有的任务对全部输出,输出完成后与执行表进行左连接,连接条件就是两者的任务对需要一样,因为是左连接,所以未在执行表出现的任务对也就是未执行的任务对会被返回为空值,所以,只需要使用where
筛选出所有右表中的task_id
为空的数据(未被执行的任务)即可。
with recursive temp as (
select task_id
, 1 as subtask_id
from Tasks
union
select task_id
, subtask_id + 1
from temp
where subtask_id < (select subtasks_count from Tasks where task_id = temp.task_id)
)
select temp.task_id
, temp.subtask_id
from temp left join Executed
on temp.task_id = Executed.task_id
and temp.subtask_id = Executed.subtask_id
where Executed.task_id is null
这道题确实有些复杂,首先从给出的原表中拿到我们需要的每个人的分类,这里采用的是一个大的case
嵌套多个表达式语句来做到判断各个id
的使用端类型,判断方式是我们前面提到的快速计数法,即利用sum()
代替in
的方式,加快运算。到这里我们已经可以进行运算,我们按照前面分的类进行聚合,算出每天的每种客户端的不同人数,算出之后进行输出发现存在缺失值,即有一些当天并未使用过的客户端类型,原本应当输出为0,但我们无法输出,所以这里开始追溯丢失值。直接构造辅助列,把题目中提到的三种客户端全部输出,并且从原表中提取所有的日期并去重,将前面的三种客户端和日期做笛卡尔积直接输出就构造了一个每天所有客户端的表,将此表和前面做出的存在丢失值的查询进行左连接,连接条件为相同的时间和客户端类型,由于时间是不存在丢失值的,但是客户端存在丢失值,所以这里的左连接会把丢掉的值改为空值,我们就不存在丢失行了,即可继续处理。我们根据拿到的表做聚合即可将需要的值算出来,这里需要注意count()
里面的值是有要求的,因为行不存在丢失值,丢失值是建立在单元格中的,所以我们这里需要去对user_id
进行计数是,由于之前左连接存在空值,所以数到的全为非空值,至于总amount
则需要进行去空值处理,因为空值不存在加法,会返回空值,处理后全部输出即可。
with all_person as (
select spend_date
, user_id
, case when sum(platform='mobile')>0 and sum(platform='desktop')=0 then 'mobile'
when sum(platform='mobile')=0 and sum(platform='desktop')>0 then 'desktop'
when sum(platform='mobile')>0 and sum(platform='desktop')>0 then 'both'
end as platform
, sum(amount) as amount
from Spending
group by spend_date ,user_id
), all_platform as (
select 'desktop' as platform
union
select 'mobile'
union
select 'both'
), temp as (
select spend_date
, platform
from (select distinct spend_date from Spending) as t1 join all_platform
)
select temp.spend_date
, temp.platform
, ifnull(sum(amount),0) as total_amount
, count(user_id) as total_users
from temp left join all_person
on temp.spend_date = all_person.spend_date
and temp.platform = all_person.platform
group by spend_date, platform
这道题常规思路写起来是极为繁琐的,需要对于原表中的所有时间进行一个全外连接,但很可惜,mysql
没有全外连接,所以,我们如果这么处理是很不合适且极费时间的。在这里我们想处理这道题就要引入新的方法:构造辅助列,按照原题意思,我们无需管退回订单前面的状态,也就是说该订单是被取消还是被通过都是可以被我们计算在退回订单中的,如此一来,我们直接构造新表,把所有退回的订单构造为一个全新的表,这个表的所有信息全和之前一样,只是状态栏这里是退回状态以及操作日期变成了退回日期。因此,我们让这个表和之前的原表进行不去重并集,此时我们就得到了一个拥有三种状态以及所有状态的操作时间的表,我们称之为完整表。
我们继续对完整表做操作,对我们所需的两种状态分别做计数和求和,这里的求和方法之前说过,这里不赘述,注意这里的分组需要按照月份和国家一起分组,不然会造成值的确实,由于要求不对全属性为0的数据做记录,所以结尾对于分组计算完的数据进行了一次处理,保证所有值都不为0。
with full_Transactions as (
select id
, country
, 'chargeback' as state
, amount
, Chargebacks.trans_date
from Chargebacks inner join Transactions
on trans_id = id
union all
select *
from Transactions
)
select date_format(trans_date, '%Y-%m') as month
, country
, sum(state='approved') as approved_count
, sum(if(state='approved',amount,0)) as approved_amount
, sum(state='chargeback') as chargeback_count
, sum(if(state='chargeback',amount,0)) as chargeback_amount
from full_Transactions
group by month, country
having approved_count + approved_amount + chargeback_amount + approved_count <> 0
这题按照题意直接分组构建辅助列,然后按照秒数除以60得到的分钟数来直接分组,分完组之后,按照每个区间段算出区间内数量,得出结果输出即可。
select '[0-5>' as bin
, count(*) as total
from Sessions
where duration/60 between 0 and 5
union
select '[5-10>' as bin
, count(*) as total
from Sessions
where duration/60 between 5 and 10
union
select '[10-15>' as bin
, count(*) as total
from Sessions
where duration/60 between 10 and 15
union
select '15 or more' as bin
, count(*) as total
from Sessions
where duration/60 >= 15
该题直接按照产品id
进行分组,由于这里一个产品只会在同一个商店出现一次,所以直接窄表变宽表累加就可以了,注意这里的if
后面需要写null
,因为题目要求是产品没有在商店的时候直接返回空值就可以了。
select product_id
, sum(if(store='store1', price, null)) as store1
, sum(if(store='store2', price, null)) as store2
, sum(if(store='store3', price, null)) as store3
from Products
group by product_id
本题按照主队和客队进行分别计算,写一个cte
把主队客队得分分别计算,也分别记住得球数和失球数,组合时直接做不去重并集即可。在外循环中,先连表,按照球队名字进行分组,分组聚合计算各项目,并在最后按照既定要求排序即可。
with temp as (
select home_team_id as team
, case when home_team_goals > away_team_goals then 3
when home_team_goals = away_team_goals then 1
when home_team_goals < away_team_goals then 0 end as points
, home_team_goals as goal_for
, away_team_goals as goal_against
from Matches
union all
select away_team_id
, case when home_team_goals > away_team_goals then 0
when home_team_goals = away_team_goals then 1
when home_team_goals < away_team_goals then 3 end
, away_team_goals
, home_team_goals
from Matches
)
select team_name
, count(*) as matches_played
, sum(points) as points
, sum(goal_for) as goal_for
, sum(goal_against) as goal_against
, sum(goal_for)-sum(goal_against) as goal_diff
from temp inner join Teams
on team = team_id
group by team_name
order by points desc, goal_diff desc, team_name
首先按照题目将所有值进行筛选,分为3个收入类进行计算,并根据分好的类进行进一步计算计数项,我们构建一个全部三类收入状态作为一个新的公共表参与后续运算,和前面已经算好的每种状态有多少个的表进行连接,两表连接后,可以得到全状态的表以及计数,由于会出现空值,所以这里我们做一次去重处理,处理完直接输出就好。
with sal_p as (
select account_id
, case when income > 50000 then 'High Salary'
when income between 20000 and 50000 then 'Average Salary'
when income < 20000 then 'Low Salary' end as category
, income
from Accounts
), all_category as (
select 'Low Salary' as category
union
select 'Average Salary'
union
select 'High Salary'
)
select all_category.category
, ifnull(accounts_count,0) as accounts_count
from(select category
, count(*) as accounts_count
from sal_p
group by category) as temp right join all_category
on all_category.category = temp.category
第二种方法是通过构建辅助列,直接将所有类型分开后直接放进给定的组内,所以直接求计数项并在一起就可以了,这里注意不要进行分组,分组会把没有值的记录直接删掉,会导致当某一种情况缺少值即计数为0时,则由于分组聚合会被去掉,所以这里直接计数输出即可。
select "Low Salary" as category
, count(*) as accounts_count
from Accounts
where income < 20000
union
select "Average Salary"
, count(*)
from Accounts
where income between 20000 and 50000
union
select "High Salary"
, count(*)
from Accounts
where income > 50000
本题筛选出三个实验室的所有活动,分别做三个子表,在子表中先按照experiment_name
分组算出每一种活动在子表中的数量分别有多少个,由于存在没有进行的活动,在计数中就不会出现,所以这里我们直接右连接拿到所有的包括未参与活动种类的数量,这里未参与的活动会是一个空值,需要进行去空值处理,处理完之后三个子表做并集输出即可。
with temp as (
select 'Programming' as experiment_name
union
select 'Sports'
union
select 'Reading'
)
select 'Android' as platform
, temp.experiment_name
, ifnull(num_experiments,0) as num_experiments
from(select platform
, experiment_name
, count(*) as num_experiments
from Experiments
where platform = 'Android'
group by experiment_name) as t1 right join temp
on temp.experiment_name = t1.experiment_name
union
select 'IOS' as platform
, temp.experiment_name
, ifnull(num_experiments,0) as num_experiments
from(select platform
, experiment_name
, count(*) as num_experiments
from Experiments
where platform = 'IOS'
group by experiment_name) as t1 right join temp
on temp.experiment_name = t1.experiment_name
union
select 'Web' as platform
, temp.experiment_name
, ifnull(num_experiments,0) as num_experiments
from(select platform
, experiment_name
, count(*) as num_experiments
from Experiments
where platform = 'Web'
group by experiment_name) as t1 right join temp
on temp.experiment_name = t1.experiment_name
首先按照每个航班进行连接,找到每个用户对应的航班,并且按照时间顺序进行累加1,算出当前的容量,按照容量和之前总容量进行比较,如果小于等于总容量,则当前为许可状态,反之则需要等待,排序输出两种状态即可。
select passenger_id
, if(cp <= capacity, 'Confirmed', 'Waitlist') as Status
from(select passenger_id
, capacity
, sum(1) over(partition by Flights.flight_id order by booking_time) as cp
from Passengers inner join Flights
on Passengers.flight_id = Flights.flight_id ) as t1
order by passenger_id
和上题一样,直接套用代码,但修改排序顺序为用户,得到用户状态后,直接做窄表变宽表写sum()
并且连表拿到没有容量的航班去空值输出即可。
with par as (
select passenger_id
, flight_id
, if(cp <= capacity, 'Confirmed', 'Waitlist') as Status
from(select passenger_id
, capacity
, Flights.flight_id
, sum(1) over(partition by Flights.flight_id order by passenger_id) as cp
from Passengers inner join Flights
on Passengers.flight_id = Flights.flight_id ) as t1
order by passenger_id
)
select Flights.flight_id
, ifnull(sum(Status='Confirmed'),0) as booked_cnt
, ifnull(sum(Status='Waitlist'),0) as waitlist_cnt
from par right join Flights
on par.flight_id = Flights.flight_id
group by flight_id
order by 1
这道题代码略显繁琐,也可能是我的代码不够合理,还需要后续优化。
本题思路要明确,算出每个国家的所有通话记录和通话总时长,全部算出之后,分别计算各国家总时长和全世界总时长,算出之后,进行比较,比较得到的结果即为需要求出的国家。这里我们首先写一个cte
把呼叫者和收听者看作两部分,把这两部分所属的国家的通话时长进行分别累和,算出来之后,再写一个cte
对于每个国家总的通话时长进行累加,注意,这里还需要计算一下计数项,因为我们需要求的是平均值,所以我们这里求计数项,后面在计算全部国家的通话平均时长的时候就可以作为除数使用,这里我们还需要做一些准备工作,首要的就是算出所有国家的总时长和总计数和,这里写一个窗口函数算出这两项(这里写窗口函数是便于后面计算,不写窗口函数要一直连表,不利于计算,很繁琐),外面套一个大循环,利用我们之前找到的各国家的时长除以通话个数就是各国平均时长,并且利用上面窗口函数算出所有国家总时长除以总通话数,比较后得出国家id
(这里是取的号码的前3位区号),根据国家表连表就可以输出国家名字。
with ap as (
select left(phone_number,3) as country_code
, count(*) as cnt
, sum(duration) as duration
from Calls inner join Person
on Calls.caller_id = id
group by left(phone_number,3)
union all
select left(phone_number,3)
, count(*)
, sum(duration)
from Calls inner join Person
on Calls.callee_id = id
group by left(phone_number,3)
), all_du as (
select country_code
, sum(cnt) as cnt
, sum(duration) as duration
from ap
group by country_code
)
select name as country
from(select country_code
, cnt
, duration
, sum(cnt) over() as sum_cnt
, sum(duration) over() as sum_d
from all_du) as temp inner join Country
on temp.country_code = Country.country_code
where duration/cnt > sum_d/sum_cnt
首先分别把收款和付款作为两组数据并集到一起,作为一整列,付款时为负收入,收款时为正收入,组合就得到了一个所有人收入表,对所有人的收入表进行按人员聚合,处理数据累加得到每个人的总收入,得到数据之后和原来初始账户中的金额进行相加,若总值为负数,则说明透支。本题主要考点在于对于宽表变窄表的处理,还是需要先找到两种数据的不同之处,本题就在于一个是支付者,一个是收款者,按照这个进行区分就可以得到一个窄表来进行运算。
with temp as (
select paid_by as user
, -amount as amount
from Transactions
union all
select paid_to
, amount
from Transactions
), t1 as (
select user
, sum(amount) as amount
from temp
group by 1
)
select Users.user_id
, user_name
, credit + ifnull(amount,0) as credit
, if(credit + amount < 0, 'Yes', 'No') as credit_limit_breached
from Users left join t1
on Users.user_id = t1.user
按题意直接进行拆解,分别拿到每个产品在每个商店的价格,分3个商店分开计算做并集即可,遇到不在商店售卖即该商店价格为null
时,直接筛选排除掉即可输出。
select product_id
, 'store1' as store
, store1 as price
from Products
where store1 is not null
union
select product_id
, 'store2' as store
, store2 as price
from Products
where store2 is not null
union
select product_id
, 'store3' as store
, store3 as price
from Products
where store3 is not null
首先将通话记录完整到拨号人和接收人,接着通过窗口函数拿到每天同一个账户第一个电话的接收者和最后一个电话的接收者,最后对每天的每个用户直接进行不同接收者的计数,如果这个计数项为1,则表示这一天这个用户的第一个电话和最后一个电话的接收者是一样的,这题的逻辑关系还是相对复杂的,需要小心应对。
with temp as (
select caller_id as id1
, recipient_id as id2
, call_time
, date_format(call_time, '%Y-%m-%d') as day
from Calls
union all
select recipient_id
, caller_id
, call_time
, date_format(call_time, '%Y-%m-%d') as day
from Calls
)
select distinct id1 as user_id
from(select id1
, id2
, call_time
, day
, rank() over(partition by day, id1 order by call_time) as rk1
, rank() over(partition by day, id1 order by call_time desc) as rk2
from temp) as t1
where rk1 = 1 or rk2 = 1
group by day, id1
having count(distinct id2)=1
分别提取出原表中满足要求的数据,直接做笛卡尔积输出即可。
with t1 as (
select symbol
from Elements
where type = 'Nonmetal'
), t2 as (
select symbol
from Elements
where type = 'Metal'
)
select t1.symbol as Metal
, t2.symbol as Nonmetal
from t1 join t2
首先计算Senior
最多可以取到多少人,这里需要按工资由小到大排,排完之后进行累加,计算出每一排的累加和,要求累加和小于等于70000,对于排除掉工资和大于70000的之后,算出此时的总人数,并且传参传出当前剩余的工资数。利用剩余的工资数带入到Junior
的数量计算中,按照上面的计算逻辑,并以前面的剩余工资继续参与计算,算出当前的招聘人数,和前面一开始的人数并起来输出即可。本题就是套娃,表套表,得到Junior
的起始工资,如何将这个参数传进去是本题难点,这里我选择的是直接写了个关联子查询,效率比较慢。
with t1 as (
select 'Senior' as experience
, count(*) as accepted_candidates
, max(sum_salary) as max_sal
from(select employee_id
, sum(salary) over(order by salary) as sum_salary
from Candidates
where experience = 'Senior') as t2
where sum_salary <= 70000
), t3 as (
select 'Junior' as experience
, count(*) as accepted_candidates
from(select employee_id
, sum(salary) over(order by salary) as sum_salary
from Candidates
where experience = 'Junior') as t4
where sum_salary <= (70000-(select ifnull(max_sal,0) from t1))
)
select *
from t3
union
select experience
, accepted_candidates
from t1
本题和上题处理思路完全一致,但是需要用窗口函数代替聚合函数的max()
,并且在下面计算初级职工时使用聚合函数调用其中某一个max_sal
当然这里面所有的值都一样,算出所有满足条件的employee_id
输出即可。
with t1 as (
select employee_id
, max(sum_salary) over() as max_sal
from(select employee_id
, sum(salary) over(order by salary) as sum_salary
from Candidates
where experience = 'Senior') as t2
where sum_salary <= 70000
), t3 as (
select employee_id
from(select employee_id
, sum(salary) over(order by salary) as sum_salary
from Candidates
where experience = 'Junior') as t4
where sum_salary <= (70000-(select ifnull(max(max_sal),0) from t1))
)
select employee_id
from t3
union
select employee_id
from t1
where employee_id is not null
将所有比赛的得奖情况做不去重并集合在一起,并对这个公共表做处理,直接连表,连接运动员表,连接后数出每个人在表中出现过的次数即可。
with temp as (
select Wimbledon as champion
from Championships
union all
select Fr_open
from Championships
union all
select US_open
from Championships
union all
select Au_open
from Championships
)
select player_id
, player_name
, count(*) as grand_slams_count
from Players inner join temp
on Players.player_id = temp.champion
group by player_id
这里还有一种效率更快的写法,直接隐式连接2表,利用比赛获胜者等于运动员的表达式来进行整行计算,如果满足则返回1,否则返回0,按照运动员分组,并保证输出的每个运动员都有获胜的比赛,即grand_slams_count > 0
,判断完即可输出。
select Players.player_id
, player_name
, sum(Wimbledon = Players.player_id) + sum(Fr_open = player_id) + sum(US_open = player_id) + sum(Au_open = player_id) as grand_slams_count
from Championships, Players
group by player_id
having grand_slams_count > 0
先建立所有好友关系找到所有用户和好友,接着开始计算,分组到各用户,窗口函数计算当前用户的好友个数,并嵌套一个子循环找到所有的用户个数,外层循环分组计算完输出即可。
with all_friends as (
select user1 as user_id
, user2 as friend_id
from Friends
union
select user2
, user1
from Friends
)
select user_id as user1
, round(100*cnt1/cnt2,2) as percentage_popularity
from(select user_id
, count(*) over(partition by user_id) as cnt1
, (select count(distinct user_id) from all_friends) as cnt2
from all_friends) as temp
group by user_id
order by user1
这题直接拿正则表达式去匹配,正则表达式的写法是^
表示检测语句,检测当前位为起始位置,$
为结束位置,前面要求开头是大小写字母,直接中括号单次匹配即可,后面不能保证一定有元素跟在第一个元素后面,所以按照目标要求,打上相应的中括号把数据塞进去即可,匹配次数为任意次,因为只有开头一个字母后面跟邮箱网址也符合要求,这里需要注意.
符号需要进行转义,否则该符号意义为匹配任意字符一次,双反斜杠表示转义,转义结束之后匹配输出即可。
MySQL中,正则表达式的rlike
关键字和regexp
意义一样,使用语法无区别。
select user_id
, name
, mail
from Users
where mail rlike '^[a-zA-Z][a-zA-Z./_0-9-]*@leetcode\\.com$'
这题就是直接利用正则表达式进行匹配,匹配完成后直接输出即可,因为疾病代号前可能会有空格或者后面会接上其他字符,这道题需要用到\\b
检测字符边界的知识点,当为字符串开头或者结尾时匹配成功。
select *
from Patients
where conditions rlike '.*\\bDIAB1.+'
按照题目要求进行正则匹配,由于不是左右侧,所以直接排除情况,先写字符边界加空格排除状况,试着看看报错,发现无错误,直接输出即可。
select 'bull' as word
, count(*) as count
from Files
where content rlike '\\s\\bbull\\b\\s'
union
select 'bear' as word
, count(*) as count
from Files
where content rlike '\\s\\bbear\\b\\s'
这里两表做笛卡尔积先完全连接,按照句首、句尾、句中三种情况进行匹配,匹配之后按照位置进行排序后分组去重组合字符串,最后右连接找到没有匹配的主题的文章,去空值给默认值输出即可。
select Posts.post_id
, ifnull(topic,'Ambiguous!') as topic
from(select post_id
, group_concat(distinct topic_id order by topic_id) as topic
from Posts join Keywords
where content like concat('% ',word,' %') or content like concat('% ' ,word) or content like concat(word,' %')
group by post_id) as temp right join Posts
on Posts.post_id = temp.post_id
date_format()
的所有占位符的意义%W 工作日的全称,例如:Sunday, Monday,…, Saturday
%M 月份全名称,例如:January, February,…December
%m 具有前导零的月份名称,例如:00,01,02,… 12
%d 如果是1个数字(小于10),那么一个月之中的第几天表示为加前导加0, 如:00, 01,02, …31
%e 没有前导零的月份的日子,例如:1,2,… 31
%Y 表示年份,四位数,例如2000,2001,…等。
%y 表示年份,两位数,例如00,01,…等。
按照题目思路直接进行日期转为相同格式的字符串即可,具体的转换模式可以看上面的各占位符的意义,按照需求带入即可。
select date_format(day, '%W, %M %e, %Y') as day
from Days
有的值都一样,算出所有满足条件的employee_id
输出即可。
with t1 as (
select employee_id
, max(sum_salary) over() as max_sal
from(select employee_id
, sum(salary) over(order by salary) as sum_salary
from Candidates
where experience = 'Senior') as t2
where sum_salary <= 70000
), t3 as (
select employee_id
from(select employee_id
, sum(salary) over(order by salary) as sum_salary
from Candidates
where experience = 'Junior') as t4
where sum_salary <= (70000-(select ifnull(max(max_sal),0) from t1))
)
select employee_id
from t3
union
select employee_id
from t1
where employee_id is not null
将所有比赛的得奖情况做不去重并集合在一起,并对这个公共表做处理,直接连表,连接运动员表,连接后数出每个人在表中出现过的次数即可。
with temp as (
select Wimbledon as champion
from Championships
union all
select Fr_open
from Championships
union all
select US_open
from Championships
union all
select Au_open
from Championships
)
select player_id
, player_name
, count(*) as grand_slams_count
from Players inner join temp
on Players.player_id = temp.champion
group by player_id
这里还有一种效率更快的写法,直接隐式连接2表,利用比赛获胜者等于运动员的表达式来进行整行计算,如果满足则返回1,否则返回0,按照运动员分组,并保证输出的每个运动员都有获胜的比赛,即grand_slams_count > 0
,判断完即可输出。
select Players.player_id
, player_name
, sum(Wimbledon = Players.player_id) + sum(Fr_open = player_id) + sum(US_open = player_id) + sum(Au_open = player_id) as grand_slams_count
from Championships, Players
group by player_id
having grand_slams_count > 0
先建立所有好友关系找到所有用户和好友,接着开始计算,分组到各用户,窗口函数计算当前用户的好友个数,并嵌套一个子循环找到所有的用户个数,外层循环分组计算完输出即可。
with all_friends as (
select user1 as user_id
, user2 as friend_id
from Friends
union
select user2
, user1
from Friends
)
select user_id as user1
, round(100*cnt1/cnt2,2) as percentage_popularity
from(select user_id
, count(*) over(partition by user_id) as cnt1
, (select count(distinct user_id) from all_friends) as cnt2
from all_friends) as temp
group by user_id
order by user1
这题直接拿正则表达式去匹配,正则表达式的写法是^
表示检测语句,检测当前位为起始位置,$
为结束位置,前面要求开头是大小写字母,直接中括号单次匹配即可,后面不能保证一定有元素跟在第一个元素后面,所以按照目标要求,打上相应的中括号把数据塞进去即可,匹配次数为任意次,因为只有开头一个字母后面跟邮箱网址也符合要求,这里需要注意.
符号需要进行转义,否则该符号意义为匹配任意字符一次,双反斜杠表示转义,转义结束之后匹配输出即可。
MySQL中,正则表达式的rlike
关键字和regexp
意义一样,使用语法无区别。
select user_id
, name
, mail
from Users
where mail rlike '^[a-zA-Z][a-zA-Z./_0-9-]*@leetcode\\.com$'
这题就是直接利用正则表达式进行匹配,匹配完成后直接输出即可,因为疾病代号前可能会有空格或者后面会接上其他字符,这道题需要用到\\b
检测字符边界的知识点,当为字符串开头或者结尾时匹配成功。
select *
from Patients
where conditions rlike '.*\\bDIAB1.+'
按照题目要求进行正则匹配,由于不是左右侧,所以直接排除情况,先写字符边界加空格排除状况,试着看看报错,发现无错误,直接输出即可。
select 'bull' as word
, count(*) as count
from Files
where content rlike '\\s\\bbull\\b\\s'
union
select 'bear' as word
, count(*) as count
from Files
where content rlike '\\s\\bbear\\b\\s'
这里两表做笛卡尔积先完全连接,按照句首、句尾、句中三种情况进行匹配,匹配之后按照位置进行排序后分组去重组合字符串,最后右连接找到没有匹配的主题的文章,去空值给默认值输出即可。
select Posts.post_id
, ifnull(topic,'Ambiguous!') as topic
from(select post_id
, group_concat(distinct topic_id order by topic_id) as topic
from Posts join Keywords
where content like concat('% ',word,' %') or content like concat('% ' ,word) or content like concat(word,' %')
group by post_id) as temp right join Posts
on Posts.post_id = temp.post_id
date_format()
的所有占位符的意义%W 工作日的全称,例如:Sunday, Monday,…, Saturday
%M 月份全名称,例如:January, February,…December
%m 具有前导零的月份名称,例如:00,01,02,… 12
%d 如果是1个数字(小于10),那么一个月之中的第几天表示为加前导加0, 如:00, 01,02, …31
%e 没有前导零的月份的日子,例如:1,2,… 31
%Y 表示年份,四位数,例如2000,2001,…等。
%y 表示年份,两位数,例如00,01,…等。
按照题目思路直接进行日期转为相同格式的字符串即可,具体的转换模式可以看上面的各占位符的意义,按照需求带入即可。
select date_format(day, '%W, %M %e, %Y') as day
from Days