以下题目均来自力扣
动作表:Actions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| post_id | int |
| action_date | date |
| action | enum |
| extra | varchar |
+---------------+---------+
此表没有主键,所以可能会有重复的行。
action 字段是 ENUM 类型的,包含:('view', 'like', 'reaction', 'comment', 'report', 'share')
extra 字段是可选的信息(可能为 null),其中的信息例如有:1.报告理由(a reason for report) 2.反应类型(a type of reaction)
编写一条SQL,查询每种 报告理由(report reason)在昨天的不同报告数量(post_id)。假设今天是 2019-07-05。
查询及结果的格式示例:
Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra |
+---------+---------+-------------+--------+--------+
| 1 | 1 | 2019-07-01 | view | null |
| 1 | 1 | 2019-07-01 | like | null |
| 1 | 1 | 2019-07-01 | share | null |
| 2 | 4 | 2019-07-04 | view | null |
| 2 | 4 | 2019-07-04 | report | spam |
| 3 | 4 | 2019-07-04 | view | null |
| 3 | 4 | 2019-07-04 | report | spam |
| 4 | 3 | 2019-07-02 | view | null |
| 4 | 3 | 2019-07-02 | report | spam |
| 5 | 2 | 2019-07-04 | view | null |
| 5 | 2 | 2019-07-04 | report | racism |
| 5 | 5 | 2019-07-04 | view | null |
| 5 | 5 | 2019-07-04 | report | racism |
+---------+---------+-------------+--------+--------+
Result table:
+---------------+--------------+
| report_reason | report_count |
+---------------+--------------+
| spam | 1 |
| racism | 2 |
+---------------+--------------+
注意,我们只关心报告数量非零的结果。
找出
select extra report_reason,count(distinct post_id) report_count
from Actions where action='report'
and action_date=date_sub('2019-07-05',interval 1 day)
group by extra
;
事件表:Events
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| business_id | int |
| event_type | varchar |
| occurences | int |
+---------------+---------+
此表的主键是 (business_id, event_type)。
表中的每一行记录了某种类型的事件在某些业务中多次发生的信息。
写一段 SQL 来查询所有活跃的业务。
如果一个业务的某个事件类型的发生次数大于此事件类型在所有业务中的平均发生次数,并且该业务至少有两个这样的事件类型,那么该业务就可被看做是活跃业务。
查询结果格式如下所示:
Events table:
+-------------+------------+------------+
| business_id | event_type | occurences |
+-------------+------------+------------+
| 1 | reviews | 7 |
| 3 | reviews | 3 |
| 1 | ads | 11 |
| 2 | ads | 7 |
| 3 | ads | 6 |
| 1 | page views | 3 |
| 2 | page views | 12 |
+-------------+------------+------------+
结果表
+-------------+
| business_id |
+-------------+
| 1 |
+-------------+
'reviews'、 'ads' 和 'page views' 的总平均发生次数分别是 (7+3)/2=5, (11+7+6)/3=8, (3+12)/2=7.5。
id 为 1 的业务有 7 个 'reviews' 事件(大于 5)和 11 个 'ads' 事件(大于 8),所以它是活跃业务。
开窗求平均值
select
business_id,
event_type,
occurences,
avg(occurences) over(partition by event_type) avg_oc
from
Events
;
新增标记列flag,判断是否大于平均值,大于为1,否则为0
select
business_id,
event_type,
if(occurences>avg_oc,1,0) flag,
from
(
select
business_id,
event_type,
occurences,
avg(occurences) over(partition by event_type) avg_oc
from
Events
) t
;
对business_id进行分组,求出sum(flag)大于等于2的就是活跃的业务
select
business_id
from (select
business_id,
event_type,
if(occurences>avg_oc,1,0) flag
from
(
select
business_id,
event_type,
occurences,
avg(occurences) over(partition by event_type) avg_oc
from
Events
) t) s group by business_id having sum(flag)>=2
支出表: Spending
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| spend_date | date |
| platform | enum |
| amount | int |
+-------------+---------+
这张表记录了用户在一个在线购物网站的支出历史,该在线购物平台同时拥有桌面端('desktop')和手机端('mobile')的应用程序。
这张表的主键是 (user_id, spend_date, platform)。
平台列 platform 是一种 ENUM ,类型为('desktop', 'mobile')。
写一段 SQL 来查找每天 仅 使用手机端用户、仅 使用桌面端用户和 同时 使用桌面端和手机端的用户人数和总支出金额。
查询结果格式如下例所示:
Spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1 | 2019-07-01 | mobile | 100 |
| 1 | 2019-07-01 | desktop | 100 |
| 2 | 2019-07-01 | mobile | 100 |
| 2 | 2019-07-02 | mobile | 100 |
| 3 | 2019-07-01 | desktop | 100 |
| 3 | 2019-07-02 | desktop | 100 |
+---------+------------+----------+--------+
Result table:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop | 100 | 1 |
| 2019-07-01 | mobile | 100 | 1 |
| 2019-07-01 | both | 200 | 1 |
| 2019-07-02 | desktop | 100 | 1 |
| 2019-07-02 | mobile | 100 | 1 |
| 2019-07-02 | both | 0 | 0 |
+------------+----------+--------------+-------------+
在 2019-07-01, 用户1 同时 使用桌面端和手机端购买, 用户2 仅 使用了手机端购买,而用户3 仅 使用了桌面端购买。
在 2019-07-02, 用户2 仅 使用了手机端购买, 用户3 仅 使用了桌面端购买,且没有用户 同时 使用桌面端和手机端购买。
select spend_date,b.platform,
sum(if(a.platform=b.platform,amount,0)) as total_amount,
count(if(a.platform=b.platform,1,null)) as total_users
from(
select spend_date,user_id,
if(count(distinct platform)=2,'both',platform) as platform,
sum(amount) as amount
from spending
group by user_id,spend_date
) a,(
select 'desktop' as platform union
select 'mobile' as platform union
select 'both' as platform
) b
group by spend_date,platform
动作表: Actions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| post_id | int |
| action_date | date |
| action | enum |
| extra | varchar |
+---------------+---------+
这张表没有主键,并有可能存在重复的行。
action 列的类型是 ENUM,可能的值为 ('view', 'like', 'reaction', 'comment', 'report', 'share')。
extra 列拥有一些可选信息,例如:报告理由(a reason for report)或反应类型(a type of reaction)等。
移除表: Removals
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| post_id | int |
| remove_date | date |
+---------------+---------+
这张表的主键是 post_id。
这张表的每一行表示一个被移除的帖子,原因可能是由于被举报或被管理员审查。
编写一段 SQL 来查找:在被报告为垃圾广告的帖子中,被移除的帖子的每日平均占比,四舍五入到小数点后 2 位。
查询结果的格式如下:
Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra |
+---------+---------+-------------+--------+--------+
| 1 | 1 | 2019-07-01 | view | null |
| 1 | 1 | 2019-07-01 | like | null |
| 1 | 1 | 2019-07-01 | share | null |
| 2 | 2 | 2019-07-04 | view | null |
| 2 | 2 | 2019-07-04 | report | spam |
| 3 | 4 | 2019-07-04 | view | null |
| 3 | 4 | 2019-07-04 | report | spam |
| 4 | 3 | 2019-07-02 | view | null |
| 4 | 3 | 2019-07-02 | report | spam |
| 5 | 2 | 2019-07-03 | view | null |
| 5 | 2 | 2019-07-03 | report | racism |
| 5 | 5 | 2019-07-03 | view | null |
| 5 | 5 | 2019-07-03 | report | racism |
+---------+---------+-------------+--------+--------+
Removals table:
+---------+-------------+
| post_id | remove_date |
+---------+-------------+
| 2 | 2019-07-20 |
| 3 | 2019-07-18 |
+---------+-------------+
Result table:
+-----------------------+
| average_daily_percent |
+-----------------------+
| 75.00 |
+-----------------------+
2019-07-04 的垃圾广告移除率是 50%,因为有两张帖子被报告为垃圾广告,但只有一个得到移除。
2019-07-02 的垃圾广告移除率是 100%,因为有一张帖子被举报为垃圾广告并得到移除。
其余几天没有收到垃圾广告的举报,因此平均值为:(50 + 100) / 2 = 75%
注意,输出仅需要一个平均值即可,我们并不关注移除操作的日期。
SELECT ROUND(AVG(proportion) * 100, 2) AS average_daily_percent
FROM (
SELECT actions.action_date, COUNT(DISTINCT removals.post_id)/COUNT(DISTINCT actions.post_id) AS proportion
FROM actions
LEFT JOIN removals
ON actions.post_id = removals.post_id
WHERE extra = 'spam'
GROUP BY actions.action_date
) a
活动记录表:Activity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |
+---------------+---------+
该表是用户在社交网站的活动记录。
该表没有主键,可能包含重复数据。
activity_type 字段为以下四种值 ('open_session', 'end_session', 'scroll_down', 'send_message')。
每个 session_id 只属于一个用户。
请写SQL查询出截至 2019-07-27(包含2019-07-27),近 30天的每日活跃用户数(当天只要有一条活动记录,即为活跃用户)。
查询结果示例如下:
Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1 | 1 | 2019-07-20 | open_session |
| 1 | 1 | 2019-07-20 | scroll_down |
| 1 | 1 | 2019-07-20 | end_session |
| 2 | 4 | 2019-07-20 | open_session |
| 2 | 4 | 2019-07-21 | send_message |
| 2 | 4 | 2019-07-21 | end_session |
| 3 | 2 | 2019-07-21 | open_session |
| 3 | 2 | 2019-07-21 | send_message |
| 3 | 2 | 2019-07-21 | end_session |
| 4 | 3 | 2019-06-25 | open_session |
| 4 | 3 | 2019-06-25 | end_session |
+---------+------------+---------------+---------------+
Result table:
+------------+--------------+
| day | active_users |
+------------+--------------+
| 2019-07-20 | 2 |
| 2019-07-21 | 2 |
+------------+--------------+
非活跃用户的记录不需要展示。
对时间范围进行分组,取user_id数,需要去重
select activity_date day,count(distinct user_id) active_users from Activity
where activity_date between date_sub('2019-07-27',interval 30 day) and '2019-07-27' group by activity_date
;
注意:为了严谨,这里计算要用user_id进行计数
活动记录表:Activity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |
+---------------+---------+
该表是用户在社交网站的活动记录。
该表没有主键,可能包含重复数据。
activity_type 字段为以下四种值 ('open_session', 'end_session', 'scroll_down', 'send_message')。
每个 session_id 只属于一个用户。
编写SQL查询以查找截至2019年7月27日(含)的30天内每个用户的平均会话数,四舍五入到小数点后两位。我们只统计那些会话期间用户至少进行一项活动的有效会话。
查询结果示例如下:
Activity table:
+---------+------------+---------------+---------------+
| user_id | session_id | activity_date | activity_type |
+---------+------------+---------------+---------------+
| 1 | 1 | 2019-07-20 | open_session |
| 1 | 1 | 2019-07-20 | scroll_down |
| 1 | 1 | 2019-07-20 | end_session |
| 2 | 4 | 2019-07-20 | open_session |
| 2 | 4 | 2019-07-21 | send_message |
| 2 | 4 | 2019-07-21 | end_session |
| 3 | 2 | 2019-07-21 | open_session |
| 3 | 2 | 2019-07-21 | send_message |
| 3 | 2 | 2019-07-21 | end_session |
| 4 | 3 | 2019-06-25 | open_session |
| 4 | 3 | 2019-06-25 | end_session |
+---------+------------+---------------+---------------+
Result table:
+---------------------------+
| average_sessions_per_user |
+---------------------------+
| 1.33 |
+---------------------------+
User 1 和 2 在过去30天内各自进行了1次会话,而用户3进行了2次会话,因此平均值为(1 +1 + 2)/ 3 = 1.33。
SELECT IFNULL(ROUND(COUNT(DISTINCT session_id) / COUNT(DISTINCT user_id), 2), 0) AS average_sessions_per_user
FROM Activity
WHERE DATEDIFF('2019-07-27', activity_date) < 30
;
Views
表:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |
+---------------+---------+
此表无主键,因此可能会存在重复行。
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意,同一人的 author_id 和 viewer_id 是相同的。
请编写一条 SQL 查询以找出所有浏览过自己文章的作者,结果按照 id 升序排列。
查询结果的格式如下所示:
Views 表:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
结果表:
+------+
| id |
+------+
| 4 |
| 7 |
+------+
找到viewer_id和author_id相等的,排序,然后去重
select author_id
from Views
where viewer_id=author_id order by author_id
Views
表:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |
+---------------+---------+
此表无主键,因此可能会存在重复行。
此表的每一行都表示某人在某天浏览了某位作者的某篇文章。
请注意,同一人的 author_id 和 viewer_id 是相同的。
编写一条 SQL 查询来找出在同一天阅读至少两篇文章的人,结果按照 id 升序排序。
查询结果的格式如下所示:
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 3 | 4 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
Result table:
+------+
| id |
+------+
| 5 |
| 6 |
+------+
对view_date、viewer_id进行分组,加条件having count(DISTINCT article_id)>=2
SELECT DISTINCT viewer_id AS id
FROM Views
GROUP BY view_date, viewer_id
HAVING COUNT(DISTINCT article_id) >= 2
ORDER BY viewer_id
;
Table: Users
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| join_date | date |
| favorite_brand | varchar |
+----------------+---------+
此表主键是 user_id,表中描述了购物网站的用户信息,用户可以在此网站上进行商品买卖。
Table: Orders
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| order_id | int |
| order_date | date |
| item_id | int |
| buyer_id | int |
| seller_id | int |
+---------------+---------+
此表主键是 order_id,外键是 item_id 和(buyer_id,seller_id)。
Table: Item
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| item_id | int |
| item_brand | varchar |
+---------------+---------+
此表主键是 item_id。
请写出一条SQL语句以查询每个用户的注册日期和在 2019 年作为买家的订单总数。
查询结果格式如下:
Users table:
+---------+------------+----------------+
| user_id | join_date | favorite_brand |
+---------+------------+----------------+
| 1 | 2018-01-01 | Lenovo |
| 2 | 2018-02-09 | Samsung |
| 3 | 2018-01-19 | LG |
| 4 | 2018-05-21 | HP |
+---------+------------+----------------+
Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1 | 2019-08-01 | 4 | 1 | 2 |
| 2 | 2018-08-02 | 2 | 1 | 3 |
| 3 | 2019-08-03 | 3 | 2 | 3 |
| 4 | 2018-08-04 | 1 | 4 | 2 |
| 5 | 2018-08-04 | 1 | 3 | 4 |
| 6 | 2019-08-05 | 2 | 2 | 4 |
+----------+------------+---------+----------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1 | Samsung |
| 2 | Lenovo |
| 3 | LG |
| 4 | HP |
+---------+------------+
Result table:
+-----------+------------+----------------+
| buyer_id | join_date | orders_in_2019 |
+-----------+------------+----------------+
| 1 | 2018-01-01 | 1 |
| 2 | 2018-02-09 | 2 |
| 3 | 2018-01-19 | 0 |
| 4 | 2018-05-21 | 0 |
+-----------+------------+----------------+
方法一:
订单表找到时间为2019年的
select *
from Orders
where YEAR(order_date)=2019
用户表左连接上表
select u.user_id buyer_id,u.join_date join_date,count(o.order_id) orders_in_2019
from Users u
left join
(select *
from Orders
where YEAR(order_date)=2019) o
on u.user_id=o.buyer_id
group by u.user_id,u.join_date
;
方法二:
订单表找到时间为2019年的,并按照buyer_id进行分组,求出数量
select buyer_id,count(order_id) cn
from Orders
where YEAR(order_date)=2019
group by buyer_id
;
用户表和上表进行左连接,没有连接上的为0
select user_id buyer_id,join_date,ifnull(cn,0) orders_in_2019
from Users u left join
(select buyer_id,count(order_id) cn
from Orders
where YEAR(order_date)=2019
group by buyer_id) o
on u.user_id=o.buyer_id
YEAR(date)=int
年份
MONTH(date)=int
月份
表: Users
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| join_date | date |
| favorite_brand | varchar |
+----------------+---------+
user_id 是该表的主键
表中包含一位在线购物网站用户的个人信息,用户可以在该网站出售和购买商品。
表: Orders
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| order_id | int |
| order_date | date |
| item_id | int |
| buyer_id | int |
| seller_id | int |
+---------------+---------+
order_id 是该表的主键
item_id 是 Items 表的外键
buyer_id 和 seller_id 是 Users 表的外键
表: Items
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| item_id | int |
| item_brand | varchar |
+---------------+---------+
item_id 是该表的主键
写一个 SQL 查询确定每一个用户按日期顺序卖出的第二件商品的品牌是否是他们最喜爱的品牌。如果一个用户卖出少于两件商品,查询的结果是 no 。
题目保证没有一个用户在一天中卖出超过一件商品
下面是查询结果格式的例子:
Users table:
+---------+------------+----------------+
| user_id | join_date | favorite_brand |
+---------+------------+----------------+
| 1 | 2019-01-01 | Lenovo |
| 2 | 2019-02-09 | Samsung |
| 3 | 2019-01-19 | LG |
| 4 | 2019-05-21 | HP |
+---------+------------+----------------+
Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1 | 2019-08-01 | 4 | 1 | 2 |
| 2 | 2019-08-02 | 2 | 1 | 3 |
| 3 | 2019-08-03 | 3 | 2 | 3 |
| 4 | 2019-08-04 | 1 | 4 | 2 |
| 5 | 2019-08-04 | 1 | 3 | 4 |
| 6 | 2019-08-05 | 2 | 2 | 4 |
+----------+------------+---------+----------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1 | Samsung |
| 2 | Lenovo |
| 3 | LG |
| 4 | HP |
+---------+------------+
Result table:
+-----------+--------------------+
| seller_id | 2nd_item_fav_brand |
+-----------+--------------------+
| 1 | no |
| 2 | yes |
| 3 | yes |
| 4 | no |
+-----------+--------------------+
id 为 1 的用户的查询结果是 no,因为他什么也没有卖出
id为 2 和 3 的用户的查询结果是 yes,因为他们卖出的第二件商品的品牌是他们自己最喜爱的品牌
id为 4 的用户的查询结果是 no,因为他卖出的第二件商品的品牌不是他最喜爱的品牌
orders表左连接Items表
select seller_id,Orders.item_id item_id,item_brand,
rank() over(partition by seller_id order by order_date) rk
from Orders left join Items
on Orders.item_id=Items.item_id
;
对上表找到排序为2的
select seller_id,item_id,item_brand from
(select seller_id,Orders.item_id item_id,item_brand,
rank() over(partition by seller_id order by order_date) rk
from Orders left join Items
on Orders.item_id=Items.item_id) t where rk=2
;
用Users表左连接上表,进行条件匹配
select user_id seller_id,
case when item_id is null then 'no'
when favorite_brand=item_brand then 'yes'
else 'no' end as 2nd_item_fav_brand
from Users u left join
(select seller_id,item_id,item_brand from
(select seller_id,Orders.item_id item_id,item_brand,
rank() over(partition by seller_id order by order_date) rk
from Orders left join Items
on Orders.item_id=Items.item_id) t where rk=2) o on u.user_id=o.seller_id
看好主外键,知道谁与谁进行join,不能乱join
产品数据表: Products
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| product_id | int |
| new_price | int |
| change_date | date |
+---------------+---------+
这张表的主键是 (product_id, change_date)。
这张表的每一行分别记录了 某产品 在某个日期 更改后 的新价格。
写一段 SQL来查找在 2019-08-16 时全部产品的价格,假设所有产品在修改前的价格都是 10。
查询结果格式如下例所示:
Products table:
+------------+-----------+-------------+
| product_id | new_price | change_date |
+------------+-----------+-------------+
| 1 | 20 | 2019-08-14 |
| 2 | 50 | 2019-08-14 |
| 1 | 30 | 2019-08-15 |
| 1 | 35 | 2019-08-16 |
| 2 | 65 | 2019-08-17 |
| 3 | 20 | 2019-08-18 |
+------------+-----------+-------------+
Result table:
+------------+-------+
| product_id | price |
+------------+-------+
| 2 | 50 |
| 1 | 35 |
| 3 | 10 |
+------------+-------+
1、找到所有产品
select distinct product_id from Products;
2、找到所有在2019-08-16前修改过价格的产品和价格
select product_id,max(change_date) dt
from Products where change_date<='2019-08-16'
group by product_id
;
3、找到上表对应的价格
select product_id,new_price
from Products where (product_id,change_date) in
(select product_id,max(change_date) dt
from Products where change_date<='2019-08-16'
group by product_id)
;
4、用所有产品left join上表,没有的话为10
select u.product_id product_id,
case when new_price is null then 10 else new_price as price
from
(select distinct product_id from Products) u
left join
(select product_id,new_price
from Products where (product_id,change_date) in
(select product_id,max(change_date) dt
from Products where change_date<='2019-08-16'
group by product_id)) p
on
u.product_id=p.product_id
;
配送表: Delivery
+-----------------------------+---------+
| Column Name | Type |
+-----------------------------+---------+
| delivery_id | int |
| customer_id | int |
| order_date | date |
| customer_pref_delivery_date | date |
+-----------------------------+---------+
delivery_id 是表的主键。
该表保存着顾客的食物配送信息,顾客在某个日期下了订单,并指定了一个期望的配送日期(和下单日期相同或者在那之后)。
如果顾客期望的配送日期和下单日期相同,则该订单称为 「即时订单」,否则称为「计划订单」。
写一条 SQL 查询语句获取即时订单所占的百分比, 保留两位小数。
查询结果如下所示:
Delivery 表:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 5 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-11 |
| 4 | 3 | 2019-08-24 | 2019-08-26 |
| 5 | 4 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
+-------------+-------------+------------+-----------------------------+
Result 表:
+----------------------+
| immediate_percentage |
+----------------------+
| 33.33 |
+----------------------+
2 和 3 号订单为即时订单,其他的为计划订单。
方法一:直接计算
找到即时订单数
select count(*) from Delivery where order_date=customer_pref_delivery_date;
全部订单数
select count(delivery_id) from Delivery;
计算百分比
select round((select count(*) from Delivery where order_date=customer_pref_delivery_date)/(select count(delivery_id) from Delivery)*100,2) as immediate_percentage;
方法二:sum case when then else end
select round(sum(case when order_date=customer_pref_delivery_date then 1 else 0 end)/count(*)*100,2) as immediate_percentage
from Delivery;
方法三:sum
select round(sum(order_date=customer_pref_delivery_date)/count(*)*100,2) as immediate_percentage
from Delivery;
配送表: Delivery
+-----------------------------+---------+
| Column Name | Type |
+-----------------------------+---------+
| delivery_id | int |
| customer_id | int |
| order_date | date |
| customer_pref_delivery_date | date |
+-----------------------------+---------+
delivery_id 是表的主键。
该表保存着顾客的食物配送信息,顾客在某个日期下了订单,并指定了一个期望的配送日期(和下单日期相同或者在那之后)。
如果顾客期望的配送日期和下单日期相同,则该订单称为 「即时订单」,否则称为「计划订单」。
「首次订单」是顾客最早创建的订单。我们保证一个顾客只会有一个「首次订单」。
写一条 SQL 查询语句获取即时订单在所有用户的首次订单中的比例。保留两位小数。
查询结果如下所示:
Delivery 表:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 2 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-12 |
| 4 | 3 | 2019-08-24 | 2019-08-24 |
| 5 | 3 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
| 7 | 4 | 2019-08-09 | 2019-08-09 |
+-------------+-------------+------------+-----------------------------+
Result 表:
+----------------------+
| immediate_percentage |
+----------------------+
| 50.00 |
+----------------------+
1 号顾客的 1 号订单是首次订单,并且是计划订单。
2 号顾客的 2 号订单是首次订单,并且是即时订单。
3 号顾客的 5 号订单是首次订单,并且是计划订单。
4 号顾客的 7 号订单是首次订单,并且是即时订单。
因此,一半顾客的首次订单是即时的。
找到首次订单
select customer_id,min(order_date) dt from Delivery group by customer_id
找到对应日期
select customer_id,order_date,customer_pref_delivery_date from Delivery
where (customer_id,order_date) in (select customer_id,min(order_date) dt from Delivery group by customer_id)
计算
select round(sum(order_date=customer_pref_delivery_date)/count(*)*100,2) as immediate_percentage from
(select customer_id,order_date,customer_pref_delivery_date from Delivery
where (customer_id,order_date) in (select customer_id,min(order_date) dt from Delivery group by customer_id)) t
部门表 Department
:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| revenue | int |
| month | varchar |
+---------------+---------+
(id, month) 是表的联合主键。
这个表格有关于每个部门每月收入的信息。
月份(month)可以取下列值 ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]。
编写一个 SQL 查询来重新格式化表,使得新的表中有一个部门 id 列和一些对应 每个月 的收入(revenue)列。
查询结果格式如下面的示例所示:
Department 表:
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+------+---------+-------+
查询得到的结果表:
+------+-------------+-------------+-------------+-----+-------------+
| id | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-------------+
| 1 | 8000 | 7000 | 6000 | ... | null |
| 2 | 9000 | null | null | ... | null |
| 3 | null | 10000 | null | ... | null |
+------+-------------+-------------+-------------+-----+-------------+
注意,结果表有 13 列 (1个部门 id 列 + 12个月份的收入列)。
对部门id进行分组
select id,
sum(case `month` when 'Jan' then revenue end) as Jan_Revenue,
sum(case `month` when 'Feb' then revenue end) as Feb_Revenue,
sum(case `month` when 'Mar' then revenue end) as Mar_Revenue,
sum(case `month` when 'Apr' then revenue end) as Apr_Revenue,
sum(case `month` when 'May' then revenue end) as May_Revenue,
sum(case `month` when 'Jun' then revenue end) as Jun_Revenue,
sum(case `month` when 'Jul' then revenue end) as Jul_Revenue,
sum(case `month` when 'Aug' then revenue end) as Aug_Revenue,
sum(case `month` when 'Sep' then revenue end) as Sep_Revenue,
sum(case `month` when 'Oct' then revenue end) as Oct_Revenue,
sum(case `month` when 'Nov' then revenue end) as Nov_Revenue,
sum(case `month` when 'Dec' then revenue end) as Dec_Revenue
from Department
group by id
;
Table: Transactions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| country | varchar |
| state | enum |
| amount | int |
| trans_date | date |
+---------------+---------+
id 是这个表的主键。
该表包含有关传入事务的信息。
state 列类型为 “[”批准“,”拒绝“] 之一。
编写一个 sql 查询来查找每个月和每个国家/地区的事务数及其总金额、已批准的事务数及其总金额。
查询结果格式如下所示:
Transactions table:
+------+---------+----------+--------+------------+
| id | country | state | amount | trans_date |
+------+---------+----------+--------+------------+
| 121 | US | approved | 1000 | 2018-12-18 |
| 122 | US | declined | 2000 | 2018-12-19 |
| 123 | US | approved | 2000 | 2019-01-01 |
| 124 | DE | approved | 2000 | 2019-01-07 |
+------+---------+----------+--------+------------+
Result table:
+----------+---------+-------------+----------------+--------------------+-----------------------+
| month | country | trans_count | approved_count | trans_total_amount | approved_total_amount |
+----------+---------+-------------+----------------+--------------------+-----------------------+
| 2018-12 | US | 2 | 1 | 3000 | 1000 |
| 2019-01 | US | 1 | 1 | 2000 | 2000 |
| 2019-01 | DE | 1 | 1 | 2000 | 2000 |
+----------+---------+-------------+----------------+--------------------+-----------------------+
按照格式化日期和国家进行分组
select date_format(trans_date,'%Y-%m') month,country,
count(*) trans_count,sum(if(state='approved',1,0)) approved_count,
count(amount) trans_total_amount,sum(if(state='approved',amount,0)) approved_total_amount
from Transactions
group by date_format(trans_date,'%Y-%m'),country
Players
玩家表
+-------------+-------+
| Column Name | Type |
+-------------+-------+
| player_id | int |
| group_id | int |
+-------------+-------+
player_id 是此表的主键。
此表的每一行表示每个玩家的组。
Matches
赛事表
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| match_id | int |
| first_player | int |
| second_player | int |
| first_score | int |
| second_score | int |
+---------------+---------+
match_id 是此表的主键。
每一行是一场比赛的记录,first_player 和 second_player 表示该场比赛的球员 ID。
first_score 和 second_score 分别表示 first_player 和 second_player 的得分。
你可以假设,在每一场比赛中,球员都属于同一组。
每组的获胜者是在组内累积得分最高的选手。如果平局,player_id 最小 的选手获胜。
编写一个 SQL 查询来查找每组中的获胜者。
查询结果格式如下所示
Players 表:
+-----------+------------+
| player_id | group_id |
+-----------+------------+
| 15 | 1 |
| 25 | 1 |
| 30 | 1 |
| 45 | 1 |
| 10 | 2 |
| 35 | 2 |
| 50 | 2 |
| 20 | 3 |
| 40 | 3 |
+-----------+------------+
Matches 表:
+------------+--------------+---------------+-------------+--------------+
| match_id | first_player | second_player | first_score | second_score |
+------------+--------------+---------------+-------------+--------------+
| 1 | 15 | 45 | 3 | 0 |
| 2 | 30 | 25 | 1 | 2 |
| 3 | 30 | 15 | 2 | 0 |
| 4 | 40 | 20 | 5 | 2 |
| 5 | 35 | 50 | 1 | 1 |
+------------+--------------+---------------+-------------+--------------+
Result 表:
+-----------+------------+
| group_id | player_id |
+-----------+------------+
| 1 | 15 |
| 2 | 35 |
| 3 | 40 |
+-----------+------------+
找到每个选手的得分
select first_player pid,first_score sc from Matches union all
select second_player,second_score from Matches
按照选手分组,求出得分
select pid,sum(sc) sc
from (select first_player pid,first_score sc from Matches union all
select second_player,second_score from Matches) t group by pid;
上表和选手表进行连接
select group_id,pid,sc,
row_number() over(partition by group_id order by sc desc,pid asc) rk
from (select pid,sum(sc) sc
from (select first_player pid,first_score sc from Matches union all
select second_player,second_score from Matches) t group by pid) m
left join Players p on m.pid=p.player_id
;
去排名为第一的
select
group_id,
pid player_id
from
(select
group_id,
pid,
sc,
row_number() over(partition by group_id order by sc desc,pid asc) rk
from
(select
pid,
sum(sc) sc
from
(select
first_player pid,
first_score sc
from
Matches
union all
select
second_player,
second_score
from
Matches) t
group by
pid) m
left join
Players p
on
m.pid=p.player_id) t
where
rk=1
;
表: Queue
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| person_id | int |
| person_name | varchar |
| weight | int |
| turn | int |
+-------------+---------+
person_id 是这个表的主键。
该表展示了所有等待电梯的人的信息。
表中 person_id 和 turn 列将包含从 1 到 n 的所有数字,其中 n 是表中的行数。
电梯最大载重量为 1000。
写一条 SQL 查询语句查找最后一个能进入电梯且不超过重量限制的 person_name 。题目确保队列中第一位的人可以进入电梯 。
查询结果如下所示 :
Queue 表
+-----------+-------------------+--------+------+
| person_id | person_name | weight | turn |
+-----------+-------------------+--------+------+
| 5 | George Washington | 250 | 1 |
| 3 | John Adams | 350 | 2 |
| 6 | Thomas Jefferson | 400 | 3 |
| 2 | Will Johnliams | 200 | 4 |
| 4 | Thomas Jefferson | 175 | 5 |
| 1 | James Elephant | 500 | 6 |
+-----------+-------------------+--------+------+
Result 表
+-------------------+
| person_name |
+-------------------+
| Thomas Jefferson |
+-------------------+
为了简化,Queue 表按 turn 列由小到大排序。
上例中 George Washington(id 5), John Adams(id 3) 和 Thomas Jefferson(id 6) 将可以进入电梯,因为他们的体重和为 250 + 350 + 400 = 1000。
Thomas Jefferson(id 6) 是最后一个体重合适并进入电梯的人。
开窗求和
select person_id,person_name,weight,turn,
sum(weight) over(order by turn rows between unbounded preceding and current now) as sum_weight from Queue
;
找到小于等于1000的
select person_name
from (select person_id,person_name,weight,turn,
sum(weight) over(order by turn rows between unbounded preceding and current row) as sum_weight from Queue) t where sum_weight<=1000 order by sum_weight limit 1
;
Transactions
记录表
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| id | int |
| country | varchar |
| state | enum |
| amount | int |
| trans_date | date |
+----------------+---------+
id 是这个表的主键。
该表包含有关传入事务的信息。
状态列是类型为 [approved(已批准)、declined(已拒绝)] 的枚举。
Chargebacks
表
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| trans_id | int |
| trans_date | date |
+----------------+---------+
退单包含有关放置在事务表中的某些事务的传入退单的基本信息。
trans_id 是 transactions 表的 id 列的外键。
每项退单都对应于之前进行的交易,即使未经批准。
编写一个 SQL 查询,以查找每个月和每个国家/地区的信息:已批准交易的数量及其总金额、退单的数量及其总金额。
注意:在您的查询中,只需显示给定月份和国家,忽略所有为零的行。
查询结果格式如下所示:
Transactions 表:
+-----+---------+----------+--------+------------+
| id | country | state | amount | trans_date |
+-----+---------+----------+--------+------------+
| 101 | US | approved | 1000 | 2019-05-18 |
| 102 | US | declined | 2000 | 2019-05-19 |
| 103 | US | approved | 3000 | 2019-06-10 |
| 104 | US | declined | 4000 | 2019-06-13 |
| 105 | US | approved | 5000 | 2019-06-15 |
+-----+---------+----------+--------+------------+
Chargebacks 表:
+----------+------------+
| trans_id | trans_date |
+----------+------------+
| 102 | 2019-05-29 |
| 101 | 2019-06-30 |
| 105 | 2019-09-18 |
+----------+------------+
Result 表:
+---------+---------+----------------+-----------------+------------------+-------------------+
| month | country | approved_count | approved_amount | chargeback_count | chargeback_amount |
+---------+---------+----------------+-----------------+------------------+-------------------+
| 2019-05 | US | 1 | 1000 | 1 | 2000 |
| 2019-06 | US | 2 | 8000 | 1 | 1000 |
| 2019-09 | US | 0 | 0 | 1 | 5000 |
+---------+---------+----------------+-----------------+------------------+-------------------+
with base as(
select state,country,date_format(trans_date,'%Y-%m') month,amount
from Transactions where state='approved'
union all
select 'chargeback' state,t.country,date_format(c.trans_date,'%Y-%m') month,t.amount
from Chargebacks c join Transactions t
on c.trans_id=t.id)
select month,country,
sum(if(state='approved',1,0)) approved_count,
sum(if(state='approved',amount,0)) approved_amount,
sum(if(state='chargeback',1,0)) chargeback_count,
sum(if(state='chargeback',amount,0)) chargeback_amount
from base
group by month,country
;
查询表 Queries
:
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| query_name | varchar |
| result | varchar |
| position | int |
| rating | int |
+-------------+---------+
此表没有主键,并可能有重复的行。
此表包含了一些从数据库中收集的查询信息。
“位置”(position)列的值为 1 到 500 。
“评分”(rating)列的值为 1 到 5 。评分小于 3 的查询被定义为质量很差的查询。
将查询结果的质量 quality
定义为:
各查询结果的评分与其位置之间比率的平均值。
将劣质查询百分比 poor_query_percentage
为:
评分小于 3 的查询结果占全部查询结果的百分比。
编写一组 SQL 来查找每次查询的名称
(query_name
)、质量
(quality
) 和 劣质查询百分比
(poor_query_percentage
)。
质量
(quality
) 和劣质查询百分比
(poor_query_percentage
) 都应四舍五入到小数点后两位。
查询结果格式如下所示:
Queries table:
+------------+-------------------+----------+--------+
| query_name | result | position | rating |
+------------+-------------------+----------+--------+
| Dog | Golden Retriever | 1 | 5 |
| Dog | German Shepherd | 2 | 5 |
| Dog | Mule | 200 | 1 |
| Cat | Shirazi | 5 | 2 |
| Cat | Siamese | 3 | 3 |
| Cat | Sphynx | 7 | 4 |
+------------+-------------------+----------+--------+
Result table:
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog | 2.50 | 33.33 |
| Cat | 0.66 | 33.33 |
+------------+---------+-----------------------+
Dog 查询结果的质量为 ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50
Dog 查询结果的劣质查询百分比为 (1 / 3) * 100 = 33.33
Cat 查询结果的质量为 ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66
Cat 查询结果的劣质查询百分比为 (1 / 3) * 100 = 33.33
方法一:
分组求结果
select
query_name,
round(sum(rating/position)/count(*),2) quality,
round(sum(if(rating<3,1,0))/count(*)*100,2) poor_query_percentage
from Queries
group by query_name
;
方法二:
SELECT
query_name,
ROUND(AVG(rating/position), 2) quality,
ROUND(SUM(IF(rating < 3, 1, 0)) * 100 / COUNT(*), 2) poor_query_percentage
FROM Queries
GROUP BY query_name
Table: Teams
+---------------+----------+
| Column Name | Type |
+---------------+----------+
| team_id | int |
| team_name | varchar |
+---------------+----------+
此表的主键是 team_id,表中的每一行都代表一支独立足球队。
Table: Matches
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| match_id | int |
| host_team | int |
| guest_team | int |
| host_goals | int |
| guest_goals | int |
+---------------+---------+
此表的主键是 match_id,表中的每一行都代表一场已结束的比赛,比赛的主客队分别由它们自己的 id 表示,他们的进球由 host_goals 和 guest_goals 分别表示。
积分规则如下:
写出一条SQL语句以查询每个队的 team_id,team_name 和 num_points。结果根据 num_points 降序排序,如果有两队积分相同,那么这两队按 team_id 升序排序。
查询结果格式如下:
Teams table:
+-----------+--------------+
| team_id | team_name |
+-----------+--------------+
| 10 | Leetcode FC |
| 20 | NewYork FC |
| 30 | Atlanta FC |
| 40 | Chicago FC |
| 50 | Toronto FC |
+-----------+--------------+
Matches table:
+------------+--------------+---------------+-------------+--------------+
| match_id | host_team | guest_team | host_goals | guest_goals |
+------------+--------------+---------------+-------------+--------------+
| 1 | 10 | 20 | 3 | 0 |
| 2 | 30 | 10 | 2 | 2 |
| 3 | 10 | 50 | 5 | 1 |
| 4 | 20 | 30 | 1 | 0 |
| 5 | 50 | 30 | 1 | 0 |
+------------+--------------+---------------+-------------+--------------+
Result table:
+------------+--------------+---------------+
| team_id | team_name | num_points |
+------------+--------------+---------------+
| 10 | Leetcode FC | 7 |
| 20 | NewYork FC | 3 |
| 50 | Toronto FC | 3 |
| 30 | Atlanta FC | 1 |
| 40 | Chicago FC | 0 |
+------------+--------------+---------------+
with base as(
select host_team,
case when host_goals>guest_goals then 3
when host_goals=guest_goals then 1
else 0 end as score
from Matches
union all
select guest_team,
case when host_goals<guest_goals then 3
when host_goals=guest_goals then 1
else 0 end as score
from Matches
)
select team_id,team_name,ifnull(score,0) num_points
from Teams t left join(
select host_team,sum(score) score
from base
group by host_team) m
on t.team_id=m.host_team
order by num_points desc,team_id asc