Table: Users 用户信息表
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| join_date | date |
| favorite_brand | varchar |
+----------------+---------+
此表主键是 user_id,表中描述了购物网站的用户信息,用户可以在此网站上进行商品买卖。
Table: Orders 用户订单信息表
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| order_id | int |
| order_date | date |
| item_id | int |
| buyer_id | int |
| seller_id | int |
+---------------+---------+
此表主键是 order_id,外键是 item_id 和(buyer_id,seller_id)。
Table: Item 商品信息表
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| item_id | int |
| item_brand | varchar |
+---------------+---------+
此表主键是 item_id。
1.请写出一条SQL语句以查询每个用户的注册日期和在2019年作为买家的订单总数。
查询结果示例
+-----------+------------+----------------+
| buyer_id | join_date | orders_in_2019 |
+-----------+------------+----------------+
| 1 | 2018-01-01 | 1 |
| 2 | 2018-02-09 | 2 |
| 3 | 2018-01-19 | 0 |
| 4 | 2018-05-21 | 0 |
+-----------+------------+----------------+
解题思路:
这里会犯group by+count组合的经典错误,第一次做题会轻易的写出如下结果
select u.user_id,u.join_date,count(order_id) as orders_in_2019
from Orders o left join Users u
on o.buyer_id = u.user_id
where year(order_date) = 2019
group by u.user_id
问题在于对于那些2019年没有订单的用户,结果应该是0,但是上述语句查询结果不会出现2019年没有订单的用户,也就是直接过滤掉了。
原因在于SQL语句是按以下顺序执行的
from > where > group by > having > select > distinct > order by > limit
where在group by之前执行,从而直接过滤掉2019年没有订单的用户
正确解题答案如下:将上述查询语句left join到用户信息表,没有订单的用户会出现null,然后用ifnull
select a.user_id as buyer_id,a.join_date,ifnull(b.orders_in_2019,0) as orders_in_2019
from Users a left join (
select u.user_id,u.join_date,count(order_id) as orders_in_2019
from Orders o left join Users u
on o.buyer_id = u.user_id
where year(order_date) = 2019
group by u.user_id
)b on a.user_id = b.user_id
2.写一个 SQL 查询确定每一个用户按日期顺序卖出的第二件商品的品牌是否是他们最喜爱的品牌。如果一个用户卖出少于两件商品,查询的结果是no 。
示例如下
Users table:
+---------+------------+----------------+
| user_id | join_date | favorite_brand |
+---------+------------+----------------+
| 1 | 2019-01-01 | Lenovo |
| 2 | 2019-02-09 | Samsung |
| 3 | 2019-01-19 | LG |
| 4 | 2019-05-21 | HP |
+---------+------------+----------------+
Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1 | 2019-08-01 | 4 | 1 | 2 |
| 2 | 2019-08-02 | 2 | 1 | 3 |
| 3 | 2019-08-03 | 3 | 2 | 3 |
| 4 | 2019-08-04 | 1 | 4 | 2 |
| 5 | 2019-08-04 | 1 | 3 | 4 |
| 6 | 2019-08-05 | 2 | 2 | 4 |
+----------+------------+---------+----------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1 | Samsung |
| 2 | Lenovo |
| 3 | LG |
| 4 | HP |
+---------+------------+
Result table:
+-----------+--------------------+
| seller_id | 2nd_item_fav_brand |
+-----------+--------------------+
| 1 | no |
| 2 | yes |
| 3 | yes |
| 4 | no |
+-----------+--------------------+
id 为 1 的用户的查询结果是 no,因为他什么也没有卖出
id为 2 和 3 的用户的查询结果是 yes,因为他们卖出的第二件商品的品牌是他们自己最喜爱的品牌
id为 4 的用户的查询结果是 no,因为他卖出的第二件商品的品牌不是他最喜爱的品牌
解题思路:
(1)利用窗口函数给每个用户卖商品的顺序标号,筛选rank_=2
(2)将1的结果left join到users表,如果一个用户卖出少于两件商品,则1中不会出现sell_id,连接到users表时会出现null值
(3)利用case when进行判断
select user_id as "seller_id",
(case when item_brand=favorite_brand then 'yes' else 'no' end) "2nd_item_fav_brand"
from Users u left join(
select t.*
from
(
select o.seller_id,i.item_brand,
row_number() over(partition by seller_id order by order_date) rank_
from orders o left join Items i
on o.item_id = i.item_id
) t
where rank_=2
) a on u.user_id = a.seller_id
order by user_id ;
1.餐馆营销额分析
表:Customer
| Column Name | Type |
+---------------+---------+
| customer_id | int |
| name | varchar |
| visited_on | date |
| amount | int |
+---------------+---------+
(customer_id, visited_on) 是该表的主键
该表包含一家餐馆的顾客交易数据
visited_on 表示 (customer_id) 的顾客在 visited_on 那天访问了餐馆
amount 是一个顾客某一天的消费总额
你是餐馆的老板,现在你想分析一下可能的营业额变化增长(每天至少有一位顾客)
写一条 SQL 查询计算以 7 天(某日期 + 该日期前的 6 天)为一个时间段的顾客消费平均值
查询结果格式的例子如下:
查询结果按 visited_on 排序
average_amount 要 保留两位小数,日期数据的格式为 ('YYYY-MM-DD')
Customer 表:
+-------------+--------------+--------------+-------------+
| customer_id | name | visited_on | amount |
+-------------+--------------+--------------+-------------+
| 1 | Jhon | 2019-01-01 | 100 |
| 2 | Daniel | 2019-01-02 | 110 |
| 3 | Jade | 2019-01-03 | 120 |
| 4 | Khaled | 2019-01-04 | 130 |
| 5 | Winston | 2019-01-05 | 110 |
| 6 | Elvis | 2019-01-06 | 140 |
| 7 | Anna | 2019-01-07 | 150 |
| 8 | Maria | 2019-01-08 | 80 |
| 9 | Jaze | 2019-01-09 | 110 |
| 1 | Jhon | 2019-01-10 | 130 |
| 3 | Jade | 2019-01-10 | 150 |
+-------------+--------------+--------------+-------------+
结果表:
+--------------+--------------+----------------+
| visited_on | amount | average_amount |
+--------------+--------------+----------------+
| 2019-01-07 | 860 | 122.86 |
| 2019-01-08 | 840 | 120 |
| 2019-01-09 | 840 | 120 |
| 2019-01-10 | 1000 | 142.86 |
+--------------+--------------+----------------+
第一个七天消费平均值从 2019-01-01 到 2019-01-07 是 (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
第二个七天消费平均值从 2019-01-02 到 2019-01-08 是 (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
第三个七天消费平均值从 2019-01-03 到 2019-01-09 是 (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
第四个七天消费平均值从 2019-01-04 到 2019-01-10 是 (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86
关键点:每天可能会有多个用户进行消费,这会在自连接生成笛卡尔积时,同一天销售总额会计算多次,所以最好事先先计算出每一天的营销额,然后再用自连接
select a.visited_on, sum(b.amount) amount, round(avg(b.amount),2) average_amount
from
(select visited_on, sum(amount) amount from customer group by visited_on order by visited_on) a
left join
(select visited_on, sum(amount) amount from customer group by visited_on order by visited_on) b
on datediff(a.visited_on,b.visited_on) between 0 and 6
group by a.visited_on
having count(*) = 7
表:Enrollments
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| course_id | int |
| grade | int |
+---------------+---------+
(student_id, course_id) 是该表的主键。
1.编写一个 SQL 查询,查询每位学生获得的最高成绩和它所对应的科目,若科目成绩并列,取 course_id 最小的一门。查询结果需按 student_id 增序进行排序。
解法一:窗口函数
select a.student_id,a.course_id,a.grade
from
(
select student_id,course_id,grade,
row_number()over(partition by student_id order by course_id) as course_rank
from
(
select student_id,course_id,grade,
dense_rank()over(partition by student_id order by grade desc) as grade_rank
from Enrollments
) t
where t.grade_rank = 1
) a
where a.course_rank = 1
order by a.student_id
解法二: in解法(更简单,快速)
select student_id,min(course_id) as course_id,grade
from Enrollments
where (student_id, grade) in (select student_id,max(grade) from Enrollments group by student_id)
group by student_id,grade
order by student_id
2.查询每一科目成绩最高和最低分数的学生,输出courseid,studentid,score
(此题是笔者面试遇到的面试题)
解题思路:
(1)先分别查询最高和最低分数学生
(2)然后使用union合并
预备知识:union和union all
union对两个结果集进行并集操作,两个联合的字段必须一样
两者区别是union要进行重复值扫描,不包括重复行,同时进行默认规则的排序,效率低
union all包括重复行,不进行排序
请注意,union 内部的每个select语句必须拥有相同数量的列。列也必须拥有相似的数据类型。同时,每个select语句中的列的顺序必须相同。
如果union的两列数据类型不一样,可以用强制类型转化为相同类型,convert(value,type)
-- 查询每一科目成绩最高分数的学生
select e.course_id,e.student_id,e.grade as score
from Enrollments e left join
(
select course_id,max(grade) as max_grade
from Enrollments
group by course_id
) t on e.course_id = t.course_id
where e.grade = t.max_grade
union合并
select e.course_id,e.student_id,e.grade as score
from Enrollments e left join
(
select course_id,max(grade) as max_grade
from Enrollments
group by course_id
) t on e.course_id = t.course_id
where e.grade = t.max_grade
union
select e.course_id,e.student_id,e.grade as score
from Enrollments e left join
(
select course_id,min(grade) as min_grade
from Enrollments
group by course_id
) t on e.course_id = t.course_id
where e.grade = t.min_grade