关系型数据库:Relational database
金老师练习网站:http://www.xuesql.cn/lesson/select_queries_introduction
1.数据库范式(normalization)
2.用JOINs进行多表联合查询
3.用INNER JOIN 连接表的语法
SELECT column, another_table_column, …
FROM mytable (主表)
INNER JOIN another_table (要连接的表)
ON mytable.id = another_table.id (想象一下刚才讲的主键连接,两个相同的连成1条)
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;
⚠️ 注意:INNER JOIN 可以简写做 JOIN. 两者是相同的意思,但我们还是会继续写作 INNER JOIN 以便和后面的 LEFT JOIN, RIGHT JOIN等相比较.
1.table1 :MOVIES
1.找到所有电影的国内Domestic_sales和国际销售额
SELECT * FROM movies
INNER JOIN Boxoffice
ON Movies.Id = Boxoffice.Movie_id
2.找到所有国际销售额比国内销售大的电影
SELECT * FROM movies
INNER JOIN Boxoffice
ON Movies.Id = Boxoffice.Movie_id
WHERE International_sales > Domestic_sales
⚠️ 注意:最后一句直接用了column名称,前面没有+table.column
3.找出所有电影按市场占有率rating倒序排列
SELECT * FROM movies
INNER JOIN Boxoffice
ON Movies.Id = Boxoffice.Movie_id
ORDER BY Rating
⚠️ 注意:默认升序,无ASC也可
4.每部电影按国际销售额比较,排名最靠前的导演是谁,线上销量多少
SELECT Director,International_sales FROM movies
INNER JOIN Boxoffice
ON Movies.Id = Boxoffice.Movie_id
ORDER BY International_sales DESC
LIMIT 1
用LEFT/RIGHT/FULL JOINs 做多表查询
SELECT column, another_column, …
FROM mytable
INNER/LEFT/RIGHT/FULL JOIN another_table
ON mytable.id = another_table.matching_id
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;
⚠️ 注意: A 连接 B, LEFT JOIN保留A的所有行,不管有没有能匹配上B ;
⚠️ 注意: 反过来 RIGHT JOIN则保留所有B里的行;
⚠️ 注意:最后FULL JOIN 不管有没有匹配上,同时保留A和B里的所有行 。
LEFT OUTER JOIN = LEFT JOIN; FULL OUTER JOIN = FULL JOIN
1.找到所有有雇员的办公室(buildings)名字
SELECT Distinct Building FROM employees
LEFT JOIN Buildings
ON Employees.Building = Buildings.Building_name
WHERE Building IS NOT NULL;
⚠️ 注意: WHERE Building IS NOT NULL
左表中含有 null, 直接输出会显示 null
+ where column is not null
2.找到所有办公室和他们的最大容量
SELECT * FROM Buildings
RIGHT JOIN Employees
ON Buildings.Building_name = Employees.Building
⚠️ 注意: 输出空白, 右边按照关键词Building无法匹配左边,一对多
RIGHT ➡️ LEFT
SELECT * FROM Buildings
Left JOIN Employees
ON Buildings.Building_name = Employees.Building
⚠️ 注意: 最后两行连接左表为空
⚠️ 注意:
⚠️ 注意:
3.找到所有办公室里的所有角色(包含没有雇员的),并做唯一输出(DISTINCT)
SELECT Distinct Building_name,Role FROM employees
LEFT JOIN Buildings
ON Employees.Building = Buildings.Building_name;
SELECT DISTINCT Building_name, Role
FROM Buildings
LEFT JOIN Employees
ON Buildings.Building_name = Employees.Building;
4.找到所有有雇员的办公室(buildings)和对应的容量
SELECT Name,Building ,Capacity FROM employees
LEFT JOIN Buildings
ON Employees.Building = Buildings.Building_name
where Building is not null
+ distinct
SELECT distinct Building ,Capacity FROM employees
LEFT JOIN Buildings
ON Employees.Building = Buildings.Building_name
where Building is not null
在查询条件中处理 NULL
SELECT column, another_column, …
FROM mytable
WHERE column IS/IS NOT NULL
AND/OR another_condition
AND/OR …;
2.找到还没有雇员的办公室
SELECT * FROM Buildings
LEFT JOIN Employees
ON Building_name = Building
+ WHERE Role IS NULL
SELECT Building_name FROM Buildings
LEFT JOIN Employees
ON Building_name = Building
WHERE Role IS NULL;
1.包含表达式的例子
SELECT particle_speed / 2.0 AS half_particle_speed (对结果做了一个除2)
FROM physics_data
WHERE ABS(particle_position) * 10.0 >500
(条件要求这个属性绝对值乘以10大于500);
2.AS使用别名
SELECT col_expression AS expr_description, …
FROM mytable;
3.属性列和表取别名的例子
SELECT column AS better_column_name, …
FROM a_long_widgets_table_name AS mywidgets
INNER JOIN widget_sales
ON mywidgets.id = widget_sales.widget_id;
1.列出所有的电影ID,名字和销售总额(以百万美元为单位计算)
SELECT ID ,Title,(Domestic_sales+International_sales)/1000000 AS TOTAL_SALES
FROM movies
JOIN Boxoffice
ON Id=Movie_id
2.列出所有的电影ID,名字和市场指数(Rating的10倍为市场指数)
SELECT ID ,Title,Rating*10 AS Indx FROM movies
JOIN Boxoffice
ON Id=Movie_id
3.列出所有偶数年份的电影,需要电影ID,名字和年份
SELECT Id,Title,Year
FROM Movies
WHERE Year % 2 = 0
⚠️ 注意: WHERE Year%2=0 表示偶数
4.John Lasseter导演的每部电影每分钟值多少钱,告诉我最高的3个电影名和价值就可以
SELECT Title,(Domestic_sales+International_sales)/Length_minutes AS Minute_value
FROM Movies
JOIN Boxoffice
ON Id=Movie_id
WHERE Director='John Lasseter'
ORDER BY Minute_value DESC
LIMIT 3
SELECT Title,Length(Title),(Domestic_sales+International_sales) AS TOTAL_SALES
FROM Movies
JOIN Boxoffice
ON Id=Movie_id
ORDER BY Length(Title) DESC
LIMIT 3
1.全部结果数据做统计
SELECT AGG_FUNC(column_or_expression) AS aggregate_description, …
FROM mytable
WHERE constraint_expression;
Function | Description |
---|---|
COUNT(*) | 统计数据行数 |
COUNT(column) | 统计非NULL的行数 |
SUM(column) | 对column所有行求和 |
2.用分组的方式统计
SELECT AGG_FUNC(column_or_expression) AS aggregate_description, …
FROM mytable
WHERE constraint_expression
GROUP BY column;
1.找出就职年份最高的雇员(列出雇员名字+年份
SELECT Name,Years_employed FROM employees
ORDER BY Years_employed DESC
LIMIT 1
2.按角色(Role)统计一下每个角色的平均就职年份
SELECT ROLE,AVG(Years_employed) FROM employees
GROUP BY ROLE
3.按办公室名字总计一下就职年份总和
SELECT BUILDING,SUM(Years_employed) FROM employees
GROUP BY Building
4.每栋办公室按人数排名,不要统计无办公室的雇员
SELECT BUILDING,COUNT(BUILDING) FROM employees
WHERE Building IS NOT NULL
GROUP BY Building
⚠️ 注意: 先 WHERE 后 GROUP BY
5.就职1,3,5,7年的人分别占总人数的百分比率是多少(给出年份和比率"50%" 记为 50)
SELECT Years_employed,COUNT(Years_employed)/(SELECT COUNT(Name) FROM employees) AS RATIO
FROM employees
WHERE Years_employed IN (1,3,7,5)
GROUP BY Years_employed
SELECT Years_employed,COUNT(Years_employed)*100/(SELECT COUNT(Name) FROM employees) AS RATIO
FROM employees
WHERE Years_employed IN (1,3,7,5)
GROUP BY Years_employed
用HAVING进行筛选
SELECT group_by_column, AGG_FUNC(column_expression) AS aggregate_result_alias, …
FROM mytable
WHERE condition
GROUP BY column
HAVING group_condition;
1.统计一下Artist角色的雇员数量
SELECT COUNT(*) FROM employees
WHERE ROLE='Artist'
2.按角色统计一下每个角色的雇员数量
SELECT ROLE,COUNT(*) FROM employees
GROUP BY ROLE
⚠️ 注意: 有GROUP BY 无 distinct ?
3.算出Engineer角色的就职年份总计
SELECT SUM(Years_employed) FROM employees
WHERE ROLE='Engineer'
4.按角色分组算出每个角色按有办公室和没办公室的统计人数(列出角色,数量,有无办公室,注意一个角色如果部分有办公室,部分没有需分开统计)
SELECT ROLE,COUNT(BUILDING) AS BUILDING,COUNT(*)-COUNT(BUILDING)
AS NO_BUILDING
FROM employees
GROUP BY ROLE
SELECT Role,CASE when Building is NOT NULL
THEN '1' ELSE '0' END
AS Have,COUNT(Name)
FROM Employees
GROUP BY Role,Have;
5.按角色和就职年份统计人数,年份按0-3,3-6,6-9这种阶梯分组,最后按角色+阶梯分组排序
SELECT Role,Years_employed/3 AS Year,
COUNT(Name) FROM Employees
GROUP BY Role,Year
ORDER BY Year ASC;
完整的SELECT查询
SELECT DISTINCT column, AGG_FUNC(column_or_expression), …
FROM mytable
JOIN another_table
ON mytable.column = another_table.column
WHERE constraint_expression
GROUP BY column
HAVING constraint_expression
ORDER BY column ASC/DESC
LIMIT count OFFSET COUNT;
1. FROM 和 JOINs
FROM 或 JOIN会第一个执行,确定一个整体的数据范围. 如果要JOIN不同表,可能会生成一个临时Table来用于 下面的过程。总之第一步可以简单理解为确定一个数据源表(含临时表)
2. WHERE
我们确定了数据来源 WHERE 语句就将在这个数据源中按要求进行数据筛选,并丢弃不符合要求的数据行,所有的筛选col属性 只能来自FROM圈定的表. AS别名还不能在这个阶段使用,因为可能别名是一个还没执行的表达式
3. GROUP BY
如果你用了 GROUP BY 分组,那GROUP BY 将对之前的数据进行分组,统计等,并将是结果集缩小为分组数.这意味着 其他的数据在分组后丢弃.
4. HAVING
如果你用了 GROUP BY 分组, HAVING 会在分组完成后对结果集再次筛选。AS别名也不能在这个阶段使用.
5. SELECT
确定结果之后,SELECT用来对结果col简单筛选或计算,决定输出什么数据.
6. DISTINCT
如果数据行有重复DISTINCT 将负责排重.
7. ORDER BY
在结果集确定的情况下,ORDER BY 对结果做排序。因为SELECT中的表达式已经执行完了。此时可以用AS别名.
8. LIMIT / OFFSET
最后 LIMIT 和 OFFSET 从排序的结果中截取部分数据.
1.统计出每一个导演的电影数量(列出导演名字和数量)
SELECT Director,count() FROM movies
GROUP BY Director
2.统计一下每个导演的销售总额(列出导演名字和销售总额)
--错误做法
SELECT Director,Domestic_sales+International_sales FROM movies
JOIN Boxoffice
ON Id = Movie_id
GROUP BY Director
⚠️ 注意: 没有sum时, group by加和,只累计第一条出现的数据;sum(Domestic_sales+ International_sales), 函数返回数值列的总数值
--正确做法
SELECT Director,sum(Domestic_sales+International_sales) FROM movies
JOIN Boxoffice
ON Id = Movie_id
GROUP BY Director
3.按导演分组计算销售总额,求出平均销售额冠军(统计结果过滤掉只有单部电影的导演,列出导演名,总销量,电影数量,平均销量)
--错误做法 没有sum
SELECT Director,
(Domestic_sales+International_sales) AS TOTAL_SALES,
COUNT(Director),
(Domestic_sales+International_sales)/COUNT(Director) AS AVG_SALES
FROM movies
JOIN Boxoffice
ON Id = Movie_id
GROUP BY Director
HAVING COUNT(Director)>1
ORDER BY AVG_SALES DESC
LIMIT 1
--正确做法
SELECT director,SUM(b.domestic_sales+b.international_sales)
as Sum_sales,
count(title) as Total,
sum(b.domestic_sales+b.international_sales)/count(title)
AS average FROM Movies m
LEFT JOIN Boxoffice b
ON m.Id = b.Movie_id
GROUP BY Director
HAVING COUNT(Title) > 1
ORDER BY Average DESC
LIMIT 1;
⚠️ 注意: GROUP BY Director 前面用sum()
4.找出每部电影和单部电影销售冠军之间的销售差,列出电影名,销售额差额
SELECT TITLE,
(SELECT MAX(Domestic_sales+International_sales) FROM Boxoffice)
-SUM(Domestic_sales+International_sales) AS DIFFERENCE
FROM movies
JOIN Boxoffice
ON Id = Movie_id
GROUP BY Title;
⚠️ 注意: MAX ,SUM 一类和GROUP BY联合使用
⚠️ 注意:SELECT FROM 哪个表要搞清楚
⚠️ 注意:没有SELECT的时候,max没有结果显示