本文根据《SQL进阶教程》([日]MICK/著 吴炎昌/译)所写笔记。
数据库的一个难题是无法将SQL语句的执行结果转换为想要的格式。本节,我们将通过学习格式转换中具有代表性的行列转换和嵌套式侧栏的生成方法,深入理解一下其中骑着重要作用的外连接。
外连接有三种:左外连接、右外连接、全外连接
其中,左外连接和右外连接没有什么功能上的区别。用作主表的表写在运算符左边时用左外连接,写在运算符右边时用右外连接。
现在,我们先用实际例子来体验一下什么是全外连接。首先我们先创建两张表:
CREATE TABLE Class_A
(id char(1),
name varchar(30),
PRIMARY KEY(id));
CREATE TABLE Class_B
(id char(1),
name varchar(30),
PRIMARY KEY(id));
INSERT INTO Class_A (id, name) VALUES('1', '田中');
INSERT INTO Class_A (id, name) VALUES('2', '铃木');
INSERT INTO Class_A (id, name) VALUES('3', '伊集院');
INSERT INTO Class_B (id, name) VALUES('1', '田中');
INSERT INTO Class_B (id, name) VALUES('2', '铃木');
INSERT INTO Class_B (id, name) VALUES('4', '西园寺');
mysql> select * from class_a;
+----+-----------+
| id | name |
+----+-----------+
| 1 | 田中 |
| 2 | 铃木 |
| 3 | 伊集院 |
+----+-----------+
mysql> select * from class_b;
+----+-----------+
| id | name |
+----+-----------+
| 1 | 田中 |
| 2 | 铃木 |
| 4 | 西园寺 |
+----+-----------+
在这两张表里,田中和铃木同时属于两张表,而伊集院和西园寺只属于其中一张表,全外连接能够从这样两张内容不一致的表里,没有遗漏地获取全部信息的方法,所以也可以理解成“把两张表都当作主表来使用” 的连接。
mysql> select coalesce(a.id,b.id) as id,a.name as a_name,b.name as b_name
-> from class_a a full outer join class_b
-> on a.id=b.id;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'full outer join class_b
on a.id=b.id' at line 2
执行错误,因为MySQL中不支持外连接,所以可以分别进行左外连接和右外连接,再把两个结果通过union合并起来。
mysql> select a.id as id,a.name,b.name
-> from class_a a left outer join class_b b
-> on a.id=b.id
-> union select b.id as id ,a.name,b.name
-> from class_a a right outer join class_b b
-> on a.id=b.id;
+----+-----------+-----------+
| id | name | name |
+----+-----------+-----------+
| 1 | 田中 | 田中 |
| 2 | 铃木 | 铃木 |
| 3 | 伊集院 | NULL |
| 4 | NULL | 西园寺 |
+----+-----------+-----------+
我们换个角度,把表连接堪称集合运算:
伊集院在A班里存在而在B班里不存在b_name列的值为null。西园寺在B班里存在而在A班里不存在,a_name列的值是null。于是我们可以通过判断连接后的相关字段是否为null来求得差集。
mysql> select a.id as id,a.name as a_name
-> from class_a a left outer join class_b b
-> on a.id=b.id
-> where b.name is null;
mysql> select b.id as id,b.name as b_name
-> from class_a a right outer join class_b b
-> on a.id=b.id
-> where a.name is null;
+----+-----------+
| id | b_name |
+----+-----------+
| 4 | 西园寺 |
+----+-----------+
什么是异或:
如果a、b两个值不相同,则异或结果为1。如果a、b两个值相同,异或结果为0。
mysql> select coalesce(a.id,b.id) as id,
-> coalesce(a.name,b.name) as name
-> from class_a a full outer join class_b b
-> on a.id=b.id
-> where a.name is null or b.name is null;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'full outer join class_b b
on a.id=b.id
where a.name is null or b.name is null' at line 3
MySQL中不支持全连接,将代码改成如下:
mysql> select a.id as id,a.name as name
-> from class_a a left outer join class_b b
-> on a.id=b.id
-> where b.name is null
-> union select b.id as id,b.name as name
-> from class_a a right outer join class_b b
-> on a.id=b.id
-> where a.name is null;
+----+-----------+
| id | name |
+----+-----------+
| 3 | 伊集院 |
| 4 | 西园寺 |
+----+-----------+
首先先创建一个courses表:
CREATE TABLE Courses
(name VARCHAR(32),
course VARCHAR(32),
PRIMARY KEY(name, course));
INSERT INTO Courses VALUES('赤井', 'SQL入门');
INSERT INTO Courses VALUES('赤井', 'UNIX基础');
INSERT INTO Courses VALUES('铃木', 'SQL入门');
INSERT INTO Courses VALUES('工藤', 'SQL入门');
INSERT INTO Courses VALUES('工藤', 'Java中级');
INSERT INTO Courses VALUES('吉田', 'UNIX基础');
INSERT INTO Courses VALUES('渡边', 'SQL入门');
mysql> select * from courses;
+--------+------------+
| name | course |
+--------+------------+
| 吉田 | UNIX基础 |
| 工藤 | Java中级 |
| 工藤 | SQL入门 |
| 渡边 | SQL入门 |
| 赤井 | SQL入门 |
| 赤井 | UNIX基础 |
| 铃木 | SQL入门 |
+--------+------------+
我们利用上面的表生成如下的表:
SQL入门 | UNIX基础 | Java中级 | |
---|---|---|---|
赤井 | ⚪ | ⚪ | |
工藤 | ⚪ | ⚪ | |
铃木 | ⚪ | ||
吉田 | ⚪ | ||
渡边 | ⚪ |
方法一:
mysql> select c0.name,
-> case when c1.name is not null then '⚪' else null end as 'SQL入门',
-> case when c2.name is not null then '⚪' else null end as 'UNIX基础',
-> case when c3.name is not null then '⚪' else null end as 'Java中级'
-> from (select distinct name from courses) c0
-> left outer join
-> (select name from courses where course='SQL入门') c1
-> on c0.name=c1.name
-> left outer join
-> (select name from courses where course='UNIX基础') c2
-> on c0.name=c2.name
-> left outer join
-> (select name from courses where course='Java中级') c3
-> on c0.name=c3.name;
+--------+-----------+------------+------------+
| name | SQL入门 | UNIX基础 | Java中级 |
+--------+-----------+------------+------------+
| 吉田 | NULL | ⚪ | NULL |
| 工藤 | ⚪ | NULL | ⚪ |
| 渡边 | ⚪ | NULL | NULL |
| 赤井 | ⚪ | ⚪ | NULL |
| 铃木 | ⚪ | NULL | NULL |
+--------+-----------+------------+------------+
方法二:
mysql> select c0.name,
-> (select 'o'
-> from courses c1
-> where course='SQL入门' and c1.name=c0.name) as 'SQL入门',
-> (select 'o'
-> from courses c2
-> where course='UNIX基础' and c2.name=c0.name) as 'UNIX基础',
-> (select 'o'
-> from courses c3
-> where course='Java中级' and c3.name=c0.name) as 'Java中级'
-> from (select distinct name from courses) c0;
+--------+-----------+------------+------------+
| name | SQL入门 | UNIX基础 | Java中级 |
+--------+-----------+------------+------------+
| 吉田 | NULL | o | NULL |
| 工藤 | o | NULL | o |
| 渡边 | o | NULL | NULL |
| 赤井 | o | o | NULL |
| 铃木 | o | NULL | NULL |
+--------+-----------+------------+------------+
方法三:(嵌套使用case表达式)
case表达式可以写在select子句里的聚合函数内部,也可以写在聚合函数外部。
mysql> select name,
-> case when sum(case when course='SQL入门' then 1 else null end)=1 then 'o' else null end as 'SQL入门',
-> case when sum(case when course='UNIX基础' then 1 else null end)=1 then 'o' else null end as 'UNIX基础',
-> case when sum(case when course='Java中级' then 1 else null end)=1 then 'o' else null end as 'Java中级'
-> from courses group by name;
+--------+-----------+------------+------------+
| name | SQL入门 | UNIX基础 | Java中级 |
+--------+-----------+------------+------------+
| 吉田 | NULL | o | NULL |
| 工藤 | o | NULL | o |
| 渡边 | o | NULL | NULL |
| 赤井 | o | o | NULL |
| 铃木 | o | NULL | NULL |
+--------+-----------+------------+------------+
有下面一张personnel表:
CREATE TABLE Personnel
(employee varchar(32),
child_1 varchar(32),
child_2 varchar(32),
child_3 varchar(32),
PRIMARY KEY(employee));
INSERT INTO Personnel VALUES('赤井', '一郎', '二郎', '三郎');
INSERT INTO Personnel VALUES('工藤', '春子', '夏子', NULL);
INSERT INTO Personnel VALUES('铃木', '夏子', NULL, NULL);
INSERT INTO Personnel VALUES('吉田', NULL, NULL, NULL);
mysql> select * from personnel;
+----------+---------+---------+---------+
| employee | child_1 | child_2 | child_3 |
+----------+---------+---------+---------+
| 吉田 | NULL | NULL | NULL |
| 工藤 | 春子 | 夏子 | NULL |
| 赤井 | 一郎 | 二郎 | 三郎 |
| 铃木 | 夏子 | NULL | NULL |
+----------+---------+---------+---------+
列数据转换成行数据,使用union all
mysql> select employee,child_1 as child from personnel
-> union all
-> select employee,child_2 as child from personnel
-> union all
-> select employee,child_3 as child from personnel;
+----------+--------+
| employee | child |
+----------+--------+
| 吉田 | NULL |
| 工藤 | 春子 |
| 赤井 | 一郎 |
| 铃木 | 夏子 |
| 吉田 | NULL |
| 工藤 | 夏子 |
| 赤井 | 二郎 |
| 铃木 | NULL |
| 吉田 | NULL |
| 工藤 | NULL |
| 赤井 | 三郎 |
| 铃木 | NULL |
+----------+--------+
其实本质上,我们想生成的是如下的表:
employee(员工) | child(孩子) |
---|---|
赤井 | 一郎 |
赤井 | 二郎 |
赤井 | 三郎 |
工藤 | 春子 |
工藤 | 夏子 |
铃木 | 夏子 |
吉田 |
在这道例题中,我们不能单纯地将‘child’列为null的行排除掉。我们先来生成一个存储子女列表的视图(孩子主表)
mysql> create view children(child)
-> as select child_1 from personnel
-> union
-> select child_2 from personnel
-> union
-> select child_3 from personnel;
mysql> select * from children;
+--------+
| child |
+--------+
| NULL |
| 春子 |
| 一郎 |
| 夏子 |
| 二郎 |
| 三郎 |
+--------+
mysql> select emp.employee,children.child
-> from personnel emp
-> left outer join children
-> on children.child in (emp.child_1,emp.child_2,child_3);
+----------+--------+
| employee | child |
+----------+--------+
| 工藤 | 春子 |
| 赤井 | 一郎 |
| 工藤 | 夏子 |
| 铃木 | 夏子 |
| 赤井 | 二郎 |
| 赤井 | 三郎 |
| 吉田 | NULL |
+----------+--------+
-- 这样一来,当表personnel里孩子1--孩子3列的名字存在于children视图里时,返回该名字,否则返回null。
我们将以下面的商品主表和商品销售历史管理为例,深入探讨一下乘法运算:
CREATE TABLE Items
(item_no INTEGER PRIMARY KEY,
item VARCHAR(32) NOT NULL);
INSERT INTO Items VALUES(10, 'FD');
INSERT INTO Items VALUES(20, 'CD-R');
INSERT INTO Items VALUES(30, 'MO');
INSERT INTO Items VALUES(40, 'DVD');
CREATE TABLE SalesHistory
(sale_date DATE NOT NULL,
item_no INTEGER NOT NULL,
quantity INTEGER NOT NULL,
PRIMARY KEY(sale_date, item_no));
INSERT INTO SalesHistory VALUES('2007-10-01', 10, 4);
INSERT INTO SalesHistory VALUES('2007-10-01', 20, 10);
INSERT INTO SalesHistory VALUES('2007-10-01', 30, 3);
INSERT INTO SalesHistory VALUES('2007-10-03', 10, 32);
INSERT INTO SalesHistory VALUES('2007-10-03', 30, 12);
INSERT INTO SalesHistory VALUES('2007-10-04', 20, 22);
INSERT INTO SalesHistory VALUES('2007-10-04', 30, 7);
mysql> select * from items;
+---------+------+
| item_no | item |
+---------+------+
| 10 | FD |
| 20 | CD-R |
| 30 | MO |
| 40 | DVD |
+---------+------+
mysql> select * from saleshistory;
+------------+---------+----------+
| sale_date | item_no | quantity |
+------------+---------+----------+
| 2007-10-01 | 10 | 4 |
| 2007-10-01 | 20 | 10 |
| 2007-10-01 | 30 | 3 |
| 2007-10-03 | 10 | 32 |
| 2007-10-03 | 30 | 12 |
| 2007-10-04 | 20 | 22 |
| 2007-10-04 | 30 | 7 |
+------------+---------+----------+
我们希望得出的结果如下:
item_no | total_qty |
---|---|
10 | 36 |
20 | 32 |
30 | 22 |
40 |
因为没有销售记录的40号商品也需要输出在结果里,所以很显然,这里使用外连接。
mysql> select i.item_no,sh.total_qty
-> from items i left outer join
-> (select item_no,sum(quantity) as total_qty
-> from saleshistory
-> group by item_no) sh
-> on i.item_no=sh.item_no;
+---------+-----------+
| item_no | total_qty |
+---------+-----------+
| 10 | 36 |
| 20 | 32 |
| 30 | 22 |
| 40 | NULL |
+---------+-----------+
以下有一种性能更加好的代码:
mysql> select i.item_no,sum(sh.quantity) as total_qty
-> from items i left outer join saleshistory sh
-> on i.item_no=sh.item_no
-> group by i.item_no;
+---------+-----------+
| item_no | total_qty |
+---------+-----------+
| 10 | 36 |
| 20 | 32 |
| 30 | 22 |
| 40 | NULL |
+---------+-----------+