部门表 Department:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| revenue | int |
| month | varchar |
+---------------+---------+
(id, month) 是表的联合主键。
这个表格有关于每个部门每月收入的信息。
月份(month)可以取下列值 ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]。
编写一个 SQL 查询来重新格式化表,使得新的表中有一个部门 id 列和一些对应 每个月 的收入(revenue)列。
查询结果格式如下面的示例所示:
Department 表:
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+------+---------+-------+
查询得到的结果表:
+------+-------------+-------------+-------------+-----+-------------+
| id | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-------------+
| 1 | 8000 | 7000 | 6000 | ... | null |
| 2 | 9000 | null | null | ... | null |
| 3 | null | 10000 | null | ... | null |
+------+-------------+-------------+-------------+-----+-------------+
注意,结果表有 13 列 (1个部门 id 列 + 12个月份的收入列)。
审题
这个问题其实就是经常遇到的长数据与宽数据之间的转换 类似于数据透视表?
python或者R中都是很容易解决的。
下面看一下这个问题在sql中怎么解决呢?
生成数据
CREATE TABLE department2(
id INT,
revenue INT,
MONTH VARCHAR(10),
PRIMARY KEY(id, MONTH));
INSERT INTO department2 VALUE(1, 8000, 'Jan'),(2, 9000, 'Jan'),(3, 10000, 'Feb'),(1, 7000, 'Feb'),(1, 6000, 'Mar');
自己的解答
用If条件 IF(month
='Jan',revenue,NULL) Jan_Revenue
SELECT id,
IF(`month`='Jan',revenue,NULL) Jan_Revenue,
IF(`month`='Feb',revenue,NULL) Feb_Revenue,
IF(`month`='Mar',revenue,NULL) Mar_Revenue
FROM Department;
结果是这样的
对Id进行分组
SELECT id,
IF(`month`='Jan',revenue,NULL) Jan_Revenue,
IF(`month`='Feb',revenue,NULL) Feb_Revenue,
IF(`month`='Mar',revenue,NULL) Mar_Revenue
FROM Department
GROUP BY id;
并没有得到想要的结果
原因是什么呢?
对于id=1 前三个月都有收入 但只有一个在第一行是非NULL的
分组后选择的是每一组的第一个元素 因此和想象的结果不太一样
那这么做呢
SELECT id, MAX(tmp.Jan_Revenue) AS Jan_Revenue,
MAX(tmp.Feb_Revenue) AS Feb_Revenue,
MAX(tmp.Mar_Revenue) AS Mar_Revenue
FROM (SELECT id,
IF(`month`='Jan',revenue,NULL) Jan_Revenue,
IF(`month`='Feb',revenue,NULL) Feb_Revenue,
IF(`month`='Mar',revenue,NULL) Mar_Revenue
FROM Department) tmp
GROUP BY id;
这样是ok的, 但还要写子查询
试一下聚合函数应该也是可以的 这个就是要在每一组只有一个元素为非NULL 把这个非NULL元素选择出来 所以 max sum min 都行
SELECT id,
MAX(IF(`month`='Jan',revenue,NULL)) Jan_Revenue,
MAX(IF(`month`='Feb',revenue,NULL)) Feb_Revenue,
MAX(IF(`month`='Mar',revenue,NULL)) Mar_Revenue
FROM Department2
GROUP BY id;
结果是一致的
别人的解答
把if函数换成case when语句 其实想法是一致的
值得注意的是这里需要用聚合函数
SELECT id,
SUM(CASE `month` WHEN 'Jan' THEN revenue END) Jan_Revenue,
SUM(CASE `month` WHEN 'Feb' THEN revenue END) Feb_Revenue,
SUM(CASE `month` WHEN 'Mar' THEN revenue END) Mar_Revenue,
SUM(CASE `month` WHEN 'Apr' THEN revenue END) Apr_Revenue,
SUM(CASE `month` WHEN 'May' THEN revenue END) May_Revenue,
SUM(CASE `month` WHEN 'Jun' THEN revenue END) Jun_Revenue,
SUM(CASE `month` WHEN 'Jul' THEN revenue END) Jul_Revenue,
SUM(CASE `month` WHEN 'Aug' THEN revenue END) Aug_Revenue,
SUM(CASE `month` WHEN 'Sep' THEN revenue END) Sep_Revenue,
SUM(CASE `month` WHEN 'Oct' THEN revenue END) Oct_Revenue,
SUM(CASE `month` WHEN 'Nov' THEN revenue END) Nov_Revenue,
SUM(CASE `month` WHEN 'Dec' THEN revenue END) Dec_Revenue
FROM Department
GROUP BY id;
在别人的博客中(group by的使用)看到,单独地使用group by (不加聚合函数),只能显示出每组记录的第一条记录。
我之前觉得,每组本来也只有一个元素,单独使用group by 就刚好显示了这唯一的一条元素,所以不加聚合函数也OK。按照这个想法进行实践,发现答案不能通过。
所以,我学到了:今后但凡使用group by,前面一定要有聚合函数(MAX /MIN / SUM /AVG / COUNT)
还有一种很简单的想法就是 把一月的选出来 再把二月的选出来 靠id连接直到12月
SELECT
DISTINCT
a.id,
Jan.revenue AS Jan_Revenue,
Feb.revenue AS Feb_Revenue,
Mar.revenue AS Mar_Revenue,
Apr.revenue AS Apr_Revenue,
May.revenue AS May_Revenue,
Jun.revenue AS Jun_Revenue,
Jul.revenue AS Jul_Revenue,
Aug.revenue AS Aug_Revenue,
Sep.revenue AS Sep_Revenue,
Octo.revenue AS Oct_Revenue,
Nov.revenue AS Nov_Revenue,
Dece.revenue AS Dec_Revenue
FROM
Department a
LEFT JOIN
Department Jan
ON
a.id = Jan.id
AND
Jan.month = 'Jan'
LEFT JOIN
Department Feb
ON
a.id = Feb.id
AND
Feb.month = 'Feb'
LEFT JOIN
Department Mar
ON
a.id = Mar.id
AND
Mar.month = 'Mar'
LEFT JOIN
Department Apr
ON
a.id = Apr.id
AND
Apr.month = 'Apr'
LEFT JOIN
Department May
ON
a.id = May.id
AND
May.month = 'May'
LEFT JOIN
Department Jun
ON
a.id = Jun.id
AND
Jun.month = 'Jun'
LEFT JOIN
Department Jul
ON
a.id = Jul.id
AND
Jul.month = 'Jul'
LEFT JOIN
Department Aug
ON
a.id = Aug.id
AND
Aug.month = 'Aug'
LEFT JOIN
Department Sep
ON
a.id = Sep.id
AND
Sep.month = 'Sep'
LEFT JOIN
Department Octo
ON
a.id = Octo.id
AND
Octo.month = 'Oct'
LEFT JOIN
Department Nov
ON
a.id = Nov.id
AND
Nov.month = 'Nov'
LEFT JOIN
Department Dece
ON
a.id = Dece.id
AND
Dece.month = 'Dec'