简介: MaxCompute推出新语法 - PIVOT/UNPIVOT:通过PIVOT关键字基于聚合将一个或者多个指定值的行转换为列;通过UNPIVOT关键字可将一个或者多个列转换为行。以更简洁易用的方式满足行转列和列转行的需求,简化了查询语句,提高了广大大数据开发者的生产力。
MaxCompute(原ODPS)是阿里云自主研发的具有业界领先水平的分布式大数据处理平台, 尤其在集团内部得到广泛应用,支撑了多个 BU 的核心业务。MaxCompute 除了持续优化性能外,也致力于提升 SQL 语言的用户体验和表达能力,提高广大 MaxCompute 开发者的生产力。
MaxCompute 基于 MaxCompute2.0 新一代的 SQL 引擎,显著提升了 SQL 语言编译过程的易用性与语言的表达能力。我们在此推出深入 MaxCompute 系列文章
第一弹 - 善用MaxCompute编译器的错误和警告
第二弹 - 新的基本数据类型与内建函数
第三弹 - 复杂类型
第四弹 - CTE,VALUES,SEMIJOIN
第五弹 - SELECT TRANSFORM
第六弹 - User Defined Type
第七弹 - Grouping Set, Cube and Rollup
第八弹 - 动态类型函数
第九弹 - 脚本模式与参数视图
第十弹 - IF ELSE分支语句
第十一弹 - QUALIFY
本文将向您介绍MaxCompute支持的新语法 - PIVOT/UNPIVOT,即通过PIVOT关键字基于聚合将一个或者多个指定值的行转换为列;通过UNPIVOT关键字可将一个或者多个列转换为行。常见的场景入下:
PIVOT语法将指定的行旋转为多列,并且对其余列值聚合得到结果并旋转表。PIVOT语法是FROM子句的一部分。
SELECT ...
FROM ...
PIVOT (
[AS ] [, [AS ]] ...
FOR ( [, ] ...)
IN (
( [, ] ...) AS
[, ( [, ] ...) AS ]
...
)
)
[...]
更详细的语法使用说明可参考文档。
PIVOT语法可以等效为group by + aggregate function + filter的结合。以下面这个例子为例
SELECT ...
FROM ...
PIVOT (
agg1 AS a, agg2 AS b, ...
FOR (axis1, ..., axisN)
IN (
(v11, ..., v1N) AS label1,
(v21, ..., v2N) AS label2,
...)
)
上面的语法等效于
SELECT
k1, ... kN,
agg1 AS label1_a FILTER (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label1_b FILTER (where axis1 = v21 and ... and axisN = v2N),
...,
agg1 AS label2_a FILTER (where axis1 = v11 and ... and axisN = v1N),
agg2 AS label2_b FILTER (where axis1 = v21 and ... and axisN = v2N),
...,
FROM xxxxxx
GROUP BY k1, ... kN
其中FROM内的表是PIVOT上游的结果,k1, … kN是所有未在agg1, agg2, …和axis1, …, axisN出现的列的集合。
create table shops_table as select * from (select * from values
('pen', 10, 500, 'shop1', 2020),
('pen', 11, 500, 'shop2', 2020),
('pen', 9, 300, 'shop3', 2020),
('pen', 12, 400,'shop4', 2020),
('pen', 15, 200, 'shop1', 2021),
('pen', 16, 300, 'shop2', 2021),
('pen', 16, 400, 'shop3', 2021),
('pen', 15, 300, 'shop4', 2021),
('ruler', 20, 700, 'shop1', 2020),
('ruler', 19, 900, 'shop2', 2020),
('ruler', 22, 800, 'shop3', 2020),
('ruler', 19, 700, 'shop4', 2020),
('ruler', 25, 300, 'shop1', 2021),
('ruler', 20, 500, 'shop2', 2021),
('ruler', 23, 500, 'shop3', 2021),
('ruler', 26, 600, 'shop4', 2021)
shops(item_name, count, sales, shop_name, year));
select * from shops_table;
-- 结果如下:
+-----------+------------+------------+-----------+------------+
| item_name | count | sales | shop_name | year |
+-----------+------------+------------+-----------+------------+
| pen | 10 | 500 | shop1 | 2020 |
| pen | 11 | 500 | shop2 | 2020 |
| pen | 9 | 300 | shop3 | 2020 |
| pen | 12 | 400 | shop4 | 2020 |
| pen | 15 | 200 | shop1 | 2021 |
| pen | 16 | 300 | shop2 | 2021 |
| pen | 16 | 400 | shop3 | 2021 |
| pen | 15 | 300 | shop4 | 2021 |
| ruler | 20 | 700 | shop1 | 2020 |
| ruler | 19 | 900 | shop2 | 2020 |
| ruler | 22 | 800 | shop3 | 2020 |
| ruler | 19 | 700 | shop4 | 2020 |
| ruler | 25 | 300 | shop1 | 2021 |
| ruler | 20 | 500 | shop2 | 2021 |
| ruler | 23 | 500 | shop3 | 2021 |
| ruler | 26 | 600 | shop4 | 2021 |
+-----------+------------+------------+-----------+------------+
SELECT item_name
,year
,SUM(CASE shop_name WHEN 'shop1' THEN count END) AS shop1
,SUM(CASE shop_name WHEN 'shop2' THEN count END) AS shop2
,SUM(CASE shop_name WHEN 'shop3' THEN count END) AS shop3
,SUM(CASE shop_name WHEN 'shop4' THEN count END) AS shop4
FROM shops_table
GROUP BY item_name
,year
;
--结果如下:
+-----------+------------+------------+------------+------------+------------+
| item_name | year | 'shop1' | 'shop2' | 'shop3' | 'shop4' |
+-----------+------------+------------+------------+------------+------------+
| pen | 2020 | 10 | 11 | 9 | 12 |
| pen | 2021 | 15 | 16 | 16 | 15 |
| ruler | 2020 | 20 | 19 | 22 | 19 |
| ruler | 2021 | 25 | 20 | 23 | 26 |
+-----------+------------+------------+------------+------------+------------+
select * from (select item_name, year,count,shop_name from shops_table)
pivot (sum(count) for shop_name in ('shop1', 'shop2', 'shop3', 'shop4'));
--结果如下:
+------------+------------+------------+------------+------------+------------+
| item_name | year | 'shop1' | 'shop2' | 'shop3' | 'shop4' |
+------------+------------+------------+------------+------------+------------+
| pen | 2020 | 10 | 11 | 9 | 12 |
| pen | 2021 | 15 | 16 | 16 | 15 |
| ruler | 2020 | 20 | 19 | 22 | 19 |
| ruler | 2021 | 25 | 20 | 23 | 26 |
+------------+------------+------------+------------+------------+------------+
可以在此时为聚合函数和新的列起别名,列名根据下划线合并:
select * from (select item_name, count, shop_name, year from shops_table)
pivot (sum(count) as sum_count for shop_name in ('shop1' as shop_name_1, 'shop2' as shop_name_2, 'shop3' as shop_name_3, 'shop4' as shop_name_4));
--结果如下:
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
| item_name | year | shop_name_1_sum_count | shop_name_2_sum_count | shop_name_3_sum_count | shop_name_4_sum_count |
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
| pen | 2020 | 10 | 11 | 9 | 12 |
| pen | 2021 | 15 | 16 | 16 | 15 |
| ruler | 2020 | 20 | 19 | 22 | 19 |
| ruler | 2021 | 25 | 20 | 23 | 26 |
+------------+------------+-----------------------+-----------------------+-----------------------+-----------------------+
select * from shops_table
pivot (sum(count) as sum_count, max(sales) as max_sales for shop_name in ('shop1' as shop_name_1, 'shop2' as shop_name_2, 'shop3' as shop_name_3, 'shop4' as shop_name_4));
--结果如下:
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| item_name | year | shop_name_1_sum_count | shop_name_2_sum_count | shop_name_3_sum_count | shop_name_4_sum_count | shop_name_1_max_sales | shop_name_2_max_sales | shop_name_3_max_sales | shop_name_4_max_sales |
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
| pen | 2020 | 10 | 11 | 9 | 12 | 500 | 500 | 300 | 400 |
| pen | 2021 | 15 | 16 | 16 | 15 | 200 | 300 | 400 | 300 |
| ruler | 2020 | 20 | 19 | 22 | 19 | 700 | 900 | 800 | 700 |
| ruler | 2021 | 25 | 20 | 23 | 26 | 300 | 500 | 500 | 600 |
+-----------+------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
select * from shops_table
pivot (sum(count) as sum_count, max(sales) as max_sales for (shop_name, year) in (('shop1', 2021) as shop1_2021, ('shop1', 2020) as shop1_2020));
--结果如下:
+-----------+----------------------+----------------------+----------------------+----------------------+
| item_name | shop1_2021_sum_count | shop1_2020_sum_count | shop1_2021_max_sales | shop1_2020_max_sales |
+-----------+----------------------+----------------------+----------------------+----------------------+
| pen | 15 | 10 | 200 | 500 |
| ruler | 25 | 20 | 300 | 700 |
+-----------+----------------------+----------------------+----------------------+----------------------+
UNPIVOT语法通过将列转换为行来旋转表格,UNPIVOT语法是FROM子句的一部分。
SELECT ...
FROM ...
UNPIVOT [EXCLUDE NULLS] (
[, ] ...
FOR ( [, ] ...)
IN (
( [, ] ...) AS ( [, ] ...)
[, ( [, ] ...) AS ( [, ] ...)]
...
)
)
[...]
更详细的语法使用说明可参考文档。
UNPIVOT语法可以等效为CROSS JOIN + CASE WHEN表达式的结合。以下面这个例子为例:
SELECT ...
FROM ...
UNPIVOT [exclude nulls] (
(measure1, ..., measureM)
FOR (axis1, ..., axisN)
IN ((c11, ..., c1M) AS (value11, ..., value1N),
(c21, ..., c2M) AS (value21, ..., value2N), ...))
[...]
上面的语法等效于
SELECT * FROM
(
SELECT
k1, ... kN,
CASE
WHEN axis1 = value11 AND ... AND axisN = value1N THEN c11
WHEN axis1 = value21 AND ... AND axisN = value2N THEN c21
...
ELSE null AS measure1,
...,
CASE
WHEN axis1 = value11 AND ... AND axisN = value1N THEN c1M
WHEN axis1 = value21 AND ... AND axisN = value2N THEN c2M
ELSE null AS measureM,
axis1, ..., axisN
FROM xxxx
JOIN (VALUES (value11, ..., value1N),(value21, ..., value2N), ... AS generated_table_name(axis1, ..., axisN))
)
[WHERE measure1 is not null OR ... OR measureM is not null]
create table shops as select * from (select * from values
('pen', 2020, 100, 200, 300, 400),
('pen', 2021, 100, 200, 200, 100),
('ruler', 2020, 300, 400, 300, 200),
('ruler', 2021, 400, 300, 100, 100)
shops(item_name, year, shop1, shop2, shop3, shop4));
SELECT * from shops;
--执行结果:
+-----------+------------+------------+------------+------------+------------+
| item_name | year | shop1 | shop2 | shop3 | shop4 |
+-----------+------------+------------+------------+------------+------------+
| pen | 2020 | 100 | 200 | 300 | 400 |
| pen | 2021 | 100 | 200 | 200 | 100 |
| ruler | 2020 | 300 | 400 | 300 | 200 |
| ruler | 2021 | 400 | 300 | 100 | 100 |
+-----------+------------+------------+------------+------------+------------+
select * from(
select item_name,year, 'shop1' as shop_name, shop1 as count from shops
union ALL
select item_name,year, 'shop2' as shop_name, shop2 as count from shops
UNION ALL
select item_name,year, 'shop3' as shop_name, shop3 as count from shops
UNION ALL
select item_name,year, 'shop4' as shop_name, shop4 as count from shops
);
--执行结果
+------------+------------+------------+------------+
| item_name | year | shop_name | count |
+------------+------------+------------+------------+
| pen | 2020 | shop1 | 100 |
| pen | 2021 | shop1 | 100 |
| ruler | 2020 | shop1 | 300 |
| ruler | 2021 | shop1 | 400 |
| pen | 2020 | shop2 | 200 |
| pen | 2021 | shop2 | 200 |
| ruler | 2020 | shop2 | 400 |
| ruler | 2021 | shop2 | 300 |
| pen | 2020 | shop3 | 300 |
| pen | 2021 | shop3 | 200 |
| ruler | 2020 | shop3 | 300 |
| ruler | 2021 | shop3 | 100 |
| pen | 2020 | shop4 | 400 |
| pen | 2021 | shop4 | 100 |
| ruler | 2020 | shop4 | 200 |
| ruler | 2021 | shop4 | 100 |
+------------+------------+------------+------------+
select * from shops
unpivot (count for shop_name in (shop1, shop2, shop3, shop4));
--执行结果
+------------+------------+------------+------------+
| item_name | year | count | shop_name |
+------------+------------+------------+------------+
| pen | 2020 | 100 | shop1 |
| pen | 2020 | 200 | shop2 |
| pen | 2020 | 300 | shop3 |
| pen | 2020 | 400 | shop4 |
| pen | 2021 | 100 | shop1 |
| pen | 2021 | 200 | shop2 |
| pen | 2021 | 200 | shop3 |
| pen | 2021 | 100 | shop4 |
| ruler | 2020 | 300 | shop1 |
| ruler | 2020 | 400 | shop2 |
| ruler | 2020 | 300 | shop3 |
| ruler | 2020 | 200 | shop4 |
| ruler | 2021 | 400 | shop1 |
| ruler | 2021 | 300 | shop2 |
| ruler | 2021 | 100 | shop3 |
| ruler | 2021 | 100 | shop4 |
+------------+------------+------------+------------+
select * from shops
unpivot ((count1, count2) for shop_name in ((shop1, shop2) as 'east_shop', (shop3, shop4) as 'west_shop'));
--执行结果
+------------+------------+------------+------------+------------+
| item_name | year | count1 | count2 | shop_name |
+------------+------------+------------+------------+------------+
| pen | 2020 | 100 | 200 | east_shop |
| pen | 2020 | 300 | 400 | west_shop |
| pen | 2021 | 100 | 200 | east_shop |
| pen | 2021 | 200 | 100 | west_shop |
| ruler | 2020 | 300 | 400 | east_shop |
| ruler | 2020 | 300 | 200 | west_shop |
| ruler | 2021 | 400 | 300 | east_shop |
| ruler | 2021 | 100 | 100 | west_shop |
+------------+------------+------------+------------+------------+
别名可以是多列,但是对应的需要生成的新的列名要相应增加:
select * from shops
unpivot ((count1, count2) for (shop_name, location) in ((shop1, shop2) as ('east_shop', 'east'), (shop3, shop4) as ('west_shop', 'west')));
--执行结果
+------------+------------+------------+------------+------------+------------+
| item_name | year | count1 | count2 | shop_name | location |
+------------+------------+------------+------------+------------+------------+
| pen | 2020 | 100 | 200 | east_shop | east |
| pen | 2020 | 300 | 400 | west_shop | west |
| pen | 2021 | 100 | 200 | east_shop | east |
| pen | 2021 | 200 | 100 | west_shop | west |
| ruler | 2020 | 300 | 400 | east_shop | east |
| ruler | 2020 | 300 | 200 | west_shop | west |
| ruler | 2021 | 400 | 300 | east_shop | east |
| ruler | 2021 | 100 | 100 | west_shop | west |
+------------+------------+------------+------------+------------+------------+
PIVOT/UNPIVOT语法,以更简洁易用的方式满足行转列和列转行的需求,简化了查询语句,提高了广大大数据开发者的生产力。