介绍Hive查询中的行列转换的用法
1. 案例一:求数学成绩比语文成绩好的学生的ID
(1) 需求分析
现有 hive 表 score, 内容如下:
hive> select * from score;
1 1 yuwen 43
2 1 shuxue 55
3 2 yuwen 77
4 2 shuxue 88
5 3 yuwen 98
6 3 shuxue 65
其中字段意义:
id(int) sid(int) subject(string) score(int)
分别代表:
本条记录的ID 学生ID 科目 成绩
需求:
求数学成绩比语文成绩好的学生的ID
(2) 方法一(join)
SELECT s1.sid FROM score s1 INNER JOIN score s2
ON s1.sid = s2.sid
AND s1.score > s2.score
AND s1.subject = 'shuxue'
AND s2.subject = 'yuwen';
# 结果
1
2
(3) 方法二(行列转换)
思路:
--(1)
CREATE TABLE t1 AS
SELECT sid,
CASE subject WHEN 'yuwen' THEN score END AS yuwen,
CASE subject WHEN 'shuxue' THEN score END AS shuxue
FROM score;
t1中的数据:
1 43 NULL
1 NULL 55
2 77 NULL
2 NULL 88
3 98 NULL
3 NULL 65
--(2)
CREATE TABLE t2 AS
SELECT sid, max(yuwen) yuwen, max(shuxue) shuxue
FROM t1
GROUP BY sid;
t2中的数据:
1 43 55
2 77 88
3 98 65
--(3)
SELECT sid FROM t2 WHERE shuxue > yuwen;
结果:
1
2
2.案例二:销售表的行列转换
(1) 需求
现有hive表sales,内容如下:
hive> select * from sales;
sales.y sales.season sales.sale
1991 1 11
1991 2 12
1991 3 13
1991 4 14
1992 1 21
1992 2 22
1992 3 23
1992 4 24
各字段分别代表:
y 年份
season 季度
sale 销售量
要求:在一行中显示每年每个季度的销售量
(2)实现
SELECT
tmp.y,
max(tmp.season1) season1,
max(tmp.season2) season2,
max(tmp.season3) season3,
max(tmp.season4) season4
FROM (SELECT y,
CASE season WHEN 1 THEN sale END AS season1,
CASE season WHEN 2 THEN sale END AS season2,
CASE season WHEN 3 THEN sale END AS season3,
CASE season WHEN 4 THEN sale END AS season4
FROM sales) tmp
GROUP BY tmp.y;
结果:
tmp.y season1 season2 season3 season4
1991 11 12 13 14
1992 21 22 23 24
3. 案例三:学生成绩表的列转行
(1) 需求
有如下学生成绩表score:
id sname math computer english
1 Jed 34 58 58
2 Tony 45 87 45
3 Tom 76 34 89
请编写一个SQL语句把以上的这张表转换成下面这张表:
id sname course score
1 Jed computer 58
1 Jed english 58
1 Jed math 34
2 Tony computer 87
2 Tony english 45
2 Tony math 45
3 Tom computer 34
3 Tom english 89
3 Tom math 76
(2) 实现
select id, sname, 'math' as course, math as score from score
union
select id, sname, 'computer' as course, computer as score from score
union
select id, sname, 'english' as course, english as score from score
order by id, sname, course;