参考文章:
1.HIVE行转列和列转行
https://www.cnblogs.com/blogyuhan/p/9274784.html
2.HIVE行转列和列转行
https://blog.csdn.net/jiantianming2/article/details/79189672
行转列 :将多行数据转换为一行数据中的多列
列转行 :将一行中多列数据转换为多行
需求如下
1.有用户学习情况的 uid, 科目,分数 多行数据。已知每个人所学分为 英语,数学,语文3个科目。
uid, subject,score 转换为 uid,math_score, chinese_score, english_score
SQL
INSERT OVERWRITE TABLE user_subject_score2
SELECT
uid
,MAX(
CASE
WHEN subject = 'math' THEN score
ELSE -1
END
) AS chinese_score
,MAX(
CASE
WHEN subject = 'chinese' THEN score
ELSE -1
END
) AS chinese_score
,MAX(
CASE
WHEN subject = 'english' THEN score
ELSE -1
END
) AS chinese_score
FROM user_subject_score1
GROUP BY uid
;
+--------------------------+---------------------------------+------------------------------------+------------------------------------+
| user_subject_score2.uid | user_subject_score2.math_score | user_subject_score2.chinese_score | user_subject_score2.english_score |
+--------------------------+---------------------------------+------------------------------------+------------------------------------+
| 1 | 80.0 | 76.0 | 80.0 |
| 2 | 88.0 | 88.0 | -1.0 |
| 3 | 66.0 | 30.0 | -1.0 |
+--------------------------+---------------------------------+------------------------------------+------------------------------------+
有 用户id ,订单id 求 用户的所有订单列表
uid, order_id -> uid,order_ids
我们看下这两个函数的使用方法
CONCAT_WS
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator. |
+----------------------------------------------------+
COLLECT_LIST
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| collect_list(x) - Returns a list of objects with duplicates |
+----------------------------------------------------+
建表语句与数据录入
use data_warehouse_test;
CREATE TABLE IF NOT EXISTS user_order (
uid BIGINT
,order_id BIGINT
);
CREATE TABLE IF NOT EXISTS user_orders (
uid BIGINT
,order_ids STRING
);
INSERT OVERWRITE TABLE user_order VALUES
(1, 112)
,(1, 123)
,(2, 234)
,(2, 21)
,(3, 821)
;
分析SQL
use data_warehouse_test;
INSERT OVERWRITE TABLE user_orders
SELECT
uid
,CONCAT_WS(',', COLLECT_LIST(order_str)) AS order_list
FROM
(
SELECT uid , CAST(order_id AS STRING) AS order_str
FROM user_order
) tmp
GROUP BY uid
;
输出
+------------------+------------------------+
| user_orders.uid | user_orders.order_ids |
+------------------+------------------------+
| 1 | 112,123 |
| 2 | 234,21 |
| 3 | 821 |
+------------------+------------------------+
USE data_warehouse_test;
CREATE TABLE IF NOT EXISTS explode_laterview_org(
day1_num BIGINT
,day2_num BIGINT
,day3_num BIGINT
,day4_num BIGINT
,day5_num BIGINT
,day6_num BIGINT
,day7_num BIGINT
,campaign_name STRING
,campaign_id BIGINT
);
INSERT OVERWRITE TABLE explode_laterview_org VALUES
(40, 20, 10, 4, 4, 2, 1, 'zoo', 2 )
,(100, 80, 53, 40, 7, 6, 5, 'moji', 3)
;
需要将表中数据转换为以下格式
+--------------+----------------+-----------+------+
| campaign_id | campaign_name | type | num |
+--------------+----------------+-----------+------+
| 2 | zoo | day1_num | 40 |
| 2 | zoo | day2_num | 20 |
| 2 | zoo | day3_num | 10 |
| 2 | zoo | day4_num | 4 |
| 2 | zoo | day5_num | 4 |
| 2 | zoo | day6_num | 2 |
| 2 | zoo | day7_num | 1 |
| 3 | moji | day1_num | 100 |
| 3 | moji | day2_num | 80 |
| 3 | moji | day3_num | 53 |
| 3 | moji | day4_num | 40 |
| 3 | moji | day5_num | 7 |
| 3 | moji | day6_num | 6 |
| 3 | moji | day7_num | 5 |
+--------------+----------------+-----------+------+
SELECT campaign_id, campaign_name, 'day1_num', day1_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day2_num', day2_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day3_num', day3_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day4_num', day4_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day5_num', day5_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day6_num', day6_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day7_num', day7_num
FROM explode_laterview_org
;
输出
+------------------+--------------------+-----------+---------------+
| _u1.campaign_id | _u1.campaign_name | _u1._c2 | _u1.day1_num |
+------------------+--------------------+-----------+---------------+
| 2 | zoo | day1_num | 40 |
| 2 | zoo | day2_num | 20 |
| 2 | zoo | day3_num | 10 |
| 2 | zoo | day4_num | 4 |
| 2 | zoo | day5_num | 4 |
| 2 | zoo | day6_num | 2 |
| 2 | zoo | day7_num | 1 |
| 3 | moji | day1_num | 100 |
| 3 | moji | day2_num | 80 |
| 3 | moji | day3_num | 53 |
| 3 | moji | day4_num | 40 |
| 3 | moji | day5_num | 7 |
| 3 | moji | day6_num | 6 |
| 3 | moji | day7_num | 5 |
+------------------+--------------------+-----------+---------------+
tips:方式一在指标少时方便使用,当行转列的指标比较多时代码量会比较大,维护困难
SELECT
campaign_id, campaign_name, type, num
FROM explode_laterview_org
LATERAL VIEW
EXPLODE(
STR_TO_MAP(
CONCAT(
'day1_num=',CAST (day1_num AS STRING),
'&day2_num=',CAST (day2_num AS STRING),
'&day3_num=',CAST (day3_num AS STRING),
'&day4_num=',CAST (day4_num AS STRING),
'&day5_num=',CAST (day5_num AS STRING),
'&day6_num=',CAST (day6_num AS STRING),
'&day7_num=',CAST (day7_num AS STRING)
)
,'&', '=')
) lateral_table AS type, num
;
+--------------+----------------+-----------+------+
| campaign_id | campaign_name | type | num |
+--------------+----------------+-----------+------+
| 2 | zoo | day1_num | 40 |
| 2 | zoo | day2_num | 20 |
| 2 | zoo | day3_num | 10 |
| 2 | zoo | day4_num | 4 |
| 2 | zoo | day5_num | 4 |
| 2 | zoo | day6_num | 2 |
| 2 | zoo | day7_num | 1 |
| 3 | moji | day1_num | 100 |
| 3 | moji | day2_num | 80 |
| 3 | moji | day3_num | 53 |
| 3 | moji | day4_num | 40 |
| 3 | moji | day5_num | 7 |
| 3 | moji | day6_num | 6 |
| 3 | moji | day7_num | 5 |
+--------------+----------------+-----------+------+
tips
方式二极大的减少重复的代码量,需要熟悉lateral view和str_to_map,concat等函数使用,concat主要拼接key-value形式存储的字符串,如在一个项目开发中使用一个sql处理出十多个指标,后期需要列存储各指标值时,使用方式二就可以使代码减少数10倍,极大提升代码可读性和可维护性,同时key值可以自定义。
如果对这些函数不太熟悉,请参考我的文章
https://blog.csdn.net/u010003835/article/details/106632597