Hive_HQL_行转列与列转行

参考文章:

1.HIVE行转列和列转行

https://www.cnblogs.com/blogyuhan/p/9274784.html

2.HIVE行转列和列转行

https://blog.csdn.net/jiantianming2/article/details/79189672

行转列 :将多行数据转换为一行数据中的多列

列转行 :将一行中多列数据转换为多行

行转列

方式一 : 通过 GROUP BY  + CASE WHEN + 聚合函数

需求如下

1.有用户学习情况的 uid,  科目,分数  多行数据。已知每个人所学分为 英语,数学,语文3个科目。

uid, subject,score 转换为 uid,math_score, chinese_score, english_score

SQL

INSERT OVERWRITE TABLE user_subject_score2
SELECT 
	uid
	,MAX( 
		CASE
			WHEN subject = 'math' THEN score
			ELSE -1
		END
	) AS chinese_score
	,MAX( 
		CASE
			WHEN subject = 'chinese' THEN score
			ELSE -1
		END
	) AS chinese_score
	,MAX( 
		CASE
			WHEN subject = 'english' THEN score
			ELSE -1
		END
	) AS chinese_score
FROM user_subject_score1
GROUP BY uid
;

+--------------------------+---------------------------------+------------------------------------+------------------------------------+
| user_subject_score2.uid  | user_subject_score2.math_score  | user_subject_score2.chinese_score  | user_subject_score2.english_score  |
+--------------------------+---------------------------------+------------------------------------+------------------------------------+
| 1                        | 80.0                            | 76.0                               | 80.0                               |
| 2                        | 88.0                            | 88.0                               | -1.0                               |
| 3                        | 66.0                            | 30.0                               | -1.0                               |
+--------------------------+---------------------------------+------------------------------------+------------------------------------+

方式二: 通过 GROUP BY  + CONCAT_WS + COLLECT_LIST

 有 用户id ,订单id 求 用户的所有订单列表

uid, order_id  ->  uid,order_ids

我们看下这两个函数的使用方法

CONCAT_WS 

+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator. |
+----------------------------------------------------+
 

COLLECT_LIST

+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| collect_list(x) - Returns a list of objects with duplicates |
+----------------------------------------------------+
 

建表语句与数据录入

use data_warehouse_test;

CREATE TABLE IF NOT EXISTS user_order (
	uid BIGINT
	,order_id BIGINT
);

CREATE TABLE IF NOT EXISTS user_orders (
	uid BIGINT
	,order_ids STRING
);

INSERT OVERWRITE TABLE user_order VALUES 
(1, 112)
,(1, 123)
,(2, 234)
,(2, 21)
,(3, 821)
;

分析SQL

use data_warehouse_test;

INSERT OVERWRITE TABLE user_orders
SELECT 
	uid
	,CONCAT_WS(',', COLLECT_LIST(order_str)) AS order_list
FROM
(
	SELECT uid , CAST(order_id AS STRING) AS order_str
	FROM user_order
) tmp
GROUP BY uid
;

输出

+------------------+------------------------+
| user_orders.uid  | user_orders.order_ids  |
+------------------+------------------------+
| 1                | 112,123                |
| 2                | 234,21                 |
| 3                | 821                    |
+------------------+------------------------+
 

列转行

构建测试数据

USE data_warehouse_test;
 
CREATE TABLE IF NOT EXISTS explode_laterview_org(
	day1_num BIGINT
	,day2_num BIGINT
	,day3_num BIGINT
	,day4_num BIGINT
	,day5_num BIGINT
	,day6_num BIGINT
	,day7_num BIGINT
	,campaign_name STRING
	,campaign_id BIGINT
);
 
 
INSERT OVERWRITE TABLE explode_laterview_org VALUES 
(40, 20, 10, 4, 4, 2, 1, 'zoo', 2 )
,(100, 80, 53, 40, 7, 6, 5, 'moji', 3)
;

需要将表中数据转换为以下格式

+--------------+----------------+-----------+------+
| campaign_id  | campaign_name  |   type    | num  |
+--------------+----------------+-----------+------+
| 2            | zoo            | day1_num  | 40   |
| 2            | zoo            | day2_num  | 20   |
| 2            | zoo            | day3_num  | 10   |
| 2            | zoo            | day4_num  | 4    |
| 2            | zoo            | day5_num  | 4    |
| 2            | zoo            | day6_num  | 2    |
| 2            | zoo            | day7_num  | 1    |
| 3            | moji           | day1_num  | 100  |
| 3            | moji           | day2_num  | 80   |
| 3            | moji           | day3_num  | 53   |
| 3            | moji           | day4_num  | 40   |
| 3            | moji           | day5_num  | 7    |
| 3            | moji           | day6_num  | 6    |
| 3            | moji           | day7_num  | 5    |
+--------------+----------------+-----------+------+

方式一 :采用 UNION ALL 的方式

SELECT campaign_id, campaign_name, 'day1_num', day1_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day2_num', day2_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day3_num', day3_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day4_num', day4_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day5_num', day5_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day6_num', day6_num
FROM explode_laterview_org
UNION ALL
SELECT campaign_id, campaign_name, 'day7_num', day7_num
FROM explode_laterview_org
;

输出

+------------------+--------------------+-----------+---------------+
| _u1.campaign_id  | _u1.campaign_name  |  _u1._c2  | _u1.day1_num  |
+------------------+--------------------+-----------+---------------+
| 2                | zoo                | day1_num  | 40            |
| 2                | zoo                | day2_num  | 20            |
| 2                | zoo                | day3_num  | 10            |
| 2                | zoo                | day4_num  | 4             |
| 2                | zoo                | day5_num  | 4             |
| 2                | zoo                | day6_num  | 2             |
| 2                | zoo                | day7_num  | 1             |
| 3                | moji               | day1_num  | 100           |
| 3                | moji               | day2_num  | 80            |
| 3                | moji               | day3_num  | 53            |
| 3                | moji               | day4_num  | 40            |
| 3                | moji               | day5_num  | 7             |
| 3                | moji               | day6_num  | 6             |
| 3                | moji               | day7_num  | 5             |
+------------------+--------------------+-----------+---------------+
 

tips:方式一在指标少时方便使用,当行转列的指标比较多时代码量会比较大,维护困难

方式二:使用lateral view和str_to_map

SELECT 
    campaign_id, campaign_name, type, num
FROM explode_laterview_org
LATERAL VIEW
    EXPLODE(
            STR_TO_MAP(
                    CONCAT(
                        'day1_num=',CAST (day1_num AS STRING),
                        '&day2_num=',CAST (day2_num AS STRING),
                        '&day3_num=',CAST (day3_num AS STRING),
                        '&day4_num=',CAST (day4_num AS STRING),
                        '&day5_num=',CAST (day5_num AS STRING),
                        '&day6_num=',CAST (day6_num AS STRING),
                        '&day7_num=',CAST (day7_num AS STRING)
                    )
                ,'&', '=')
        ) lateral_table AS type, num
;

+--------------+----------------+-----------+------+
| campaign_id  | campaign_name  |   type    | num  |
+--------------+----------------+-----------+------+
| 2            | zoo            | day1_num  | 40   |
| 2            | zoo            | day2_num  | 20   |
| 2            | zoo            | day3_num  | 10   |
| 2            | zoo            | day4_num  | 4    |
| 2            | zoo            | day5_num  | 4    |
| 2            | zoo            | day6_num  | 2    |
| 2            | zoo            | day7_num  | 1    |
| 3            | moji           | day1_num  | 100  |
| 3            | moji           | day2_num  | 80   |
| 3            | moji           | day3_num  | 53   |
| 3            | moji           | day4_num  | 40   |
| 3            | moji           | day5_num  | 7    |
| 3            | moji           | day6_num  | 6    |
| 3            | moji           | day7_num  | 5    |
+--------------+----------------+-----------+------+
 

tips

      方式二极大的减少重复的代码量,需要熟悉lateral view和str_to_map,concat等函数使用,concat主要拼接key-value形式存储的字符串,如在一个项目开发中使用一个sql处理出十多个指标,后期需要列存储各指标值时,使用方式二就可以使代码减少数10倍,极大提升代码可读性和可维护性,同时key值可以自定义。

如果对这些函数不太熟悉,请参考我的文章

https://blog.csdn.net/u010003835/article/details/106632597

你可能感兴趣的:(Hive)