Spark PIVOT & UNPIVOT, 行转列和列转行

测试数据

name course score
Darren Chinese 71
Darren Math 81
Darren English 91
Jonathan Chinese 72
Jonathan Math 82
Jonathan English 92
Tom Chinese 73

行转列

语法

SELECT
    xxx
FROM
    table_test
PIVOT(
    聚合函数(value_column) FOR pivot_column in ()
)

Example:

SELECT 
    * 
FROM row_table
PIVOT(
    MAX(score) FOR course in ('Chinese', 'Math', 'English')
)

结果:

name Chinese Math English
Darren 71 81 91
Jonathan 72 82 92
Tom 73 null null

列转行

spark并不支持UNPIVOT,而是用stack()来实现列转行

语法:

SELECT
    STACK
    (
        row_number, 
        'column1_value', column1_name,
         ..., 
        'columnn_value', columnn_name
    ) as (new_column1_name, new_column2_name)

Example:

SELECT
    name
  , STACK
        (
         3, 
         'Chinese', Chinese, 
         'Math', Math, 
         'English', English
    ) as (course, score)
FROM col_table

结果:

name course score
Darren Chinese 71
Darren Math 81
Darren English 91
Jonathan Chinese 72
Jonathan Math 82
Jonathan English 92
Tom Chinese 73
Tom Math null
Tom English null

注意:此时发现结果表和最原始的表比较,Tom多了两行值为null,所以应该再过滤掉null值就得到了和原来一样的表

spark.sql(f"""
    SELECT
        name
      , STACK
            (
            3, 
            'Chinese', Chinese, 
            'Math', Math, 
            'English', English
        ) as (course, score)
    FROM col_table
    
""").where("score is not null")

就能得到和原来一样的结果了。

 

参考:

https://queirozf.com/entries/spark-dataframe-examples-pivot-and-unpivot-data

https://sparkbyexamples.com/spark/how-to-pivot-table-and-unpivot-a-spark-dataframe/

你可能感兴趣的:(Spark)