Spark-Sql数组array类型转string

原数据和表结构

+----------+------------+------------+-------+--------+-----------+
|train_code|station_name|station_code|is_late|late_min|arrive_date|
+----------+------------+------------+-------+--------+-----------+
|K8363     |昆山          |KSH         |1      |1435    |2019-03-19 |
|2149      |兖州          |YZK         |1      |1424    |2019-03-19 |
|K1084     |唐山          |TSP         |0      |0       |2019-03-19 |
|K7755     |唐山北         |FUP         |1      |1415    |2019-03-19 |
|K451      |马兰          |MLR         |0      |0       |2019-03-19 |
|K567      |麻城          |MCN         |0      |0       |2019-03-19 |
|T396      |济宁          |JIK         |1      |4       |2019-03-19 |
|K346      |锦州          |JZD         |0      |0       |2019-03-19 |
|K1126     |衢州          |QEH         |0      |0       |2019-03-19 |
|K1295     |中卫          |ZWJ         |0      |0       |2019-03-19 |
|K125      |唐山          |TSP         |0      |0       |2019-03-19 |
|K1137     |兖州          |YZK         |0      |0       |2019-03-19 |
|K1074     |潢川          |KCN         |0      |0       |2019-03-19 |
|Z180      |呼和浩特东       |NDC         |1      |3       |2019-03-19 |
|K748      |三江县         |SOZ         |1      |7       |2019-03-19 |
|K928      |天津          |TJP         |1      |7       |2019-03-19 |
|K549      |四平          |SPT         |0      |0       |2019-03-19 |
|K96       |鞍山          |AST         |0      |0       |2019-03-19 |
|K1669     |玉山          |YNG         |1      |10      |2019-03-19 |
|K70       |蕲春          |QRN         |0      |0       |2019-03-19 |
+----------+------------+------------+-------+--------+-----------+

root
 |-- train_code: string (nullable = true)
 |-- station_name: string (nullable = true)
 |-- station_code: string (nullable = true)
 |-- is_late: long (nullable = true)
 |-- late_min: long (nullable = true)
 |-- arrive_date: string (nullable = true)

想统计某一个车次所有晚点时间

        Dataset structData = tableData.groupBy("train_code", "station_code").agg(collect_set(struct("arrive_date", "late_min")).as("detail_set"));
        structData.printSchema();
        structData.show(false);

结果如下:

root
 |-- train_code: string (nullable = true)
 |-- station_code: string (nullable = true)
 |-- detail_set: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- arrive_date: string (nullable = true)
 |    |    |-- late_min: long (nullable = true)

+----------+------------+----------------------------------------------------------------+
|train_code|station_code|detail_set                                                      |
+----------+------------+----------------------------------------------------------------+
|0000      |GQD         |[[2019-03-31,0], [2019-03-19,0]]                                |
|0000      |LYT         |[[2019-03-21,0], [2019-03-25,0], [2019-03-19,0]]                |
|1133      |CGV         |[[2019-03-24,0]]                                                |
|1133      |DTV         |[[2019-03-19,0], [2019-03-30,0]]                                |
|1133      |FZC         |[[2019-03-18,0]]                                                |
|1133      |JAC         |[[2019-03-27,0]]                                                |
|1133      |YOV         |[[2019-03-20,0], [2019-03-25,0]]                                |
|1134      |BXP         |[[2019-03-24,0], [2019-03-19,3], [2019-03-18,8]]                |
|1134      |DTV         |[[2019-03-31,0], [2019-03-23,0], [2019-03-30,0]]                |
|1134      |FZC         |[[2019-03-31,0], [2019-03-27,0]]                                |
|1134      |WQC         |[[2019-03-25,0], [2019-03-19,0]]                                |
|1134      |WVC         |[[2019-03-23,5]]                                                |
|1134      |XHP         |[[2019-03-19,7]]                                                |
|1147      |CJY         |[[2019-03-31,12], [2019-03-25,14]]                              |
|1147      |LKF         |[[2019-03-18,1], [2019-03-20,0]]                                |
|1147      |PJH         |[[2019-03-31,6], [2019-03-28,0], [2019-03-25,0], [2019-03-30,0]]|
|1147      |WNY         |[[2019-03-19,23]]                                               |
|1148      |DKH         |[[2019-03-30,2], [2019-03-23,0], [2019-03-21,0]]                |
|1148      |KFF         |[[2019-03-30,0]]                                                |
|1148      |UIH         |[[2019-03-25,7]]                                                |
+----------+------------+----------------------------------------------------------------+

想把detail_set字段有array转换为string类型。
修改代码

        Dataset lateDetail=tableData.groupBy("train_code","station_code").agg(collect_set(concat_ws(":",col("arrive_date"),col("late_min"))).as("late_list"));
        Dataset finalResult=  lateDetail.withColumn("date_list_str",concat_ws(",",col("late_list")));
        finalResult.printSchema();
        finalResult.show(false);

统计结果如下

root
 |-- train_code: string (nullable = true)
 |-- station_code: string (nullable = true)
 |-- late_list: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- date_list_str: string (nullable = false)

+----------+------------+--------------------------------------------------------+---------------------------------------------------+
|train_code|station_code|late_list                                               |date_list_str                                      |
+----------+------------+--------------------------------------------------------+---------------------------------------------------+
|0000      |GQD         |[2019-03-19:0, 2019-03-31:0]                            |2019-03-19:0,2019-03-31:0                          |
|0000      |LYT         |[2019-03-25:0, 2019-03-19:0, 2019-03-21:0]              |2019-03-25:0,2019-03-19:0,2019-03-21:0             |
|1133      |CGV         |[2019-03-24:0]                                          |2019-03-24:0                                       |
|1133      |DTV         |[2019-03-19:0, 2019-03-30:0]                            |2019-03-19:0,2019-03-30:0                          |
|1133      |FZC         |[2019-03-18:0]                                          |2019-03-18:0                                       |
|1133      |JAC         |[2019-03-27:0]                                          |2019-03-27:0                                       |
|1133      |YOV         |[2019-03-25:0, 2019-03-20:0]                            |2019-03-25:0,2019-03-20:0                          |
|1134      |BXP         |[2019-03-19:3, 2019-03-18:8, 2019-03-24:0]              |2019-03-19:3,2019-03-18:8,2019-03-24:0             |
|1134      |DTV         |[2019-03-30:0, 2019-03-23:0, 2019-03-31:0]              |2019-03-30:0,2019-03-23:0,2019-03-31:0             |
|1134      |FZC         |[2019-03-27:0, 2019-03-31:0]                            |2019-03-27:0,2019-03-31:0                          |
|1134      |WQC         |[2019-03-25:0, 2019-03-19:0]                            |2019-03-25:0,2019-03-19:0                          |
|1134      |WVC         |[2019-03-23:5]                                          |2019-03-23:5                                       |
|1134      |XHP         |[2019-03-19:7]                                          |2019-03-19:7                                       |
|1147      |CJY         |[2019-03-25:14, 2019-03-31:12]                          |2019-03-25:14,2019-03-31:12                        |
|1147      |LKF         |[2019-03-20:0, 2019-03-18:1]                            |2019-03-20:0,2019-03-18:1                          |
|1147      |PJH         |[2019-03-25:0, 2019-03-30:0, 2019-03-31:6, 2019-03-28:0]|2019-03-25:0,2019-03-30:0,2019-03-31:6,2019-03-28:0|
|1147      |WNY         |[2019-03-19:23]                                         |2019-03-19:23                                      |
|1148      |DKH         |[2019-03-21:0, 2019-03-23:0, 2019-03-30:2]              |2019-03-21:0,2019-03-23:0,2019-03-30:2             |
|1148      |KFF         |[2019-03-30:0]                                          |2019-03-30:0                                       |
|1148      |UIH         |[2019-03-25:7]                                          |2019-03-25:7                                       |
+----------+------------+--------------------------------------------------------+---------------------------------------------------+
only showing top 20 rows

搞定。

你可能感兴趣的:(Spark)