collect_set、collect_list 、concat_ws (多行合并)

collect_set去除重复元素;collect_list不去除重复元素

+------+-----------------------------------+------------------------------------+
|gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
+------+-----------------------------------+------------------------------------+
|female|                             no,yes|                    no,yes,no,no,yes|
|  male|                             no,yes|                    no,yes,no,yes,no|
+------+-----------------------------------+------------------------------------+

你可能感兴趣的:(spark,hadoop)