spark concat_ws,collect_set



concat_ws

hive > select product_id, concat_ws('_',collect_set(promotion_id)) as promotion_ids from product_promotion group by product_id;
OK
5112 960024_960025_960026_960027_960028
5113 960043_960044_960045_960046
Time taken: 3.116 seconds
concat_ws实现将多行记录合并成一行

collect_set


from pyspark.sql import functions as F

F.collect_set("di_ware_no")


这里的collect_set的作用是对di_ware_no去重,值得注意的是,必须保证di_ware_no的类型是string类型



你可能感兴趣的:(hive,spark)