spark concat_ws,collect_set

 
  
 
  

concat_ws

hive > select product_id, concat_ws('_',collect_set(promotion_id)) as promotion_ids from product_promotion group by product_id;
OK
5112 960024_960025_960026_960027_960028
5113 960043_960044_960045_960046
Time taken: 3.116 seconds
concat_ws实现将多行记录合并成一行

collect_set

 
   
from pyspark.sql import functions as F
 
   
F.collect_set("di_ware_no")
 
   
 
   
这里的collect_set的作用是对di_ware_no去重,值得注意的是,必须保证di_ware_no的类型是string类型
 
  
 
  
 
 

你可能感兴趣的:(hive,spark)