leafsRDD = labeledPointRDD.repartition(numPartitions)
repartition: Coalesce bag into fewer partitions.合并到更少的部分
Examples:
>>> b.repartition(5) # set to have 5 partitions # doctest: +SKIP`
repartition
reparation是coalesce(numPartitions, shuffle = true),repartition不仅会调整Partition数,也会将Partitioner修改为hashPartitioner,产生shuffle操作。
coalesce
coalesce函数可以控制是否shuffle,但当shuffle为false时,只能减小Partition数,无法增大。