RDD.repartition

    leafsRDD = labeledPointRDD.repartition(numPartitions)

repartition: Coalesce bag into fewer partitions.合并到更少的部分

    Examples:
           >>> b.repartition(5)  # set to have 5 partitions  # doctest: +SKIP`

Partition调整

repartition
reparation是coalesce(numPartitions, shuffle = true),repartition不仅会调整Partition数,也会将Partitioner修改为hashPartitioner,产生shuffle操作。

coalesce
coalesce函数可以控制是否shuffle,但当shuffle为false时,只能减小Partition数,无法增大。

你可能感兴趣的:(RDD.repartition)