Spark Transformation —— repartition算子

def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T]

该函数其实就是coalesce函数第二个参数为true的实现,coalesce 有合并联合的意思,更偏向于合并分区,而 repartion 算子就是重新分区的意思。

scala> var rdd2 = data.repartition(1)
rdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at repartition at :29

scala> rdd2.partitions.size
res8: Int = 1

scala> var rdd2 = data.repartition(4)
rdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[19] at repartition at :29

scala> rdd2.partitions.size
res9: Int = 4

你可能感兴趣的:(Spark)