groupByKey实例分析Spark Hash Shuffle

https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala
代码是这链接里的第一个例子

test(“groupByKey without compression”) {
val myConf = conf.clone().set(“spark.shuffle.compress”, “false”)
sc = new SparkContext(“local”, “test”, myConf)
val pairs = sc.parallelize(Array((1, 1), (1, 2), (1, 3), (2, 1)), 4)
val groups = pairs.groupByKey(4).collect()
assert(groups.size === 2)
val valuesFor1 = groups.find(_._1 == 1).get._2
assert(valuesFor1.toList.sorted === List(1, 2, 3))
val valuesFor2 = groups.find(_._1 == 2).get._2
assert(valuesFor2.toList.sorted === List(1))
}

groupByKey实例分析Spark Hash Shuffle_第1张图片

你可能感兴趣的:(spark)