前言
本例基于1 Spark 2.3.0测试笔记一:Shuffle到胃疼 2 Spark 2.3.0测试笔记二:还能不能玩了? 3 Spark 2.3.1测试笔记一:问题依旧在? 的猜测 2.3.1 SortExec物理算子相对于2.1.2可能存在性能regression 进行benchmark测试。
Test Code
class SortExecBenchmark extends BenchmarkBase {
test("sort with one") {
val N = 2 << 23
runBenchmark("sort with one", N) {
val df = sparkSession.range(N).selectExpr(s"-id * 2 as k1").sort("k1")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
df.count()
}
}
test("sort with two") {
val N = 2 << 23
runBenchmark("sort with two", N) {
val df = sparkSession.range(N)
.selectExpr(s"-id * 2 as k1", "-id % 10000 as k2")
.sort("k2", "k1")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
df.count()
}
}
test("sort with three") {
val N = 2 << 23
runBenchmark("sort with three", N) {
val df = sparkSession.range(N)
.selectExpr(s"-id * 2 as k1", " -id % 100000 as k2", "-id % 10000 as k3")
.sort("k3", "k2", "k1")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
df.count()
}
}
test("merge join reversed") {
val N = 2 << 21
runBenchmark("merge join at the worst", N) {
val df1 = sparkSession.range(N).selectExpr(s"-id * 2 as k1")
val df2 = sparkSession.range(N).selectExpr(s"-id * 3 as k2")
val df = df1.join(df2, col("k1") === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
df.count()
}
}
test("merge join with duplicates reversed") {
val N = 2 << 21
runBenchmark("sort merge join", N) {
val df1 = sparkSession.range(N)
.selectExpr(s"-(id * 15485863) % ${N*10} as k1")
val df2 = sparkSession.range(N)
.selectExpr(s"-(id * 15485867) % ${N*10} as k2")
df1.join(df2, col("k1") === col("k2")).count()
}
}
override def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
val benchmark = new Benchmark(name, cardinality)
benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
f
}
benchmark.addCase(s"$name wholestage on", numIters = 3) { iter =>
sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
f
}
benchmark.run()
}
}
2.1.2 Benchmark records
[info] SortExecBenchmark:
Running benchmark: sort with one
Running case: sort with one wholestage off
Stopped after 2 iterations, 14683 ms
Running case: sort with one wholestage on
Stopped after 3 iterations, 18842 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort with one: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort with one wholestage off 6538 / 7342 2.6 389.7 1.0X
sort with one wholestage on 6175 / 6281 2.7 368.1 1.1X
[info] - sort with one (54 seconds, 387 milliseconds)
Running benchmark: sort with two
Running case: sort with two wholestage off
Stopped after 2 iterations, 18571 ms
Running case: sort with two wholestage on
Stopped after 3 iterations, 26397 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort with two: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort with two wholestage off 9196 / 9286 1.8 548.1 1.0X
sort with two wholestage on 8139 / 8799 2.1 485.1 1.1X
[info] - sort with two (1 minute, 4 seconds)
Running benchmark: sort with three
Running case: sort with three wholestage off
Stopped after 2 iterations, 28709 ms
Running case: sort with three wholestage on
Stopped after 3 iterations, 40878 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort with three: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort with three wholestage off 14038 / 14355 1.2 836.7 1.0X
sort with three wholestage on 13018 / 13626 1.3 775.9 1.1X
[info] - sort with three (1 minute, 37 seconds)
Running benchmark: merge join at the worst
Running case: merge join at the worst wholestage off
Stopped after 2 iterations, 7851 ms
Running case: merge join at the worst wholestage on
Stopped after 3 iterations, 11256 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
merge join at the worst: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
merge join at the worst wholestage off 3870 / 3926 1.1 922.6 1.0X
merge join at the worst wholestage on 3698 / 3752 1.1 881.7 1.0X
[info] - merge join reverted (27 seconds, 471 milliseconds)
Running benchmark: sort merge join
Running case: sort merge join wholestage off
Stopped after 2 iterations, 9358 ms
Running case: sort merge join wholestage on
Stopped after 3 iterations, 13661 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort merge join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort merge join wholestage off 4617 / 4679 0.9 1100.7 1.0X
sort merge join wholestage on 4306 / 4554 1.0 1026.7 1.1X
[info] - merge join with duplicates reverted (32 seconds, 826 milliseconds)
2.3.1 Benchmark records
[info] SortExecBenchmark:
Running benchmark: sort with one
Running case: sort with one wholestage off
Stopped after 2 iterations, 14670 ms
Running case: sort with one wholestage on
Stopped after 3 iterations, 18269 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort with one: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort with one wholestage off 6936 / 7335 2.4 413.4 1.0X
sort with one wholestage on 6040 / 6090 2.8 360.0 1.1X
[info] - sort with one (54 seconds, 443 milliseconds)
Running benchmark: sort with two
Running case: sort with two wholestage off
Stopped after 2 iterations, 18748 ms
Running case: sort with two wholestage on
Stopped after 3 iterations, 25809 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort with two: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort with two wholestage off 9195 / 9374 1.8 548.0 1.0X
sort with two wholestage on 8459 / 8603 2.0 504.2 1.1X
[info] - sort with two (1 minute, 4 seconds)
Running benchmark: sort with three
Running case: sort with three wholestage off
Stopped after 2 iterations, 28472 ms
Running case: sort with three wholestage on
Stopped after 3 iterations, 40225 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort with three: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort with three wholestage off 13708 / 14236 1.2 817.1 1.0X
sort with three wholestage on 13291 / 13408 1.3 792.2 1.0X
[info] - sort with three (1 minute, 36 seconds)
Running benchmark: merge join at the worst
Running case: merge join at the worst wholestage off
Stopped after 2 iterations, 7856 ms
Running case: merge join at the worst wholestage on
Stopped after 3 iterations, 10573 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
merge join at the worst: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
merge join at the worst wholestage off 3810 / 3928 1.1 908.4 1.0X
merge join at the worst wholestage on 3487 / 3525 1.2 831.4 1.1X
[info] - merge join reverted (26 seconds, 664 milliseconds)
Running benchmark: sort merge join
Running case: sort merge join wholestage off
Stopped after 2 iterations, 9118 ms
Running case: sort merge join wholestage on
Stopped after 3 iterations, 13825 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
sort merge join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
sort merge join wholestage off 4450 / 4559 0.9 1061.0 1.0X
sort merge join wholestage on 4395 / 4608 1.0 1047.9 1.0X
2.1.2 vs 2.3.1
version | case | Best/Avg Time(ms) | Rate(M/s) | Per Row(ns) | Relative |
---|---|---|---|---|---|
2.1.2 | sort with one wholestage on | 6175 / 6281 | 2.7 | 368.1 | 1.1X |
2.3.1 | sort with one wholestage on | 6040 / 6090 | 2.8 | 360.0 | 1.1X |
2.1.2 | sort with two wholestage on | 8139 / 8799 | 2.1 | ** 485.1 ** | 1.1X |
2.3.1 | sort with two wholestage on | 8459 / 8603 | 2.0 | 504.2 | 1.1X |
2.1.2 | sort with three wholestage on | 13018 / 13626 | 1.3 | ** 775.9 ** | 1.1X |
2.3.1 | sort with three wholestage on | 13291 / 13408 | 1.3 | 792.2 | 1.0X |
声明
- Benchmark有一定的波动性,也可能因计算机性能得到不同的结果
- 上面的数据,取第三次test的结果,第一次由于sbt编译会占用内存,所以执行
killall java
杀死所有java进程,进而第二次执行“跑热”JVM,最后记录第三次结果 - case有点简单,两者的差异不是特别明显,或许是对于spark那种类似alpha sort的排序方式对primitive类型影响不大
- 在全int场景下,2.1.2相比2.3.1略有优势,但微乎及微
结论
- 尚不能做任何结论,需下一步丰富下用例继续测试复现