Spark 2.3.1测试笔记二:SortExec性能测试1

前言

本例基于1 Spark 2.3.0测试笔记一:Shuffle到胃疼 2 Spark 2.3.0测试笔记二:还能不能玩了? 3 Spark 2.3.1测试笔记一:问题依旧在? 的猜测 2.3.1 SortExec物理算子相对于2.1.2可能存在性能regression 进行benchmark测试。

Test Code

class SortExecBenchmark extends BenchmarkBase {

  test("sort with one") {
    val N = 2 << 23
    runBenchmark("sort with one", N) {
      val df = sparkSession.range(N).selectExpr(s"-id * 2 as k1").sort("k1")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
      df.count()
    }
  }

  test("sort with two") {
    val N = 2 << 23
    runBenchmark("sort with two", N) {
      val df = sparkSession.range(N)
        .selectExpr(s"-id * 2 as k1", "-id % 10000 as k2")
        .sort("k2", "k1")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
      df.count()
    }
  }

  test("sort with three") {
    val N = 2 << 23
    runBenchmark("sort with three", N) {
      val df = sparkSession.range(N)
        .selectExpr(s"-id * 2 as k1", " -id % 100000 as k2", "-id % 10000 as k3")
        .sort("k3", "k2", "k1")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
      df.count()
    }
  }

  test("merge join reversed") {
    val N = 2 << 21
    runBenchmark("merge join at the worst", N) {
      val df1 = sparkSession.range(N).selectExpr(s"-id * 2 as k1")
      val df2 = sparkSession.range(N).selectExpr(s"-id * 3 as k2")
      val df = df1.join(df2, col("k1") === col("k2"))
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
      df.count()
    }
  }

  test("merge join with duplicates reversed") {
    val N = 2 << 21
    runBenchmark("sort merge join", N) {
      val df1 = sparkSession.range(N)
        .selectExpr(s"-(id * 15485863) % ${N*10} as k1")
      val df2 = sparkSession.range(N)
        .selectExpr(s"-(id * 15485867) % ${N*10} as k2")
      df1.join(df2, col("k1") === col("k2")).count()
    }
  }

  override def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
    val benchmark = new Benchmark(name, cardinality)

    benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
      f
    }

    benchmark.addCase(s"$name wholestage on", numIters = 3) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
      f
    }

    benchmark.run()
  }
}

2.1.2 Benchmark records

[info] SortExecBenchmark:
Running benchmark: sort with one
  Running case: sort with one wholestage off
  Stopped after 2 iterations, 14683 ms
  Running case: sort with one wholestage on
  Stopped after 3 iterations, 18842 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with one:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with one wholestage off                  6538 / 7342          2.6         389.7       1.0X
sort with one wholestage on                   6175 / 6281          2.7         368.1       1.1X

[info] - sort with one (54 seconds, 387 milliseconds)
Running benchmark: sort with two
  Running case: sort with two wholestage off
  Stopped after 2 iterations, 18571 ms
  Running case: sort with two wholestage on
  Stopped after 3 iterations, 26397 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with two:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with two wholestage off                  9196 / 9286          1.8         548.1       1.0X
sort with two wholestage on                   8139 / 8799          2.1         485.1       1.1X

[info] - sort with two (1 minute, 4 seconds)
Running benchmark: sort with three
  Running case: sort with three wholestage off
  Stopped after 2 iterations, 28709 ms
  Running case: sort with three wholestage on
  Stopped after 3 iterations, 40878 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with three:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with three wholestage off              14038 / 14355          1.2         836.7       1.0X
sort with three wholestage on               13018 / 13626          1.3         775.9       1.1X

[info] - sort with three (1 minute, 37 seconds)
Running benchmark: merge join at the worst
  Running case: merge join at the worst wholestage off
  Stopped after 2 iterations, 7851 ms
  Running case: merge join at the worst wholestage on
  Stopped after 3 iterations, 11256 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

merge join at the worst:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
merge join at the worst wholestage off        3870 / 3926          1.1         922.6       1.0X
merge join at the worst wholestage on         3698 / 3752          1.1         881.7       1.0X

[info] - merge join reverted (27 seconds, 471 milliseconds)
Running benchmark: sort merge join
  Running case: sort merge join wholestage off
  Stopped after 2 iterations, 9358 ms
  Running case: sort merge join wholestage on
  Stopped after 3 iterations, 13661 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort merge join:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort merge join wholestage off                4617 / 4679          0.9        1100.7       1.0X
sort merge join wholestage on                 4306 / 4554          1.0        1026.7       1.1X

[info] - merge join with duplicates reverted (32 seconds, 826 milliseconds)

2.3.1 Benchmark records

[info] SortExecBenchmark:
Running benchmark: sort with one
  Running case: sort with one wholestage off
  Stopped after 2 iterations, 14670 ms
  Running case: sort with one wholestage on
  Stopped after 3 iterations, 18269 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with one:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with one wholestage off                  6936 / 7335          2.4         413.4       1.0X
sort with one wholestage on                   6040 / 6090          2.8         360.0       1.1X

[info] - sort with one (54 seconds, 443 milliseconds)
Running benchmark: sort with two
  Running case: sort with two wholestage off
  Stopped after 2 iterations, 18748 ms
  Running case: sort with two wholestage on
  Stopped after 3 iterations, 25809 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with two:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with two wholestage off                  9195 / 9374          1.8         548.0       1.0X
sort with two wholestage on                   8459 / 8603          2.0         504.2       1.1X

[info] - sort with two (1 minute, 4 seconds)
Running benchmark: sort with three
  Running case: sort with three wholestage off
  Stopped after 2 iterations, 28472 ms
  Running case: sort with three wholestage on
  Stopped after 3 iterations, 40225 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort with three:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort with three wholestage off              13708 / 14236          1.2         817.1       1.0X
sort with three wholestage on               13291 / 13408          1.3         792.2       1.0X

[info] - sort with three (1 minute, 36 seconds)
Running benchmark: merge join at the worst
  Running case: merge join at the worst wholestage off
  Stopped after 2 iterations, 7856 ms
  Running case: merge join at the worst wholestage on
  Stopped after 3 iterations, 10573 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

merge join at the worst:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
merge join at the worst wholestage off        3810 / 3928          1.1         908.4       1.0X
merge join at the worst wholestage on         3487 / 3525          1.2         831.4       1.1X

[info] - merge join reverted (26 seconds, 664 milliseconds)
Running benchmark: sort merge join
  Running case: sort merge join wholestage off
  Stopped after 2 iterations, 9118 ms
  Running case: sort merge join wholestage on
  Stopped after 3 iterations, 13825 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz

sort merge join:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
sort merge join wholestage off                4450 / 4559          0.9        1061.0       1.0X
sort merge join wholestage on                 4395 / 4608          1.0        1047.9       1.0X

2.1.2 vs 2.3.1

version case Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
2.1.2 sort with one wholestage on 6175 / 6281 2.7 368.1 1.1X
2.3.1 sort with one wholestage on 6040 / 6090 2.8 360.0 1.1X
2.1.2 sort with two wholestage on 8139 / 8799 2.1 ** 485.1 ** 1.1X
2.3.1 sort with two wholestage on 8459 / 8603 2.0 504.2 1.1X
2.1.2 sort with three wholestage on 13018 / 13626 1.3 ** 775.9 ** 1.1X
2.3.1 sort with three wholestage on 13291 / 13408 1.3 792.2 1.0X

声明

  1. Benchmark有一定的波动性,也可能因计算机性能得到不同的结果
  2. 上面的数据,取第三次test的结果,第一次由于sbt编译会占用内存,所以执行killall java杀死所有java进程,进而第二次执行“跑热”JVM,最后记录第三次结果
  3. case有点简单,两者的差异不是特别明显,或许是对于spark那种类似alpha sort的排序方式对primitive类型影响不大
  4. 在全int场景下,2.1.2相比2.3.1略有优势,但微乎及微

结论

  1. 尚不能做任何结论,需下一步丰富下用例继续测试复现

你可能感兴趣的:(Spark 2.3.1测试笔记二:SortExec性能测试1)