虽然JMH可以帮我们更好了解我们所编写的代码,但是如果我们所编写的JMH基准测试本身就有问题,那就很难起到指导作用。
现代的Java虚拟机越来越智能,它在类的早期编译阶段、加载阶段以及后期的运行时都可以为我们的代码进行相关的优化,比如Dead Code的擦除、常量的折叠、循环的打开,甚至是profile的优化。
所谓dead code 是指JVM为我们擦去了上下文无关的代码,甚至经过计算后不会用到的代码。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
@Threads(5)
@State(Scope.Thread)
public class JMHExample13 {
@Benchmark
public void baseLine() {
//空方法
}
@Benchmark
public void measureLog1() {
// 进行数学运算、但是在局部方法内
Math.log(PI);
}
@Benchmark
public void measureLog2() {
//result 是通过数学运算所得并且下一行代码中得到了使用
double result = Math.log(PI);
//对result进行数学运算,但是结果不保存也不返回,更不会进行第二次运算
Math.log(result);
}
@Benchmark
public double measureLog3() {
return Math.log(PI);
}
public static void main(String[] args) throws RunnerException {
Options opts = new OptionsBuilder()
.include(JMHExample13.class.getSimpleName())
.forks(1)
.build();
new Runner(opts).run();
}
}
Benchmark Mode Cnt Score Error Units
JMHExample13.baseLine avgt 5 0.001 ± 0.001 us/op
JMHExample13.measureLog1 avgt 5 ≈ 10⁻³ us/op
JMHExample13.measureLog2 avgt 5 ≈ 10⁻³ us/op
JMHExample13.measureLog3 avgt 5 0.003 ± 0.001 us/op
measureLog1和measureLog2方法基准性能与baseLine方法几乎一样,因此可以肯定,这两行代码有过擦除操作,这样的代码成为dead code,measureLog3不会被擦除,它对结果进行了返回,消耗了cpu。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample14 {
double x1 = Math.PI;
double x2 = Math.PI * 2;
@Benchmark
public double baseLine() {
return Math.pow(x1, 2);
}
@Benchmark
public double powButReturn() {
Math.pow(x1, 2);
return Math.pow(x2, 2);
}
@Benchmark
public double powThenAdd() {
return Math.pow(x1, 2) + Math.pow(x2, 2);
}
@Benchmark
public void useBlackHole(Blackhole hole) {
hole.consume(Math.pow(x1, 2));
hole.consume(Math.pow(x2, 2));
}
public static void main(String[] args) throws RunnerException {
Options opts = new OptionsBuilder()
.include(JMHExample14.class.getSimpleName())
.build();
new Runner(opts).run();
}
}
Benchmark Mode Cnt Score Error Units
JMHExample14.baseLine avgt 50 2.133 ± 0.125 ns/op
JMHExample14.powButReturn avgt 50 2.030 ± 0.037 ns/op
JMHExample14.powThenAdd avgt 50 2.381 ± 0.101 ns/op
JMHExample14.useBlackHole avgt 50 4.080 ± 0.084 ns/op
baseline和putButReturnOne方法性能几乎一样,powThenAdd性能相比前两个方法占用cpu的时间要稍微长点,原因是执行了两次pow操作,useBlackHole中虽然没有对两个参数进行任何合并操作,但是由于执行了black hole的consume方法,因此会占用一定的cpu,虽然Blackhole占用一定的cpu资源,对于无返回值的基准测试方法中都通过Blackhole进行consume,就可以确保相同的基准执行条件。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample15 {
private final double x1 = 124.456;
private final double x2 = 342.456;
private double y1 = 124.456;
private double y2 = 342.456;
@Benchmark
public double returnDirect() {
return 42_620.79997;
}
@Benchmark
public double returnCaculate_1() {
return x1 * x2;
}
@Benchmark
public double returnCaculate_2() {
return log(y1) * log(y2);
}
@Benchmark
public double returnCaculate_3() {
return log(x1) * log(x2);
}
public static void main(String[] args) throws RunnerException {
Options opts = new OptionsBuilder()
.include(JMHExample15.class.getSimpleName())
.forks(1)
.build();
new Runner(opts).run();
}
}
Benchmark Mode Cnt Score Error Units
JMHExample15.returnCaculate_1 avgt 10 1.989 ± 0.157 ns/op
JMHExample15.returnCaculate_2 avgt 10 41.732 ± 0.724 ns/op
JMHExample15.returnCaculate_3 avgt 10 1.964 ± 0.072 ns/op
JMHExample15.returnDirect avgt 10 1.968 ± 0.115 ns/op
我们可以看到,1、3、4统计数据相差无几,这也就意味着在编译器优化的时候常量折叠,这些方法在运行阶段跟不需要进行计算,直接将结果返回即可,而第二个方法不会进行任何优化。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample16 {
private int x = 1;
private int y = 2;
@Benchmark
public int measure() {
return (x + y);
}
public int loopCompute(int times) {
int result = 0;
for (int i = 0; i < times; i++) {
result += (x + y);
}
return result;
}
@OperationsPerInvocation
@Benchmark
public int measureLoop_1() {
return loopCompute(1);
}
@OperationsPerInvocation(10)
@Benchmark
public int measureLoop_10() {
return loopCompute(10);
}
@OperationsPerInvocation(100)
@Benchmark
public int measureLoop_100() {
return loopCompute(100);
}
@OperationsPerInvocation(1000)
@Benchmark
public int measureLoop_1000() {
return loopCompute(1000);
}
public static void main(String[] args) throws RunnerException {
Options opts = new OptionsBuilder()
.include(JMHExample16.class.getSimpleName())
.forks(1)
.build();
new Runner(opts).run();
}
}
Benchmark Mode Cnt Score Error Units
JMHExample16.measure avgt 10 2.345 ± 0.118 ns/op
JMHExample16.measureLoop_1 avgt 10 2.440 ± 0.407 ns/op
JMHExample16.measureLoop_10 avgt 10 0.667 ± 0.013 ns/op
JMHExample16.measureLoop_100 avgt 10 0.033 ± 0.009 ns/op
JMHExample16.measureLoop_1000 avgt 10 0.043 ± 0.002 ns/op
measure和measureLoop_1几乎等价的,但是measureLoop_10进行了10次这样的操作,但是我们肯定不能拿10次的运算和1次运算所耗费的CPU时间去比较,因此@OperationsPerInvocation(10)是每次对measureLoop_10调用基准方法时将op操作记录为10。再循环次数越多的时候,折叠的情况也就比较多。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample17 {
interface Inc{
int inc ();
}
public static class Inc1 implements Inc {
private int i = 0;
@Override
public int inc() {
return ++i;
}
}
public static class Inc2 implements Inc {
private int i = 0;
@Override
public int inc() {
return ++i;
}
}
private Inc inc1 = new Inc1();
private Inc inc2 = new Inc2();
private int measure(Inc inc) {
int result = 0;
for (int i = 0; i < 10; i++) {
result += inc.inc();
}
return result;
}
@Benchmark
public int measure_inc_1() {
return this.measure(inc1);
}
@Benchmark
public int measure_inc_2() {
return this.measure(inc2);
}
@Benchmark
public int measure_inc_3() {
return measure(inc1);
}
public static void main(String[] args) throws RunnerException {
Options opts = new OptionsBuilder()
.include(JMHExample17.class.getSimpleName())
.forks(0)
.build();
new Runner(opts).run();
}
}
Benchmark Mode Cnt Score Error Units
JMHExample17.measure_inc_1 avgt 10 0.002 ± 0.001 us/op
JMHExample17.measure_inc_2 avgt 10 0.009 ± 0.002 us/op
JMHExample17.measure_inc_3 avgt 10 0.012 ± 0.001 us/op
将fork设置为0,每个基准测试方法都会与JMHExample17共同使用一个jvm进程,因此基准测试方法可能会混入JMHExample17进程的Profile,measure_inc_1和measure_inc_2方法几乎是一致的,他们的性能却存在差异,虽然measure_inc_1和measure_inc_3代码完全相同,但是还会存在不同的性能数据。
将fork设置为1,再看性能测试数据:
Benchmark Mode Cnt Score Error Units
JMHExample17.measure_inc_1 avgt 10 0.003 ± 0.001 us/op
JMHExample17.measure_inc_2 avgt 10 0.003 ± 0.001 us/op
JMHExample17.measure_inc_3 avgt 10 0.003 ± 0.001 us/op
每次基准测试都会开辟一个新的进程去运行。
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Group)
public class JMHExample20 {
@Param({"1", "2", "3", "4"})
private int type;
private Map<Integer, Integer> map;
@Setup
public void setup() {
switch (type) {
case 1:
map = new ConcurrentHashMap<>();
break;
case 2:
map = new ConcurrentSkipListMap<>();
break;
case 3:
map = new Hashtable<>();
break;
case 4:
map = Collections.synchronizedMap(new HashMap<>());
break;
default:
throw new IllegalArgumentException("illegal map type.");
}
}
@Benchmark
@GroupThreads(5)
@Group("g")
public void put() {
int random = randomIntValue();
map.put(random, random);
}
private int randomIntValue() {
return (int) Math.ceil(Math.random() * 600000);
}
@Benchmark
@GroupThreads(5)
@Group("g")
public Integer get() {
return map.get(randomIntValue());
}
public static void main(String[] args) throws RunnerException {
Options opts = new OptionsBuilder()
.include(JMHExample20.class.getSimpleName())
.forks(1)
.build();
new Runner(opts).run();
}
}
# Run complete. Total time: 00:01:04
Benchmark (type) Mode Cnt Score Error Units
JMHExample20.g 1 avgt 10 2.131 ± 0.046 us/op
JMHExample20.g:get 1 avgt 10 2.150 ± 0.025 us/op
JMHExample20.g:put 1 avgt 10 2.112 ± 0.081 us/op
JMHExample20.g 2 avgt 10 3.441 ± 0.781 us/op
JMHExample20.g:get 2 avgt 10 3.582 ± 0.833 us/op
JMHExample20.g:put 2 avgt 10 3.299 ± 0.731 us/op
JMHExample20.g 3 avgt 10 4.061 ± 0.110 us/op
JMHExample20.g:get 3 avgt 10 4.513 ± 0.126 us/op
JMHExample20.g:put 3 avgt 10 3.610 ± 0.109 us/op
JMHExample20.g 4 avgt 10 4.743 ± 0.505 us/op
JMHExample20.g:get 4 avgt 10 5.687 ± 0.660 us/op
JMHExample20.g:put 4 avgt 10 3.800 ± 0.356 us/op
经过测试发现,ConcurrentHashMap性能是最好的。