JMH高级用法

JMH高级用法_第1张图片

前言

虽然JMH可以帮我们更好了解我们所编写的代码,但是如果我们所编写的JMH基准测试本身就有问题,那就很难起到指导作用。

编写正确的微基准测试用例

现代的Java虚拟机越来越智能,它在类的早期编译阶段、加载阶段以及后期的运行时都可以为我们的代码进行相关的优化,比如Dead Code的擦除、常量的折叠、循环的打开,甚至是profile的优化。

避免DCE(Dead Code Elimination)

所谓dead code 是指JVM为我们擦去了上下文无关的代码,甚至经过计算后不会用到的代码。

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
@Threads(5)
@State(Scope.Thread)
public class JMHExample13 {

    @Benchmark
    public void baseLine() {
        //空方法
    }

    @Benchmark
    public void measureLog1() {
        // 进行数学运算、但是在局部方法内
        Math.log(PI);
    }

    @Benchmark
    public void measureLog2() {
        //result 是通过数学运算所得并且下一行代码中得到了使用
        double result = Math.log(PI);
        //对result进行数学运算,但是结果不保存也不返回,更不会进行第二次运算
        Math.log(result);
    }

    @Benchmark
    public double measureLog3() {
        return Math.log(PI);
    }

    public static void main(String[] args) throws RunnerException {
        Options opts = new OptionsBuilder()
                .include(JMHExample13.class.getSimpleName())
                .forks(1)
                .build();
        new Runner(opts).run();
    }
}
Benchmark                 Mode  Cnt   Score    Error  Units
JMHExample13.baseLine     avgt    5   0.001 ±  0.001  us/op
JMHExample13.measureLog1  avgt    510⁻³           us/op
JMHExample13.measureLog2  avgt    510⁻³           us/op
JMHExample13.measureLog3  avgt    5   0.003 ±  0.001  us/op

measureLog1和measureLog2方法基准性能与baseLine方法几乎一样,因此可以肯定,这两行代码有过擦除操作,这样的代码成为dead code,measureLog3不会被擦除,它对结果进行了返回,消耗了cpu。

使用blackhoe来避免dead code

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample14 {

    double x1 = Math.PI;
    double x2 = Math.PI * 2;

    @Benchmark
    public double baseLine() {
        return Math.pow(x1, 2);
    }

    @Benchmark
    public double powButReturn() {
        Math.pow(x1, 2);
        return Math.pow(x2, 2);
    }

    @Benchmark
    public double powThenAdd() {
        return Math.pow(x1, 2) + Math.pow(x2, 2);
    }

    @Benchmark
    public void useBlackHole(Blackhole hole) {
        hole.consume(Math.pow(x1, 2));
        hole.consume(Math.pow(x2, 2));
    }

    public static void main(String[] args) throws RunnerException {
        Options opts = new OptionsBuilder()
                .include(JMHExample14.class.getSimpleName())
                .build();
        new Runner(opts).run();
    }
}
Benchmark                  Mode  Cnt  Score   Error  Units
JMHExample14.baseLine      avgt   50  2.133 ± 0.125  ns/op
JMHExample14.powButReturn  avgt   50  2.030 ± 0.037  ns/op
JMHExample14.powThenAdd    avgt   50  2.381 ± 0.101  ns/op
JMHExample14.useBlackHole  avgt   50  4.080 ± 0.084  ns/op

baseline和putButReturnOne方法性能几乎一样,powThenAdd性能相比前两个方法占用cpu的时间要稍微长点,原因是执行了两次pow操作,useBlackHole中虽然没有对两个参数进行任何合并操作,但是由于执行了black hole的consume方法,因此会占用一定的cpu,虽然Blackhole占用一定的cpu资源,对于无返回值的基准测试方法中都通过Blackhole进行consume,就可以确保相同的基准执行条件。

避免常量折叠

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample15 {
    private final double x1 = 124.456;
    private final double x2 = 342.456;

    private double y1 = 124.456;
    private double y2 = 342.456;

    @Benchmark
    public double returnDirect() {
        return 42_620.79997;
    }

    @Benchmark
    public double returnCaculate_1() {
        return x1 * x2;
    }

    @Benchmark
    public double returnCaculate_2() {
        return log(y1) * log(y2);
    }

    @Benchmark
    public double returnCaculate_3() {
        return log(x1) * log(x2);
    }

    public static void main(String[] args) throws RunnerException {
        Options opts = new OptionsBuilder()
                .include(JMHExample15.class.getSimpleName())
                .forks(1)
                .build();
        new Runner(opts).run();
    }
}
Benchmark                      Mode  Cnt   Score   Error  Units
JMHExample15.returnCaculate_1  avgt   10   1.989 ± 0.157  ns/op
JMHExample15.returnCaculate_2  avgt   10  41.732 ± 0.724  ns/op
JMHExample15.returnCaculate_3  avgt   10   1.964 ± 0.072  ns/op
JMHExample15.returnDirect      avgt   10   1.968 ± 0.115  ns/op

我们可以看到,1、3、4统计数据相差无几,这也就意味着在编译器优化的时候常量折叠,这些方法在运行阶段跟不需要进行计算,直接将结果返回即可,而第二个方法不会进行任何优化。

避免循环展开

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample16 {
    private int x = 1;
    private int y = 2;

    @Benchmark
    public int measure() {
        return (x + y);
    }

    public int loopCompute(int times) {
        int result = 0;
        for (int i = 0; i < times; i++) {
            result += (x + y);
        }
        return result;
    }

    @OperationsPerInvocation
    @Benchmark
    public int measureLoop_1() {
        return loopCompute(1);
    }

    @OperationsPerInvocation(10)
    @Benchmark
    public int measureLoop_10() {
        return loopCompute(10);
    }

    @OperationsPerInvocation(100)
    @Benchmark
    public int measureLoop_100() {
        return loopCompute(100);
    }

    @OperationsPerInvocation(1000)
    @Benchmark
    public int measureLoop_1000() {
        return loopCompute(1000);
    }

    public static void main(String[] args) throws RunnerException {
        Options opts = new OptionsBuilder()
                .include(JMHExample16.class.getSimpleName())
                .forks(1)
                .build();
        new Runner(opts).run();
    }
}
Benchmark                      Mode  Cnt  Score   Error  Units
JMHExample16.measure           avgt   10  2.345 ± 0.118  ns/op
JMHExample16.measureLoop_1     avgt   10  2.440 ± 0.407  ns/op
JMHExample16.measureLoop_10    avgt   10  0.667 ± 0.013  ns/op
JMHExample16.measureLoop_100   avgt   10  0.033 ± 0.009  ns/op
JMHExample16.measureLoop_1000  avgt   10  0.043 ± 0.002  ns/op

measure和measureLoop_1几乎等价的,但是measureLoop_10进行了10次这样的操作,但是我们肯定不能拿10次的运算和1次运算所耗费的CPU时间去比较,因此@OperationsPerInvocation(10)是每次对measureLoop_10调用基准方法时将op操作记录为10。再循环次数越多的时候,折叠的情况也就比较多。

Fork用于避免 Profile-guided optimizations

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Thread)
public class JMHExample17 {
    interface Inc{
        int inc ();
    }

    public static class Inc1 implements Inc {
        private int i = 0;
        @Override
        public int inc() {
            return ++i;
        }
    }

    public static class Inc2 implements Inc {
        private int i = 0;
        @Override
        public int inc() {
            return ++i;
        }
    }

    private Inc inc1 = new Inc1();
    private Inc inc2 = new Inc2();

    private int measure(Inc inc) {
        int result = 0;
        for (int i = 0; i < 10; i++) {
            result += inc.inc();
        }
        return result;
    }

    @Benchmark
    public int measure_inc_1() {
        return this.measure(inc1);
    }

    @Benchmark
    public int measure_inc_2() {
        return this.measure(inc2);
    }

    @Benchmark
    public int measure_inc_3() {
        return measure(inc1);
    }

    public static void main(String[] args) throws RunnerException {

        Options opts = new OptionsBuilder()
                .include(JMHExample17.class.getSimpleName())
                .forks(0)
                .build();
        new Runner(opts).run();
    }
}
Benchmark                   Mode  Cnt  Score    Error  Units
JMHExample17.measure_inc_1  avgt   10  0.002 ±  0.001  us/op
JMHExample17.measure_inc_2  avgt   10  0.009 ±  0.002  us/op
JMHExample17.measure_inc_3  avgt   10  0.012 ±  0.001  us/op

将fork设置为0,每个基准测试方法都会与JMHExample17共同使用一个jvm进程,因此基准测试方法可能会混入JMHExample17进程的Profile,measure_inc_1和measure_inc_2方法几乎是一致的,他们的性能却存在差异,虽然measure_inc_1和measure_inc_3代码完全相同,但是还会存在不同的性能数据。

将fork设置为1,再看性能测试数据:

Benchmark                   Mode  Cnt  Score    Error  Units
JMHExample17.measure_inc_1  avgt   10  0.003 ±  0.001  us/op
JMHExample17.measure_inc_2  avgt   10  0.003 ±  0.001  us/op
JMHExample17.measure_inc_3  avgt   10  0.003 ±  0.001  us/op

每次基准测试都会开辟一个新的进程去运行。

几大线程安全Map性能对比

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 5)
@State(Scope.Group)
public class JMHExample20 {
    @Param({"1", "2", "3", "4"})
    private int type;

    private Map<Integer, Integer> map;

    @Setup
    public void setup() {
        switch (type) {
            case 1:
                map = new ConcurrentHashMap<>();
                break;
            case 2:
                map = new ConcurrentSkipListMap<>();
                break;
            case 3:
                map = new Hashtable<>();
                break;
            case 4:
                map = Collections.synchronizedMap(new HashMap<>());
                break;
            default:
                throw new IllegalArgumentException("illegal map type.");
        }
    }

    @Benchmark
    @GroupThreads(5)
    @Group("g")
    public void put() {
        int random = randomIntValue();
        map.put(random, random);
    }

    private int randomIntValue() {
        return (int) Math.ceil(Math.random() * 600000);
    }

    @Benchmark
    @GroupThreads(5)
    @Group("g")
    public Integer get() {
        return map.get(randomIntValue());
    }

    public static void main(String[] args) throws RunnerException {
        Options opts = new OptionsBuilder()
                .include(JMHExample20.class.getSimpleName())
                .forks(1)
                .build();
        new Runner(opts).run();
    }
}
# Run complete. Total time: 00:01:04

Benchmark           (type)  Mode  Cnt  Score   Error  Units
JMHExample20.g           1  avgt   10  2.131 ± 0.046  us/op
JMHExample20.g:get       1  avgt   10  2.150 ± 0.025  us/op
JMHExample20.g:put       1  avgt   10  2.112 ± 0.081  us/op
JMHExample20.g           2  avgt   10  3.441 ± 0.781  us/op
JMHExample20.g:get       2  avgt   10  3.582 ± 0.833  us/op
JMHExample20.g:put       2  avgt   10  3.299 ± 0.731  us/op
JMHExample20.g           3  avgt   10  4.061 ± 0.110  us/op
JMHExample20.g:get       3  avgt   10  4.513 ± 0.126  us/op
JMHExample20.g:put       3  avgt   10  3.610 ± 0.109  us/op
JMHExample20.g           4  avgt   10  4.743 ± 0.505  us/op
JMHExample20.g:get       4  avgt   10  5.687 ± 0.660  us/op
JMHExample20.g:put       4  avgt   10  3.800 ± 0.356  us/op

经过测试发现,ConcurrentHashMap性能是最好的。

你可能感兴趣的:(jvm,多线程,jvm,java)