JMH官网
JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.
Java微基准测试框架JMH(Java Microbenchmark Harness),OpenJDK项目中发布的专用于性能测试的框架;
官网提供了一系列的代码示例,清单链接 -> JMH Samples
1、引入JMH支持的jar包:到官网下载,或者使用Maven引入依赖包;
org.openjdk.jmh
jmh-core
1.21
org.openjdk.jmh
jmh-generator-annprocess
1.21
provided
2、官网第一个代码示例:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class JMHSample_01_HelloWorld {
@Benchmark
public void wellHelloThere() {
// this method was intentionally left blank.
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_01_HelloWorld.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
}
其中wellHelloThere()方法作为基准,是被度量的代码;
如果直接运行,会报如下错误:
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to find the resource: /META-INF/BenchmarkList
at org.openjdk.jmh.runner.AbstractResourceReader.getReaders(AbstractResourceReader.java:98)
at org.openjdk.jmh.runner.BenchmarkList.find(BenchmarkList.java:122)
at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:263)
at org.openjdk.jmh.runner.Runner.run(Runner.java:209)
at com.freedom.chapter03.jmh.JMHSample_01_HelloWorld.main(JMHSample_01_HelloWorld.java:25)
3、安装maven插件并配置
JMH框架在测试开始前,根据用户的测试用例,通过Java APT机制生成真正的测试代码,所以需要通过Eclipse Marketplace安装m2e-apt插件;
安装成功后,设置APT模式为自动配置:
这样就可以测试了,如果还是报上面的错,在Eclipse里执行如下命令,再测试即可;
mvn clean package
执行结果如下:
# JMH version: 1.20
# VM version: JDK 1.8.0_131, VM 25.131-b11
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.freedom.chapter03.jmh.JMHSample_01_HelloWorld.wellHelloThere
# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration 1: ≈ 10⁻³ us/op
# Warmup Iteration 2: ≈ 10⁻³ us/op
...
# Warmup Iteration 19: ≈ 10⁻³ us/op
# Warmup Iteration 20: ≈ 10⁻³ us/op
Iteration 1: ≈ 10⁻³ us/op
Iteration 2: ≈ 10⁻³ us/op
...
Iteration 19: ≈ 10⁻³ us/op
Iteration 20: ≈ 10⁻³ us/op
Result "com.freedom.chapter03.jmh.JMHSample_01_HelloWorld.wellHelloThere":
≈ 10⁻³ us/op
# Run complete. Total time: 00:00:40
Benchmark Mode Cnt Score Error Units
JMHSample_01_HelloWorld.wellHelloThere avgt 20 ≈ 10⁻³ us/op
测试报告先给出了本次测试的基本信息:JMH版本、JDK版本、预热迭代次数及间隔、测量代码迭代次数及间隔、超时时间、线程信息、基准模式、基准方法等信息;
然后是预热迭代的结果,预热的目的是让Java虚拟机对被测代码得到充分的JIT编译和优化,但不会作为最终的统计结果;
然后是的是每一次迭代结果,显示一个操作所花费的时间,即被测试代码执行速率;
最后是本次测试平均花费时间,为10⁻³us;
4、JMH基本概念
4.1 Mode - 模式
/**
* Benchmark mode.
*/
public enum Mode {
// 整体吞吐量,单位时间内可执行多少次调用
Throughput("thrpt", "Throughput, ops/time"),
// 平均时间
AverageTime("avgt", "Average time, time/op"),
// 随机取样
SampleTime("sample", "Sampling time"),
// 只运行一次,用于测试冷启动时性能
SingleShotTime("ss", "Single shot invocation time"),
// Throughput、AverageTime、SampleTime依次执行
All("all", "All benchmark modes");
...
}
JMH中,吞吐量和平均时间是最为常用的模式,下面官网第2个示例;
4.11 吞吐量示例
@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public void measureThroughput() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
结果,表示measureThroughput()方每秒可以执行约9.807次;
Benchmark Mode Cnt Score Error Units
JMHSample_02_BenchmarkModes.measureThroughput thrpt 20 9.807 ± 0.040 ops/s
4.12 平均时间示例
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureAvgTime() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
结果,显示measureAvgTime()方法每个操作需要约102毫秒;
Benchmark Mode Cnt Score Error Units
JMHSample_02_BenchmarkModes.measureAvgTime avgt 20 102056.917 ± 572.546 us/op
4.13 随机取样示例
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureSamples() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
结果,表示measureSamples()方法中,平均执行时间是101852.119微秒;其中50%调用在100794.368微秒内完成,95%调用在104988.672微秒内完成,全部的采样调用均在106692.608微秒内完成;
Benchmark Mode Cnt Score Error Units
JMHSample_02_BenchmarkModes.measureSamples sample 200 101852.119 ± 432.919 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.00 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.50 sample 100794.368 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.90 sample 104726.528 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.95 sample 104988.672 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.99 sample 106157.834 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.999 sample 106692.608 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.9999 sample 106692.608 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p1.00 sample 106692.608 us/op
4.2 Iteration 迭代
迭代是JMH的依次测量单位;大部分模式下,1s迭代1次;
4.3 Warmup 预热
Java虚拟机的JIT的存在,会造成同一个方法在JIT编译前后的执行时间不同的情况;
4.4 State 状态
指定一个对象的作用范围:
官方第3个示例,分别声明Thread和Benchmark级别的模型,然后访问;
public class JMHSample_03_States {
@State(Scope.Benchmark)
public static class BenchmarkState {
volatile double x = Math.PI;
}
@State(Scope.Thread)
public static class ThreadState {
volatile double x = Math.PI;
}
@Benchmark
public void measureUnshared(ThreadState state) {
state.x++;
}
@Benchmark
public void measureShared(BenchmarkState state) {
state.x++;
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_03_States.class.getSimpleName())
.threads(4)
.forks(1)
.build();
new Runner(opt).run();
}
}
对于measureUnshared()方法,每个不同的测试线程都有自己的数据复制,而对于measureShared()方法,所有测试线程共享一份数据,测试结果不同,如下:
Benchmark Mode Cnt Score Error Units
JMHSample_03_States.measureShared thrpt 20 51055114.592 ± 510090.663 ops/s
JMHSample_03_States.measureUnshared thrpt 20 302956301.034 ± 1267510.555 ops/s
4.5 Options/OptionsBuilder 配置
测试前指定一些参数,比如指定测试类(include)、使用线程个数(fork)、预热迭代次数(warmupIterations)等;
5、HashMap、Collections.synchronizedMap(new HashMap())和ConcurrentHashMap的JMH性能测试
static Map hashMap = new HashMap<>();
static Map syncHashMap = Collections.synchronizedMap(new HashMap<>());
static Map concurrentHashMap = new ConcurrentHashMap<>();
@Setup
public void setup() {
for (int i = 0; i < 10000; i++) {
hashMap.put(String.valueOf(i), String.valueOf(i));
syncHashMap.put(String.valueOf(i), String.valueOf(i));
concurrentHashMap.put(String.valueOf(i), String.valueOf(i));
}
}
@Benchmark
public void hashMapGet() {
hashMap.get("4");
}
@Benchmark
public void syncHashMapGet() {
syncHashMap.get("4");
}
@Benchmark
public void concurrentHashMapGet() {
concurrentHashMap.get("4");
}
@Benchmark
public void hashMapSize() {
hashMap.size();
}
@Benchmark
public void syncHashMapSize() {
syncHashMap.size();
}
@Benchmark
public void concurrentHashMapSize() {
concurrentHashMap.size();
}
JDK8单线程结果:
Benchmark Mode Cnt Score Error Units
JMH_Map.concurrentHashMapGet thrpt 20 118.105 ± 0.324 ops/us
JMH_Map.concurrentHashMapSize thrpt 20 877.783 ± 2.223 ops/us
JMH_Map.hashMapGet thrpt 20 161.768 ± 0.361 ops/us
JMH_Map.hashMapSize thrpt 20 1534.111 ± 8.070 ops/us
JMH_Map.syncHashMapGet thrpt 20 39.468 ± 0.202 ops/us
JMH_Map.syncHashMapSize thrpt 20 39.047 ± 0.069 ops/us
JDK8两个线程结果:
Benchmark Mode Cnt Score Error Units
JMH_Map.concurrentHashMapGet thrpt 20 239.697 ± 4.031 ops/us
JMH_Map.concurrentHashMapSize thrpt 20 1700.468 ± 53.626 ops/us
JMH_Map.hashMapGet thrpt 20 300.296 ± 6.823 ops/us
JMH_Map.hashMapSize thrpt 20 2879.248 ± 187.412 ops/us
JMH_Map.syncHashMapGet thrpt 20 16.074 ± 0.343 ops/us
JMH_Map.syncHashMapSize thrpt 20 17.348 ± 0.161 ops/us
使用两个线程,一般来说,吞吐量可以增加一倍;
但syncHashMap,由于引入线程竞争,性能反而下降;
6、CopyOnWriteArrayList和ConcurrentLinkedQueue的JMH性能测试
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class JMH_List {
CopyOnWriteArrayList
Benchmark Mode Cnt Score Error Units
JMH_List.bigConcurrentListWrite thrpt 20 0.002 ± 0.001 ops/us
JMH_List.bigCopyOnWriteWrite thrpt 20 0.565 ± 0.035 ops/us
JMH_List.concurrentListGet thrpt 20 1499.011 ± 181.468 ops/us
JMH_List.concurrentListSize thrpt 20 162.075 ± 2.613 ops/us
JMH_List.copyOnWriteGet thrpt 20 1743.510 ± 9.192 ops/us
JMH_List.copyOnWriteSize thrpt 20 2349.314 ± 179.757 ops/us
JMH_List.smallConcurrentListWrite thrpt 20 0.002 ± 0.001 ops/us
JMH_List.smallCopyOnWriteWrite thrpt 20 4.831 ± 0.177 ops/us
可以由结果看到,写的性能远远低于读的性能;
对于写性能,当CopyOnWriteArrayList内部有1000个元素时,由于复制的成本,写性能要远远低于只包含少数元素的list,但性能依然优于ConcurrentLinkedQueue;
对于读性能,进行只读不写的get操作,两者性能都不错;由于实现上的差异,ConcurrentLinkedQueue的size操作明显慢于CopyOnWriteArrayList的;
结论:在高并发场景下,即使有少许的写入,当元素总量不大时,在绝大多数场景中,CopyOnWriteArrayList类要由于ConcurrentLinkedQueue类的;