ZGC全称是Z Garbage Collector,是一款可伸缩(scalable)的低延迟(low latency garbage)、并发(concurrent)垃圾回收器,旨在实现以下几个目标:
-server -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:age*,gc*=info:file=gc-%t.log:time,tid,tags:filecount=3,filesize=20m -Djava.io.tmpdir=/tmp'
Z Garbage Collector,即ZGC,是一个可伸缩的、低延迟的垃圾收集器,主要为了满足如下目标进行设计:
停顿时间在10ms以下,10ms其实是一个很保守的数据,在SPECjbb 2015基准测试,128G的大堆下最大停顿时间才1.68ms,远低于10ms,和G1算法相比,也感觉像是在虐菜。
G1算法通过只回收部分Region,避免了全堆扫描,改善了大堆下的停顿时间,但在普通大小的堆里却表现平平,ZGC为什么可以这么优秀,主要是因为以下几个特性。
Concurrent
ZGC只有短暂的STW,大部分的过程都是和应用线程并发执行,比如最耗时的并发标记和并发移动过程。
Region-based
ZGC中没有新生代和老年代的概念,只有一块一块的内存区域page,以page单位进行对象的分配和回收。
Compacting
每次进行GC时,都会对page进行压缩操作,所以完全避免了CMS算法中的碎片化问题。
NUMA-aware
现在多CPU插槽的服务器都是Numa架构,比如两颗CPU插槽(24核),64G内存的服务器,那其中一颗CPU上的12个核,访问从属于它的32G本地内存,要比访问另外32G远端内存要快得多。
ZGC默认支持NUMA架构,在创建对象时,根据当前线程在哪个CPU执行,优先在靠近这个CPU的内存进行分配,这样可以显著的提高性能,在SPEC JBB 2005 基准测试里获得40%的提升。
Using colored pointers
和以往的标记算法比较不同,CMS和G1会在对象的对象头进行标记,而ZGC是标记对象的指针。
其中低42位对象的地址,42-45位用来做指标标记。
Using load barriers
因为在标记和移动过程中,GC线程和应用线程是并发执行的,所以存在这种情况:对象A内部的引用所指的对象B在标记或者移动状态,为了保证应用线程拿到的B对象是对的,那么在读取B的指针时会经过一个 “load barriers” 读屏障,这个屏障可以保证在执行GC时,数据读取的正确性。
JDK11
ZGC目前只在Linux/x64上可用,如果有足够的需求,将来可能会增加对其他平台的支持。
目前只支持64位的linux系统,狼哥在mac跑了半天都是下面的错!
$ hg clone https://wiki.openjdk.java.net/display/hg.openjdk.java.net/jdk/jdk
$ cd jdk
$ sh configure
$ make images
如果正在编译的版本是 11.0.0, 11.0.1 or 11.0.2,必须加上配置参数--with-jvm-features=zgc
开启ZGC的编译,在11.0.3或者12之后,可以忽略这个参数,已经默认支持。
编译结束之后,你会得到一个完整的JDK,在Linux中,可以在下面目录中找到这个新的JDK
./build/linux-x86_64-normal-server-release/images/jdk
可以进入bin文件夹,执行 ./java -version
验证一下。
编译完成之后,已经迫不及待的想试试ZGC,需要配置以下JVM参数,才能使用ZGC.
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx10g -Xlog:gc
参数说明:
Heap Size
通过-Xmx10g
进行设置。
-Xmx是ZGC收集器中最重要的调优选项,大大解决了程序员在JVM参数调优上的困扰。ZGC是一个并发收集器,必须要设置一个最大堆的大小,应用需要多大的堆,主要有下面几个考量:
Concurrent GC Threads
通过-XX:ConcGCThread = 4
进行设置。
并发执行的GC线程数,如果没有设置,在JVM启动的时候会根据CPU的核数计算出一个合理的数量,默认是核数的12.5%,但是根据应用的特性,可以通过手动设置调整。
因为在并发标记和并发移动时,GC线程和应用线程是并发执行的,所以存在抢占CPU的情况,对于一些对延迟比较敏感的应用,这个并发线程数就不能设置的过大,不然会降低应用的吞吐量,并有可能增加应用的延迟,因为GC线程占用了太多的CPU,但是如果设置的太小,就有可能对象的分配速率比垃圾收集的速率来的大,最终导致应用线程停下来等GC线程完成垃圾收集,并释放内存。
一般来说,如果低延迟对应用程序很重要,那么不要这个值不要设置的过于大,理想情况下,系统的CPU利用率不应该超过70%。
Parallel GC Threads
通过-XX:ParallelGCThreads = 20
当对GC Roots进行标记和移动时,需要进行STW,这个过程会使用ParallelGCThreads个GC线程进行并行执行。
ParallelGCThreads默认为CPU核数的60%,为什么可以这么大?
因为这个时候,应用线程已经完全停下来了,所以要用尽可能多的线程完成这部分任务,这样才能让STW尽可能的短暂。
Authors | Per Liden, Stefan Karlsson |
Owner | Per Liden |
Type | Feature |
Scope | Implementation |
Status | Closed / Delivered |
Release | 11 |
Component | hotspot / gc |
Discussion | hotspot dash gc dash dev at openjdk dot java dot net |
Effort | L |
Duration | L |
Depends | JEP 312: Thread-Local Handshakes |
JEP 304: Garbage Collector Interface | |
Reviewed by | Mikael Vidstedt, Stefan Karlsson |
Endorsed by | Mikael Vidstedt |
Created | 2018/02/13 09:58 |
Updated | 2018/11/30 16:31 |
Issue | 8197831 |
The Z Garbage Collector, also known as ZGC, is a scalable low-latency garbage collector.
We have strong ambitions to meet these goals for a large set of relevant workloads. At the same time, we want to acknowledge that we don't see these goals as hard requirements for every conceivable workload.
It is not a goal to provide working implementations for platforms other than Linux/x64. Support for additional platforms can be added later, if there is enough demand.
Garbage collection is one of Java's main strengths. However, when garbage collection pauses become too long they start to affect application response times negatively. By removing or drastically reducing the length of GC pauses, we'll make Java a more attractive platform for an even wider set of applications.
Furthermore, the amount of memory available in modern systems continues to grow. Users and application developers expect the JVM to be equipped to take full advantage of this memory in an efficient manner, and without long GC pause times.
At a glance, ZGC is a concurrent, single-generation, region-based, NUMA-aware, compacting collector. Stop-the-world phases are limited to root scanning, so GC pause times do not increase with the size of the heap or the live set.
A core design principle/choice in ZGC is the use of load barriers in combination with colored object pointers (i.e., colored oops). This is what enables ZGC to do concurrent operations, such as object relocation, while Java application threads are running. From a Java thread's perspective, the act of loading a reference field in a Java object is subject to a load barrier. In addition to an object address, a colored object pointer contains information used by the load barrier to determine if some action needs to be taken before allowing a Java thread to use the pointer. For example, the object might have been relocated, in which case the load barrier will detect the situation and take appropriate action.
Compared to alternative techniques, we believe the colored-pointers scheme offers some very attractive properties. In particular:
It allows us to reclaim and reuse memory during the relocation/compaction phase, before pointers pointing into the reclaimed/reused regions have been fixed. This helps keep the general heap overhead down. It also means that there is no need to implement a separate mark-compact algorithm to handle a full GC.
It allows us to have relatively few and simple GC barriers. This helps keep the runtime overhead down. It also means that it's easier to implement, optimize and maintain the GC barrier code in our interpreter and JIT compilers.
We currently store marking and relocation related information in the colored pointers. However, the versatile nature of this scheme allows us to store any type of information (as long as we can fit it into the pointer) and let the load barrier take any action it wants to based on that information. We believe this will lay the foundation for many future features. To pick one example, in a heterogeneous memory environment, this could be used to track heap access patterns to guide GC relocation decisions to move rarely used objects to cold storage.
Regular performance measurements have been done using SPECjbb® 2015 [1]. Performance is looking good, both from a throughput and latency point of view. Below are typical benchmark scores (in percent, normalized against ZGC's max-jOPS), comparing ZGC and G1, in composite mode using a 128G heap.
(Higher is better)
ZGC
max-jOPS: 100%
critical-jOPS: 76.1%
G1
max-jOPS: 91.2%
critical-jOPS: 54.7%
Below are typical GC pause times from the same benchmark. ZGC manages to stay well below the 10ms goal. Note that exact numbers can vary (both up and down, but not significantly) depending on the exact machine and setup used.
(Lower is better)
ZGC
avg: 1.091ms (+/-0.215ms)
95th percentile: 1.380ms
99th percentile: 1.512ms
99.9th percentile: 1.663ms
99.99th percentile: 1.681ms
max: 1.681ms
G1
avg: 156.806ms (+/-71.126ms)
95th percentile: 316.672ms
99th percentile: 428.095ms
99.9th percentile: 543.846ms
99.99th percentile: 543.846ms
max: 543.846ms
Ad-hoc performance measurements have also been done on various other SPEC® benchmarks and internal workloads. In general, ZGC manages to maintain single-digit millisecond pause times.
[1] SPECjbb® 2015 is a registered trademark of the Standard Performance Evaluation Corporation (spec.org). The actual results are not represented as compliant because the SUT may not meet SPEC's requirements for general availability.
The initial experimental version of ZGC will not have support for class unloading. The ClassUnloading
and ClassUnloadingWithConcurrentMark
options will be disabled by default. Enabling them will have no effect.
Also, ZGC will initially not have support for JVMCI (i.e. Graal). An error message will be printed if the EnableJVMCI
option is enabled.
These limitations will be addressed at a later stage in this project.
By convention, experimental features in the JVM are disabled by default by the build system. ZGC, being an experimental feature, will therefore not be present in a JDK build unless explicitly enabled at compile-time using the configure option --with-jvm-features=zgc
.
(ZGC will be present in all Linux/x64 JDK builds produced by Oracle)
Experimental features in the JVM also need to be explicitly unlocked at run-time. To enable/use ZGC, the following JVM options will therefore be needed: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
.
Please see the ZGC Project Wiki for more information on how to setup and tune ZGC.
An obvious alternative is to add concurrent compaction capabilities to G1. This alternative was extensively prototyped but eventually abandoned. We found it unfeasible to shoehorn this functionality into a code base that was never designed for this purpose and, at the same time, preserve G1's stability and other good properties.
A theoretical alternative would be to improve CMS one way or another. There are however several reasons why basing a low latency collector on the CMS algorithm is neither an attractive nor viable option. Reasons include no support for compaction, the unbound remark phase, a complicated code base, and the fact that it has already been deprecated (JEP 291).
The Shenandoah Project is exploring the use of Brooks pointers to achieve concurrent operations (JEP 189).
Most of our existing functional and stress tests are collector agnostic and can be reused as-is. Additional tests targeting properties and functions specific to ZGC will be added.
参考来源:http://openjdk.java.net/jeps/333
参考来源:https://www.jianshu.com/p/6f89fd5842bf 原作者:占小狼
参考来源:https://www.jianshu.com/p/85fa1691c8b7 原作者:go4it