The G1 Garbage Collector
The Garbage-First (G1) collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is designed for applications that:
Can operate concurrently with applications threads like the CMS collector.
Compact(压紧) free space without lengthy(冗长的) GC induced pause times.
Need more predictable(可预见的) GC pause durations.
Do not want to sacrifice(牺牲) a lot of throughput performance.
Do not require a much larger Java heap.
G1 is planned as the long term replacement for the Concurrent Mark-Sweep Collector (CMS). Comparing G1 with CMS, there are differences that make G1 a better solution. One difference is that G1 is a compacting collector. G1 compacts sufficiently to completely avoid the use of fine-grained free lists for allocation, and instead relies(依赖于) on regions. This considerably simplifies parts of the collector, and mostly eliminates(消除) potential(潜在的,有可能的) fragmentation issues. Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets.
The G1 collector takes a different approach to allocating the heap. The pictures that follow review the G1 system step by step.
The heap is one memory area split into many fixed sized regions.
Region size is chosen by the JVM at startup. The JVM generally targets around 2000 regions varying in size from 1 to 32Mb.
In reality, these regions are mapped into logical representations of Eden, Survivor, and old generation spaces.
The colors in the picture shows which region is associated with which role. Live objects are evacuated (i.e., copied or moved) from one region to another. Regions are designed to be collected in parallel with or without stopping all other application threads.
As shown regions can be allocated into Eden, survivor, and old generation regions. In addition, there is a fourth type of object known as Humongous regions. These regions are designed to hold objects that are 50% the size of a standard region or larger. They are stored as a set of contiguous regions. Finally the last type of regions would be the unused areas of the heap.
Note: At the time of this writing, collecting humongous objects has not been optimized. Therefore, you should avoid creating objects of this size.
The heap is split into approximately 2000 regions. Minimum size is 1Mb and maximum size is 32Mb. Blue regions hold old generation objects and green regions hold young generation objects.
Live objects are evacuated (i.e., copied or moved) to one or more survivor regions. If the aging threshold is met, some of the objects are promoted to old generation regions.
This is a stop the world (STW) pause. Eden size and survivor size is calculated for the next young GC. Accounting information is kept to help calculate the size. Things like the pause time goal are taken into consideration.
This approach makes it very easy to resize regions, making them bigger or smaller as needed.
Live objects have been evacuated(撤离) to survivor regions or to old generation regions.
Recently promoted(提升) objects are shown in dark blue. Survivor regions in green.
In summary, the following can be said about the young generation in G1:
The heap is a single memory space split into regions.
Young generation memory is composed(组成) of a set of non-contiguous(接触的,临近的) regions. This makes it easy to resize when needed.
Young generation garbage collections, or young GCs, are stop the world events. All application threads are stopped for the operation.
The young GC is done in parallel using multiple threads.
Live objects are copied to new survivor or old generation regions.
Initial marking of live object is piggybacked on a young generation garbage collection. In the logs this is noted as GC pause (young)(inital-mark).
If empty regions are found (as denoted by the "X"), they are removed immediately in the Remark phase. Also, "accounting" information that determines liveness is calculated.
Empty regions are removed and reclaimed. Region liveness is now calculated for all regions.
G1 selects the regions with the lowest "liveness", those regions which can be collected the fastest. Then those regions are collected at the same time as a young GC. This is denoted in the logs as [GC pause (mixed)]. So both young and old generations are collected at the same time.
The regions selected have been collected and compacted into the dark blue region and the dark green region shown in the diagram.
In summary, there are a few key points we can make about the G1 garbage collection on the old generation.
Concurrent Marking Phase
Liveness information is calculated concurrently while the application is running.
This liveness information identifies which regions will be best to reclaim during an evacuation pause.
There is no sweeping phase like in CMS.
Remark Phase
Uses the Snapshot-at-the-Beginning (SATB) algorithm which is much faster then what was used with CMS.
Completely empty regions are reclaimed.
Copying/Cleanup Phase
Young generation and old generation are reclaimed at the same time.
Old generation regions are selected based on their liveness.