Now that we have reviewed the core concepts behind GC algorithms, let us move to the specific implementations one can find inside the JVM. An important aspect to recognize first is the fact that, for most JVMs out there, two different GC algorithms are needed – one to clean the Young Generation and another to remove garbage from the Old Generation.
You can choose from a variety of such algorithms bundled into the JVM. If you do not specify a Garbage Collection algorithm explicitly, a platform-specific default will be used. In this chapter, the working principles of each of those algorithms are explained.
For a quick cheat sheet, the following list is a fast way to get yourself up to speed with which algorithm combinations are possible. Note that this stands true for Java 8, for older Java versions the available combinations might differ a bit:
Young | Tenured | JVM options |
---|---|---|
Incremental | Incremental | -Xincgc |
Serial | Serial | -XX:+UseSerialGC |
Parallel Scavenge | Serial | -XX:+UseParallelGC -XX:-UseParallelOldGC |
Parallel New | Serial | N/A |
Serial | Parallel Old | N/A |
Parallel Scavenge | Parallel Old | -XX:+UseParallelGC -XX:+UseParallelOldGC |
Parallel New | Parallel Old | N/A |
Serial | CMS | -XX:-UseParNewGC -XX:+UseConcMarkSweepGC |
Parallel Scavenge | CMS | N/A |
Parallel New | CMS | -XX:+UseParNewGC -XX:+UseConcMarkSweepGC |
G1 | -XX:+UseG1GC |
If the above looks too complex, do not worry. In reality it all boils down to just four combinations highlighted in the table above. The rest are either deprecated, not supported or just impractical to apply in real world. So, in the following chapters we cover the working principles of the following combinations:
This Garbage Collectors combination of garbage collectors uses mark-copy in the Young Generation and mark-sweep-compact in Old generation. As the name implies – both of these collectors are single-threaded collectors, incapable of parallelizing the task at hand. Both of the collectors also trigger stop-the-world pauses, stopping all application threads.
This GC algorithm cannot thus take advantage of multiple cores commonly found in modern hardware. Independent of the number of cores available, just one is used by JVM during garbage collection.
Enabling this collector for both the Young and Old Generation is done via specifying a single parameter in the JVM startup script:
java -XX:+UseSerialGC com.mypackages.MyExecutableClass
This option is recommended and makes sense only for JVM with a couple of hundreds megabytes heap size running in a single CPU environment. For majority of the deployments this is a rare combination to find. Most server-side deployments are done on platforms with multiple cores, essentially meaning that by choosing Serial GC you are setting artificial limits on the use of system resources. This results in idle resources which otherwise could be used to reduce latency or increase throughput.
Let us now review how garbage collector logs look like when using Serial GC and what useful information one can obtain from there. For this purpose, we have turned on GC logging on the JVM using the following parameters:
–XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
resulting in output similar to the following:
2015-05-26T14:45:37.987-0200: 151.126: [GC (Allocation Failure) 151.126: [DefNew: 629119K->69888K(629120K), 0.0584157 secs] 1619346K->1273247K(2027264K), 0.0585007 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]
2015-05-26T14:45:59.690-0200: 172.829: [GC (Allocation Failure) 172.829: [DefNew: 629120K->629120K(629120K), 0.0000372 secs]172.829: [Tenured: 1203359K->755802K(1398144K), 0.1855567 secs] 1832479K->755802K(2027264K), [Metaspace: 6741K->6741K(1056768K)], 0.1856954 secs] [Times: user=0.18 sys=0.00, real=0.18 secs]
The short snippet from the GC logs exposes a lot of information about what is taking place inside the JVM. As a matter of fact, in this snippet there were two Garbage Collection events taking place, one of them cleaning the Young Generation and another taking care of the entire heap. Let’s start by analyzing the first collection that is taking place in the Young Generation.
Following snippet contains the information about a GC event cleaning the Young Generation:
2015-05-26T14:45:37.987-02001:151.1262:[GC3(Allocation Failure4) 151.126: [DefNew5:629119K->69888K6(629120K)7, 0.0584157 secs]1619346K->1273247K8(2027264K)9,0.0585007 secs10][Times: user=0.06 sys=0.00, real=0.06 secs]11
From the above snippet we can thus understand exactly what was happening with the memory consumption inside JVM during the GC event. Before this collection, heap usage totaled at 1,619,346K. Out of this, the Young Generation consumed 629,119K. From this we can calculate the Old Generation usage being equal to 990,227K.
A more important conclusion is hidden in the next batch of numbers indicating that, after the collection, Young Generation usage decreased by 559,231K but total heap usage decreased only by 346,099K. From this we can again derive that 213,132K of objects were promoted from the Young Generation to the Old Generation.
This GC event is also illustrated with the following snapshots showing memory usage right before the GC started and right after it finished:
After understanding the first minor GC event, lets look into the second GC event in the logs:
2015-05-26T14:45:59.690-02001:172.8292:[GC (Allocation Failure) 172.829: [DefNew: 629120K->629120K(629120K), 0.0000372 secs3]172.829:[Tenured4:1203359K->755802K5(1398144K)6,0.1855567 secs7]1832479K->755802K8(2027264K)9,[Metaspace: 6741K->6741K(1056768K)]10[Times: user=0.18 sys=0.00, real=0.18 secs]11
The difference with Minor GC is evident – in addition to the Young Generation, during this GC event the Old Generation and Metaspace were also cleaned. The layout of the memory before and after the event would look like the situation in the following picture:
This combination of Garbage Collectors uses mark-copy in the Young Generation and mark-sweep-compact in the Old Generation. Both the Young and Old collections trigger stop-the-world events, stopping all application threads to perform garbage collection. Both collectors run marking and copying / compacting phases using multiple threads, hence the notion ‘Parallel’. Using this approach, one can considerably reduce collection times.
The number of threads used during garbage collection is configurable via another command line option -XX:ParallelGCThreads=NNN . The default value is equal to the number of cores in your machine.
Enabling this option is done via specification of any of the following parameters combination in the JVM startup script, all of which select the same GC algorithms:
java -XX:+UseParallelGC com.mypackages.MyExecutableClass
java -XX:+UseParallelOldGC com.mypackages.MyExecutableClass
java -XX:+UseParallelGC -XX:+UseParallelOldGC com.mypackages.MyExecutableClass
Parallel Garbage Collector is suitable on multi-core machines in cases where your primary goal is to increase throughput. High throughput is achieved due effective usage of system resources:
On the other hand, as all phases of the collection have to happen without any interruptions, these collectors are still susceptible to long pauses during which your application threads are stopped and no actual work is being done. So in case latency is your primary goal, you should check the next combinations of garbage collectors.
Let us now review how garbage collector logs look like when using Parallel GC and what useful information one can obtain from there. For this, let’s look again at the garbage collector logs that expose once more one minor and one major GC pause:
2015-05-26T14:27:40.915-0200: 116.115: [GC (Allocation Failure) [PSYoungGen: 2694440K->1305132K(2796544K)] 9556775K->8438926K(11185152K), 0.2406675 secs] [Times: user=1.77 sys=0.01, real=0.24 secs]
2015-05-26T14:27:41.155-0200: 116.356: [Full GC (Ergonomics) [PSYoungGen: 1305132K->0K(2796544K)] [ParOldGen: 7133794K->6597672K(8388608K)] 8438926K->6597672K(11185152K), [Metaspace: 6745K->6745K(1056768K)], 0.9158801 secs] [Times: user=4.49 sys=0.64, real=0.92 secs]
The first of the two events indicates a GC event taking place in the Young Generation:
2015-05-26T14:27:40.915-02001:116.1152:[GC3(Allocation Failure4)[PSYoungGen5:2694440K->1305132K6(2796544K)7]9556775K->8438926K8(11185152K)9,0.2406675 secs10][Times: user=1.77 sys=0.01, real=0.24 secs]11
So, in short, the total heap consumption before the collection was 9,556,775K. Out of this Young generation was 2,694,440K. This means that used Old generation was 6,862,335K. After the collection young generation usage decreased by 1,389,308K, but total heap usage decreased only by 1,117,849K. This means that 271,459K was promoted from Young generation to Old.
After understanding how Parallel GC cleans the Young Generation, we are ready to look at how the whole heap is being cleaned by analyzing the next snippet from the GC logs:
2015-05-26T14:27:41.155-02001:116.3562:[Full GC3 (Ergonomics4)[PSYoungGen: 1305132K->0K(2796544K)]5[ParOldGen6:7133794K->6597672K7(8388608K)8]8438926K->6597672K9(11185152K)10,[Metaspace: 6745K->6745K(1056768K)]11, 0.9158801 secs12, [Times: user=4.49 sys=0.64, real=0.92 secs]13
Again, the difference with Minor GC is evident – in addition to the Young Generation, during this GC event the Old Generation and Metaspace were also cleaned. The layout of the memory before and after the event would look like the situation in the following picture:
The official name of this garbage collectors combination is “Mostly Concurrent Mark and Sweep Garbage Collector”. It uses the parallel stop-the-world mark-copy algorithm in the Young Generation and the mostly concurrent mark-sweep algorithm in the Old Generation.
This collector was designed to avoid long pauses while collecting in the Old Generation. It achieves this by two means. Firstly, it does not compact the Old Generation but uses free-lists to manage reclaimed space. Secondly, it does most of job in the mark-and-sweep phases concurrently with the application. This means that garbage collection is not stopping the application threads completely and is using multiple threads to complete the collection. By default, the number of threads used equals ¼ of the number of physical cores of your machine.
This garbage collector can be chosen by specifying the following option on your command line:
java -XX:+UseConcMarkSweepGC com.mypackages.MyExecutableClass
This combination is a good choice on multi-core machines if your primary target is latency. Decreasing the duration of the individual GC pause directly affects the way your application is perceived by end-users, giving them a feel of a more responsive application. As most of the time at least some part of the CPU is occupied by GC and not executing your application’s code, CMS generally provides worse throughput than Parallel GC.
As with previous GC algorithms, let us now see how this algorithm is applied in practice by taking a look at the GC logs that once again expose one minor and one major GC pause:
2015-05-26T16:23:07.219-0200: 64.322: [GC (Allocation Failure) 64.322: [ParNew: 613404K->68068K(613440K), 0.1020465 secs] 10885349K->10880154K(12514816K), 0.1021309 secs] [Times: user=0.78 sys=0.01, real=0.11 secs]
2015-05-26T16:23:07.321-0200: 64.425: [GC (CMS Initial Mark) [1 CMS-initial-mark: 10812086K(11901376K)] 10887844K(12514816K), 0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2015-05-26T16:23:07.321-0200: 64.425: [CMS-concurrent-mark-start]
2015-05-26T16:23:07.357-0200: 64.460: [CMS-concurrent-mark: 0.035/0.035 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
2015-05-26T16:23:07.357-0200: 64.460: [CMS-concurrent-preclean-start]
2015-05-26T16:23:07.373-0200: 64.476: [CMS-concurrent-preclean: 0.016/0.016 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2015-05-26T16:23:07.373-0200: 64.476: [CMS-concurrent-abortable-preclean-start]
2015-05-26T16:23:08.446-0200: 65.550: [CMS-concurrent-abortable-preclean: 0.167/1.074 secs] [Times: user=0.20 sys=0.00, real=1.07 secs]
2015-05-26T16:23:08.447-0200: 65.550: [GC (CMS Final Remark) [YG occupancy: 387920 K (613440 K)]65.550: [Rescan (parallel) , 0.0085125 secs]65.559: [weak refs processing, 0.0000243 secs]65.559: [class unloading, 0.0013120 secs]65.560: [scrub symbol table, 0.0008345 secs]65.561: [scrub string table, 0.0001759 secs][1 CMS-remark: 10812086K(11901376K)] 11200006K(12514816K), 0.0110730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
2015-05-26T16:23:08.458-0200: 65.561: [CMS-concurrent-sweep-start]
2015-05-26T16:23:08.485-0200: 65.588: [CMS-concurrent-sweep: 0.027/0.027 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2015-05-26T16:23:08.485-0200: 65.589: [CMS-concurrent-reset-start]
2015-05-26T16:23:08.497-0200: 65.601: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
First of the GC events in log denotes a minor GC cleaning the Young space. Let’s analyze how this collector combination behaves in this regard:
2015-05-26T16:23:07.219-02001:64.3222:[GC3(Allocation Failure4) 64.322: [ParNew5:613404K->68068K6(613440K)7,0.1020465 secs8] 10885349K->10880154K9(12514816K)10,0.1021309 secs11][Times: user=0.78 sys=0.01, real=0.11 secs]12
From the above we can thus see that before the collection the total used heap was 10,885,349K and the used Young Generation share was 613,404K. This means that the Old Generation share was 10,271,945K. After the collection, Young Generation usage decreased by 545,336K but total heap usage decreased only by 5,195K. This means that 540,141K was promoted from the Young Generation to Old.
Now, just as you are becoming accustomed to reading GC logs already, this chapter will introduce a completely different format for the next garbage collection event in the logs. The verbose output that follows consists of all the different phases of the mostly concurrent garbage collection in the Old Generation. We will review them one by one but in this case we will cover the log content in phases instead of the entire event log at once for more concise representation. But to recap, the whole event for the CMS collector looks like the following:
2015-05-26T16:23:07.321-0200: 64.425: [GC (CMS Initial Mark) [1 CMS-initial-mark: 10812086K(11901376K)] 10887844K(12514816K), 0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2015-05-26T16:23:07.321-0200: 64.425: [CMS-concurrent-mark-start]
2015-05-26T16:23:07.357-0200: 64.460: [CMS-concurrent-mark: 0.035/0.035 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
2015-05-26T16:23:07.357-0200: 64.460: [CMS-concurrent-preclean-start]
2015-05-26T16:23:07.373-0200: 64.476: [CMS-concurrent-preclean: 0.016/0.016 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2015-05-26T16:23:07.373-0200: 64.476: [CMS-concurrent-abortable-preclean-start]
2015-05-26T16:23:08.446-0200: 65.550: [CMS-concurrent-abortable-preclean: 0.167/1.074 secs] [Times: user=0.20 sys=0.00, real=1.07 secs]
2015-05-26T16:23:08.447-0200: 65.550: [GC (CMS Final Remark) [YG occupancy: 387920 K (613440 K)]65.550: [Rescan (parallel) , 0.0085125 secs]65.559: [weak refs processing, 0.0000243 secs]65.559: [class unloading, 0.0013120 secs]65.560: [scrub symbol table, 0.0008345 secs]65.561: [scrub string table, 0.0001759 secs][1 CMS-remark: 10812086K(11901376K)] 11200006K(12514816K), 0.0110730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
2015-05-26T16:23:08.458-0200: 65.561: [CMS-concurrent-sweep-start]
2015-05-26T16:23:08.485-0200: 65.588: [CMS-concurrent-sweep: 0.027/0.027 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2015-05-26T16:23:08.485-0200: 65.589: [CMS-concurrent-reset-start]
2015-05-26T16:23:08.497-0200: 65.601: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
Phase 1: Initial Mark. This is one of the two stop-the-world events during CMS. The goal of this phase is to collect all Garbage Collector roots.
2015-05-26T16:23:07.321-0200: 64.421: [GC (CMS Initial Mark2[1 CMS-initial-mark:10812086K3(11901376K)4]10887844K5(12514816K)6,0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]7
Phase 2: Concurrent Mark. During this phase the Garbage Collector traverses the Old Generation and marks all live objects, starting from the roots found in the previous phase of “Initial Mark”. The “Concurrent Mark” phase, as its name suggests, runs concurrently with your application and does not stop the application threads.
2015-05-26T16:23:07.321-0200: 64.425: [CMS-concurrent-mark-start]2015-05-26T16:23:07.357-0200: 64.460: [CMS-concurrent-mark1:035/0.035 secs2][Times: user=0.07 sys=0.00, real=0.03 secs]3
Phase 3: Concurrent Preclean. This is again a concurrent phase, running in parallel with the application threads, not stopping them. Essentially it is another mark phase which will try to account references changed during previous mark phase.
2015-05-26T16:23:07.357-0200: 64.460: [CMS-concurrent-preclean-start]2015-05-26T16:23:07.373-0200: 64.476: [CMS-concurrent-preclean1:0.016/0.016 secs2][Times: user=0.02 sys=0.00, real=0.02 secs]3
Just to bear in mind – in real world situation Minor Garbage Collections of the Young Generation can occur anytime during concurrent collecting the Old Generation. In such case the major collection records seen below will be interleaved with the Minor GC events covered in previous chapter.
Phase 4: Concurrent Abortable Preclean. Again, a concurrent phase that is not stopping the application’s threads. The full reasoning behind the phase is complex to say the least, but it would suffice to tell that the goal of this phase is to reduce the amount of work to be done in the next stop-the-world phase of final remark.
2015-05-26T16:23:07.373-0200: 64.476: [CMS-concurrent-abortable-preclean-start]2015-05-26T16:23:08.446-0200: 65.550: [CMS-concurrent-abortable-preclean1:0.167/1.074 secs2] [Times: user=0.20 sys=0.00, real=1.07 secs]3
Phase 5: Final Remark. This is the second and last stop-the-world phase during the event. The goal of this stop-the-world phase is to finalize marking all live objects in the Old Generations, including the references that were created/modified during previous concurrent marking phases.
Usually CMS tries to run final remark phase when Young Generation is as empty as possible in order to eliminate the possibility of several stop-the-world phases happening back-to-back.
This event looks a bit more complex than previous phases:
2015-05-26T16:23:08.447-0200: 65.5501: [GC (CMS Final Remark2) [YG occupancy: 387920 K (613440 K)3]65.550:[Rescan (parallel) , 0.0085125 secs]465.559: [weak refs processing, 0.0000243 secs]65.5595: [class unloading, 0.0013120 secs]65.5606: [scrub string table, 0.0001759 secs7][1 CMS-remark: 10812086K(11901376K)8]11200006K(12514816K) 9, 0.0110730 secs10] [[Times: user=0.06 sys=0.00, real=0.01 secs]11
After the five marking phases, all live objects in the Old Generation are marked and now garbage collector is going to reclaim all unused objects by sweeping the Old Generation:
Phase 6: Concurrent Sweep. Performed concurrently with the application, without the need for the stop-the-world pauses. The purpose of the phase is to remove unused objects and to reclaim the space occupied by them for future use.
2015-05-26T16:23:08.458-0200: 65.561: [CMS-concurrent-sweep-start]2015-05-26T16:23:08.485-0200: 65.588: [CMS-concurrent-sweep1:0.027/0.027 secs2] [[Times: user=0.03 sys=0.00, real=0.03 secs]3
Phase 7: Concurrent Reset. Concurrently executed phase, resetting inner data structures of the CMS algorithm and preparing them for the next cycle.
2015-05-26T16:23:08.485-0200: 65.589: [CMS-concurrent-reset-start]2015-05-26T16:23:08.497-0200: 65.601: [CMS-concurrent-reset1:0.012/0.012 secs2] [[Times: user=0.01 sys=0.00, real=0.01 secs]3
Just to bear in mind – in a real world situation Minor Garbage Collections in the Young Generation can occur anytime during concurrent collecting in the Old Generation. In such case the major collection records seen below will be interleaved with the Minor GC events covered in previous chapter.
G1, called ‘Garbage First’ collector, uses a completely different heap layout than previous collectors. The design principles of the collector are also based on different assumptions than previous algorithms.
But when zooming out far enough, we can still say G1 is just a clever modification of the familiar mark-copy algorithm. It aims to replace CMS as a low-pause garbage collector in OpenJDK and as such sees a great deal of care nowadays. Almost every new version of Java 8 brings performance and stability improvements to G1. There are plans to make G1 the default garbage collector for the upcoming release of Java 9.
This garbage collector can be chosen by specifying the following option on your command line:
java -XX:+UseG1GC com.mypackages.MyExecutableClass
If you use the latest versions – Java 8 or 9 – G1 garbage collector can be a good choice for latency-constrained applications requiring high responsiveness. But you should test it carefully before using realistic usage scenarios as it is still not as mature and well-tested as CMS. In addition, there are limited possibilities and guidelines for tuning the performance of G1 garbage collector.
The details on the G1-specific behavior will be added to the handbook during the Q2 2015
From:https://plumbr.eu/handbook/garbage-collection-algorithms-implementations