Concurrent mark gives reduced and consistent garbage collection pause times when heap sizes increase.
The GC starts a concurrent marking phase before the heap is full. In the concurrent phase, the GC scans the roots, i.e. stacks, JNI references, class statics, and so on. The stacks are scanned by asking each thread to scan its own stack. These roots are then used to trace live objects concurrently. Tracing is done by a low-priority background thread and by each application thread when it does a heap lock allocation.
While the GC is marking live objects concurrently with application threads running, it has to record any changes to objects that are already traced. It uses a write barrier that is run every time a reference in an object is updated. The write barrier flags when an object reference update has occurred, to force a re-scan of part of the heap. The heap is divided into 512-byte sections and each section is allocated a one-byte card in the card table. Whenever a reference to an object is updated, the card that corresponds to the start address of the object that has been updated with the new object reference is marked with 0x01. A byte is used instead of a bit to eliminate contention; it allows marking of the cards using non-atomic operations. A stop-the-world (STW) collection is started when one of the following occurs:
- An allocation failure
- A System.gc
- Concurrent mark completes all the marking that it can do
The GC tries to start the concurrent mark phase so that it completes at the same time as the heap is exhausted. The GC does this by constant tuning of the parameters that govern the concurrent mark time. In the STW phase, the GC re-scans all roots and uses the marked cards to see what else must be retraced, and then sweeps as normal. It is guaranteed that all objects that were unreachable at the start of the concurrent phase are collected. It is not guaranteed that objects that become unreachable during the concurrent phase are collected. Objects which become unreachable during the concurrent phase are referred to as "floating garbage".
Reduced and consistent pause times are the benefits of concurrent mark, but they come at a cost. Application threads must do some tracing when they are requesting a heap lock allocation. The processor usage needed varies depending on how much idle CPU time is available for the background thread. Also, the write barrier requires additional processor usage.
The -Xgcpolicy command-line parameter is used to enable and disable concurrent mark:
-Xgcpolicy: <optthruput | optavgpause | gencon | subpool>
The
-Xgcpolicy options have these effects:
optthruput
Disables concurrent mark. If you do not have pause time problems (as seen by erratic application response times), you get the best throughput with this option.
Optthruput is the default setting.
optavgpause
Enables concurrent mark with its default values. If you are having problems with erratic application response times that are caused by normal garbage collections, you can reduce those problems at the cost of some throughput, by using the
optavgpause option.
gencon
Requests the combined use of concurrent and generational GC to help minimize the time that is spent in any garbage collection pause.
subpool
Disables concurrent mark. It uses an improved object allocation algorithm to achieve better performance when allocating objects on the heap. This option might improve performance on SMP systems with 16 or more processors. The subpool option is available only on AIX®, Linux® PPC and zSeries®, z/OS®, and i5/OS®.