Zz JavaOne: Garbage First

JavaOne: Garbage First

 
Sun’s HotSpot Garbage Collectors can be divided into two categories: The young generation and the tenured generation. The majority of allocations will be done in the young generation, which is optimized to have a short lifetime relative to the interval between collections. Objects that survive several collections in the young generation get moved to the tenured generation, which is typically larger and collected less often. The young generation collectors are Serial, ParNew and Parallel Scavenge. All three are stop-the-world copying collectors. Serial uses a single GC thread, whilst ParNew and Parallel Scavenge both use multiple threads. The tenured collectors all use a mark-sweep-compact algorithm. Again three are currently in use: Serial Old, which is another single GC thread, Parallel Old which using multiple GC threads and CMS, a mostly concurrent low pause collector. Garbage First aims to replace CMS and takes a somewhat different approach, straddling the young generation - tenured generation boundary.

In a presentation at JavaOne this year, Tony Printezis provided some more details and a follow up interview is also now available on the JavaOne conference site. In it Printezis provides a summary of how Garbage First (G1) works:

“The heap is split into fixed size regions and the separation between the two generations is basically logical. So some regions are considered to be young, some old. All space reclamation in G1 is done through copying. G1 selects a set of regions, pick the surviving object from those regions and copy them to another set of regions. This is how all space reclamation happens in G1, instead of the combination of copying and in-place de-allocation that CMS does.”

Printezis goes on to describe three main objectives for the new collector:

“The first objective is consistent low pauses over time. In essence, because G1 compacts as it proceeds, it copies objects from one area of the heap to the other. Thus, because of compaction, it will not encounter fragmentation issues that CMS might. There will always be areas of contiguous free space from which to allocate, allowing G1 to have consistent pauses over time. 

The second objective is to avoid, as much as possible, having a full GC. After G1 performs a global marking phase determining the liveness of objects throughout the heap, it will immediately know where in the heap there are regions that are mostly empty. It will tackle those regions first, making a lot of space available. This way, the garbage collector will obtain more breathing space, decreasing the probability of a full GC. This is also why the garbage collector is called Garbage-First.

The final objective is good throughput. For many of our customers, throughput is king. We want G1 to have good throughput to meet our customers' requirements.”

Sun Research have published a paper (pdf document) which provides more details on Garbage-First and provides some information on how these objectives, in particular the real-time goal, are achieved. Whilst most real-time collectors work at the highly granular level of individual objects, Garbage First collects at the region level. If any region contains no live objects it is immediately reclaimed. The user can specify a goal for the pauses and G1 will do an estimate of how many regions can be collected in that time based on previous collections. So the collector has a reasonably accurate model of the cost of collecting the regions, and therefore "the collector can choose a set of regions that can be collected within a given pause time limit (with high probability)." In other words, Garbage-First is not a hard real-time collector - it meets the soft real-time goal with high probability but not absolute certainty. The trade-off is that Garbage-First should yield higher throughput in return for the softer, but still moderately stringent, real-time constraints. This is a good fit for large-scale server applications which often have large amounts of live heap data and considerable thread-level parallelism. Garbage-First also provides some finer control, allowing a user to specify a fraction of time during a period of execution to be spent on garbage collection - for example in the next 120 seconds spend no more than 20 seconds on garbage collection.

Garbage First will be in Java SE 7 and should be committed in the next few weeks. It will also be released as an update to Java 6.

 

 Sun的HotSpot垃圾收集器可分为两类:新生区(young generation)与老年区(tenured generation)。大部分的内存分配在新生区中进行,相对于垃圾收集的间隔时间来说,它经过了优化并且生命周期很短。经过几次垃圾收集后仍然存活于 新生区中的对象将被迁移到老年区中,这部分区域通常更大并且垃圾收集不那么频繁。新生区收集器分为连续式(Serial)、 同新式(ParNew)及并行扫描式(Parallel Scavenge)三种。所有这三种都是拷贝收集器。连续式使用了一个单独的GC线程,而同新式与并行扫描式都使用了多线程。老年区收集器都使用了标记扫 描压缩(mark-sweep-compact)算法。同样老年区收集器也分为三种:Serial Old(另一个单独的GC线程)、Parallel Old(使用多个GC线程)及CMS(一个多并发低暂停的收集器)。Garbage First的目标在于替换掉CMS并且采取了某些不同的方式——跨越了新生区和老年区的边界。    

在今年JavaOne的一个展示中,Tony Printezis对Garbage First进行了详尽的介绍,在JavaOne大会的网站上有一个随后的采访。Printezis概述了Garbage First (G1)的工作方式:

“堆被切分成固定大小的区域,同时两个区域之间的分隔基本上是合理的。因此我们可以认为一些区域是新的,另一些是老的。在G1中所有的空间回收都是通过拷 贝完成的。G1选择一组区域,从那些区域中摘出存活的对象,然后将其拷贝到另一组区域中。这就是G1中空间回收的方式,而不是CMS中所采取的那种方式 (拷贝与适当的重分配的组合方式)。” 

Printezis继续阐述了新的收集器的三个主要目标:

“首要目标是随始终一致的低停顿率。本质上,由于G1在处理同时做压缩,它将对象从堆的一个地方拷贝到另一个地方。这样,由于压缩的原因,它不会遇到CMS可能会遇到的碎片问题。总会有连续空闲的空间供分配,这就使得G1拥有始终一致的停顿率。
第二个目标是尽量避免完全的GC。在G1对全局进行标记并决定堆上对象的活跃度后,它立刻就知道堆上的哪些区域几乎是空闲的。它将首先处理那些区域,腾出 大量空间。通过这种方式,垃圾收集器将获得更多空间并减少完全GC的可能性。这也是为什么该垃圾收集器叫做Garbage-First的原因。
最后一个目标是良好的吞吐量。对于我们很多客户来说,吞吐量意味着一切。我们期望G1拥有良好的吞吐量以满足我们客户的需求。”

Sun研究小组发表了一篇论文(pdf 格式)更加详尽地论述了Garbage-First并深入分析了如何实现这些目标,尤其是实时目标。大多数实时收集器工作在单个对象层次上,而 Garbage First则在区域层次上进行收集。如果任何区域不再包含存活的对象时,它就会被立刻回收。用户可以为停顿率指定一个目标,G1会基于之前的收集对此时可 回收的区域数量作出估计。该收集器对区域回收的代价有一个合理且精确的模型,所以“该收集器可以在给定的停顿时间内(高概率)选择一组可被回收的区域。” 换句话说,Garbage-First并不是一个纯粹的实时收集器——它以高概率但不绝对地满足软实时目标。作为交换,Garbage-First应该具 备更高的吞吐量以作为软实时的补偿,但是其仍会适度遵循实时的限制。这对于经常产生大量存活堆数据和线程级别数据的大规模服务器端应用来说是非常棒的。 Garbage-First还提供了一些出色的控制,使得用户可以在垃圾收集的执行周期中指定一小部分时间——例如,在下一个120秒中最多花20秒的时 间在垃圾收集上。

 

Garbage First将会包含在Java SE 7中并且过几周就会被提交。它也将以升级包的方式加入到Java 6中。 

你可能感兴趣的:(G1)