当使用 -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintHeapAtGC 参数启动HotSpot来运行Java程序时,可以看到类似下面这种GC log:
{Heap before GC invocations=0 (full 0):
par new generation total 14784K, used 13175K [0x03ad0000, 0x04ad0000, 0x04ad0000)
eden space 13184K, 99% used [0x03ad0000, 0x047adcf8, 0x047b0000)
from space 1600K, 0% used [0x047b0000, 0x047b0000, 0x04940000)
to space 1600K, 0% used [0x04940000, 0x04940000, 0x04ad0000)
concurrent mark-sweep generation total 49152K, used 0K [0x04ad0000, 0x07ad0000, 0x09ad0000)
concurrent-mark-sweep perm gen total 16384K, used 2068K [0x09ad0000, 0x0aad0000, 0x0dad0000)
好多名字好多数字。可是它们分别是什么意思呢?在源码里找找答案吧~
如果设置PrintHeapAtGC参数,则HotSpot在GC前后都会将GC堆的概要状况输出到log中。
在HotSpot源码中搜索“PrintHeapAtGC”,可以找到许多地方。其中形如“if (PrintHeapAtGC)”的就是该参数起作用的地方。这里挑genCollectedHeap为例来看看:
hotspot/src/share/vm/memory/genCollectedHeap.cpp
void GenCollectedHeap::do_collection(bool full,
bool clear_all_soft_refs,
size_t size,
bool is_tlab,
int max_level) {
bool prepared_for_verification = false;
ResourceMark rm;
DEBUG_ONLY(Thread* my_thread = Thread::current();)
assert(SafepointSynchronize::is_at_safepoint(), "should be at safepoint");
assert(my_thread->is_VM_thread() ||
my_thread->is_ConcurrentGC_thread(),
"incorrect thread type capability");
assert(Heap_lock->is_locked(), "the requesting thread should have the Heap_lock");
guarantee(!is_gc_active(), "collection is not reentrant");
assert(max_level < n_gens(), "sanity check");
// ...
if (PrintHeapAtGC) {
Universe::print_heap_before_gc();
if (Verbose) {
gclog_or_tty->print_cr("GC Cause: %s", GCCause::to_string(gc_cause()));
}
}
// perform GC...
if (PrintHeapAtGC) {
Universe::print_heap_after_gc();
}
// ...
}
// ...
void GenCollectedHeap::print_on(outputStream* st) const {
for (int i = 0; i < _n_gens; i++) {
_gens[i]->print_on(st);
}
perm_gen()->print_on(st);
}
类似的,另外几种GC堆负责执行回收的方法也会在回收前后分别调用Universe::print_heap_before_gc()与Universe::print_heap_after_gc():
PSMarkSweep::invoke_no_policy()
PSScavenge::invoke_no_policy()
PSParallelCompact::pre_compact()
PSScavenge::invoke_no_policy()
G1CollectedHeap::do_collection()
G1CollectedHeap::do_collection_pause_at_safepoint()
那么这两个输出堆信息的函数是如何实现的呢?
hotspot/src/share/vm/memory/universe.hpp
static void print_heap_before_gc() { print_heap_before_gc(gclog_or_tty); }
static void print_heap_after_gc() { print_heap_after_gc(gclog_or_tty); }
hotspot/src/share/vm/memory/universe.cpp
void Universe::print_heap_before_gc(outputStream* st) {
st->print_cr("{Heap before GC invocations=%u (full %u):",
heap()->total_collections(),
heap()->total_full_collections());
heap()->print_on(st);
}
void Universe::print_heap_after_gc(outputStream* st) {
st->print_cr("Heap after GC invocations=%u (full %u):",
heap()->total_collections(),
heap()->total_full_collections());
heap()->print_on(st);
st->print_cr("}");
}
OK,可以看到大体骨架了。在invocations=后的数字表示的是总的GC次数,full后的数字则是其中full GC的次数。接下来就交给各个不同算法实现的GC堆来输出自身的信息了。
留意到本例中启动JVM时用了-XX:+UseParNewGC -XX:+UseConcMarkSweepGC这两个参数。这指定了在年轻代使用parallel new收集器,在年老代使用concurrent-mark-sweep收集器。这种组合所使用的堆就是前面提到的GenCollectedHeap,本例中输出堆信息调用heap()->print_on(st)调用的就是GenCollectedHeap::print_on(),代码上面也贴出来了。其中每一代都被组织为一个Generation类的对象:
hotspot/src/share/vm/memory/generation.hpp
// A Generation models a heap area for similarly-aged objects.
// It will contain one ore more spaces holding the actual objects.
//
// The Generation class hierarchy:
//
// Generation - abstract base class
// - DefNewGeneration - allocation area (copy collected)
// - ParNewGeneration - a DefNewGeneration that is collected by
// several threads
// - CardGeneration - abstract class adding offset array behavior
// - OneContigSpaceCardGeneration - abstract class holding a single
// contiguous space with card marking
// - TenuredGeneration - tenured (old object) space (markSweepCompact)
// - CompactingPermGenGen - reflective object area (klasses, methods, symbols, ...)
// - ConcurrentMarkSweepGeneration - Mostly Concurrent Mark Sweep Generation
// (Detlefs-Printezis refinement of
// Boehm-Demers-Schenker)
//
// The system configurations currently allowed are:
//
// DefNewGeneration + TenuredGeneration + PermGeneration
// DefNewGeneration + ConcurrentMarkSweepGeneration + ConcurrentMarkSweepPermGen
//
// ParNewGeneration + TenuredGeneration + PermGeneration
// ParNewGeneration + ConcurrentMarkSweepGeneration + ConcurrentMarkSweepPermGen
//
// ...
class Generation: public CHeapObj {
// ...
// Memory area reserved for generation
VirtualSpace _virtual_space;
// ...
public:
// The set of possible generation kinds.
enum Name {
ASParNew,
ASConcurrentMarkSweep,
DefNew,
ParNew,
MarkSweepCompact,
ConcurrentMarkSweep,
Other
};
// ...
// Space enquiries (results in bytes)
virtual size_t capacity() const = 0; // The maximum number of object bytes the
// generation can currently hold.
virtual size_t used() const = 0; // The number of used bytes in the gen.
virtual size_t free() const = 0; // The number of free bytes in the gen.
// ...
};
(题外话:注意上面的注意提到这个分代式GC堆的框架下所允许的组合有哪些。
可以留意到ParallelGC与G1GC不在其中,难道它们不是分代式的?其实是,虽然它们也是分代式GC的实现(G1逻辑上是分代式的),但并没有使用HotSpot原先的框架,而是另外开了接口。这点在
OpenJDK的一篇文档上有所描述:
引用
Collector Styles
There are two styles in which we've built collectors. At first we had a framework into which we could plug generations, each of which would have its own collector. The framework is general enough for us to have built several collectors on it, and does support a limited amount of mix-and-matching of “framework” generations. The framework has some inefficiencies due to the generality at allows. We've worked around some of the inefficiencies. When we built the high-throughput collector we decided not to use the framework, but instead designed an interface that a collector would support, with the high-throughput collector as an instance of that interface. That means that the “interface” collectors can't be mixed and matched, which implies some duplication of code. But it has the advantage that one can work on an “interface” collector without worrying about breaking any of the other collectors.
)
好吧,扯回来。看看Generation::print_on()是如何实现的:
hotspot/src/share/vm/memory/generation.cpp
void Generation::print_on(outputStream* st) const {
st->print(" %-20s", name());
st->print(" total " SIZE_FORMAT "K, used " SIZE_FORMAT "K",
capacity()/K, used()/K);
st->print_cr(" [" INTPTR_FORMAT ", " INTPTR_FORMAT ", " INTPTR_FORMAT ")",
_virtual_space.low_boundary(),
_virtual_space.high(),
_virtual_space.high_boundary());
}
可以看到每行上输出的是:
GC堆的名字 total 总容量 used 已使用空间 [数字1,数字2,数字3)
其中总容量与已使用空间都是以KB为单位的。
呃,“数字1”“数字2”“数字3”是怎么回事?可以看到它们是_virtual_space的属性,其声明为:
// VirtualSpace is data structure for committing a previously reserved address range in smaller chunks.
class VirtualSpace VALUE_OBJ_CLASS_SPEC {
// ...
private:
// Reserved area
char* _low_boundary;
char* _high_boundary;
// Committed area
char* _low;
char* _high;
// ...
public:
// Committed area
char* low() const { return _low; }
char* high() const { return _high; }
// Reserved area
char* low_boundary() const { return _low_boundary; }
char* high_boundary() const { return _high_boundary; }
// ...
};
“reserved area”是指申请了但还没实际提交的空间,“commited area”是指申请了并已提交的空间。“reserved”与“commited”是在分阶段向操作系统申请空间时会涉及的概念,在Windows上的话,可以参考MSDN上
VirtualAlloc()的文档。
至此可知本文开头GC log的后几行格式为:
GC堆的名字 total 总容量 used 已分配空间 [申请的虚拟空间下限,已分配的虚拟空间上限,申请的虚拟空间上限)
左方括号与右圆括号就是标准的区间记法,表示一个左闭右开的区间。
至于eden、from、to这三行的格式也是大同小异,就懒得深究了……
根据分析,可以很清楚的看到开头的GC log所表示的堆的年轻代:年老代:永久代的空间分配分别是16MB:80MB:64MB,比例是1:5:4。用图画出来,可以看到它们在虚拟内存中是被紧挨着分配的:
以上数据和代码基于JDK 1.6.0 update 18,在32位Windows XP SP3上。