PermGen大了也不行

随便记一下。

今天有个应用说是full GC过于频繁。看了下发现执行的都是CMS GC,并不是真的stop-the-world的full GC。但确实是很频繁,几秒就触发一次。
堆的使用状况,eden/SS0/SS1的使用量都没啥特别的,old gen大概用了10%+,而perm gen用了70%+。光看空间占用量的话,都还达不到CMS的触发条件。.
$ jstat -gcutil `pgrep -u admin java` 1s
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
 37.21   0.00  99.81  12.87  76.82   1767  196.843  3085 2998.088 3194.931
 37.21   0.00  99.81  12.87  76.82   1767  196.843  3086 2998.088 3194.931
  0.00  47.48   1.06  12.90  76.82   1768  196.959  3086 2999.778 3196.737
  0.00  47.48   1.88  12.90  76.82   1768  196.959  3086 2999.778 3196.737


有几个VM参数会影响GC堆的占用量状况与CMS的触发之间的关系:
product(uintx, MinHeapFreeRatio, 40,
  "Min percentage of heap free after GC to avoid expansion")
product(intx, CMSTriggerRatio, 80,
  "Percentage of MinHeapFreeRatio in CMS generation that is allocated before a CMS collection cycle commences")
product(intx, CMSTriggerPermRatio, 80,
  "Percentage of MinHeapFreeRatio in the CMS perm generation that is allocated before a CMS collection cycle commences, that also collects the perm generation")
product(intx, CMSInitiatingOccupancyFraction, -1,
  "Percentage CMS generation occupancy to start a CMS collection cycle. A negative value means that CMSTriggerRatio is used")
product(intx, CMSInitiatingPermOccupancyFraction, -1,
  "Percentage CMS perm generation occupancy to start a CMScollection cycle. A negative value means that CMSTriggerPermRatio is used")


在HotSpot VM里,上面几个参数是这样用的:
// The field "_initiating_occupancy" represents the occupancy percentage
// at which we trigger a new collection cycle.  Unless explicitly specified
// via CMSInitiating[Perm]OccupancyFraction (argument "io" below), it
// is calculated by:
//
//   Let "f" be MinHeapFreeRatio in
//
//    _intiating_occupancy = 100-f +
//                           f * (CMSTrigger[Perm]Ratio/100)
//   where CMSTrigger[Perm]Ratio is the argument "tr" below.
//
// That is, if we assume the heap is at its desired maximum occupancy at the
// end of a collection, we let CMSTrigger[Perm]Ratio of the (purported) free
// space be allocated before initiating a new collection cycle.
//
void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, intx tr) {
  assert(io <= 100 && tr >= 0 && tr <= 100, "Check the arguments");
  if (io >= 0) {
    _initiating_occupancy = (double)io / 100.0;
  } else {
    _initiating_occupancy = ((100 - MinHeapFreeRatio) +
                             (double)(tr * MinHeapFreeRatio) / 100.0)
                            / 100.0;
  }
}

_cmsGen ->init_initiating_occupancy(CMSInitiatingOccupancyFraction, CMSTriggerRatio);
_permGen->init_initiating_occupancy(CMSInitiatingPermOccupancyFraction, CMSTriggerPermRatio);


在这个应用里,这几个参数都没显式设置,用的就是默认值:
$ jinfo -flag MinHeapFreeRatio `pgrep -u admin java`
-XX:MinHeapFreeRatio=40
$ jinfo -flag CMSTriggerPermRatio `pgrep -u admin java`
-XX:CMSTriggerPermRatio=80
$ jinfo -flag CMSInitiatingPermOccupancyFraction `pgrep -u admin java`
-XX:CMSInitiatingPermOccupancyFraction=-1

所以可以知道,CMS perm gen触发CMS GC的占用量是((100 - 40) + (80 * 40) / 100.0) / 100.0 = 92%

要观察CMS的触发条件的动态调整的话,有 -XX:+PrintCMSInitiationStatistics参数可用。这里有该参数对应的日志的例子, https://gist.github.com/1050942,内容是类似这样的:
CMSCollector shouldConcurrentCollect: 42.910
time_until_cms_gen_full 2.0111715
free=32676856
contiguous_available=44957696
promotion_rate=1.00797e+07
cms_allocation_rate=0
occupancy=0.5003915
initiatingOccupancy=0.9200000
initiatingPermOccupancy=0.9200000


CMS GC默认不只通过old gen和perm gen的占用量来触发,还有别的一些条件。
product(bool, UseCMSInitiatingOccupancyOnly, false, "Only use occupancy as a crierion for starting a CMS collection")

如果这个参数是true的话那就只用占用量来触发了。

关于CMS GC的触发条件的一段注释:
// We should be conservative in starting a collection cycle.  To
// start too eagerly runs the risk of collecting too often in the
// extreme.  To collect too rarely falls back on full collections,
// which works, even if not optimum in terms of concurrent work.
// As a work around for too eagerly collecting, use the flag
// UseCMSInitiatingOccupancyOnly.  This also has the advantage of
// giving the user an easily understandable way of controlling the
// collections.
// We want to start a new collection cycle if any of the following
// conditions hold:
// . our current occupancy exceeds the configured initiating occupancy
//   for this generation, or
// . we recently needed to expand this space and have not, since that
//   expansion, done a collection of this generation, or
// . the underlying space believes that it may be a good idea to initiate
//   a concurrent collection (this may be based on criteria such as the
//   following: the space uses linear allocation and linear allocation is
//   going to fail, or there is believed to be excessive fragmentation in
//   the generation, etc... or ...
// [.(currently done by CMSCollector::shouldConcurrentCollect() only for
//   the case of the old generation, not the perm generation; see CR 6543076):
//   we may be approaching a point at which allocation requests may fail because
//   we will be out of sufficient free space given allocation rate estimates.]
bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {

// Decide if we want to enable class unloading as part of the
// ensuing concurrent GC cycle. We will collect the perm gen and
// unload classes if it's the case that:
// (1) an explicit gc request has been made and the flag
//     ExplicitGCInvokesConcurrentAndUnloadsClasses is set, OR
// (2) (a) class unloading is enabled at the command line, and
//     (b) (i)   perm gen threshold has been crossed, or
//         (ii)  old gen is getting really full, or
//         (iii) the previous N CMS collections did not collect the
//               perm gen
// NOTE: Provided there is no change in the state of the heap between
// calls to this method, it should have idempotent results. Moreover,
// its results should be monotonically increasing (i.e. going from 0 to 1,
// but not 1 to 0) between successive calls between which the heap was
// not collected. For the implementation below, it must thus rely on
// the property that concurrent_cycles_since_last_unload()
// will not decrease unless a collection cycle happened and that
// _permGen->should_concurrent_collect() and _cmsGen->is_too_full() are
// themselves also monotonic in that sense. See check_monotonicity()
// below.
bool CMSCollector::update_should_unload_classes() {

// Support for concurrent collection policy decisions.
bool CompactibleFreeListSpace::should_concurrent_collect() const {
  // In the future we might want to add in frgamentation stats --
  // including erosion of the "mountain" into this decision as well.
  return !adaptive_freelists() && linearAllocationWouldFail();
}


=====================================================================

观察GC日志,发现CMS的initial mark阶段的暂停居然超过1.4s了。而印象中上次看这个应用的initial mark的暂停时间还不到1s,说明情况恶化了。这期间,MaxPermSize从256m调到了512m。感觉暂停时间的提高跟perm gen的增大很有关系。

2011-06-28T11:11:01.417+0800: 432933.547: [GC [1 CMS-initial-mark: 262613K(2097152K)] 2003375K(4019584K), 1.9010460 secs] [Times: user=1.90 sys=0.00, real=1.90 secs] 
2011-06-28T11:11:03.347+0800: 432935.478: [CMS-concurrent-mark-start]
2011-06-28T11:11:04.737+0800: 432936.867: [CMS-concurrent-mark: 1.362/1.390 secs] [Times: user=1.50 sys=0.01, real=1.39 secs] 
2011-06-28T11:11:04.737+0800: 432936.868: [CMS-concurrent-preclean-start]
2011-06-28T11:11:04.752+0800: 432936.883: [CMS-concurrent-preclean: 0.014/0.015 secs] [Times: user=0.00 sys=0.01, real=0.02 secs] 
2011-06-28T11:11:04.752+0800: 432936.883: [CMS-concurrent-abortable-preclean-start]
2011-06-28T11:11:07.783+0800: 432939.913: [GC 432939.913: [ParNew: 1800925K->85199K(1922432K), 0.1672690 secs] 2063538K->348295K(4019584K), 0.1675310 secs] [Times: user=0.40 sys=0.00, real=0.17 secs] 
 CMS: abort preclean due to time 2011-06-28T11:11:09.938+0800: 432942.068: [CMS-concurrent-abortable-preclean: 3.556/5.185 secs] [Times: user=4.15 sys=0.04, real=5.18 secs] 
2011-06-28T11:11:09.944+0800: 432942.074: [GC[YG occupancy: 124255 K (1922432 K)]432942.074: [Rescan (parallel) , 0.1174650 secs]432942.192: [weak refs processing, 0.0000200 secs]432942.192: [class unloading, 0.0868120 secs]432942.279: [scrub symbol & string tables, 0.0169840 secs] [1 CMS-remark: 263096K(2097152K)] 387351K(4019584K), 0.2316920 secs] [Times: user=0.31 sys=0.00, real=0.23 secs] 
2011-06-28T11:11:10.176+0800: 432942.306: [CMS-concurrent-sweep-start]
2011-06-28T11:11:10.688+0800: 432942.818: [CMS-concurrent-sweep: 0.512/0.512 secs] [Times: user=0.54 sys=0.00, real=0.51 secs] 
2011-06-28T11:11:10.688+0800: 432942.818: [CMS-concurrent-reset-start]
2011-06-28T11:11:10.707+0800: 432942.838: [CMS-concurrent-reset: 0.020/0.020 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 


增大perm gen对这应用来说或许反而成毒药了。它大量使用了Groovy脚本,有比较频繁的新的类的生成与加载动作,简单看了段日志大概是每隔几分钟会加载十来个类。类卸载的速度足以维持perm gen不OOM。

嘛,回头再跟进了。

你可能感兴趣的:(C++,c,cms,C#,groovy)