记一次JVM GC日志分析

  这几天在准备升级JDK版本到1.6,对目前线上JVM(版本是1.5.0_08-b03) GC日志进行了分析,发现一些参数设置不太合理的地方,有待后续通过数据来进行验证。

1.原始GC日志(通过JVM配置GC Print参数获取GC日志)

...

695.775: [GC 695.776: [ParNew: 130944K->0K(131008K), 0.0174100 secs] 432961K->302710K(786368K), 0.0175930 secs]
697.323: [GC [1 CMS-initial-mark: 302710K(655360K)] 348624K(786368K), 0.1140530 secs]
697.438: [CMS-concurrent-mark-start]
699.494: [GC 699.494: [ParNew: 130944K->0K(131008K), 0.0115290 secs] 433654K->303891K(786368K), 0.0116990 secs]
701.381: [CMS-concurrent-mark: 1.204/3.944 secs]
701.381: [CMS-concurrent-preclean-start]
701.382: [CMS-concurrent-preclean: 0.000/0.000 secs]
701.420: [CMS-concurrent-abortable-preclean-start]
701.420: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs]
703.302: [GC 703.302: [ParNew: 130944K->0K(131008K), 0.0161480 secs] 434835K->305889K(786368K), 0.0163490 secs]
705.202: [GC[YG occupancy: 68582 K (131008 K)]705.202: [Rescan (parallel) , 0.1094800 secs]705.312: [weak refs processing, 0.0446420 secs] [1 CMS-remark: 305889K(655360K)] 374472K(786368K), 0.1543650 secs]
705.360: [CMS-concurrent-sweep-start]
705.644: [CMS-concurrent-sweep: 0.284/0.284 secs]
705.644: [CMS-concurrent-reset-start]
705.654: [CMS-concurrent-reset: 0.010/0.010 secs]
706.985: [GC 706.985: [ParNew: 130924K->0K(131008K), 0.0193060 secs] 432106K->302870K(786368K), 0.0195480 secs]
708.540: [GC [1 CMS-initial-mark: 302870K(655360K)] 350092K(786368K), 0.1140590 secs]
708.654: [CMS-concurrent-mark-start]
710.647: [GC 710.647: [ParNew: 130944K->0K(131008K), 0.0081390 secs] 433814K->303960K(786368K), 0.0083430 secs]
712.560: [CMS-concurrent-mark: 1.314/3.906 secs]
712.560: [CMS-concurrent-preclean-start]
712.560: [CMS-concurrent-preclean: 0.000/0.000 secs]
712.615: [CMS-concurrent-abortable-preclean-start]
712.615: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs]
713.567: [GC 713.567: [ParNew: 130944K->0K(131008K), 0.0104890 secs] 434904K->305163K(786368K), 0.0107090 secs]
715.109: [GC[YG occupancy: 68948 K (131008 K)]715.109: [Rescan (parallel) , 0.0884690 secs]715.198: [weak refs processing, 0.0441790 secs] [1 CMS-remark: 305163K(655360K)] 374111K(786368K), 0.1329350 secs]
715.243: [CMS-concurrent-sweep-start]
715.445: [CMS-concurrent-sweep: 0.203/0.203 secs]
715.445: [CMS-concurrent-reset-start]
715.454: [CMS-concurrent-reset: 0.009/0.009 secs]

...

2.日志分析报告

针对原始GC日志,分别用下面两种工具做了分析,其中gcviewer分析还是比较具体的,就是解析日志时会有些异常,但不影响最后的结果。

(1)通过gcviewer分析:

记一次JVM GC日志分析_第1张图片

图1(顶部的黑色线都代表Full GC,也可以理解为Major GC,是根据日志中的CMS GC统计的;底部灰色线代表的是Minor GC)

记一次JVM GC日志分析_第2张图片

图2

记一次JVM GC日志分析_第3张图片

图3

可以看到Full GC非常多,占所有pause时间比达到65.9%,这是有问题的,GC应该尽可能在年轻代完成,而不是到年老代,这个在第3部分参数说明中会提到。

(2)通过GarbageCat分析:

========================================
SUMMARY:
========================================
# GC Events: 172925
GC Event Types: CMS_INITIAL_MARK, CMS_CONCURRENT, CMS_SERIAL_OLD_CONCURRENT_MODE_FAILURE, PAR_NEW, CMS_REMARK
Max Heap Space: 1079084K
Max Heap Occupancy: 716725K
Max Perm Space: 98304K
Max Perm Occupancy: 8993K
Throughput: 97%
Max Pause: 426 ms
Total Pause: 9033244 ms
First Timestamp: 250 ms
Last Timestamp: 342994691 ms

从这里主要看pause time时间,分析报告中还有一部分不能处理的,这些日志现在还没找到最终原因:

========================================
30 UNIDENTIFIED LOG LINE(S):
========================================
39629.723: [Full GC 39629.723: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor6720]
40724.189: [Full GC 40724.189: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor1945]
40785.929: [Full GC 40785.929: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor10169]
40922.428: [Full GC 40922.428: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor10175]
124477.019: [Full GC 124477.019: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor33989]
124997.427: [Full GC 124997.427: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor34622]
125059.257: [Full GC 125059.257: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor34638]
125348.006: [Full GC 125348.006: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor34644]
125940.674: [Full GC 125940.674: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor2017]
126047.093: [Full GC 126047.093: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor5743]
128938.724: [Full GC 128938.724: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor5715]
209586.704: [Full GC 209586.704: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58696]
210272.871: [Full GC 210272.872: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedMethodAccessor5749]
211428.317: [Full GC 211428.318: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor7545]
212198.995: [Full GC 212198.995: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedMethodAccessor5806]
212408.355: [Full GC 212408.355: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58795]
212525.304: [Full GC 212525.304: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58807]
212819.763: [Full GC 212819.763: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58817]
212881.623: [Full GC 212881.623: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58821]
214567.777: [Full GC 214567.778: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor1643]
214980.856: [Full GC 214980.856: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58897]
215119.546: [Full GC 215119.546: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor58903]
294794.469: [Full GC 294794.469: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81521]
298186.036: [Full GC 298186.036: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81595]
300171.660: [Full GC 300171.660: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81708]
300318.420: [Full GC 300318.420: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81738]
300380.550: [Full GC 300380.550: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81772]
300756.398: [Full GC 300756.398: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81744]
300818.348: [Full GC 300818.348: [CMS (concurrent mode failure)[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor81788]
301812.025: [Full GC 301812.025: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor10606]

 

3.问题根源(当前参数配置及说明,其中红色标注的是设置不妥的地方)

 98%的java对象,在创建之后不久就变成了非活动对象;只有2%的对象,会在长时间一直处于活动状态。major gc需要的时间比minor gc长的多,所以我们要减少major gc次数,提高minor gc的效率,尽量将非活动对象消灭在年轻代。

针对上述分析报告,从JVM当前参数配置中找到了些原因,如下:

-Xms768m -Xmx1280m  jvm堆的最小值和最大值设置,一般设成相同值,避免频繁分配堆空间

-XX:NewSize=128m -XX:MaxNewSize=128m  年轻代最小值和最大值设置(年轻代设定了,年老代也就定了),也可以用参数-XX:NewRatio=4,年老代和年轻代的大小比,这里128m有点小了,官方建议的是heap的3/8,差不多280m

-XX:PermSize=96m -XX:MaxPermSize=128m 持久代最小值和最大值设置

-XX:MaxTenuringThreshold=0  经过多少次minor gc 后进入年老代,设置为0的话直接进入年老代,这是不太合理的,正常应该在年轻代多呆一段时间,真正需要到年老代的才转过去

-XX:SurvivorRatio=20000  年轻代中eden和一块suvivor区的空间比例,这里设置成20000有问题,suvivor区空间几乎为0,一次minor gc后基本都转到年老代了,年轻代没有起到过滤左右

-XX:+UseParNewGC  年轻代采用并行gc策略,JDK5.0以上,JVM会根据系统配置自行设置,所以无需再设置此值。使用多线程收集,提高吞吐量(-XX:ParallelGCThreads 并行收集器的线程数,此值最好配置与处理器数目相等,如果超过当前cpu数,会加大机器负载)

-XX:+UseConcMarkSweepGC  年老代采用并发gc策略,和应用程序并发执行,减少pause time,但是需要更大的堆区,因为并发执行,有碎片(-XX:+UseParallelOldGC 年老代垃圾收集方式为并行收集,这个是JAVA 6出现的参数选项)

 -XX:+CMSPermGenSweepingEnabled  为了避免Perm区满引起的full gc,建议开启CMS回收Perm区选项

 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps  打印gc日志

 -XX:CMSInitiatingOccupancyFraction=1 年老代使用空间比达到这个值时开始cms gc,默认是在年老代占满68%的时候开始进行CMS收集,这里设置成1是不合理的,会导致CMS GC频繁发生,从gc日志里可以看出来,CMS GC和minor GC几乎一样多

 -XX:+CMSIncrementalMode 启动i-CMS模式,增量模式,将cms gc过程分成6个阶段,其中阶段initial Mark和remark时需要pause,这6个阶段在两次minor gc的间隔期执行,具体执行起止时间由下面两个参数决定。拆分成小阶段增量执行时,可以避免应用被中断时间过长,极端情况是如果只有一个cpu,那么得等 全部做完这6个阶段才能释放cpu,如果是多cpu这个模式开启与否应该影响不大

-XX:CMSIncrementalDutyCycleMin=10 默认值10 启动CMS的下线

-XX:CMSIncrementalDutyCycle=30 默认值50 启动CMS的上线

-XX:+UseCMSCompactAtFullCollection  在FULL GC的时候, 对年老代的压缩。CMS是不会移动内存的, 因此这个非常容易产生碎片, 导致内存不够用, 因此, 内存的压缩这个时候就会被启用。 可能会影响性能,但是可以消除碎片,增加这个参数是个好习惯。

-XX:CMSFullGCsBeforeCompaction=0  上面配置开启的情况下,这里设置多少次Full GC后,对年老代进行压缩,这里设置成0不知道什么意思,可以根据线上full gc 的频率确定,频率高,这个值可以大点,比如5,反之频率低,这个值可以小点,比如1

-XX:CMSMarkStackSize=8M

-XX:CMSMarkStackSizeMax=32M

 

原文:http://blog.csdn.net/cpzhong/article/details/6831751

你可能感兴趣的:(日志分析)