ParNew + CMS垃圾回收过程分析

1 正常情况

1.1 环境信息

启动参数:

-Xmx:4096M -XX:MaxPermSize=512M -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ 
-Xloggc:gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M

堆最大为4G,永久代最大为512M,启用CMS进行老年代垃圾回收,ParNew做年轻代垃圾回收。GC日志会打印到bin/gc.log.{d}中,文件最大为20M,最多存储10个文件。

运行参数:

CommandLine flags: -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=1055244992 -XX:MaxHeapSize=4294967296 -XX:MaxNewSize=872415232 -XX:MaxPermSize=536870912 -XX:MaxTenuringThreshold=6 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC

初始堆空间1055244992字节约等于1G,最大堆空间4G,新生代最大空间832M,永久代最大空间512M,年轻代最大晋升年龄阈值

1.2 GC日志解析

// Minor GC
2020-04-14T19:28:29.017+0800: 2.166: [GC2020-04-14T19:28:29.017+0800: 2.166: [ParNew: 274880K->26888K(309184K), 0.0229870 secs] 274880K->26888K(996224K), 0.0231220 secs] [Times: user=0.12 sys=0.03, real=0.02 secs]
2020-04-14T19:28:29.950+0800: 3.098: [GC2020-04-14T19:28:29.950+0800: 3.099: [ParNew: 301768K->23106K(309184K), 0.1368230 secs] 301768K->45676K(996224K), 0.1369380 secs] [Times: user=0.39 sys=0.12, real=0.14 secs]
2020-04-14T19:28:30.611+0800: 3.760: [GC2020-04-14T19:28:30.611+0800: 3.760: [ParNew: 297986K->7433K(309184K), 0.0065780 secs] 320556K->30003K(996224K), 0.0066680 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
// Full GC
2020-04-15T13:51:54.511+0800: 66207.660: [Full GC2020-04-15T13:51:54.511+0800: 66207.660: [CMS: 78706K->52261K(687040K), 0.4142720 secs] 121505K->52261K(996224K), [CMS Perm : 49043K->48806K(49280K)], 0.4146180 secs] [Times: user=0.36 sys=0.03, real=0.42 secs]
// Major GC
2020-04-23T01:07:24.411+0800: 711537.560: [GC [1 CMS-initial-mark: 343521K(687040K)] 344924K(996288K), 0.0041360 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
2020-04-23T01:07:24.416+0800: 711537.564: [CMS-concurrent-mark-start]
2020-04-23T01:07:24.472+0800: 711537.621: [CMS-concurrent-mark: 0.057/0.057 secs] [Times: user=0.18 sys=0.00, real=0.05 secs]
2020-04-23T01:07:24.472+0800: 711537.621: [CMS-concurrent-preclean-start]
2020-04-23T01:07:24.476+0800: 711537.624: [CMS-concurrent-preclean: 0.003/0.003 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
2020-04-23T01:07:24.476+0800: 711537.625: [GC[YG occupancy: 2326 K (309248 K)]2020-04-23T01:07:24.476+0800: 711537.625: [Rescan (parallel) , 0.0032780 secs]2020-04-23T01:07:24.480+0800: 711537.628: [weak refs processing, 0.0029960 secs]2020-04-23T01:07:24.483+0800: 711537.631: [scrub string table, 0.0014220 secs] [1 CMS-remark: 343521K(687040K)] 345848K(996288K), 0.0079910 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
2020-04-23T01:07:24.485+0800: 711537.633: [CMS-concurrent-sweep-start]
2020-04-23T01:07:24.907+0800: 711538.056: [CMS-concurrent-sweep: 0.422/0.422 secs] [Times: user=0.44 sys=0.01, real=0.43 secs]
2020-04-23T01:07:24.907+0800: 711538.056: [CMS-concurrent-reset-start]
2020-04-23T01:07:24.917+0800: 711538.065: [CMS-concurrent-reset: 0.010/0.010 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

1.2.1 Minor GC

2020-04-14T19:28:29.017+0800: 2.166: [GC2020-04-14T19:28:29.017+0800: 2.166: [ParNew: 274880K->26888K(309184K), 0.0229870 secs] 274880K->26888K(996224K), 0.0231220 secs] [Times: user=0.12 sys=0.03, real=0.02 secs]

[ParNew: GC前年轻代占用->GC后年轻代空间占用(年轻代总空间), 0.0229870 secs] GC前堆占用->GC后堆占用(9JVM堆总空间), 0.0231220 secs] [Times: user=用户线程cpu累计时间 sys=系统线程cpu累计时间, real=时机花费时间 secs]

MInor GC的触发时机:Eden区满,按照Eden和S默认8:1计算,从日志中可以算出Eden空间为309184*(9/10) = 278266 ≈ 274880。实际花费时间一般小于前面两个时间的和,因为现在系统大都是多cpu的。

1.2.2 Major GC

Major GC由CMS负责,它有两种工作模式:

  • backGround:普通模式,包括如下文所说的7个阶段,其中1、5会发生STW,其他阶段都是和用户线程并发运行的。
  • foreGround:发生concurrent mode failure时切换成此模式,此模式类似Serial-old,它通过跳过7个阶段中不紧要的阶段来实现,例如Precleaning、AbortablePreclean。

CMS执行回收的时机:默认每2s扫描一次,若发现老年代占用达到92%则开始CMS回收。触发时机还会根据回收历史动态调整,也可用CMSInitiatingOccupancyFraction(如70%) 和 UseCMSInitiatingOccupancyOnly 自定义触发时机。

1. 初始标记

[GC [1 CMS-initial-mark: 343521K(687040K)] 344924K(996288K), 0.0041360 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]

[GC [1 CMS-initial-mark: 老年代占用(老年代空间)] 堆占用(堆空间), 0.0041360 secs] 。

初始标记阶段可并发进行。用于标记从Gc Root出发到达老年代的第一层对象,会发生STW但时间很短。CMS默认92%占用时触发GC,TODO:这里为什么50%就执行待探究。

2. 并发标记

[CMS-concurrent-mark-start]
[CMS-concurrent-mark: 0.057/0.057 secs] [Times: user=0.18 sys=0.00, real=0.05 secs]

该阶段用于标记从1中存活对象出发找到的可达对象。这个阶段中会有用户线程也在执行,因此会有新的对象产生,在这个阶段中还会将新对象或发生变化的对象(?)所对应的card table中的数据设置为相应的值,供后续阶段使用。

3. 并发预清理

[CMS-concurrent-preclean-start]
[CMS-concurrent-preclean: 0.003/0.003 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]

从card table中标记为dirty的块出发,标记存活对象。

4. 可中断并发预清理  -- 未触发

这个阶段是个重复执行的阶段,和3作用一样,也是标记存活的对象,他们的目的都是为了帮remark阶段多做一些工作,以减少STW的时间。本阶段的入口条件是(TODO):Eden区占用大于2M(CMSScheduleRemarkEdenSizeThreshold)出口条件是:Eden占用率> 50% (CMSScheduleRemarkEdenPenetration)或者本阶段持续时间>5s(CMSMaxAbortablePrecleanTime)。也可通过CMSScavengeBeforeRemark参数,使remark前强制进行一次Minor GC。

5. 重新标记

[GC[YG occupancy: 2326 K (309248 K)]2020-04-23T01:07:24.476+0800: 711537.625: [Rescan (parallel) , 0.0032780 secs]2020-04-23T01:07:24.480+0800: 711537.628: [weak refs processing, 0.0029960 secs]2020-04-23T01:07:24.483+0800: 711537.631: [scrub string table, 0.0014220 secs] [1 CMS-remark: 343521K(687040K)] 345848K(996288K), 0.0079910 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

此阶段会发生STW,扫描整个堆。从Gc Root和新生代出发,标记所有存活对象。

6. 并发清理

[CMS-concurrent-sweep-start]
[CMS-concurrent-sweep: 0.422/0.422 secs] [Times: user=0.44 sys=0.01, real=0.43 secs]

并发多线程清理标记出来的垃圾对象。

7. 并发重置

[CMS-concurrent-reset-start]
[CMS-concurrent-reset: 0.010/0.010 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

重置CMS系统参数,准备下次Major GC使用。

1.2.3 Full GC

2020-04-15T13:51:54.511+0800: 66207.660: [Full GC2020-04-15T13:51:54.511+0800: 66207.660: [CMS: 78706K->52261K(687040K), 0.4142720 secs] 121505K->52261K(996224K), [CMS Perm : 49043K->48806K(49280K)], 0.4146180 secs] [Times: user=0.36 sys=0.03, real=0.42 secs]

可以发现,本次Full GC是由于Perm Gen空间占满导致。此例中永久代的初始空间 != 最大空间,因此要减少此类情形下的停顿,可将老年代初始容量设置为最大。

Full GC触发时机:

  • Perm Gen空间满
  • 有大对象要放入老年代但老年代放不下
  • 发生promotion failed
  • 进行CMS回收时发生concurrent mode failure
  • 执行了Systerm.gc()且没有开启isableExplicitGC参数

2 异常情况

2.1 环境信息

启动参数:

-Xmx:2048M -XX:MaxPermSize=512M -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ 
-Xloggc:gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M

运行参数:

CommandLine flags: -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=1055244992 -XX:MaxHeapSize=2147483648 -XX:MaxNewSize=715784192 -XX:MaxPermSize=536870912 -XX:MaxTenuringThreshold=6 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC

2.2 concurrent mode failure

// 发生在并发清理阶段
2020-03-27T13:22:11.866+0800: 11486649.782: [GC [1 CMS-initial-mark: 1398143K(1398144K)] 1987073K(2027264K), 0.5966640 secs] [Times: user=0.00 sys=0.59, real=0.60 secs]
2020-03-27T13:22:12.463+0800: 11486650.379: [CMS-concurrent-mark-start]
2020-03-27T13:22:14.721+0800: 11486652.637: [CMS-concurrent-mark: 2.258/2.258 secs] [Times: user=0.00 sys=6.93, real=2.25 secs]
2020-03-27T13:22:14.721+0800: 11486652.637: [CMS-concurrent-preclean-start]
2020-03-27T13:22:14.728+0800: 11486652.644: [CMS-concurrent-preclean: 0.007/0.007 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
2020-03-27T13:22:14.728+0800: 11486652.644: [CMS-concurrent-abortable-preclean-start]
2020-03-27T13:22:14.728+0800: 11486652.644: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2020-03-27T13:22:14.729+0800: 11486652.645: [GC[YG occupancy: 611935 K (629120 K)]2020-03-27T13:22:14.729+0800: 11486652.645: [Rescan (parallel) , 0.6998930 secs]2020-03-27T13:22:15
.429+0800: 11486653.345: [weak refs processing, 0.0000400 secs]2020-03-27T13:22:15.429+0800: 11486653.345: [scrub string table, 0.0009110 secs] [1 CMS-remark: 1398143K(1398144K)] 20
10079K(2027264K), 0.7009860 secs] [Times: user=0.00 sys=6.40, real=0.70 secs]
2020-03-27T13:22:15.431+0800: 11486653.347: [CMS-concurrent-sweep-start]
2020-03-27T13:22:15.869+0800: 11486653.785: [GC2020-03-27T13:22:15.869+0800: 11486653.785: [ParNew: 628890K->628890K(629120K), 0.0000370 secs]2020-03-27T13:22:15.869+0800: 11486653.
785: [CMS2020-03-27T13:22:16.181+0800: 11486654.097: [CMS-concurrent-sweep: 0.748/0.751 secs] [Times: user=0.00 sys=0.88, real=0.76 secs]
 (concurrent mode failure): 1398143K->1398143K(1398144K), 8.2583410 secs] 2027034K->1817078K(2027264K), [CMS Perm : 44073K->44073K(73644K)], 8.2586550 secs] [Times: user=0.00 sys=8.
26, real=8.26 secs]

// 发生在并发标记阶段
2020-03-27T12:47:25.590+0800: 11484563.506: [GC [1 CMS-initial-mark: 1398143K(1398144K)] 2018776K(2027264K), 0.6111070 secs] [Times: user=0.00 sys=0.61, real=0.61 secs]
2020-03-27T12:47:26.202+0800: 11484564.118: [CMS-concurrent-mark-start]
2020-03-27T12:47:26.300+0800: 11484564.216: [GC2020-03-27T12:47:26.300+0800: 11484564.216: [ParNew: 629119K->629119K(629120K), 0.0000380 secs]2020-03-27T12:47:26.300+0800: 11484564.
216: [CMS2020-03-27T12:47:28.468+0800: 11484566.384: [CMS-concurrent-mark: 2.263/2.266 secs] [Times: user=0.00 sys=6.83, real=2.27 secs]
 (concurrent mode failure): 1398143K->1398143K(1398144K), 10.3618040 secs] 2027263K->1815187K(2027264K), [CMS Perm : 44073K->44073K(73644K)], 10.3621370 secs] [Times: user=0.00 sys=14.69, real=10.36 secs]

concurrent mode failure(并发模式失败):CMS的目标就是在回收老年代对象的时候不要停止全部应用线程,在并发周期执行期间,用户的线程依然在运行,如果这时候如果有新的对象要申请放入老年代(1.新对象 2.新生代的晋升对象),就会触发这个错误,然后CMS会被Serial Old收集器代替 —— STW,停顿时间长并进行空间压缩(默认每次发生这种错误后都会进行空间压缩,也可通过参数CMSFullGCsBeforeCompaction设置每隔多少次不压缩的Full GC后,执行一次带压缩的Full GC)。

2.3 promotion failed

// 下述为网上摘抄日志,本人未遇见过
106.641: [GC 106.641: [ParNew (promotion failed): 14784K->14784K(14784K), 0.0370328 secs]106.678: [CMS106.715: [CMS-concurrent-mark: 0.065/0.103 secs] [Times: user=0.17 sys=0.00, real=0.11 secs]
(concurrent mode failure): 41568K->27787K(49152K), 0.2128504 secs] 52402K->27787K(63936K), [CMS Perm : 2086K->2086K(12288K)], 0.2499776 secs] [Times: user=0.28 sys=0.00, real=0.25 secs]

promotion failed(晋升失败):Minor GC时会有一部分对象(晋升/幸存区放不下)放入老年代,这时会出现如下可能:

1. 老年代能放得下。万事大吉。

2. 老年代放不下,promotion failed:(TODO)

  • a. 触发Full GC ;
  • b. 若此时老年代正处在CMS回收的并发阶段,则触发CMS的concurrent mode failure, 导致CMS foreground GC。

3 写在最后

JVM水太深了,网上找了很多资料,还是没有解决的我的一些疑问,文中存疑之处用TODO标示以待后续补充。

 

你可能感兴趣的:(jvm)