背景:公司线上服务器,FullGC耗时超过1秒报警。
2019-09-16T11:01:25.287+0800: 9566486.997: [GC (Allocation Failure) 2019-09-16T11:01:25.288+0800: 9566486.997: [ParNew: 1683966K->4564K(1887488K), 0.0238815 secs] 31513
45K->1472607K(3984640K), 0.0245668 secs] [Times: user=0.09 sys=0.00, real=0.02 secs]
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs]
2019-09-16T11:01:25.330+0800: 9566487.039: [CMS-concurrent-mark-start]
2019-09-16T11:01:26.007+0800: 9566487.716: [CMS-concurrent-mark: 0.666/0.677 secs] [Times: user=0.95 sys=0.09, real=0.67 secs]
2019-09-16T11:01:26.007+0800: 9566487.717: [CMS-concurrent-preclean-start]
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-preclean: 0.015/0.016 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]
2019-09-16T11:01:26.023+0800: 9566487.733: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2019-09-16T11:01:31.115+0800: 9566492.825: [CMS-concurrent-abortable-preclean: 2.679/5.092 secs] [Times: user=3.46 sys=0.32, real=5.09
secs]
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark)
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs]
2019-09-16T11:01:33.059+0800: 9566494.768: [CMS-concurrent-sweep-start]
2019-09-16T11:01:34.235+0800: 9566495.945: [CMS-concurrent-sweep: 1.056/1.176 secs] [Times: user=1.26 sys=0.00, real=1.18 secs]
2019-09-16T11:01:34.236+0800: 9566495.945: [CMS-concurrent-reset-start]
2019-09-16T11:01:34.259+0800: 9566495.968: [CMS-concurrent-reset: 0.023/0.023 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2019-09-16T11:03:37.270+0800: 9566618.979: [GC (Allocation Failure) 2019-09-16T11:03:37.270+0800: 9566618.979: [ParNew: 1681090K->4156K(1887488K), 0.0252135 secs] 25335
50K->856782K(3984640K), 0.0259454 secs] [Times: user=0.10 sys=0.00, real=0.02 secs]
CMS收集器,是基于“标记-清除”算法实现的,运作过程分为4个步骤,包括:
初始标记(CMS initial Mark)
并发标记(CMS concurrent Mark)
重新标记(CMS Remark)
并发清除(CMS concurrent sweep)
其中,初始标记、重新标记这两个步骤是“Stop The World”。
报警内容提示:FullGC 耗时时长 1953ms,根据以上日志,你可以计算出1953的数值么?可以先思考一下。
这里想先引申一个问题:jstat -gcutil
$jstat -gcutil 1837 1000
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.used substituted NaN
Warning: Unresolved Symbol: sun.gc.generation.2.space.0.capacity substituted NaN
S0 S1 E O P YGC YGCT FGC FGCT GCT
1.24 0.00 7.22 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 16.24 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 24.95 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 25.43 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 25.55 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 25.97 4.64 � 103180 3024.968 9 6.840 3031.809
1.24 0.00 26.37 4.64 � 103180 3024.968 9 6.840 3031.809
根据上图,我们可以看到FGC次数是9次,而我看应用启动之后的FullGC次数只有3次,执行jmap -dump:live,format=b,file=heap-dump.bin
dump文件引发的,与三次CMS。
最后查资料发现一个比较合理的说法,这边的FGC统计的是JVM STW的次数,而一次CMS会在Initial Mark 与Remark 有两次STW。所以最终的FGC = CMS*2 + FullGC-dump。
回归到上面的问题:CMS-FGC时长:GC日志中我们可以看到:
2019-09-16T11:01:25.316+0800: 9566487.025: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs] [Times: user=0.02 sys=0.0
0, real=0.02 secs]
其中
[1 CMS-initial-mark: 1468043K(2097152K)] 1473371K(3984640K), 0.0139600 secs]
意思就是,CMS Initial Mark 初始标记“Stop The World” 0.0139600 secs
初始化标记我们已经获得时间了,那么CMS remark呢。
2019-09-16T11:01:31.119+0800: 9566492.828: [GC (CMS Final Remark) [YG occupancy: 422286 K (1887488 K)]2019-09-16T11:01:31.119+0800: 9566492.829: [GC (CMS Final Remark)
2019-09-16T11:01:31.120+0800: 9566492.829: [ParNew: 422286K->3266K(1887488K), 0.0261946 secs] 1890329K->1471476K(3984640K), 0.0268209 secs] [Times: user=0.07 sys=0.00,
real=0.02 secs]
2019-09-16T11:01:31.146+0800: 9566492.856: [Rescan (parallel) , 0.0102439 secs]2019-09-16T11:01:31.157+0800: 9566492.866: [weak refs processing, 1.6619564 secs]2019-09-
16T11:01:32.819+0800: 9566494.528: [class unloading, 0.1607796 secs]2019-09-16T11:01:32.979+0800: 9566494.689: [scrub symbol table, 0.0317450 secs]2019-09-16T11:01:33.0
11+0800: 9566494.720: [scrub string table, 0.0030630 secs][1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs] [Times: user=2.01 sys=0.00, real=1.93 s
ecs]
这边我们看到[GC (CMS Final Remark)...
,但是这一行却没有标记耗时,这是为什么呢?
这边就涉及到CMS重新标记的设计了,简单说:
CMS-concurrent-abortable-preclean
。什么是CMS-concurrent-abortable-preclean
?(后期待补充)[GC (CMS Final Remark)...
仅仅只是代表了,我即将要做重新标记了,而真正重新标记是发生在[Rescan (parallel)
这一行中了。[1 CMS-remark: 1468209K(2097152K)] 1471476K(3984640K), 1.9392917 secs]
那么这里我们可以获取到,CMS Remark 重新标记“Stop The World” 1.9392917 secs
。
总计:CMS Initial Mark(0.0139600 secs) + CMS Remark(1.9392917 secs)=1953.2517ms