早上查看低峰期gc情况,发现昨晚上fgc有274次,感觉有些不正常,开始查找原因
[jump@order-binlog-data002 spring-boot]$ sudo jstat -gc 20028 4000 10
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
283328.0 283328.0 1036.4 0.0 2266752.0 589773.4 1700864.0 52111.5 41476.0 40494.4 4672.0 4507.7 5998 157.033 274 37.794 194.827
283328.0 283328.0 1036.4 0.0 2266752.0 1902667.1 1700864.0 52111.5 41476.0 40494.4 4672.0 4507.7 5998 157.033 274 37.794 194.827
283328.0 283328.0 0.0 906.8 2266752.0 1165571.6 1700864.0 52125.8 41476.0 40494.4 4672.0 4507.7 5999 157.037 274 37.794 194.831
283328.0 283328.0 588.3 0.0 2266752.0 419124.4 1700864.0 52222.6 41476.0 40494.4 4672.0 4507.7 6000 157.041 274 37.794 194.834
283328.0 283328.0 588.3 0.0 2266752.0 1858587.9 1700864.0 52222.6 41476.0 40494.4 4672.0 4507.7 6000 157.041 274 37.794 194.834
283328.0 283328.0 0.0 980.7 2266752.0 1015355.0 1700864.0 52311.3 41476.0 40494.4 4672.0 4507.7 6001 157.046 274 37.794 194.839
283328.0 283328.0 1075.6 0.0 2266752.0 376459.3 1700864.0 52369.0 41476.0 40494.4 4672.0 4507.7 6002 157.050 274 37.794 194.843
283328.0 283328.0 1075.6 0.0 2266752.0 1725419.9 1700864.0 52369.0 41476.0 40494.4 4672.0 4507.7 6002 157.050 274 37.794 194.843
283328.0 283328.0 0.0 650.2 2266752.0 993727.1 1700864.0 52471.9 41476.0 40494.4 4672.0 4507.7 6003 157.053 274 37.794 194.847
283328.0 283328.0 1402.2 0.0 2266752.0 203814.7 1700864.0 52553.9 41476.0 40494.4 4672.0 4507.7 6004 157.058 274 37.794 194.851
中午高峰期查看,老年代一直在增长,每4秒增长大概1M
[jump@order-binlog-data002 spring-boot]$ sudo jstat -gc 20028 4000 10
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
283328.0 283328.0 3849.7 0.0 2266752.0 510658.3 1700864.0 80623.6 41476.0 40494.4 4672.0 4507.7 6396 158.987 274 37.794 196.780
283328.0 283328.0 0.0 2788.4 2266752.0 477917.0 1700864.0 80689.9 41476.0 40494.4 4672.0 4507.7 6397 158.991 274 37.794 196.784
283328.0 283328.0 1931.9 0.0 2266752.0 474958.2 1700864.0 80791.0 41476.0 40494.4 4672.0 4507.7 6398 158.995 274 37.794 196.788
283328.0 283328.0 0.0 2145.2 2266752.0 483570.1 1700864.0 80877.9 41476.0 40494.4 4672.0 4507.7 6399 158.999 274 37.794 196.793
283328.0 283328.0 2864.5 0.0 2266752.0 638776.6 1700864.0 80953.9 41476.0 40494.4 4672.0 4507.7 6400 159.004 274 37.794 196.797
283328.0 283328.0 0.0 1856.1 2266752.0 880657.2 1700864.0 81013.6 41476.0 40494.4 4672.0 4507.7 6401 159.009 274 37.794 196.802
283328.0 283328.0 4210.8 0.0 2266752.0 894710.5 1700864.0 81099.8 41476.0 40494.4 4672.0 4507.7 6402 159.014 274 37.794 196.808
283328.0 283328.0 0.0 2900.8 2266752.0 960077.2 1700864.0 81210.0 41476.0 40494.4 4672.0 4507.7 6403 159.018 274 37.794 196.812
283328.0 283328.0 1813.5 0.0 2266752.0 1064101.3 1700864.0 81266.9 41476.0 40494.4 4672.0 4507.7 6404 159.023 274 37.794 196.817
283328.0 283328.0 0.0 1498.7 2266752.0 1126030.4 1700864.0 81323.5 41476.0 40494.4 4672.0 4507.7 6405 159.028 274 37.794 196.822
查看昨晚gc日志发现,age1和age2表示新生代年龄为1和2, 但是有个地方只有age1没有age2,紧接着就开始CMS收集,感觉有些线索。首先简单介绍一些该应用,大多数是对象都是朝生夕死的对象,不太可能进入老年代。开始怀疑是年龄晋升阈值太小,只配置了2。本该在新生代回收的对象,由于没有及时回收,从age1直接到了老年代。
2017-07-12T20:54:06.586+0800: 80.309: [GC (Allocation Failure) 2017-07-12T20:54:06.586+0800: 80.309: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age 1: 33586544 bytes, 33586544 total
- age 2: 35816528 bytes, 69403072 total
: 2355435K->78957K(2550080K), 0.0317835 secs] 3532169K->1284299K(4250944K), 0.0319476 secs] [Times: user=0.12 sys=0.00, real=0.03 secs]
2017-07-12T20:54:06.618+0800: 80.341: Total time for which application threads were stopped: 0.0325909 seconds, Stopping threads took: 0.0001534 seconds
2017-07-12T20:54:07.620+0800: 81.342: Total time for which application threads were stopped: 0.0006056 seconds, Stopping threads took: 0.0001820 seconds
2017-07-12T20:54:08.384+0800: 82.107: [GC (Allocation Failure) 2017-07-12T20:54:08.384+0800: 82.107: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age 1: 56321632 bytes, 56321632 total
- age 2: 31853384 bytes, 88175016 total
: 2345709K->116591K(2550080K), 0.0398190 secs] 3551051K->1356910K(4250944K), 0.0399864 secs] [Times: user=0.14 sys=0.00, real=0.04 secs]
2017-07-12T20:54:08.424+0800: 82.147: Total time for which application threads were stopped: 0.0406831 seconds, Stopping threads took: 0.0001112 seconds
2017-07-12T20:54:10.085+0800: 83.808: [GC (Allocation Failure) 2017-07-12T20:54:10.085+0800: 83.808: [ParNew
Desired survivor size 145063936 bytes, new threshold 1 (max 2)
- age 1: 190948400 bytes, 190948400 total
- age 2: 54471392 bytes, 245419792 total
: 2383343K->264021K(2550080K), 0.0715712 secs] 3623662K->1535443K(4250944K), 0.0717427 secs] [Times: user=0.27 sys=0.00, real=0.07 secs]
2017-07-12T20:54:10.157+0800: 83.880: Total time for which application threads were stopped: 0.0724583 seconds, Stopping threads took: 0.0001708 seconds
2017-07-12T20:54:11.595+0800: 85.318: [GC (Allocation Failure) 2017-07-12T20:54:11.595+0800: 85.318: [ParNew
Desired survivor size 145063936 bytes, new threshold 1 (max 2)
- age 1: 188703680 bytes, 188703680 total
: 2530773K->250075K(2550080K), 0.1668482 secs] 3802195K->1753036K(4250944K), 0.1670176 secs] [Times: user=0.52 sys=0.04, real=0.17 secs]
2017-07-12T20:54:11.762+0800: 85.485: Total time for which application threads were stopped: 0.1676754 seconds, Stopping threads took: 0.0000755 seconds
2017-07-12T20:54:11.765+0800: 85.488: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1502960K(1700864K)] 1771515K(4250944K), 0.0158729 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
2017-07-12T20:54:11.781+0800: 85.504: Total time for which application threads were stopped: 0.0167987 seconds, Stopping threads took: 0.0002961 seconds
2017-07-12T20:54:11.781+0800: 85.504: [CMS-concurrent-mark-start]
2017-07-12T20:54:12.050+0800: 85.773: [CMS-concurrent-mark: 0.269/0.269 secs] [Times: user=0.70 sys=0.11, real=0.26 secs]
2017-07-12T20:54:12.050+0800: 85.773: [CMS-concurrent-preclean-start]
2017-07-12T20:54:12.055+0800: 85.778: [CMS-concurrent-preclean: 0.005/0.005 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2017-07-12T20:54:12.055+0800: 85.778: [CMS-concurrent-abortable-preclean-start]
2017-07-12T20:54:12.055+0800: 85.778: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2017-07-12T20:54:12.056+0800: 85.778: [GC (CMS Final Remark) [YG occupancy: 631012 K (2550080 K)]2017-07-12T20:54:12.056+0800: 85.778: [Rescan (parallel) , 0.0752694 secs]2017-07-12T20:54:12.131+0800: 85.854: [weak refs processing, 0.0000552 secs]2017-07-12T20:54:12.131+0800: 85.854: [class unloading, 0.0092330 secs]2017-07-12T20:54:12.140+0800: 85.863: [scrub symbol table, 0.0050382 secs]2017-07-12T20:54:12.145+0800: 85.868: [scrub string table, 0.0010191 secs][1 CMS-remark: 1502960K(1700864K)] 2133973K(4250944K), 0.0924674 secs] [Times: user=0.31 sys=0.00, real=0.09 secs]
2017-07-12T20:54:12.148+0800: 85.871: Total time for which application threads were stopped: 0.0930845 seconds, Stopping threads took: 0.0001297 seconds
2017-07-12T20:54:12.148+0800: 85.871: [CMS-concurrent-sweep-start]
2017-07-12T20:54:12.612+0800: 86.335: [CMS-concurrent-sweep: 0.464/0.464 secs] [Times: user=1.21 sys=0.26, real=0.47 secs]
2017-07-12T20:54:12.612+0800: 86.335: [CMS-concurrent-reset-start]
2017-07-12T20:54:12.617+0800: 86.339: [CMS-concurrent-reset: 0.005/0.005 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2017-07-12T20:54:13.595+0800: 87.318: [GC (Allocation Failure) 2017-07-12T20:54:13.595+0800: 87.318: [ParNew
Desired survivor size 145063936 bytes, new threshold 1 (max 2)
- age 1: 158235824 bytes, 158235824 total
: 2516827K->235282K(2550080K), 0.1279312 secs] 3152007K->1049527K(4250944K), 0.1281230 secs] [Times: user=0.43 sys=0.03, real=0.13 secs]
2017-07-12T20:54:13.723+0800: 87.446: Total time for which application threads were stopped: 0.1288954 seconds, Stopping threads took: 0.0001631 seconds
2017-07-12T20:54:15.493+0800: 89.216: [GC (Allocation Failure) 2017-07-12T20:54:15.493+0800: 89.216: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age 1: 82668368 bytes, 82668368 total
: 2502034K->203689K(2550080K), 0.1076523 secs] 3316279K->1170879K(4250944K), 0.1078419 secs] [Times: user=0.35 sys=0.02, real=0.10 secs]
2017-07-12T20:54:15.601+0800: 89.324: Total time for which application threads were stopped: 0.1085887 seconds, Stopping threads took: 0.0001270 seconds
2017-07-12T20:54:17.307+0800: 91.029: [GC (Allocation Failure) 2017-07-12T20:54:17.307+0800: 91.029: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age 1: 77960664 bytes, 77960664 total
- age 2: 65094536 bytes, 143055200 total
: 2470441K->182761K(2550080K), 0.0338754 secs] 3437631K->1149951K(4250944K), 0.0340508 secs] [Times: user=0.13 sys=0.00, real=0.03 secs]
2017-07-12T20:54:17.341+0800: 91.064: Total time for which application threads were stopped: 0.0348178 seconds, Stopping threads took: 0.0001475 seconds
找到原因后,就开始做调整。首先为了尽可能把对象留到新生代回收,需要提高Survivor的利用率。有个TargetSurvivorRatio参数,默认是50,表示利用率超过50%就将对象送入老年代。还有对象晋升阈值MaxTenuringThreshold,调整为最大。考虑到机器内存为8G,而运维默认设置最大堆为4.4G,还有可利用空间,即将最大堆调整为6G。
-Xmx6000m
-Xms6000m
-Xmn4000m
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=15
调整后gc后,第二天早上查看gc情况,发现昨晚上没有一次fgc,而且老年代增长非常缓慢,虽然不能完全避免fgc,但是已经将fgc减少,并且延后。
sudo jstat -gc 31703 5000 10
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
350976.0 350976.0 0.0 1974.1 2808320.0 686899.3 2048000.0 85810.0 41856.0 41050.9 4736.0 4544.5 4017 40.907 0 0.000 40.907
350976.0 350976.0 1834.3 0.0 2808320.0 623932.5 2048000.0 85816.7 41856.0 41050.9 4736.0 4544.5 4018 40.916 0 0.000 40.916
350976.0 350976.0 0.0 2378.6 2808320.0 426285.4 2048000.0 85832.0 41856.0 41050.9 4736.0 4544.5 4019 40.926 0 0.000 40.926
350976.0 350976.0 2783.9 0.0 2808320.0 556281.0 2048000.0 85858.5 41856.0 41050.9 4736.0 4544.5 4020 40.935 0 0.000 40.935
350976.0 350976.0 0.0 2848.8 2808320.0 609204.5 2048000.0 85893.7 41856.0 41050.9 4736.0 4544.5 4021 40.944 0 0.000 40.944
350976.0 350976.0 2076.1 0.0 2808320.0 735427.5 2048000.0 85989.6 41856.0 41050.9 4736.0 4544.5 4022 40.953 0 0.000 40.953
350976.0 350976.0 0.0 2687.3 2808320.0 685107.7 2048000.0 86028.1 41856.0 41050.9 4736.0 4544.5 4023 40.963 0 0.000 40.963
350976.0 350976.0 3439.6 0.0 2808320.0 507868.2 2048000.0 86074.3 41856.0 41050.9 4736.0 4544.5 4024 40.972 0 0.000 40.972
350976.0 350976.0 0.0 2130.9 2808320.0 950108.9 2048000.0 86145.5 41856.0 41050.9 4736.0 4544.5 4025 40.982 0 0.000 40.982
350976.0 350976.0 3430.0 0.0 2808320.0 917396.6 2048000.0 86145.5 41856.0 41050.9 4736.0 4544.5 4026 40.992 0 0.000 40.992