一次JVM调优实战

早上查看低峰期gc情况,发现昨晚上fgc有274次,感觉有些不正常,开始查找原因

[jump@order-binlog-data002 spring-boot]$ sudo jstat -gc 20028 4000 10
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
283328.0 283328.0 1036.4  0.0   2266752.0 589773.4 1700864.0   52111.5   41476.0 40494.4 4672.0 4507.7   5998  157.033  274    37.794  194.827
283328.0 283328.0 1036.4  0.0   2266752.0 1902667.1 1700864.0   52111.5   41476.0 40494.4 4672.0 4507.7   5998  157.033  274    37.794  194.827
283328.0 283328.0  0.0   906.8  2266752.0 1165571.6 1700864.0   52125.8   41476.0 40494.4 4672.0 4507.7   5999  157.037  274    37.794  194.831
283328.0 283328.0 588.3   0.0   2266752.0 419124.4 1700864.0   52222.6   41476.0 40494.4 4672.0 4507.7   6000  157.041  274    37.794  194.834
283328.0 283328.0 588.3   0.0   2266752.0 1858587.9 1700864.0   52222.6   41476.0 40494.4 4672.0 4507.7   6000  157.041  274    37.794  194.834
283328.0 283328.0  0.0   980.7  2266752.0 1015355.0 1700864.0   52311.3   41476.0 40494.4 4672.0 4507.7   6001  157.046  274    37.794  194.839
283328.0 283328.0 1075.6  0.0   2266752.0 376459.3 1700864.0   52369.0   41476.0 40494.4 4672.0 4507.7   6002  157.050  274    37.794  194.843
283328.0 283328.0 1075.6  0.0   2266752.0 1725419.9 1700864.0   52369.0   41476.0 40494.4 4672.0 4507.7   6002  157.050  274    37.794  194.843
283328.0 283328.0  0.0   650.2  2266752.0 993727.1 1700864.0   52471.9   41476.0 40494.4 4672.0 4507.7   6003  157.053  274    37.794  194.847
283328.0 283328.0 1402.2  0.0   2266752.0 203814.7 1700864.0   52553.9   41476.0 40494.4 4672.0 4507.7   6004  157.058  274    37.794  194.851

中午高峰期查看,老年代一直在增长,每4秒增长大概1M

[jump@order-binlog-data002 spring-boot]$ sudo jstat -gc 20028 4000 10
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
283328.0 283328.0 3849.7  0.0   2266752.0 510658.3 1700864.0   80623.6   41476.0 40494.4 4672.0 4507.7   6396  158.987  274    37.794  196.780
283328.0 283328.0  0.0   2788.4 2266752.0 477917.0 1700864.0   80689.9   41476.0 40494.4 4672.0 4507.7   6397  158.991  274    37.794  196.784
283328.0 283328.0 1931.9  0.0   2266752.0 474958.2 1700864.0   80791.0   41476.0 40494.4 4672.0 4507.7   6398  158.995  274    37.794  196.788
283328.0 283328.0  0.0   2145.2 2266752.0 483570.1 1700864.0   80877.9   41476.0 40494.4 4672.0 4507.7   6399  158.999  274    37.794  196.793
283328.0 283328.0 2864.5  0.0   2266752.0 638776.6 1700864.0   80953.9   41476.0 40494.4 4672.0 4507.7   6400  159.004  274    37.794  196.797
283328.0 283328.0  0.0   1856.1 2266752.0 880657.2 1700864.0   81013.6   41476.0 40494.4 4672.0 4507.7   6401  159.009  274    37.794  196.802
283328.0 283328.0 4210.8  0.0   2266752.0 894710.5 1700864.0   81099.8   41476.0 40494.4 4672.0 4507.7   6402  159.014  274    37.794  196.808
283328.0 283328.0  0.0   2900.8 2266752.0 960077.2 1700864.0   81210.0   41476.0 40494.4 4672.0 4507.7   6403  159.018  274    37.794  196.812
283328.0 283328.0 1813.5  0.0   2266752.0 1064101.3 1700864.0   81266.9   41476.0 40494.4 4672.0 4507.7   6404  159.023  274    37.794  196.817
283328.0 283328.0  0.0   1498.7 2266752.0 1126030.4 1700864.0   81323.5   41476.0 40494.4 4672.0 4507.7   6405  159.028  274    37.794  196.822

查看昨晚gc日志发现,age1和age2表示新生代年龄为1和2, 但是有个地方只有age1没有age2,紧接着就开始CMS收集,感觉有些线索。首先简单介绍一些该应用,大多数是对象都是朝生夕死的对象,不太可能进入老年代。开始怀疑是年龄晋升阈值太小,只配置了2。本该在新生代回收的对象,由于没有及时回收,从age1直接到了老年代。

2017-07-12T20:54:06.586+0800: 80.309: [GC (Allocation Failure) 2017-07-12T20:54:06.586+0800: 80.309: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age   1:   33586544 bytes,   33586544 total
- age   2:   35816528 bytes,   69403072 total
: 2355435K->78957K(2550080K), 0.0317835 secs] 3532169K->1284299K(4250944K), 0.0319476 secs] [Times: user=0.12 sys=0.00, real=0.03 secs]
2017-07-12T20:54:06.618+0800: 80.341: Total time for which application threads were stopped: 0.0325909 seconds, Stopping threads took: 0.0001534 seconds
2017-07-12T20:54:07.620+0800: 81.342: Total time for which application threads were stopped: 0.0006056 seconds, Stopping threads took: 0.0001820 seconds
2017-07-12T20:54:08.384+0800: 82.107: [GC (Allocation Failure) 2017-07-12T20:54:08.384+0800: 82.107: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age   1:   56321632 bytes,   56321632 total
- age   2:   31853384 bytes,   88175016 total
: 2345709K->116591K(2550080K), 0.0398190 secs] 3551051K->1356910K(4250944K), 0.0399864 secs] [Times: user=0.14 sys=0.00, real=0.04 secs]
2017-07-12T20:54:08.424+0800: 82.147: Total time for which application threads were stopped: 0.0406831 seconds, Stopping threads took: 0.0001112 seconds
2017-07-12T20:54:10.085+0800: 83.808: [GC (Allocation Failure) 2017-07-12T20:54:10.085+0800: 83.808: [ParNew
Desired survivor size 145063936 bytes, new threshold 1 (max 2)
- age   1:  190948400 bytes,  190948400 total
- age   2:   54471392 bytes,  245419792 total
: 2383343K->264021K(2550080K), 0.0715712 secs] 3623662K->1535443K(4250944K), 0.0717427 secs] [Times: user=0.27 sys=0.00, real=0.07 secs]
2017-07-12T20:54:10.157+0800: 83.880: Total time for which application threads were stopped: 0.0724583 seconds, Stopping threads took: 0.0001708 seconds
2017-07-12T20:54:11.595+0800: 85.318: [GC (Allocation Failure) 2017-07-12T20:54:11.595+0800: 85.318: [ParNew
Desired survivor size 145063936 bytes, new threshold 1 (max 2)
- age   1:  188703680 bytes,  188703680 total
: 2530773K->250075K(2550080K), 0.1668482 secs] 3802195K->1753036K(4250944K), 0.1670176 secs] [Times: user=0.52 sys=0.04, real=0.17 secs]
2017-07-12T20:54:11.762+0800: 85.485: Total time for which application threads were stopped: 0.1676754 seconds, Stopping threads took: 0.0000755 seconds
2017-07-12T20:54:11.765+0800: 85.488: [GC (CMS Initial Mark) [1 CMS-initial-mark: 1502960K(1700864K)] 1771515K(4250944K), 0.0158729 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
2017-07-12T20:54:11.781+0800: 85.504: Total time for which application threads were stopped: 0.0167987 seconds, Stopping threads took: 0.0002961 seconds
2017-07-12T20:54:11.781+0800: 85.504: [CMS-concurrent-mark-start]
2017-07-12T20:54:12.050+0800: 85.773: [CMS-concurrent-mark: 0.269/0.269 secs] [Times: user=0.70 sys=0.11, real=0.26 secs]
2017-07-12T20:54:12.050+0800: 85.773: [CMS-concurrent-preclean-start]
2017-07-12T20:54:12.055+0800: 85.778: [CMS-concurrent-preclean: 0.005/0.005 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2017-07-12T20:54:12.055+0800: 85.778: [CMS-concurrent-abortable-preclean-start]
2017-07-12T20:54:12.055+0800: 85.778: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2017-07-12T20:54:12.056+0800: 85.778: [GC (CMS Final Remark) [YG occupancy: 631012 K (2550080 K)]2017-07-12T20:54:12.056+0800: 85.778: [Rescan (parallel) , 0.0752694 secs]2017-07-12T20:54:12.131+0800: 85.854: [weak refs processing, 0.0000552 secs]2017-07-12T20:54:12.131+0800: 85.854: [class unloading, 0.0092330 secs]2017-07-12T20:54:12.140+0800: 85.863: [scrub symbol table, 0.0050382 secs]2017-07-12T20:54:12.145+0800: 85.868: [scrub string table, 0.0010191 secs][1 CMS-remark: 1502960K(1700864K)] 2133973K(4250944K), 0.0924674 secs] [Times: user=0.31 sys=0.00, real=0.09 secs]
2017-07-12T20:54:12.148+0800: 85.871: Total time for which application threads were stopped: 0.0930845 seconds, Stopping threads took: 0.0001297 seconds
2017-07-12T20:54:12.148+0800: 85.871: [CMS-concurrent-sweep-start]
2017-07-12T20:54:12.612+0800: 86.335: [CMS-concurrent-sweep: 0.464/0.464 secs] [Times: user=1.21 sys=0.26, real=0.47 secs]
2017-07-12T20:54:12.612+0800: 86.335: [CMS-concurrent-reset-start]
2017-07-12T20:54:12.617+0800: 86.339: [CMS-concurrent-reset: 0.005/0.005 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2017-07-12T20:54:13.595+0800: 87.318: [GC (Allocation Failure) 2017-07-12T20:54:13.595+0800: 87.318: [ParNew
Desired survivor size 145063936 bytes, new threshold 1 (max 2)
- age   1:  158235824 bytes,  158235824 total
: 2516827K->235282K(2550080K), 0.1279312 secs] 3152007K->1049527K(4250944K), 0.1281230 secs] [Times: user=0.43 sys=0.03, real=0.13 secs]
2017-07-12T20:54:13.723+0800: 87.446: Total time for which application threads were stopped: 0.1288954 seconds, Stopping threads took: 0.0001631 seconds
2017-07-12T20:54:15.493+0800: 89.216: [GC (Allocation Failure) 2017-07-12T20:54:15.493+0800: 89.216: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age   1:   82668368 bytes,   82668368 total
: 2502034K->203689K(2550080K), 0.1076523 secs] 3316279K->1170879K(4250944K), 0.1078419 secs] [Times: user=0.35 sys=0.02, real=0.10 secs]
2017-07-12T20:54:15.601+0800: 89.324: Total time for which application threads were stopped: 0.1085887 seconds, Stopping threads took: 0.0001270 seconds
2017-07-12T20:54:17.307+0800: 91.029: [GC (Allocation Failure) 2017-07-12T20:54:17.307+0800: 91.029: [ParNew
Desired survivor size 145063936 bytes, new threshold 2 (max 2)
- age   1:   77960664 bytes,   77960664 total
- age   2:   65094536 bytes,  143055200 total
: 2470441K->182761K(2550080K), 0.0338754 secs] 3437631K->1149951K(4250944K), 0.0340508 secs] [Times: user=0.13 sys=0.00, real=0.03 secs]
2017-07-12T20:54:17.341+0800: 91.064: Total time for which application threads were stopped: 0.0348178 seconds, Stopping threads took: 0.0001475 seconds

找到原因后,就开始做调整。首先为了尽可能把对象留到新生代回收,需要提高Survivor的利用率。有个TargetSurvivorRatio参数,默认是50,表示利用率超过50%就将对象送入老年代。还有对象晋升阈值MaxTenuringThreshold,调整为最大。考虑到机器内存为8G,而运维默认设置最大堆为4.4G,还有可利用空间,即将最大堆调整为6G。

-Xmx6000m
-Xms6000m
-Xmn4000m
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=15

调整后gc后,第二天早上查看gc情况,发现昨晚上没有一次fgc,而且老年代增长非常缓慢,虽然不能完全避免fgc,但是已经将fgc减少,并且延后。

sudo jstat -gc 31703 5000 10
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
350976.0 350976.0  0.0   1974.1 2808320.0 686899.3 2048000.0   85810.0   41856.0 41050.9 4736.0 4544.5   4017   40.907   0      0.000   40.907
350976.0 350976.0 1834.3  0.0   2808320.0 623932.5 2048000.0   85816.7   41856.0 41050.9 4736.0 4544.5   4018   40.916   0      0.000   40.916
350976.0 350976.0  0.0   2378.6 2808320.0 426285.4 2048000.0   85832.0   41856.0 41050.9 4736.0 4544.5   4019   40.926   0      0.000   40.926
350976.0 350976.0 2783.9  0.0   2808320.0 556281.0 2048000.0   85858.5   41856.0 41050.9 4736.0 4544.5   4020   40.935   0      0.000   40.935
350976.0 350976.0  0.0   2848.8 2808320.0 609204.5 2048000.0   85893.7   41856.0 41050.9 4736.0 4544.5   4021   40.944   0      0.000   40.944
350976.0 350976.0 2076.1  0.0   2808320.0 735427.5 2048000.0   85989.6   41856.0 41050.9 4736.0 4544.5   4022   40.953   0      0.000   40.953
350976.0 350976.0  0.0   2687.3 2808320.0 685107.7 2048000.0   86028.1   41856.0 41050.9 4736.0 4544.5   4023   40.963   0      0.000   40.963
350976.0 350976.0 3439.6  0.0   2808320.0 507868.2 2048000.0   86074.3   41856.0 41050.9 4736.0 4544.5   4024   40.972   0      0.000   40.972
350976.0 350976.0  0.0   2130.9 2808320.0 950108.9 2048000.0   86145.5   41856.0 41050.9 4736.0 4544.5   4025   40.982   0      0.000   40.982
350976.0 350976.0 3430.0  0.0   2808320.0 917396.6 2048000.0   86145.5   41856.0 41050.9 4736.0 4544.5   4026   40.992   0      0.000   40.992

你可能感兴趣的:(JVM)