记一次JVM调优Tuning

面临的问题:年老带回收过于频繁,消耗时间太多。

运行环境:
(1)发送标签速度:平均 20672/s
(2)运行时间60665.993秒,没有发生OOME
(3)JVM选项    %JAVA_OPTS% -verbose:gc -Xms1200M -Xmx1200M -Xss256k -XX:PermSize=64m -XX:MaxPermSize=64m -XX:NewSize=750m -XX:SurvivorRatio=3     -XX:+UseParNewGC  -XX:ParallelGCThreads=6 -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=5 -XX:CMSInitiatingOccupancyFraction=14 -XX:+CMSParallelRemarkEnabled -XX:CMSMaxAbortablePrecleanTime=5  -XX:+PrintGCDetails  -XX:+PrintGCTimeStamps -Xloggc:D:/gc.log
     其中主要调节-XX:CMSInitiatingOccupancyFraction=14 参数,该参数表示年老带的空间使用率达到14%的时候,触发CMS并发GC。
(4)使用jstat记录内存堆的使用情况,其中S0,S1表示两个救助代,E表示Eden代使用率,O表示年老代使用率,P表示持久代使用率,YGC表示年轻带GC次数,YGCT表示GC消耗的时间,FGC表示年老带GC次数,FGCT表示年老带GC耗时,GCT表示整个GC时间:
     分析下面的记录片段,发现GC的时间占到整个程序运行时间的大约1/3。(20744.220秒  / 60665.993秒),平均一个CMS周期耗时为0.495秒(18505.763 / 39436)
     jstat使用方式:jstat -gcutil 924 2000 -1 > D:/jstat-log.log

    首先使用jps查看java进程,bootstrap表示tomcat的进程


     其中 -gcutil 选项表示使用gc工具;
         924表示pid
         2000 表示每隔两秒采集一次数据;
         -1 表示采集次数为无限;如果是2表示采集两次就结束
         D:/jstat-log.log 将回收情况保存到日志文件

  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
 47.57   0.00  82.77  19.11  54.85  15146 2238.076 39428 18502.751 20740.827
  0.00  32.69  41.89  19.11  54.85  15147 2238.142 39430 18503.019 20741.161
  0.00  32.69  88.88  19.11  54.85  15147 2238.142 39431 18503.019 20741.161
 19.37   0.00  36.85  19.11  54.85  15148 2238.215 39432 18503.988 20742.203
 19.37   0.00  82.03  19.11  54.85  15148 2238.215 39433 18503.988 20742.203
  0.00  23.00  33.39  19.11  54.85  15149 2238.324 39434 18504.830 20743.154
  0.00  23.00  77.48  19.11  54.85  15149 2238.324 39435 18504.830 20743.154
 31.79   0.00  36.53  19.11  54.85  15150 2238.457 39436 18505.763 20744.220 

     分析完整的内存堆使用情况(jstat-log.log)文件,,发现第四列(O)即年老带的使用率持续缓慢增加,从最初12.30增长到19.11%


具体片段分析:
(1)CMS触发时内存情况
  0.00  74.37  27.41  13.25  54.73   1173  185.404     2    0.166  185.570
  0.00  74.37  92.32  13.25  54.73   1173  185.404     2    0.166  185.570
 84.84  74.37 100.00  13.81  54.73   1174  185.404     2    0.166  185.570
 84.84  74.37 100.00  14.13  54.73   1174  185.404     2    0.166  185.570         -- 触发CMS
 84.84  74.37 100.00  14.38  54.73   1174  185.404     2    0.166  185.570
 84.84  74.37 100.00  14.69  54.73   1174  185.404     2    0.166  185.570
 84.84  74.37 100.00  15.43  54.73   1174  185.404     2    0.166  185.570
 84.84   0.00  11.00  19.19  54.73   1174  196.874     3    0.166  197.040          -- 一个CMS周期结束
 84.84  56.35 100.00  19.19  54.73   1175  196.874     3    0.868  197.742         -- 发现19.19 > 14,所以启动下一个CMS周期
 41.69  56.35 100.00  19.19  54.73   1176  197.183     3    0.868  198.051
 41.69   0.00  70.78  19.19  54.73   1176  197.356     3    0.868  198.224
  0.00  31.00  38.02  19.19  54.73   1177  197.447     3    0.868  198.315
  0.00  31.00  77.39  19.19  54.74   1177  197.447     3    0.868  198.315
 44.37   0.00  42.66  19.19  54.74   1178  197.697     3    0.868  198.565
  0.00  30.15   9.60  19.19  54.74   1179  197.812     3    0.868  198.680
  0.00  30.15  54.19  19.19  54.74   1179  197.812     3    0.868  198.680
 60.02   0.00  18.90  19.19  54.74   1180  198.070     3    0.868  198.939
 60.02   0.00  80.33  19.19  54.74   1180  198.070     3    0.868  198.939
  0.00  41.43  46.76  15.82  54.74   1181  198.235     4    0.922  199.157         ---- 一个CMS周期结束,但是只是15.82%大于14%,应该启动下一周期的CMS,为什么没有启动?(可             能是jstat采集数据的时候先采集Memory,只是CMS和采集线程并发运行,等jstat采集CMS次数的             时候,CMS已经运行结束,年老代Memory使用率降到12.06%,不会触发洗一个周期的CMS,想要精                      确的查看触发CMS时刻的内存使用情况,需要使用参数-XX:+PrintHeapAtGC)
  0.00  41.43  83.59  12.06  54.74   1181  198.235     4    0.922  199.157
 54.82   0.00  55.98  12.06  54.74   1182  198.438     4    0.922  199.360
  0.00  34.83  21.68  12.06  54.74   1183  198.516     4    0.922  199.438
  0.00  34.83  59.59  12.06  54.74   1183  198.516     4    0.922  199.438

(2)耗时较短的CMS
 0.00  47.35  67.41  13.02  54.76   2171  347.986     4    0.922  348.908
34.71   0.00  21.75  13.02  54.76   2172  348.100     4    0.922  349.022
34.71   0.00  57.84  13.02  54.76   2172  348.100     4    0.922  349.022
69.11   0.00  24.09  14.07  54.76   2174  348.596     5    1.069  349.665
69.11   0.00  92.17  13.49  54.76   2174  348.596     6    1.436  350.032
 0.00  48.73  48.51  13.49  54.76   2175  348.648     6    1.436  350.085
 0.00  48.73  88.45  13.49  54.76   2175  348.648     6    1.436  350.085

(3)在第26次CMS回收后,年老代的内存使用率持续稳定在14%以上,并且年老代内存使用率缓慢增加,不再降到14%一下,之后频繁发生年老代的CMS;开始的时候,一个周期的CMS耗时很短,前26次CMS一共耗时6.868秒,大于一个周期耗时0.25秒。
47.69   0.00  53.94  13.99  54.79   4312  675.534    24    6.727  682.260
 0.00  32.54  18.17  13.99  54.79   4313  675.648    24    6.727  682.375
 0.00  32.54  54.49  13.99  54.79   4313  675.648    24    6.727  682.375
64.39   0.00  21.83  13.99  54.79   4314  675.943    24    6.727  682.669
64.39   0.00  87.91  13.99  54.79   4314  675.943    24    6.727  682.669
 0.00  44.28  44.28  14.00  54.79   4315  676.007    26    6.868  682.875
 0.00  44.28  80.35  14.00  54.79   4315  676.007    26    6.868  682.875
42.05   0.00  50.68  14.00  54.79   4316  676.233    26    6.868  683.101
 0.00  31.35  15.79  14.00  54.79   4317  676.339    26    6.868  683.207
 0.00  31.35  72.39  14.00  54.79   4317  676.339    26    6.868  683.207
29.75   0.00   8.40  14.00  54.79   4318  676.456    26    6.868  683.324
29.75   0.00  89.56  14.00  54.79   4318  676.456    26    6.868  683.324
 0.00  21.60  49.25  14.00  54.79   4319  676.527    26    6.868  683.395
 0.00  21.60  86.30  14.00  54.79   4319  676.527    26    6.868  683.395
42.39   0.00  63.19  14.00  54.79   4320  676.719    26    6.868  683.587
 0.00  29.34  27.14  14.00  54.79   4321  676.808    26    6.868  683.676
 0.00  29.34  63.19  14.00  54.79   4321  676.808    26    6.868  683.676
52.81   0.00  35.18  14.00  54.79   4322  677.050    26    6.868  683.917
52.81  38.92 100.00  14.00  54.79   4323  677.050    26    6.868  683.917
 0.00  38.92  55.79  14.00  54.79   4323  677.229    26    6.868  684.097
 0.00  38.92  92.30  14.00  54.79   4323  677.229    26    6.868  684.097
52.62   0.00  69.99  14.00  54.79   4324  677.442    26    6.868  684.310
 0.00  33.20  30.90  14.00  54.79   4325  677.534    26    6.868  684.401
 0.00  33.20  67.46  14.00  54.79   4325  677.534    26    6.868  684.401
51.85   0.00  37.77  14.00  54.79   4326  677.802    26    6.868  684.670
51.85  37.41 100.00  14.00  54.79   4327  677.802    26    6.868  684.670
 0.00  37.41  44.45  14.00  54.79   4327  677.977    26    6.868  684.845
72.32   0.00  12.53  14.00  54.79   4328  678.284    26    6.868  685.152
72.32   0.00  79.09  14.00  54.79   4328  678.284    26    6.868  685.152
 0.00  49.42  34.68  14.00  54.79   4329  678.366    28    7.124  685.489
 0.00  49.42  75.61  14.00  54.79   4329  678.366    30    7.962  686.328       --  2个CMS周期 (7.962 - 7.124) = 0.838秒
 0.00  49.42  97.37  14.00  54.79   4329  678.366    32    8.687  687.053       --  2个CMS周期 (8.687 - 7.962) = 0.725秒
20.95   0.00  47.42  14.00  54.79   4330  678.452    33    9.855  688.308       --  1个CMS周期 (9.855 - 8.867) = 0.988秒
20.95   0.00  91.41  14.00  54.79   4330  678.452    35   11.022  689.475
 0.00  15.21  42.67  14.00  54.79   4331  678.512    38   11.272  689.784
 0.00  15.21  84.79  14.00  54.79   4331  678.512    39   12.278  690.790
25.53   0.00  44.02  14.00  54.79   4332  678.641    42   12.635  691.276
25.53   0.00  88.22  14.00  54.79   4332  678.641    43   13.729  692.370
 0.00  22.51  43.61  14.00  54.79   4333  678.732    46   14.007  692.739
 0.00  22.51  85.26  14.00  54.79   4333  678.732    47   15.121  693.853
30.96   0.00  31.47  14.00  54.79   4334  678.825    48   15.940  694.765       --  1个CMS周期 (15.940 - 15.121)= 0.819秒

(4)年老代使用率稳定在19.11,持续时间4050秒,期间19.11发生2023次,19.12发生22次。
 36.91   0.00  95.30  19.11  54.85  16164 2385.023 41476 19396.571 21781.594
  0.00  44.63  56.32  19.11  54.85  16165 2385.177 41476 19397.458 21782.635
 26.73  44.63 100.00  19.11  54.85  16166 2385.177 41477 19398.296 21783.473
 26.73   0.00  55.35  19.11  54.85  16166 2385.247 41478 19398.421 21783.668
 26.73   0.00  97.62  19.11  54.85  16166 2385.247 41479 19399.228 21784.476
  0.00  19.76  51.00  19.11  54.85  16167 2385.323 41480 19399.299 21784.622
  0.00  19.76  87.59  19.11  54.85  16167 2385.323 41481 19399.935 21785.258
 42.19   0.00  40.78  19.11  54.85  16168 2385.551 41482 19400.717 21786.267
 42.19   0.00  79.05  19.11  54.85  16168 2385.551 41483 19400.717 21786.267
  0.00  30.74  41.10  19.11  54.85  16169 2385.686 41484 19401.702 21787.388
  0.00  30.74  75.95  19.11  54.85  16169 2385.686 41485 19402.405 21788.091

关键点就是这个19.11仿佛是个临界点,它有什么含义?

一个CMS周期,在T1 ~ T2的时间段内回收了大小为 S 的年老代空间,
同时(T1~T2时间段内),年轻带向年老代晋升了 S 的活跃对象。
年老代空间总计 460800K,可以通过GC日志获得
[GC [1 CMS-initial-mark: 64848K(460800K)] 174392K(1075200K), 0.1478125 secs] [Times: user=0.14 sys=0.00, real=0.14 secs]

在(T2 - T1)时间段内存年轻代晋升到年老代的的对象大小为460800K * (19.11% -14%) = 23.55M

T2 :CMS-concurrent-reset 一个CMS周期结束时间点
T1 :CMS-initial-mark 一个CMS周期开始时间点

67020.573: [GC [1 CMS-initial-mark: 88063K(460800K)] 457313K(1075200K), 0.6802608 secs] [Times: user=0.63 sys=0.00, real=0.67 secs]                                                                                                                                                                                                             67021.254: [CMS-concurrent-mark-start]                                                                                                                                                                                                                                                                                                          67021.834: [CMS-concurrent-mark: 0.580/0.580 secs] [Times: user=2.66 sys=0.38, real=0.58 secs]                                                                                                                                                                                                                                                  67021.834: [CMS-concurrent-preclean-start]                                                                                                                                                                                                                                                                                                      67021.839: [CMS-concurrent-preclean: 0.004/0.005 secs] [Times: user=0.05 sys=0.06, real=0.01 secs]                                                                                                                                                                                                                                              67021.839: [CMS-concurrent-abortable-preclean-start]                                                                                                                                                                                                                                                                                             CMS: abort preclean due to time 67021.908: [CMS-concurrent-abortable-preclean: 0.068/0.069 secs] [Times: user=0.20 sys=0.06, real=0.06 secs]                                                                                                                                                                                                   67021.915: [GC[YG occupancy: 482922 K (614400 K)]67021.915: [Rescan (parallel) , 0.7962089 secs]67022.711: [weak refs processing, 0.0009871 secs] [1 CMS-remark: 88063K(460800K)] 570985K(1075200K), 0.7974718 secs] [Times: user=0.80 sys=0.00, real=0.80 secs]                                                                                67022.713: [CMS-concurrent-sweep-start]                                                                                                                                                                                                                                                                                                         67022.760: [GC 67022.760: [ParNew: 494421K->41137K(614400K), 0.1106140 secs] 582484K->129201K(1075200K), 0.1110208 secs] [Times: user=0.56 sys=0.00, real=0.11 secs]                                                                                                                                                                            67023.025: [CMS-concurrent-sweep: 0.198/0.312 secs] [Times: user=1.42 sys=0.27, real=0.31 secs]                                                                                                                                                                                                                                                 67023.025: [CMS-concurrent-reset-start]                                                                                                                                                                                                                                                                                                         67023.040: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 

在年老代使用率为19.11%的时候,随机抽取5个样本点(T2 - T1)为:2.315,2.656,1.181,2.349,2.467,如果取完整的数据会更准确些。
计算平均值 avg =  2.1936,得出阶段结论:大约每2.1936秒的CMS周期内,年轻代向年老代晋升了23.55M的数据

下一步测试计划:
调整-XX:CMSInitiatingOccupancyFraction的值,尝试15,16,17,18,19,20这几组数据,目标降低年老代回收频率。

你可能感兴趣的:(java,jvm,cms,user,processing,parallel)