【性能优化】记录一次YounGC峰值优化

服务环境

  • CPU 24核
  • 服务内存 16G

问题复盘

youngGC时延增大直到触发mixedGC后恢复正常
如下
【性能优化】记录一次YounGC峰值优化_第1张图片

具体GC日志如下

Before MixedGC
2020-01-12T19:37:30.333+0800: 192703.245: [SoftReference, 0 refs, 0.0000815 secs]2020-01-12T19:37:30.333+0800: 192703.245: [WeakReference, 2 refs, 0.0000126 secs]2020-01-12T19:37:30.334+0800: 192703.245: [FinalReference, 28 refs, 0.0000426 secs]2020-01-12T19:37:30.334+0800: 192703.245: [PhantomReference, 0 refs, 58 refs, 0.0000277 secs]2020-01-12T19:37:30.334+0800: 192703.245: [JNI Weak Reference, 0.0000331 secs], 0.2385635 secs]
   [Parallel Time: 232.4 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 192703012.3, Avg: 192703012.5, Max: 192703012.8, Diff: 0.5]
      [Ext Root Scanning (ms): Min: 155.8, Avg: 159.7, Max: 164.0, Diff: 8.2, Sum: 2874.1]
      [Update RS (ms): Min: 24.9, Avg: 26.0, Max: 26.5, Diff: 1.6, Sum: 467.4]
         [Processed Buffers: Min: 25, Avg: 38.4, Max: 66, Diff: 41, Sum: 691]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 41.6, Avg: 45.9, Max: 49.6, Diff: 8.0, Sum: 826.5]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3]
         [Termination Attempts: Min: 1, Avg: 8.3, Max: 13, Diff: 12, Sum: 150]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.6]
      [GC Worker Total (ms): Min: 231.5, Avg: 231.8, Max: 232.0, Diff: 0.5, Sum: 4171.6]
      [GC Worker End (ms): Min: 192703244.2, Avg: 192703244.3, Max: 192703244.3, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 5.7 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.5 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.1 ms]
   [Eden: 3192.0M(3192.0M)->0.0B(3192.0M) Survivors: 80.0M->80.0M Heap: 10.4G(16.0G)->7494.1M(16.0G)]
Heap after GC invocations=10527 (full 0):
 garbage-first heap   total 16777216K, used 7673982K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
  region size 8192K, 10 young (81920K), 10 survivors (81920K)
 Metaspace       used 113051K, capacity 114336K, committed 114816K, reserved 1150976K
  class space    used 13294K, capacity 13594K, committed 13696K, reserved 1048576K
}
 [Times: user=4.15 sys=0.00, real=0.24 secs]


After MixedGC
2020-01-12T19:37:51.074+0800: 192723.985: [SoftReference, 0 refs, 0.0000829 secs]2020-01-12T19:37:51.074+0800: 192723.985: [WeakReference, 3 refs, 0.0000134 secs]2020-01-12T19:37:51.074+0800: 192723.985: [FinalReference, 28 refs, 0.0000514 secs]2020-01-12T19:37:51.074+0800: 192723.985: [PhantomReference, 0 refs, 64 refs, 0.0000221 secs]2020-01-12T19:37:51.074+0800: 192723.985: [JNI Weak Reference, 0.0000299 secs], 0.0240489 secs]
   [Parallel Time: 18.6 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 192723966.5, Avg: 192723966.7, Max: 192723966.9, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 2.3, Avg: 2.6, Max: 3.0, Diff: 0.7, Sum: 46.1]
      [Update RS (ms): Min: 3.2, Avg: 3.5, Max: 3.9, Diff: 0.7, Sum: 63.3]
         [Processed Buffers: Min: 8, Avg: 20.9, Max: 61, Diff: 53, Sum: 377]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.4]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 11.5, Avg: 11.8, Max: 12.0, Diff: 0.5, Sum: 213.0]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 18]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 2.2]
      [GC Worker Total (ms): Min: 17.9, Avg: 18.1, Max: 18.3, Diff: 0.5, Sum: 326.1]
      [GC Worker End (ms): Min: 192723984.7, Avg: 192723984.8, Max: 192723984.8, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 5.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.4 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 2.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.1 ms]
   [Eden: 3200.0M(3200.0M)->0.0B(3192.0M) Survivors: 72.0M->80.0M Heap: 4205.6M(16.0G)->1011.7M(16.0G)]
Heap after GC invocations=10533 (full 0):
 garbage-first heap   total 16777216K, used 1035991K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
  region size 8192K, 10 young (81920K), 10 survivors (81920K)
 Metaspace       used 113051K, capacity 114336K, committed 114816K, reserved 1150976K
  class space    used 13294K, capacity 13594K, committed 13696K, reserved 1048576K
}
 [Times: user=0.22 sys=0.10, real=0.02 secs]

问题分析

  1. 为何mixedGC可以使YoungGC恢复正常 ?
  2. 如何使优化使服务稳定?

问题1 为何mixedGC可以使YoungGC恢复正常

RootSet:包括寄存器,调用栈,全局变量
RememberSet:记录OldRegion到YoungRegion的引用

YoungGC包括如下几个步骤

  1. 初始标记:遍历RootSet 和 RememberSet 标记存活实例 (STW)
  2. 并发标记:遍历Eden及Survivors,标记存活实例加入CSet
  3. 重复标记:SATB(Snapshot-at-the-beginning)增量回收 步骤2期间发生变化的实例 (STW), 标记存活实例加入CSet
  4. 最终回收:部分STW,并发

从整体情况来看,步骤一及步骤三是STW主要花费时间的阶段。
步骤一主要花费时间取决于两个变量 RootSet.size 与 RememberSet.size。
步骤三主要花费时间取决于SATB队列.size,由于并发标记速度较快。该时间可忽略。
步骤四主要花费时间取决于CSet大小及SurvivorsRegion.size

因此,从BeforeMixGC 和 AfterMixGC情况来看,其主要变更的是RememberSet大小。

从年代回收算法来看:

  1. Young: 临时,短期内存占用
  2. Eden:中期内存占用
  3. Old: 长期及永久内存占用

对于当前服务来说,服务理想情况内存占用分布情况应该为:

  1. Young:临时变量,生命周期极短
  2. Eden:连接相关所有Event请求,生命中期中
  3. Old:PlayFramework组件、服务相关组件、配置信息等,生命周期与服务周期近乎相等

服务线上运行之后第一次MixedGC之后内存记录,做为真实old区应该占用空间大小。
[Eden: 3200.0M(3200.0M)->0.0B(3200.0M) Survivors: 72.0M->72.0M Heap: 5480.9M(16.0G)->1005.6M(16.0G)]
实际当服务运行至MixedGC临界时,实际内存表现为
[Eden: 3192.0M(3192.0M)->0.0B(3192.0M) Survivors: 80.0M->80.0M Heap: 10.4G(16.0G)->7494.1M(16.0G)]

可以看到服务在整体运行期间,实际上年代回收机制实际上并没有很好的执行。

目前年老代晋升的最大临界值 -XX:MaxTenuringThreshold=15
对象存活时间超过 T = 15 * YongGC间隔时间 即能进入Old区。
统计高峰期间隔时间: Min= 1s Ava=3s
2020-01-12 一天服务数据session存活时间数量

100s = 259
60s = 290
30s =576
15s =9463

问题2 如何使优化使服务稳定?

从问题1的分析来看,解决该问题可以从以下几方面着手

  1. 年代回收机制,业务请求生命周期维持在Young和Survivors

  2. MixedGC触发机制,尽早的触发MixedGC避免RememberSet过大

  3. 业务层面优化,减少连接保持时间

  4. 年代回收机制
    根据Mixed后容量预估,Min(Old.size) = 1.5G左右 ,跟进分析情况给出建议配置为
    -XX:G1NewSizePercent=25 -XX:G1MaxNewSizePercent=75 -XX:MaxTenuringThreshold=25

  5. MixedGC触发机制
    在RememberSet还未增长过大时,即触发MixedGC
    -XX:InitiatingHeapOccupancyPercent=35

落地后续

【性能优化】记录一次YounGC峰值优化_第2张图片

整体峰值从80ms > 30ms,服务稳定性提高

你可能感兴趣的:(JVM,java)