一次线上dubbo问题的定位,进行JVM调优实战。
线上dubbo接口provider抛出异常:
org.apache.dubbo.rpc.RpcException: Failfast invoke providers ... RandomLoadBalance select from all providers ... but no luck to perform the invocation. Last error is: Invoke remote method timeout. ... cause: org.apache.dubbo.remoting.TimeoutException: Sending request timeout in client-side by scan timer. start time: 2023-01-17 11:31:24.131, end time: 2023-01-17 11:31:25.670, elapsed: 1539 ms, timeout: 800 ms, ...
可能原因:
1、GC。
jvm进行gc会stop the word,影响业务线程的执行。
2、网络波动。
虽然provider和consumer都在线,但短暂的网络波动也会导致provider发起请求超时。
3、dubbo线程池满了。
看了服务器日志,没有相关异常日志,排除。
4、有大报文。
dubbo默认一个consumer和一个provider只有一条TCP连接,tcp的可靠性、有序性、分包等特性,在分段发送大报文的时候,大报文会阻塞其它请求,其它请求自然超时了。
如果默认使用dubbo配置的连接数,即一个consumer和一个provider只有一条TCP连接,可以查看日志或看下带宽,是不是出现了大数据包传输。如果某条内容数据内容比较大,比如整条数据超过1M,在高qps情况下,必然出现大量超时。微观上,单条TCP通道上,还是顺序发送消息的,并且传输层会把单条数据分为每个包1500字节,共N个。在这个数据在被完整传输完之前,由于TCP可靠性、有序性的保证,其他的请求都只能排队等着,所以就超时了。
5、高并发流量。
流量较大时,consumer和provider的TCP连接处理不过来,provider的请求就排队超时了。
异常发生的时间
start time: 2023-01-17 11:31:24.131, end time: 2023-01-17 11:31:25.670, elapsed: 1539 ms, timeout: 800 ms
GC日志信息
2023-01-17T11:31:24.133+0800: 1038821.026: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 1.5310688 secs]
当时发生了young GC,花费了1.53秒的时间,结合异常发生的开始时间、结束时间,基本确认是年轻代GC导致的。但我使用的是G1收集器,且指定了期望的GC时间200ms,这次GC为何这么久?继续分析。
2023-01-17T11:31:24.133+0800: 1038821.026: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 1.5310688 secs]
[Parallel Time: 609.9 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1038821026.8, Avg: 1038821026.9, Max: 1038821027.0, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.0, Max: 2.6, Diff: 1.8, Sum: 13.6]
[Update RS (ms): Min: 12.4, Avg: 13.8, Max: 14.7, Diff: 2.3, Sum: 179.6]
[Processed Buffers: Min: 71, Avg: 87.8, Max: 106, Diff: 35, Sum: 1141]
[Scan RS (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.8]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 594.0, Avg: 594.5, Max: 594.7, Diff: 0.7, Sum: 7728.8]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.5]
[Termination Attempts: Min: 1, Avg: 1.1, Max: 2, Diff: 1, Sum: 14]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[GC Worker Total (ms): Min: 609.6, Avg: 609.7, Max: 609.9, Diff: 0.2, Sum: 7926.6]
[GC Worker End (ms): Min: 1038821636.6, Avg: 1038821636.6, Max: 1038821636.6, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.4 ms]
[Other: 920.7 ms]
[Evacuation Failure: 915.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 3.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.5 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.5 ms]
[Free CSet: 0.4 ms]
[Eden: 2176.0M(2360.0M)->0.0B(1920.0M) Survivors: 96.0M->0.0B Heap: 3764.9M(4096.0M)->1750.9M(4096.0M)]
[Times: user=2.67 sys=0.00, real=1.53 secs]
其中,Object Copy (对象拷贝转移)耗时594ms,Evacuation Failure (GC疏散失败)耗时915ms,时间都花在了这2个上面。
空间耗尽(对象转移失败),没有空闲Region分配给年老代(old) or 幸存区(survivor) or 2者兼有,且java堆已达到最大值无法扩展。仅在 G1 将存活对象从源空间复制到目标空间时发生。
简单理解:年老代 or 幸存区 不够用。
[Eden: 2176.0M(2360.0M)->0.0B(1920.0M) Survivors: 96.0M->0.0B Heap: 3764.9M(4096.0M)->1750.9M(4096.0M)]
年老代GC前内存占用:(3764.9-2176.0-96)=1493M(实际占用),(4096-2360-96)=1640M(最大大小,因为G1默认有10%的保留空间-XX:G1ReservePercent=10)。最大可用(1640-1493)=147M。
年老代GC后内存占用:(1750.9-0-0)=1751M(实际占用),(4096-1920-0)=2176M(最大大小)。实际占用增大了(1751-1493)=258M。
年老代可用内存不足导致了to-space exhausted的young GC。本次young GC清空了年轻代、幸存区,还由于Evacuation Failure导致G1收集器将部分年轻代直接置为年老代。
to-space exhausted一般发生在young gc后survivor区和老年代没有足够空间容纳存活对象,会导致较长的young gc耗时。回收器会将未成功复制的分区全部置为老年代分区,并且一般之后会紧跟着一次full gc/mixed gc进行全堆的回收,但是并不是每次都会进行full gc/mixed gc。猜测是因为JDK8u60之后,在young gc年轻代回收的最后(对象转移之后),会进行巨型对象的回收,因此释放了内存,内存占用回到可使用的状态。
疏散失败、转移失败。Evacuation:中文直译为疏散、撤离,G1中可理解为转移,将对象转移到其它Region中。这种就属于对象晋升失败(Promotion Failed),老年代在垃圾收集器释放出足够的空间前就已经被耗尽了,是old区的使用速度超过了垃圾收集器的回收速度。对于G1 GC,它是非常耗时的。
晋升失败:指在进行 Young GC 时,Survivor 放不下,对象只能放入 Old,但此时 Old 也放不下。因为有 concurrentMarkSweepThread 和担保机制的存在,发生的条件是很苛刻的,除非是短时间将 Old 区的剩余空间迅速填满。另外还有一种情况就是内存碎片导致的 Promotion Failed,Young GC 以为 Old 有足够的空间,结果到分配时,晋级的大对象找不到连续的空间存放。
空间分配担保: 如果大量对象在 Minor GC 后仍然存活,导致 Survivor 空间不够用,就会通过分配担保机制,将多出来的对象提前转到老年代,但老年代要进行担保的前提是自己本身还有容纳这些对象的剩余空间,由于无法提前知道会有多少对象存活下来,所以取之前每次晋升到老年代的对象的平均大小作为经验值,与老年代的剩余空间做比较。
在GC中, 并行(parallel)是指多个GC线程一起干活, 并发(concurrent)指GC线程和业务线程一起并发执行。
更早启动混合式垃圾收集周期,调小 -XX:InitiatingHeapOccupancyPercent=N 参数,默认值45。因为转移失败比多执行一些并发标记周期的代价高很多。不过,这个参数也不能调得太小,否则会导致过多的并发收集周期和混合式垃圾收集,给服务造成过多的停顿。
JDK版本在8b12之前,-XX:InitiatingHeapOccupancyPercent是整个堆使用量与堆总体容量的比值。
JDK版本在8b12之后(含8b12、大版本9、10、11....),-XX:InitiatingHeapOccupancyPercent是老年代大小与堆总体容量的比值。
改变之后,G1触发global concurrent marking的条件变得更加关心old gen什么时候会变得无法扩张,而不只是简单的看整堆剩余容量。毕竟global concurrent marking的目的是为了让G1 mixed GC可以找出适合的old gen region来收集,必须在old gen变得无法扩张(也就基本无法收集)之前完成marking。
# /data/services/jdk8u161/bin/java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
增加GC线程数,增大 -XX:ConcGCThreads 。不可过大,否则会占用过多的CPU资源,影响业务线程。
[Parallel Time: 609.9 ms, GC Workers: 13],GC线程数=13,没必要增大。
若转移失败是因为survivor中没有足够的空间容纳新晋升的对象,频繁的发生to-space exhausted,可考虑增大 -XX:G1ReservePercent ,默认值是10%。
观察了GC日志,to-space exhausted导致的GC很少。
增大堆内存。
容器是8G内存,堆内存最大值-Xmx4096m,容器不扩容的情况下,可适当提升堆内存最大值到5G。
若大对象(对象 > 1/2Region大小)过多导致的老年代碎片化问题,可增大Region大小,例:-XX:G1HeapRegionSize=4M。-XX:G1HeapRegionSize这个参数需要设置为2的幂次方,最小值是1M,最大值是32M。注意:代码层面应避免大对象的出现!
观察本次GC的前后GC信息,未发现大对象G1 Humongous Allocation分配导致的GC。
更早启动混合式垃圾收集周期,调小 -XX:InitiatingHeapOccupancyPercent=N 参数,默认值45。
增加GC线程数,增大 -XX:ConcGCThreads 。
若转移失败是因为survivor中没有足够的空间容纳新晋升的对象,频繁的发生to-space exhausted,可考虑增大 -XX:G1ReservePercent ,默认值是10%。
增大堆内存。
若大对象过多导致的老年代碎片化问题,可增大Region大小,例:-XX:G1HeapRegionSize=4M。
综合分析,本次to-space exhausted & Evacuation Failure GC前后,无大对象分配的GC,无频繁的to-space exhausted,GC线程数13已足够,无mixed GC,无full GC,推测大概率是偶发的高并发流量带来的内存不足问题,
所以,采取方案1、4。调小 -XX:InitiatingHeapOccupancyPercent=35,增大最大堆内存-Xmx5120m。优化后的jvm参数配置为:
-Xms5120m
-Xmx5120m
-XX:MetaspaceSize=512M
-XX:MaxDirectMemorySize=1024M
-Xss256k
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=35
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/jvm/heapdump.hprof
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:/tmp/jvm/gc-%t.log
-Djava.awt.headless=true
-Djava.net.preferIPv4Stack=true
-Duser.timezone=Asia/Shanghai
-Dfile.encoding=UTF-8
不要过度加一些jvm参数。比如-Xmn,这个参数会限制G1的参数的自动扩展。可以仅使用-Xms,-Xmx和期望GC时间-XX:MaxGCPauseMillis,删除任何额外的jvm参数,例如-Xmn,-XX:NewSize,-XX:MaxNewSize,-XX:SurvivorRatio等。
如果marking cycle没有足够早地开始回收老一代,那么请减少-XX:InitiatingHeapOccupancyPercent。默认值是45%。减小该值将提前开始marking cycle 。另一方面,如果marking cycle 提前开始但未有效回收内存,请将-XX:InitiatingHeapOccupancyPercent阈值增加到默认值以上。
如果并发marking cycle准时开始,但需要很长时间才能完成,那么使用属性'-XX:ConcGCThreads'增加并发标记线程数的数量。默认是GC Workers: 1 ,单线程执行。
如果有大量“空间耗尽(to-space exhausted)”或“空间溢出(to-space overflow)”GC事件,则增加-XX:G1ReservePercent。默认值是Java堆的10%。注意:G1 GC将此值限制在50%以内。
JVM参数中可设置-XX:+HeapDumpAfterFullGC和-XX:+HeapDumpOnOutOfMemoryError,可以在发生FGC和OOM的时候将当时的Java堆情况记录下来,便于事后分析。
dubbo超时日志:
Caused by: org.apache.dubbo.remoting.TimeoutException: Sending request timeout in client-side by scan timer. start time: 2023-02-14 11:30:36.886, end time: 2023-02-14 11:30:37.750, client elapsed: 864 ms, server elapsed: 0 ms, timeout: 800 ms
2023-02-14T11:30:36.887+0800: 76802.887: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 0.8614319 secs]
[Parallel Time: 82.8 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 76802887.5, Avg: 76802890.4, Max: 76802897.1, Diff: 9.6]
[Ext Root Scanning (ms): Min: 0.0, Avg: 2.6, Max: 16.0, Diff: 16.0, Sum: 34.4]
[Update RS (ms): Min: 1.8, Avg: 13.6, Max: 16.9, Diff: 15.1, Sum: 176.2]
[Processed Buffers: Min: 33, Avg: 100.5, Max: 156, Diff: 123, Sum: 1306]
[Scan RS (ms): Min: 0.0, Avg: 0.4, Max: 1.5, Diff: 1.5, Sum: 5.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
[Object Copy (ms): Min: 61.5, Avg: 62.7, Max: 69.4, Diff: 8.0, Sum: 814.9]
[Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.3]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 13]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 1.7]
[GC Worker Total (ms): Min: 72.8, Avg: 79.6, Max: 82.7, Diff: 9.9, Sum: 1035.0]
[GC Worker End (ms): Min: 76802969.9, Avg: 76802970.0, Max: 76802970.2, Diff: 0.3]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.2 ms]
[Other: 777.4 ms]
[Evacuation Failure: 766.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 7.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.0 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 1.1 ms]
[Free CSet: 0.8 ms]
[Eden: 2662.0M(3048.0M)->0.0B(256.0M) Survivors: 24.0M->0.0B Heap: 4593.4M(5120.0M)->1521.2M(5120.0M)]
[Times: user=1.92 sys=0.00, real=0.86 secs]
其中,Object Copy耗时62ms,Evacuation Failure耗时766ms,相比第一次有所改进,继续定位分析下。
年老代GC前占用:(4593-2662-24)=1907M,最大可用:(5120-3048-24)=2048M。
年老代GC后占用:(1521-0-0)=1521M,最大可用:(5120-256-0)=4864M。
young gc后,年老代实际占用空间变小,说明清理了一些位于年老代的Humongous Region。
2023-02-14T11:26:43.656+0800: 76569.655: [GC pause (G1 Humongous Allocation) (young) (initial-mark), 0.1026882 secs]
[Parallel Time: 96.2 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 76569656.4, Avg: 76569657.5, Max: 76569658.8, Diff: 2.4]
[Ext Root Scanning (ms): Min: 2.5, Avg: 5.3, Max: 11.7, Diff: 9.3, Sum: 68.7]
[Update RS (ms): Min: 13.6, Avg: 19.1, Max: 21.0, Diff: 7.5, Sum: 248.2]
[Processed Buffers: Min: 49, Avg: 103.6, Max: 188, Diff: 139, Sum: 1347]
[Scan RS (ms): Min: 0.0, Avg: 5.0, Max: 60.9, Diff: 60.9, Sum: 65.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 0.4, Avg: 47.4, Max: 53.4, Diff: 53.0, Sum: 616.7]
[Termination (ms): Min: 0.0, Avg: 9.6, Max: 18.4, Diff: 18.4, Sum: 124.5]
[Termination Attempts: Min: 1, Avg: 1.2, Max: 2, Diff: 1, Sum: 15]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 84.1, Avg: 86.4, Max: 95.5, Diff: 11.4, Sum: 1123.8]
[GC Worker End (ms): Min: 76569742.9, Avg: 76569744.0, Max: 76569751.9, Diff: 9.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.0 ms]
[Other: 5.4 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 2.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.7 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.4 ms]
[Free CSet: 0.6 ms]
[Eden: 1050.0M(3050.0M)->0.0B(3048.0M) Survivors: 22.0M->24.0M Heap: 2581.4M(5120.0M)->977.9M(5120.0M)]
[Times: user=0.27 sys=0.01, real=0.11 secs]
2023-02-14T11:26:43.759+0800: 76569.759: [GC concurrent-root-region-scan-start]
2023-02-14T11:26:43.770+0800: 76569.769: [GC concurrent-root-region-scan-end, 0.0108536 secs]
2023-02-14T11:26:43.770+0800: 76569.769: [GC concurrent-mark-start]
2023-02-14T11:26:43.992+0800: 76569.992: [GC concurrent-mark-end, 0.2220597 secs]
2023-02-14T11:26:43.994+0800: 76569.993: [GC remark 2023-02-14T11:26:43.994+0800: 76569.993: [Finalize Marking, 0.0518922 secs] 2023-02-14T11:26:44.046+0800: 76570.045: [GC ref-proc, 0.0050567 secs] 2023-02-14T11:26:44.051+0800: 76570.050: [Unloading, 0.0947543 secs], 0.1533979 secs]
[Times: user=0.30 sys=0.00, real=0.16 secs]
2023-02-14T11:26:44.149+0800: 76570.149: [GC cleanup 991M->746M(5120M), 0.0088812 secs]
[Times: user=0.02 sys=0.00, real=0.01 secs]
2023-02-14T11:26:44.158+0800: 76570.158: [GC concurrent-cleanup-start]
2023-02-14T11:26:44.159+0800: 76570.158: [GC concurrent-cleanup-end, 0.0002724 secs]
发现有G1 Humongous Allocation大对象分配触发的global concurrent marking。initial-mark阶段回收了(2581-977)=1604M堆内存,其中1048M年轻代,556M年老代;cleanup阶段回收了(991-746)=245M堆内存; concurrent marking结束时,年老代预估746M。而下次GC年老代GC前占用1907M,说明这段时间Humongous Region分配了1100M左右,有大对象的持续分配。
2023-02-14T11:30:58.930+0800: 76824.929: [GC pause (G1 Evacuation Pause) (mixed), 0.0326652 secs]
[Parallel Time: 16.5 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 76824929.9, Avg: 76824934.8, Max: 76824940.8, Diff: 10.8]
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.3, Max: 3.7, Diff: 3.7, Sum: 16.8]
[Update RS (ms): Min: 1.7, Avg: 6.2, Max: 10.9, Diff: 9.2, Sum: 80.0]
[Processed Buffers: Min: 8, Avg: 24.9, Max: 54, Diff: 46, Sum: 324]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.6]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 0.0, Avg: 1.4, Max: 9.6, Diff: 9.6, Sum: 17.6]
[Termination (ms): Min: 0.0, Avg: 1.0, Max: 1.9, Diff: 1.9, Sum: 12.9]
[Termination Attempts: Min: 1, Avg: 3.1, Max: 7, Diff: 6, Sum: 40]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
[GC Worker Total (ms): Min: 3.9, Avg: 9.9, Max: 14.7, Diff: 10.8, Sum: 129.1]
[GC Worker End (ms): Min: 76824944.6, Avg: 76824944.7, Max: 76824945.5, Diff: 0.8]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 4.0 ms]
[Other: 12.2 ms]
[Choose CSet: 0.1 ms]
[Ref Proc: 9.9 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.4 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.3 ms]
[Eden: 256.0M(256.0M)->0.0B(3068.0M) Survivors: 0.0B->4096.0K Heap: 1777.2M(5120.0M)->1246.1M(5120.0M)]
[Times: user=0.10 sys=0.00, real=0.03 secs]
2023-02-14T11:36:01.992+0800: 77127.992: [GC pause (G1 Evacuation Pause) (young), 0.0184167 secs]
[Parallel Time: 12.3 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 77127992.3, Avg: 77127999.2, Max: 77128004.1, Diff: 11.7]
[Ext Root Scanning (ms): Min: 0.0, Avg: 1.1, Max: 3.8, Diff: 3.8, Sum: 14.7]
[Update RS (ms): Min: 0.0, Avg: 2.1, Max: 5.2, Diff: 5.2, Sum: 27.0]
[Processed Buffers: Min: 0, Avg: 22.2, Max: 89, Diff: 89, Sum: 288]
[Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 1.1, Diff: 1.1, Sum: 3.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
[Object Copy (ms): Min: 0.0, Avg: 0.4, Max: 1.2, Diff: 1.2, Sum: 5.6]
[Termination (ms): Min: 0.0, Avg: 1.0, Max: 1.6, Diff: 1.6, Sum: 13.3]
[Termination Attempts: Min: 1, Avg: 1.5, Max: 4, Diff: 3, Sum: 20]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.0]
[GC Worker Total (ms): Min: 0.1, Avg: 5.0, Max: 12.0, Diff: 11.9, Sum: 65.1]
[GC Worker End (ms): Min: 77128004.1, Avg: 77128004.2, Max: 77128004.3, Diff: 0.2]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.2 ms]
[Other: 4.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.6 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.0 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 1.7 ms]
[Eden: 3068.0M(3068.0M)->0.0B(3068.0M) Survivors: 4096.0K->4096.0K Heap: 4314.1M(5120.0M)->1244.8M(5120.0M)]
发生了一次mixed GC,mixed GC之后恢复了正常的young gc。重点信息:
2023-02-14T11:30:58.930+0800: 76824.929: [GC pause (G1 Evacuation Pause) (mixed), 0.0326652 secs]
[Eden: 256.0M(256.0M)->0.0B(3068.0M) Survivors: 0.0B->4096.0K Heap: 1777.2M(5120.0M)->1246.1M(5120.0M)]
2023-02-14T11:36:01.992+0800: 77127.992: [GC pause (G1 Evacuation Pause) (young), 0.0184167 secs]
[Eden: 3068.0M(3068.0M)->0.0B(3068.0M) Survivors: 4096.0K->4096.0K Heap: 4314.1M(5120.0M)->1244.8M(5120.0M)]
mixed GC回收了256M年轻代,共回收(1777-1246)=531M堆内存,年轻代可用空间从256M调整为3068M。mixed GC之后年老代实际占用维持在1245M左右,基本无波动,说明无大对象的分配,含大对象的突发流量已结束。
综合本次gc、gc前、gc后日志,本次问题是大对象流量带来的内存碎片,导致内存不足。
再观察当天总的gc日志文件,Humongous Allocation较少,Humongous Allocation导致的长时间gc(to-space exhausted)更少,业务上可容忍这种极少量的长gc,所以暂不处理。若业务上无法容忍,可增大堆内存空间。若Humongous Allocation较多,可增大Region大小,例:-XX:G1HeapRegionSize=4M。
https://github.com/cncounter/translation/blob/master/tiemao_2020/06_g1_gc_tuning/README.md
https://www.infoq.com/articles/tuning-tips-G1-GC/
https://www.oracle.com/technical-resources/articles/java/g1gc.html
http://dengchengchao.com/?p=1411
2023-01-17T11:04:50.767+0800: 1037227.660: [GC pause (G1 Evacuation Pause) (young), 0.0776798 secs]
[Parallel Time: 74.2 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1037227660.9, Avg: 1037227660.9, Max: 1037227661.0, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.8, Avg: 1.0, Max: 2.5, Diff: 1.8, Sum: 13.5]
[Update RS (ms): Min: 0.0, Avg: 1.0, Max: 1.4, Diff: 1.4, Sum: 13.4]
[Processed Buffers: Min: 0, Avg: 23.8, Max: 51, Diff: 51, Sum: 309]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 13.0, Avg: 13.3, Max: 13.4, Diff: 0.3, Sum: 173.4]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 13]
[GC Worker Other (ms): Min: 0.0, Avg: 40.5, Max: 58.6, Diff: 58.6, Sum: 526.7]
[GC Worker Total (ms): Min: 15.5, Avg: 56.0, Max: 74.1, Diff: 58.6, Sum: 728.5]
[GC Worker End (ms): Min: 1037227676.5, Avg: 1037227717.0, Max: 1037227735.0, Diff: 58.6]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.4 ms]
[Other: 3.0 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.3 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.8 ms]
[Eden: 2358.0M(2358.0M)->0.0B(2358.0M) Survivors: 98.0M->98.0M Heap: 3092.0M(4096.0M)->734.1M(4096.0M)]
[Times: user=0.21 sys=0.00, real=0.08 secs]
2023-01-17T11:15:24.856+0800: 1037861.750: [GC pause (G1 Evacuation Pause) (young), 0.0888294 secs]
[Parallel Time: 85.1 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1037861750.0, Avg: 1037861750.1, Max: 1037861750.2, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.0, Max: 2.4, Diff: 1.8, Sum: 12.6]
[Update RS (ms): Min: 0.0, Avg: 1.0, Max: 1.4, Diff: 1.4, Sum: 13.6]
[Processed Buffers: Min: 0, Avg: 23.3, Max: 39, Diff: 39, Sum: 303]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 13.7, Avg: 13.9, Max: 14.0, Diff: 0.3, Sum: 180.8]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.2, Max: 2, Diff: 1, Sum: 15]
[GC Worker Other (ms): Min: 0.0, Avg: 58.2, Max: 68.9, Diff: 68.9, Sum: 757.0]
[GC Worker Total (ms): Min: 16.1, Avg: 74.3, Max: 85.0, Diff: 68.9, Sum: 965.4]
[GC Worker End (ms): Min: 1037861766.2, Avg: 1037861824.4, Max: 1037861835.0, Diff: 68.9]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.5 ms]
[Other: 3.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 2.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.2 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.6 ms]
[Eden: 2358.0M(2358.0M)->0.0B(2360.0M) Survivors: 98.0M->96.0M Heap: 3092.1M(4096.0M)->732.1M(4096.0M)]
[Times: user=0.22 sys=0.00, real=0.09 secs]
2023-01-17T11:27:16.401+0800: 1038573.294: [GC pause (G1 Evacuation Pause) (young), 0.0494262 secs]
[Parallel Time: 40.4 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1038573294.7, Avg: 1038573294.8, Max: 1038573294.9, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.0, Max: 2.6, Diff: 1.9, Sum: 13.6]
[Update RS (ms): Min: 0.0, Avg: 1.0, Max: 1.4, Diff: 1.4, Sum: 13.2]
[Processed Buffers: Min: 0, Avg: 23.5, Max: 45, Diff: 45, Sum: 305]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 15.7, Avg: 21.1, Max: 38.0, Diff: 22.3, Sum: 273.8]
[Termination (ms): Min: 0.0, Avg: 16.9, Max: 22.0, Diff: 22.0, Sum: 219.6]
[Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 32]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
[GC Worker Total (ms): Min: 40.1, Avg: 40.2, Max: 40.2, Diff: 0.1, Sum: 522.2]
[GC Worker End (ms): Min: 1038573334.9, Avg: 1038573335.0, Max: 1038573335.0, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.5 ms]
[Other: 8.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 6.9 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.3 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.7 ms]
[Eden: 2360.0M(2360.0M)->0.0B(2360.0M) Survivors: 96.0M->96.0M Heap: 3092.1M(4096.0M)->733.1M(4096.0M)]
[Times: user=0.24 sys=0.00, real=0.05 secs]
2023-01-17T11:31:24.133+0800: 1038821.026: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 1.5310688 secs]
[Parallel Time: 609.9 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1038821026.8, Avg: 1038821026.9, Max: 1038821027.0, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.0, Max: 2.6, Diff: 1.8, Sum: 13.6]
[Update RS (ms): Min: 12.4, Avg: 13.8, Max: 14.7, Diff: 2.3, Sum: 179.6]
[Processed Buffers: Min: 71, Avg: 87.8, Max: 106, Diff: 35, Sum: 1141]
[Scan RS (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.8]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 594.0, Avg: 594.5, Max: 594.7, Diff: 0.7, Sum: 7728.8]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.5]
[Termination Attempts: Min: 1, Avg: 1.1, Max: 2, Diff: 1, Sum: 14]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
[GC Worker Total (ms): Min: 609.6, Avg: 609.7, Max: 609.9, Diff: 0.2, Sum: 7926.6]
[GC Worker End (ms): Min: 1038821636.6, Avg: 1038821636.6, Max: 1038821636.6, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.4 ms]
[Other: 920.7 ms]
[Evacuation Failure: 915.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 3.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.5 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.5 ms]
[Free CSet: 0.4 ms]
[Eden: 2176.0M(2360.0M)->0.0B(1920.0M) Survivors: 96.0M->0.0B Heap: 3764.9M(4096.0M)->1750.9M(4096.0M)]
[Times: user=2.67 sys=0.00, real=1.53 secs]
2023-01-17T11:37:50.433+0800: 1039207.327: [GC pause (G1 Evacuation Pause) (young), 0.1167556 secs]
[Parallel Time: 114.3 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1039207327.4, Avg: 1039207327.5, Max: 1039207327.6, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.7, Avg: 1.0, Max: 2.4, Diff: 1.7, Sum: 13.2]
[Update RS (ms): Min: 107.3, Avg: 108.6, Max: 109.0, Diff: 1.7, Sum: 1411.8]
[Processed Buffers: Min: 78, Avg: 114.6, Max: 153, Diff: 75, Sum: 1490]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 55.6]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 2, Avg: 6.2, Max: 12, Diff: 10, Sum: 81]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.2]
[GC Worker Total (ms): Min: 114.0, Avg: 114.1, Max: 114.2, Diff: 0.2, Sum: 1483.2]
[GC Worker End (ms): Min: 1039207441.5, Avg: 1039207441.6, Max: 1039207441.7, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
[Other: 2.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.9 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.3 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.5 ms]
[Eden: 1920.0M(1920.0M)->0.0B(1874.0M) Survivors: 0.0B->28.0M Heap: 3670.9M(4096.0M)->1777.9M(4096.0M)]
[Times: user=0.41 sys=0.00, real=0.12 secs]
2023-01-17T11:42:39.813+0800: 1039496.706: [GC pause (G1 Evacuation Pause) (young), 0.0133004 secs]
[Parallel Time: 11.0 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1039496706.7, Avg: 1039496706.8, Max: 1039496706.9, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.8, Avg: 1.0, Max: 2.5, Diff: 1.8, Sum: 13.6]
[Update RS (ms): Min: 0.0, Avg: 0.9, Max: 1.2, Diff: 1.2, Sum: 12.2]
[Processed Buffers: Min: 0, Avg: 22.8, Max: 49, Diff: 49, Sum: 296]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 8.3, Avg: 8.6, Max: 8.7, Diff: 0.4, Sum: 112.3]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 5.6, Max: 10, Diff: 9, Sum: 73]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 1.1]
[GC Worker Total (ms): Min: 10.7, Avg: 10.8, Max: 10.9, Diff: 0.2, Sum: 140.6]
[GC Worker End (ms): Min: 1039496717.5, Avg: 1039496717.6, Max: 1039496717.7, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.4 ms]
[Other: 1.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.7 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.3 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.4 ms]
[Eden: 1874.0M(1874.0M)->0.0B(1830.0M) Survivors: 28.0M->50.0M Heap: 3651.9M(4096.0M)->1800.9M(4096.0M)]
[Times: user=0.15 sys=0.00, real=0.01 secs]
2023-01-17T11:48:28.638+0800: 1039845.532: [GC pause (G1 Evacuation Pause) (young), 0.0137210 secs]
[Parallel Time: 11.3 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 1039845532.1, Avg: 1039845532.2, Max: 1039845532.2, Diff: 0.2]
[Ext Root Scanning (ms): Min: 1.0, Avg: 2.1, Max: 2.9, Diff: 1.9, Sum: 27.9]
[Update RS (ms): Min: 0.2, Avg: 0.9, Max: 1.9, Diff: 1.7, Sum: 11.2]
[Processed Buffers: Min: 3, Avg: 22.6, Max: 68, Diff: 65, Sum: 294]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 7.9, Avg: 7.9, Max: 8.0, Diff: 0.1, Sum: 103.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 8, Avg: 12.5, Max: 21, Diff: 13, Sum: 162]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.0]
[GC Worker Total (ms): Min: 11.0, Avg: 11.1, Max: 11.3, Diff: 0.3, Sum: 144.7]
[GC Worker End (ms): Min: 1039845543.2, Avg: 1039845543.3, Max: 1039845543.4, Diff: 0.1]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
[Other: 2.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.8 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.3 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.5 ms]
[Eden: 1830.0M(1830.0M)->0.0B(1822.0M) Survivors: 50.0M->54.0M Heap: 3630.9M(4096.0M)->1804.9M(4096.0M)]
[Times: user=0.13 sys=0.00, real=0.01 secs]