测试同事反馈测试环境集群中某个服务挂了, 查看日志没有任何异常,业务日志正常打印,然后突然程序挂了。
首先查看业务日志是正常打印的,
然后想到去查看linux日志信息
通过grep -C 50 java messsages
通过linux日志可以看到被系统killed
此时怀疑是否程序jvm配置的资源内存不够导致被linux系统杀掉了 ,
进入程序目录发现生成了一个hs_error_pid28351.log
这个文件的生成应该是jvm退出的时候生成的,通过more hs_error_pid28351.log 查看
fatal error: OutOfMemory encountered: Requested array size exceeds VM limit
此时程序退出了,不知道是哪些对象把jvm内存耗完了。。, 然后又去看了下程序启动 jvm的配置参数
JAVA_OPTS="-Xms256m -Xmx24g -Xmn8g -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:log/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=100M -XX:+CrashOnOutOfMemoryError"
其中 -XX:+CrashOnOutOfMemoryError 当发生oom后程序会立即退出,JVM还会生成文本和二进制崩溃文件
也打印了gc详细信息,去查看了 下gc日志,发现gc到某个时刻, gc日志不打印了, 同时发现上面的日志经过多次 gc后,日志内存还是无法释放,
2022-06-07T15:40:27.089+0800: 64.273: [Full GC (Ergonomics) [PSYoungGen: 88064K->88024K(175104K)] [ParOldGen: 786319K->786319K(786432K)] 874383K->874344K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2765189 secs] [Times: user=8.19 sys=0.00, real=1.28 secs]
2022-06-07T15:40:28.366+0800: 65.550: [Full GC (Ergonomics) [PSYoungGen: 88064K->88041K(175104K)] [ParOldGen: 786319K->786319K(786432K)] 874383K->874361K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2386290 secs] [Times: user=8.59 sys=0.00, real=1.24 secs]
2022-06-07T15:40:29.605+0800: 66.789: [Full GC (Ergonomics) [PSYoungGen: 88064K->88044K(175104K)] [ParOldGen: 786319K->786319K(786432K)] 874383K->874364K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.9070903 secs] [Times: user=9.89 sys=0.03, real=1.91 secs]
2022-06-07T15:40:31.513+0800: 68.697: [Full GC (Ergonomics) [PSYoungGen: 88064K->88044K(175104K)] [ParOldGen: 786319K->786319K(786432K)] 874383K->874364K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2258152 secs] [Times: user=8.14 sys=0.00, real=1.23 secs]
2022-06-07T15:40:32.739+0800: 69.923: [Full GC (Ergonomics) [PSYoungGen: 88064K->88045K(175104K)] [ParOldGen: 786319K->786319K(786432K)] 874383K->874364K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8916665 secs] [Times: user=9.02 sys=0.02, real=1.88 secs]
2022-06-07T15:40:34.631+0800: 71.815: [Full GC (Ergonomics) [PSYoungGen: 88064K->88050K(175104K)] [ParOldGen: 786319K->786319K(786432K)] 874383K->874369K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8324609 secs] [Times: user=9.25 sys=0.05, real=1.83 secs]
2022-06-07T15:40:36.464+0800: 73.648: [Full GC (Ergonomics) [PSYoungGen: 88064K->88050K(175104K)] [ParOldGen: 786319K->786318K(786432K)] 874383K->874368K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8423418 secs] [Times: user=8.45 sys=0.03, real=1.84 secs]
2022-06-07T15:40:38.307+0800: 75.491: [Full GC (Ergonomics) [PSYoungGen: 88064K->88053K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874371K(961536K), [Metaspace: 93032K->93032K(1132544K)], 2.3724553 secs] [Times: user=10.83 sys=0.05, real=2.36 secs]
2022-06-07T15:40:40.680+0800: 77.864: [Full GC (Ergonomics) [PSYoungGen: 88064K->88054K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874373K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.7755906 secs] [Times: user=9.44 sys=0.00, real=1.77 secs]
2022-06-07T15:40:42.456+0800: 79.640: [Full GC (Ergonomics) [PSYoungGen: 88064K->88057K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874376K(961536K), [Metaspace: 93032K->93032K(1132544K)], 3.1007377 secs] [Times: user=11.95 sys=0.06, real=3.09 secs]
2022-06-07T15:40:45.557+0800: 82.741: [Full GC (Ergonomics) [PSYoungGen: 88064K->88059K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874377K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2195819 secs] [Times: user=8.74 sys=0.00, real=1.22 secs]
2022-06-07T15:40:46.777+0800: 83.961: [Full GC (Ergonomics) [PSYoungGen: 88064K->88062K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874381K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2158907 secs] [Times: user=8.30 sys=0.03, real=1.22 secs]
2022-06-07T15:40:47.993+0800: 85.177: [Full GC (Ergonomics) [PSYoungGen: 88064K->88062K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874381K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2364385 secs] [Times: user=8.42 sys=0.00, real=1.24 secs]
2022-06-07T15:40:49.230+0800: 86.414: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874381K(961536K), [Metaspace: 93032K->93032K(1132544K)], 2.1153734 secs] [Times: user=8.77 sys=0.03, real=2.11 secs]
2022-06-07T15:40:51.346+0800: 88.530: [Full GC (Ergonomics) [PSYoungGen: 88064K->88055K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874374K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2330226 secs] [Times: user=8.31 sys=0.01, real=1.23 secs]
2022-06-07T15:40:52.580+0800: 89.764: [Full GC (Ergonomics) [PSYoungGen: 88064K->88058K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874376K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2285841 secs] [Times: user=8.25 sys=0.02, real=1.23 secs]
2022-06-07T15:40:53.808+0800: 90.993: [Full GC (Ergonomics) [PSYoungGen: 88064K->88059K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874377K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8795535 secs] [Times: user=9.52 sys=0.01, real=1.87 secs]
2022-06-07T15:40:55.688+0800: 92.872: [Full GC (Ergonomics) [PSYoungGen: 88063K->88056K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874382K->874374K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8573714 secs] [Times: user=8.75 sys=0.00, real=1.86 secs]
2022-06-07T15:40:57.546+0800: 94.730: [Full GC (Ergonomics) [PSYoungGen: 88063K->88057K(175104K)] [ParOldGen: 786318K->786318K(786432K)] 874381K->874375K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.1935241 secs] [Times: user=8.06 sys=0.02, real=1.19 secs]
2022-06-07T15:40:58.739+0800: 95.924: [Full GC (Ergonomics) [PSYoungGen: 88064K->88057K(175104K)] [ParOldGen: 786318K->786317K(786432K)] 874382K->874374K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.9287577 secs] [Times: user=9.53 sys=0.02, real=1.93 secs]
2022-06-07T15:41:00.669+0800: 97.853: [Full GC (Ergonomics) [PSYoungGen: 88064K->88060K(175104K)] [ParOldGen: 786317K->786316K(786432K)] 874381K->874377K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8469753 secs] [Times: user=9.25 sys=0.03, real=1.85 secs]
2022-06-07T15:41:02.517+0800: 99.701: [Full GC (Ergonomics) [PSYoungGen: 88064K->88061K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874378K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2223578 secs] [Times: user=8.34 sys=0.00, real=1.22 secs]
2022-06-07T15:41:03.739+0800: 100.923: [Full GC (Ergonomics) [PSYoungGen: 88064K->88062K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874379K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.7926885 secs] [Times: user=9.27 sys=0.08, real=1.79 secs]
2022-06-07T15:41:05.532+0800: 102.716: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874379K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8188174 secs] [Times: user=9.42 sys=0.05, real=1.82 secs]
2022-06-07T15:41:07.351+0800: 104.535: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874379K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.1728942 secs] [Times: user=8.23 sys=0.00, real=1.17 secs]
2022-06-07T15:41:08.525+0800: 105.709: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874379K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2216016 secs] [Times: user=8.59 sys=0.00, real=1.22 secs]
2022-06-07T15:41:09.747+0800: 106.931: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874379K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2610382 secs] [Times: user=8.86 sys=0.03, real=1.26 secs]
2022-06-07T15:41:11.008+0800: 108.192: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786316K->786316K(786432K)] 874380K->874380K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.2108993 secs] [Times: user=8.45 sys=0.02, real=1.21 secs]
2022-06-07T15:41:12.219+0800: 109.403: [Full GC (Ergonomics) [PSYoungGen: 88064K->88063K(175104K)] [ParOldGen: 786316K->786315K(786432K)] 874380K->874379K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.7839150 secs] [Times: user=8.97 sys=0.02, real=1.78 secs]
2022-06-07T15:41:14.003+0800: 111.187: [Full GC (Ergonomics) [PSYoungGen: 88064K->83455K(175104K)] [ParOldGen: 786315K->786269K(786432K)] 874379K->869725K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.5506682 secs] [Times: user=10.41 sys=0.06, real=1.55 secs]
2022-06-07T15:41:15.564+0800: 112.748: [Full GC (Ergonomics) [PSYoungGen: 88064K->85227K(175104K)] [ParOldGen: 786269K->786266K(786432K)] 874333K->871493K(961536K), [Metaspace: 93032K->93032K(1132544K)], 1.8170896 secs] [Times: user=8.78 sys=0.00, real=1.81 secs]
2022-06-07T15:41:17.385+0800: 114.570: [Full GC (Ergonomics) [PSYoungGen: 88064K->85303K(175104K)] [ParOldGen: 786266K->786266K(786432K)] 874330K->871569K(961536K), [Metaspace: 93036K->93036K(1132544K)], 1.8459706 secs] [Times: user=8.89 sys=0.03, real=1.84 secs]
2022-06-07T15:41:19.239+0800: 116.423: [Full GC (Ergonomics) [PSYoungGen: 88064K->86548K(175104K)] [ParOldGen: 786266K->786266K(786432K)] 874330K->872814K(961536K), [Metaspace: 93036K->93036K(1132544K)], 1.2155478 secs] [Times: user=8.05 sys=0.03, real=1.22 secs]
2022-06-07T15:41:20.457+0800: 117.642: [Full GC (Ergonomics) [PSYoungGen: 88006K->87101K(175104K)] [ParOldGen: 786266K->786266K(786432K)] 874272K->873367K(961536K), [Metaspace: 93036K->93036K(1132544K)], 1.2462476 secs] [Times: user=8.86 sys=0.00, real=1.25 secs]
2022-06-07T15:41:21.708+0800: 118.892: [Full GC (Ergonomics) [PSYoungGen: 88064K->87514K(175104K)] [ParOldGen: 786266K->786266K(786432K)] 874330K->873780K(961536K), [Metaspace: 93036K->93036K(1132544K)], 1.2630408 secs] [Times: user=8.73 sys=0.01, real=1.26 secs]
2022-06-07T15:41:22.971+0800: 120.156: [Full GC (Ergonomics) [PSYoungGen: 88064K->87767K(175104K)] [ParOldGen: 786266K->786266K(786432K)] 874330K->874033K(961536K), [Metaspace: 93036K->93036K(1132544K)], 1.2148857 secs] [Times: user=8.34 sys=0.00, real=1.22 secs]
2022-06-07T15:41:24.188+0800: 121.372: [Full GC (Ergonomics) [PSYoungGen: 88064K->87903K(175104K)] [ParOldGen: 786266K->786266K(786432K)] 874330K->874169K(961536K), [Metaspace: 93036K->93036K(1132544K)], 1.2650010 secs] [Times: user=8.56 sys=0.02, real=1.27 secs]
2022-06-07T15:41:25.470+0800: 122.654: [Full GC (Ergonomics) [PSYoungGen: 88064K->0K(175104K)] [ParOldGen: 786266K->32867K(315904K)] 874330K->32867K(491008K), [Metaspace: 93040K->93040K(1132544K)], 0.1703121 secs] [Times: user=0.33 sys=0.06, real=0.17 secs]
gc前后内存都无法回收。
但是并没看到生成二进蹦溃的文件。 通过/var/log/messages 可以看到
Jun 6 17:12:50 localhost abrt-hook-ccpp: Process 28351 (java) of user 0 killed by SIGABRT - dumping core
Jun 6 17:14:13 localhost systemd-logind: Removed session 664882.
Jun 6 17:14:13 localhost systemd: Started Session 665121 of user root.
Jun 6 17:14:13 localhost systemd-logind: New session 665121 of user root.
Jun 6 17:14:14 localhost abrt-server: Executable '/usr/share/jdk1.8/bin/java' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jun 6 17:14:14 localhost abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2022-06-06-17:12:50-28351' exited with 1
Jun 6 17:14:14 localhost abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2022-06-06-17:12:50-28351'
可以dump core文件,失败了。abrt-server需要配置 ProcessUnpackaged is set to 'no'
这个先不管了。后续再去看。 我还是先更改下 jvm的启动参数
改为
-XX:HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=log/gpl-dump.hprof
当发送oom的时候, 程序不会退出,但是会自动生成堆栈信息,以便于分析问题原因
告知测试继续测试,再出现问题,会生成一些堆栈信息,
过一段时间后测试又反馈,后台服务无法响应,进程还在,此时在log目录下看到了生成的 gpl-dump.hprof
接着把文件down下来使用mat工具分析
打开文件后
通过概览页发现,821.7M的某类对象占用
接着点击Leak_Suspects 查看系统视图
通过keyword的关键信息基本可以可以确定程序执行这个方法生成的大量对象无法回收
此时进一步查看大量对象的信息都是什么
发现大量的linkedhashmap
通过上一步的 DatasetUtil.loadTaskresultBySql 关键方法找到一下代码,通过数据获取的数据rs太多了,然后每条列记录都用LinkedHashMap保存放到list中,导致内存溢出,最后更新程序限制查询条数解决此问题。