OutOfMemoryError: GC Overhead Limit Exceeded错误解析

面试:你懂什么是分布式系统吗?Redis分布式锁都不会?>>>   hot3.png

简单地说,Garbage Collection (GC)就是JVM回收不再使用的对象,释放内存的过程。GC Overhead Limit Exceeded errorjava.lang.OutOfMemoryError家族的一员,表示JVM内存被耗尽。接下来看看引起java.lang.OutOfMemoryError: GC Overhead Limit Exceeded错误的原因是什么,以及如何解决这个错误。

GC Overhead Limit Exceeded Error简介

OutOfMemoryErrorjava.lang.VirtualMachineError的子类,当JVM资源利用出现问题时抛出,更具体地说,这个错误是由于JVM花费太长时间执行GC且只能回收很少的堆内存时抛出的。根据Oracle官方文档,默认情况下,如果Java进程花费98%以上的时间执行GC,并且每次只有不到2%的堆被恢复,则JVM抛出此错误。换句话说,这意味着我们的应用程序几乎耗尽了所有可用内存,垃圾收集器花了太长时间试图清理它,并多次失败。

在这种情况下,用户会体验到应用程序响应非常缓慢,通常只需要几毫秒就能完成的某些操作,此时则需要更长的时间来完成,这是因为所有的CPU正在进行垃圾收集,因此无法执行其他任务。

错误复现

以下代码可以复现java.lang.OutOfMemoryError: GC Overhead Limit Exceeded错误,代码如下:

package com.galaxy.concurrency.jvm;

import java.util.HashMap;
import java.util.Map;
import java.util.Random;

public class OutOfMemoryGCLimitExceed {

    public static void addRandomDataToMap() {
        Map dataMap = new HashMap<>();
        Random r = new Random();
        while (true) {
            dataMap.put(r.nextInt(), String.valueOf(r.nextInt()));
        }
    }

    public static void main(String[] args) {
        addRandomDataToMap();
    }
}

这段代码很简单,就是使用一个while死循环不停地往HashMap中添加随机数。在执行main方法之前,先设置JVM参数为-Xmx300m -XX:+UseParallelGCJVM堆为300MBGC算法为ParallelGC),然后运行main方法,会遇到java.lang.OutOfMemoryError: GC Overhead Limit Exceeded错误:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.HashMap.newNode(HashMap.java:1747)
	at java.util.HashMap.putVal(HashMap.java:631)
	at java.util.HashMap.put(HashMap.java:612)
	at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.addRandomDataToMap(OutOfMemoryGCLimitExceed.java:13)
	at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.main(OutOfMemoryGCLimitExceed.java:18)

我使用的是MacOS操作系统,2核8G的硬件配置,JDK版本为1.8。由于测试环境差异,如果遇到的是java.lang.OutOfMemoryError: Java heap space错误,可以对-Xmx进行适当地调整来复现java.lang.OutOfMemoryError: GC Overhead Limit Exceeded错误。为了更好地理解不同的垃圾回收算法(Garbage Collection Algorithms),可以参考Oracle的Java Garbage Collection Basics教程。

解决方案

理想的解决方案是通过检查可能存在内存泄漏的代码来发现应用程序所存在的问题,这时需要考虑:

  • 应用程序中哪些对象占据了堆的大部分空间?(What are the objects in the application that occupy large portions of the heap?
  • 这些对象在源码中的哪些部分被使用?(In which parts of the source code are these objects being allocated?

我们还可以使用自动化图形工具,比如JVisualVM、JConsole,它可以帮助检测代码中的性能问题,包括java.lang.OutOfMemoryError

最后一种方法是通过更改JVM启动配置来增加堆大小,或者在JVM启动配置里增加-XX:-UseGCOverheadLimit选项来关闭GC Overhead limit exceeded。例如,以下JVM参数为Java应用程序提供了1GB堆空间:

java -Xmx1024m com.xyz.TheClassName

以下JVM参数不仅为Java应用程序提供了1GB堆空间,也增加-XX:-UseGCOverheadLimit选项来关闭GC Overhead limit exceeded

java -Xmx1024m -XX:-UseGCOverheadLimit com.xyz.TheClassName

但增加-XX:-UseGCOverheadLimit选项的方式治标不治本,JVM最终会抛出java.lang.OutOfMemoryError: Java heap space错误。如用以上OutOfMemoryGCLimitExceed类测试的结果为:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.lang.Integer.toString(Integer.java:403)
	at java.lang.String.valueOf(String.java:3099)
	at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.addRandomDataToMap(OutOfMemoryGCLimitExceed.java:16)
	at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.main(OutOfMemoryGCLimitExceed.java:21)

总之,如果实际的应用程序代码中存在内存泄漏,那么以上列举的方法并不能解决问题,相反,我们将推迟这个错误。因此,更明智的做法是彻底重新评估应用程序的内存使用情况。

线上事故解决过程及总结

异常日志

我们线上最近就发生了这个问题,以下是java.lang.OutOfMemoryError: GC overhead limit exceeded发生时的异常日志:

2019-04-03 10:48:21,253 [http-nio-8080-exec-121] ERROR c.u.n.s.c.c.GlobalExceptionHandler 44 - 服务器端异常!
org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded
  at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1006)
  at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:925)
  at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:974)
  at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:877)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
  at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:851)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
  at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
  at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
  at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
  at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
  at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
  at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
  at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
  at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
  at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
  at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
  at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1468)
  at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
  at org.apache.phoenix.schema.types.PDataType.isBytesComparableWith(PDataType.java:92)
  at org.apache.phoenix.schema.types.PDataType.coerceBytes(PDataType.java:832)
  at org.apache.phoenix.schema.types.PDataType.coerceBytes(PDataType.java:822)
  at org.apache.phoenix.compile.UpsertCompiler$4.execute(UpsertCompiler.java:1011)
  at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:355)
  at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:338)
  at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
  at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:336)
  at org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeUpdate(PhoenixPreparedStatement.java:199)
  at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.PhoenixHBaseAccessor.commitMetrics(PhoenixHBaseAccessor.java:188)
  at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.PhoenixHBaseAccessor.commitMetricsFromCache(PhoenixHBaseAccessor.java:138)
  at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.PhoenixHBaseAccessor.insertMetricRecordsWithMetadata(PhoenixHBaseAccessor.java:348)
  at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.HBaseTimelineMetricsService.putMetrics(HBaseTimelineMetricsService.java:299)
  at com.ucar.nosql.spacex.yarn.modules.monitor.controller.SinkConsumerController.postMetrics(SinkConsumerController.java:71)
  at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
  at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
  at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
  at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
  at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)
  at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
  at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:991)
  at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:925)
  at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:974)
  at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:877)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
  at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:851)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)

问题排查

Linux下发生OOM,不一定是因为Java服务耗内存,也可能是因为其他程序申请了很多内存,此时所有应用所需要的内存超过物理内存,然后Java服务很耗内存且被Linux操作系统找到,就会被 kill,这是Linux为避免物理内存过载导致系统崩溃而采取的内存保护机制,这种机制称为OOM Killer,具体原因参考文末参考资料中的Linux 下的 OOM Killer部分。之前工作中遇到过ElasticSearch数据存储服务和Fluentd日志采集服务部署在同一台服务器上,Fluentd内存泄漏导致的ElasticSearch服务被kill的情况。后来是reviewFluentd的代码,解决了内存泄露问题,并将其与ElasticSearch服务分开部署解决。经过确认,我们这个服务是单独部署的,因此我们将视线转到JVM内存配置上。这个应用访问量不大,线上服务器内存为4G,我们先用JDK自带的命令工具查看了JVM配置:

#查找spacex.jar的进程号
sudo jps
#查看jvm参数,pid为spacex.jar的进程号
sudo jinfo -flags pid

JVM配置如下:

Attaching to process ID 16022, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.121-b13
Non-default VM flags: -XX:CICompilerCount=2 -XX:InitialHeapSize=62914560 -XX:+ManagementServer -XX:MaxHeapSize=1006632960 -XX:MaxNewSize=335544320 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=20971520 -XX:OldSize=41943040 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
Command line:  -Dcom.sun.management.jmxremote -Dcom.sun.management.snmp.port=8044 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8045 -Djava.rmi.server.hostname=10.212.27.54

其中-XX:MaxHeapSize=1006632960KB,也就是960MB,检查了启动脚本发现JAVA_OPTS没有加到java启动命令里面,JAVA_OPTS我们设置的堆内存为-Xms2048m -Xmx2048m,但是该变量并没有被java启动命令引用,因此导致应用启动后使用的是默认的内存配置,启动脚本修改前后对比如下,注意nohup java一行的差异:

(1)修改前:

JAVA_OPTS="-server -Xms2048m -Xmx2048m -Xmn512m -Xss256k -XX:PermSize=256m -XX:MaxPermSize=256m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPACEXLOG}/"
cd /usr/local/frame/spacex
nohup java -Dcom.sun.management.jmxremote -Dcom.sun.management.snmp.port=8044 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8045 -Djava.rmi.server.hostname=$LOCAL_IP  -jar ${SPACEXJAR} > ${SPACEXLOG}/stdout.log 2>&1 &

(2)修改后:

JAVA_OPTS="-server -Xms2048m -Xmx2048m -Xmn512m -Xss256k -XX:PermSize=256m -XX:MaxPermSize=256m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPACEXLOG}/"
cd /usr/local/frame/spacex
nohup java ${JAVA_OPTS} -Dcom.sun.management.jmxremote -Dcom.sun.management.snmp.port=8044 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8045 -Djava.rmi.server.hostname=$LOCAL_IP  -jar ${SPACEXJAR} > ${SPACEXLOG}/stdout.log 2>&1 &

事后措施及总结

1、修改脚本完成,并在生产环境启动服务后,需用jpsjinfo命令查看JVM参数是否生效

2、启动服务的脚本,一定要进行完整、严谨的测试,仔细确认每一行命令及相关变量都生效

参考资料

OOM

1、文中1-3节编译自OutOfMemoryError: GC Overhead Limit Exceeded文章

2、Oracle官方总结的OOM异常及处理方法:Understand the OutOfMemoryError Exception

Linux下的OOM Killer

1、理解和配置 Linux 下的 OOM Killer

2、LinuxMM: OOM_Killer

你可能感兴趣的:(OutOfMemoryError: GC Overhead Limit Exceeded错误解析)