面试:你懂什么是分布式系统吗?Redis分布式锁都不会?>>>
简单地说,Garbage Collection (GC)
就是JVM
回收不再使用的对象,释放内存的过程。GC Overhead Limit Exceeded error
是java.lang.OutOfMemoryError
家族的一员,表示JVM
内存被耗尽。接下来看看引起java.lang.OutOfMemoryError: GC Overhead Limit Exceeded
错误的原因是什么,以及如何解决这个错误。
GC Overhead Limit Exceeded Error简介
OutOfMemoryError
是java.lang.VirtualMachineError
的子类,当JVM
资源利用出现问题时抛出,更具体地说,这个错误是由于JVM
花费太长时间执行GC
且只能回收很少的堆内存时抛出的。根据Oracle
官方文档,默认情况下,如果Java
进程花费98%
以上的时间执行GC
,并且每次只有不到2%
的堆被恢复,则JVM
抛出此错误。换句话说,这意味着我们的应用程序几乎耗尽了所有可用内存,垃圾收集器花了太长时间试图清理它,并多次失败。
在这种情况下,用户会体验到应用程序响应非常缓慢,通常只需要几毫秒就能完成的某些操作,此时则需要更长的时间来完成,这是因为所有的CPU
正在进行垃圾收集,因此无法执行其他任务。
错误复现
以下代码可以复现java.lang.OutOfMemoryError: GC Overhead Limit Exceeded
错误,代码如下:
package com.galaxy.concurrency.jvm;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class OutOfMemoryGCLimitExceed {
public static void addRandomDataToMap() {
Map dataMap = new HashMap<>();
Random r = new Random();
while (true) {
dataMap.put(r.nextInt(), String.valueOf(r.nextInt()));
}
}
public static void main(String[] args) {
addRandomDataToMap();
}
}
这段代码很简单,就是使用一个while
死循环不停地往HashMap
中添加随机数。在执行main
方法之前,先设置JVM
参数为-Xmx300m -XX:+UseParallelGC
(JVM
堆为300MB
,GC
算法为ParallelGC
),然后运行main
方法,会遇到java.lang.OutOfMemoryError: GC Overhead Limit Exceeded
错误:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1747)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.put(HashMap.java:612)
at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.addRandomDataToMap(OutOfMemoryGCLimitExceed.java:13)
at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.main(OutOfMemoryGCLimitExceed.java:18)
我使用的是MacOS
操作系统,2核8G
的硬件配置,JDK
版本为1.8
。由于测试环境差异,如果遇到的是java.lang.OutOfMemoryError: Java heap space
错误,可以对-Xmx
进行适当地调整来复现java.lang.OutOfMemoryError: GC Overhead Limit Exceeded
错误。为了更好地理解不同的垃圾回收算法(Garbage Collection Algorithms
),可以参考Oracle
的Java Garbage Collection Basics教程。
解决方案
理想的解决方案是通过检查可能存在内存泄漏的代码来发现应用程序所存在的问题,这时需要考虑:
- 应用程序中哪些对象占据了堆的大部分空间?(
What are the objects in the application that occupy large portions of the heap?
) - 这些对象在源码中的哪些部分被使用?(
In which parts of the source code are these objects being allocated?
)
我们还可以使用自动化图形工具,比如JVisualVM、JConsole,它可以帮助检测代码中的性能问题,包括java.lang.OutOfMemoryError
。
最后一种方法是通过更改JVM
启动配置来增加堆大小,或者在JVM
启动配置里增加-XX:-UseGCOverheadLimit
选项来关闭GC Overhead limit exceeded
。例如,以下JVM
参数为Java
应用程序提供了1GB
堆空间:
java -Xmx1024m com.xyz.TheClassName
以下JVM
参数不仅为Java
应用程序提供了1GB
堆空间,也增加-XX:-UseGCOverheadLimit
选项来关闭GC Overhead limit exceeded
:
java -Xmx1024m -XX:-UseGCOverheadLimit com.xyz.TheClassName
但增加-XX:-UseGCOverheadLimit
选项的方式治标不治本,JVM
最终会抛出java.lang.OutOfMemoryError: Java heap space
错误。如用以上OutOfMemoryGCLimitExceed
类测试的结果为:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.toString(Integer.java:403)
at java.lang.String.valueOf(String.java:3099)
at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.addRandomDataToMap(OutOfMemoryGCLimitExceed.java:16)
at com.galaxy.concurrency.jvm.OutOfMemoryGCLimitExceed.main(OutOfMemoryGCLimitExceed.java:21)
总之,如果实际的应用程序代码中存在内存泄漏,那么以上列举的方法并不能解决问题,相反,我们将推迟这个错误。因此,更明智的做法是彻底重新评估应用程序的内存使用情况。
线上事故解决过程及总结
异常日志
我们线上最近就发生了这个问题,以下是java.lang.OutOfMemoryError: GC overhead limit exceeded
发生时的异常日志:
2019-04-03 10:48:21,253 [http-nio-8080-exec-121] ERROR c.u.n.s.c.c.GlobalExceptionHandler 44 - 服务器端异常!
org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1006)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:925)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:974)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:877)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:851)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1468)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.phoenix.schema.types.PDataType.isBytesComparableWith(PDataType.java:92)
at org.apache.phoenix.schema.types.PDataType.coerceBytes(PDataType.java:832)
at org.apache.phoenix.schema.types.PDataType.coerceBytes(PDataType.java:822)
at org.apache.phoenix.compile.UpsertCompiler$4.execute(UpsertCompiler.java:1011)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:355)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:338)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:336)
at org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeUpdate(PhoenixPreparedStatement.java:199)
at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.PhoenixHBaseAccessor.commitMetrics(PhoenixHBaseAccessor.java:188)
at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.PhoenixHBaseAccessor.commitMetricsFromCache(PhoenixHBaseAccessor.java:138)
at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.PhoenixHBaseAccessor.insertMetricRecordsWithMetadata(PhoenixHBaseAccessor.java:348)
at com.ucar.nosql.spacex.yarn.plugins.metric.timeline.HBaseTimelineMetricsService.putMetrics(HBaseTimelineMetricsService.java:299)
at com.ucar.nosql.spacex.yarn.modules.monitor.controller.SinkConsumerController.postMetrics(SinkConsumerController.java:71)
at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:991)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:925)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:974)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:877)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:851)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
问题排查
Linux
下发生OOM
,不一定是因为Java
服务耗内存,也可能是因为其他程序申请了很多内存,此时所有应用所需要的内存超过物理内存,然后Java
服务很耗内存且被Linux
操作系统找到,就会被 kill
,这是Linux
为避免物理内存过载导致系统崩溃而采取的内存保护机制,这种机制称为OOM Killer
,具体原因参考文末参考资料中的Linux 下的 OOM Killer
部分。之前工作中遇到过ElasticSearch
数据存储服务和Fluentd
日志采集服务部署在同一台服务器上,Fluentd
内存泄漏导致的ElasticSearch
服务被kill
的情况。后来是review
了Fluentd
的代码,解决了内存泄露问题,并将其与ElasticSearch
服务分开部署解决。经过确认,我们这个服务是单独部署的,因此我们将视线转到JVM
内存配置上。这个应用访问量不大,线上服务器内存为4G
,我们先用JDK
自带的命令工具查看了JVM
配置:
#查找spacex.jar的进程号
sudo jps
#查看jvm参数,pid为spacex.jar的进程号
sudo jinfo -flags pid
JVM
配置如下:
Attaching to process ID 16022, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.121-b13
Non-default VM flags: -XX:CICompilerCount=2 -XX:InitialHeapSize=62914560 -XX:+ManagementServer -XX:MaxHeapSize=1006632960 -XX:MaxNewSize=335544320 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=20971520 -XX:OldSize=41943040 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
Command line: -Dcom.sun.management.jmxremote -Dcom.sun.management.snmp.port=8044 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8045 -Djava.rmi.server.hostname=10.212.27.54
其中-XX:MaxHeapSize=1006632960KB
,也就是960MB
,检查了启动脚本发现JAVA_OPTS
没有加到java
启动命令里面,JAVA_OPTS
我们设置的堆内存为-Xms2048m -Xmx2048m
,但是该变量并没有被java
启动命令引用,因此导致应用启动后使用的是默认的内存配置,启动脚本修改前后对比如下,注意nohup java
一行的差异:
(1)修改前:
JAVA_OPTS="-server -Xms2048m -Xmx2048m -Xmn512m -Xss256k -XX:PermSize=256m -XX:MaxPermSize=256m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPACEXLOG}/"
cd /usr/local/frame/spacex
nohup java -Dcom.sun.management.jmxremote -Dcom.sun.management.snmp.port=8044 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8045 -Djava.rmi.server.hostname=$LOCAL_IP -jar ${SPACEXJAR} > ${SPACEXLOG}/stdout.log 2>&1 &
(2)修改后:
JAVA_OPTS="-server -Xms2048m -Xmx2048m -Xmn512m -Xss256k -XX:PermSize=256m -XX:MaxPermSize=256m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPACEXLOG}/"
cd /usr/local/frame/spacex
nohup java ${JAVA_OPTS} -Dcom.sun.management.jmxremote -Dcom.sun.management.snmp.port=8044 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8045 -Djava.rmi.server.hostname=$LOCAL_IP -jar ${SPACEXJAR} > ${SPACEXLOG}/stdout.log 2>&1 &
事后措施及总结
1、修改脚本完成,并在生产环境启动服务后,需用jps
和jinfo
命令查看JVM
参数是否生效
2、启动服务的脚本,一定要进行完整、严谨的测试,仔细确认每一行命令及相关变量都生效
参考资料
OOM
1、文中1-3
节编译自OutOfMemoryError: GC Overhead Limit Exceeded文章
2、Oracle
官方总结的OOM
异常及处理方法:Understand the OutOfMemoryError Exception
Linux下的OOM Killer
1、理解和配置 Linux 下的 OOM Killer
2、LinuxMM: OOM_Killer