java.lang.OutOfMemoryError: Java heap space的思考

背景

excel文件内容较多时,使用poi非流式api处理,很容易造成OOM;OOM后系统日志发现大量报错

ERROR org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.IllegalStateException: Connection pool shut down
	at org.apache.http.util.Asserts.check(Asserts.java:34)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
	at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:70)
	at com.ecwid.consul.transport.AbstractHttpTransport.makePutRequest(AbstractHttpTransport.java:49)
	at com.ecwid.consul.v1.ConsulRawClient.makePutRequest(ConsulRawClient.java:163)
	at com.ecwid.consul.v1.agent.AgentConsulClient.agentCheckPass(AgentConsulClient.java:206)
	at com.ecwid.consul.v1.ConsulClient.agentCheckPass(ConsulClient.java:270)
	at com.ecwid.consul.v1.ConsulClient$$FastClassBySpringCGLIB$$2fe581c7.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:779)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692)
	at com.ecwid.consul.v1.ConsulClient$$EnhancerBySpringCGLIB$$2cbeb92d.agentCheckPass(<generated>)
	at com.easemob.kce.registry.ConsulClientAspect.aroundAgentCheckPass(ConsulClientAspect.java:30)
	at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at 
	......

之前处理一个公司内部框架报错的问题时,框架开发者发现有OOM日志,然后就以JVM发生OOM后,系统状态不确定为由解释过去了,当时感觉好像说的通,就没有去细究了。
今天遇到相似的问题,仔细思考了一下,感觉他的结论站不住脚。

如果如他所说,OOM后系统状态不可预知,那说明JVM发现内存不足时,会随意的回收内存,即使该内存仍然被引用;或者内存不足后,jvm内部错误处理机制不健全,导致代码跑飞了

  1. 如果jvm随意回收还在使用的内存,导致程序处于无法预知的状态,此时程序已经没有运行下去的必要了,既然如此jvm应该直接停止运行,而不是抛出java.lang.OutOfMemoryError异常。
  2. 如果jvm内部错误处理机制不健全,代码跑飞了,那应用层不应该能精确的接收到java.lang.OutOfMemoryError这个异常。

正常的逻辑应该是,底层去分配内存,发现内存不足,然后终止分配,并向上抛异常。并且会保证当前的内存快照是完整的,jvm不会随意回收正在使用的内存,然后由应用层决定异常后如何处理。

佐证

测试一,异常代码前的内存是否完整

测试代码如下

    private static void test() throws FileNotFoundException {
        FileInputStream inputStream = new FileInputStream(excel);
        List<String> a = new ArrayList<>();
        for(int i = 0;i<2000000;i++) {
            a.add(String.valueOf(new Random().nextLong()));
        }

        try (Workbook workbook = WorkbookFactory.create(inputStream);
                ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
        } catch (Throwable e) {
            e.printStackTrace();
        }

    }

测试结果
java.lang.OutOfMemoryError: Java heap space的思考_第1张图片

从结果来看inputStream 、 a 这两个变量的内存都没有被损害。可见发生oom时,jvm并不会胡乱回收内存在释放空间。

测试二,发生异常后异常代码中分配的内存是否会被回收

测试代码如下:

    private static void test() throws FileNotFoundException {
        FileInputStream inputStream = new FileInputStream(excel);
        try (Workbook workbook = WorkbookFactory.create(inputStream);
                ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
  1. try之前使用jmap -heap查询一下堆的使用情况
Heap Usage:
PS Young Generation
Eden Space:
   capacity = 67108864 (64.0MB)
   used     = 14830352 (14.143325805664062MB)
   free     = 52278512 (49.85667419433594MB)
   22.098946571350098% used
From Space:
   capacity = 11010048 (10.5MB)
   used     = 0 (0.0MB)
   free     = 11010048 (10.5MB)
   0.0% used
To Space:
   capacity = 11010048 (10.5MB)
   used     = 0 (0.0MB)
   free     = 11010048 (10.5MB)
   0.0% used
PS Old Generation
   capacity = 179306496 (171.0MB)
   used     = 0 (0.0MB)
   free     = 179306496 (171.0MB)
   0.0% used

2936 interned Strings occupying 234272 bytes.
  1. catch异常时在查询一次
PS Young Generation
Eden Space:
   capacity = 67108864 (64.0MB)
   used     = 1342216 (1.2800369262695312MB)
   free     = 65766648 (62.71996307373047MB)
   2.0000576972961426% used
From Space:
   capacity = 11010048 (10.5MB)
   used     = 0 (0.0MB)
   free     = 11010048 (10.5MB)
   0.0% used
To Space:
   capacity = 11010048 (10.5MB)
   used     = 0 (0.0MB)
   free     = 11010048 (10.5MB)
   0.0% used
PS Old Generation
   capacity = 179306496 (171.0MB)
   used     = 95233752 (90.82198333740234MB)
   free     = 84072744 (80.17801666259766MB)
   53.11227095754523% used

3034 interned Strings occupying 246616 bytes.

两次对比是发现如下
java中变量最小作用域并不是方法级,最而是代码块,出了代码块,对象就可被回收。前后两次堆对比,发现内存使用差不多,可见中间发生了一次内存回收,不然发生oom后堆使用率应该是接近100%。

对于第一章节报错的解释

OOM时,除了问题代码所在的线程会有内存分配问题,其他线程中的代码如果在OOM前夕有执行内存分配也会遇到同样的问题,因此可能是开源组件在遇到错误后,自己主动对资源进行了清理。
顺着日志信息可以找到相关线索:

  1. org.apache.http.impl.conn.PoolingHttpClientConnectionManager#requestConnection
    public ConnectionRequest requestConnection(
            final HttpRoute route,
            final Object state) {
        Args.notNull(route, "HTTP route");
        if (this.log.isDebugEnabled()) {
            this.log.debug("Connection request: " + format(route, state) + formatStats(route));
        }
        Asserts.check(!this.isShutDown.get(), "Connection pool shut down");
        final Future<CPoolEntry> future = this.pool.lease(route, state, null);
        return new ConnectionRequest() {
  1. 查看修改isShutDown变量的地方
    java.lang.OutOfMemoryError: Java heap space的思考_第2张图片
  2. 排查所有应用的地方发现以下两个方法最有可能,因为这两个地方都是在发生Error后,关闭connManager,与OOM很切合
    org.apache.http.impl.execchain.MainClientExec#execute
    @Override
    public CloseableHttpResponse execute(
            final HttpRoute route,
            final HttpRequestWrapper request,
            final HttpClientContext context,
            final HttpExecutionAware execAware) throws IOException, HttpException {
        Args.notNull(route, "HTTP route");
        Args.notNull(request, "HTTP request");
        Args.notNull(context, "HTTP context");
        # 中间省略无关代码
        } catch (final Error error) {
            connManager.shutdown();
            throw error;
        }
    }

org.apache.http.impl.execchain.MinimalClientExec#execute

    public CloseableHttpResponse execute(
            final HttpRoute route,
            final HttpRequestWrapper request,
            final HttpClientContext context,
            final HttpExecutionAware execAware) throws IOException, HttpException {
        Args.notNull(route, "HTTP route");
        Args.notNull(request, "HTTP request");
        Args.notNull(context, "HTTP context");
        # 中间省略无关代码
        } catch (final Error error) {
            connManager.shutdown();
            throw error;
        }
    }
  1. 在顺着日志中的堆栈,看到底是执行的那个函数
    at org.apache.http.util.Asserts.check(Asserts.java:34)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
    可以看到执行的是MainClientExec.execute(MainClientExec.java:176)

结合第3点,可以猜想线程执行MainClientExec.execute的过程中,问题线程(使用POI解析excel的线程)不断的消耗内存,导致该线程内存不足,也抛出了java.lang.OutOfMemoryError,然后org.apache.http.impl.execchain.MinimalClientExec#execute方法catch主了Error然后shutdown了connManager,导致后续定时任务不断的爆Connection pool shut down。

上述猜想在日志中,也可以得到印证(定时任务线程中也抛出了java.lang.OutOfMemoryError):

2022-03-09 11:22:09,363 ERROR org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.OutOfMemoryError: GC overhead limit exceeded
2022-03-09 11:22:09,363 ERROR org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.IllegalStateException: Connection pool shut down

结论

java.lang.OutOfMemoryError本身不会导致jvm处于未知状态,应用层的各种框架,在遇到java.lang.OutOfMemoryError时有各自的处理策略,有的会忽略该错误;有的会认为遇到该错误没必要在进行恢复,因此会去清理资源(比如上述报错的httpClient组件)。

你可能感兴趣的:(java,java,开发语言)