excel文件内容较多时,使用poi非流式api处理,很容易造成OOM;OOM后系统日志发现大量报错
ERROR org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
at com.ecwid.consul.transport.AbstractHttpTransport.executeRequest(AbstractHttpTransport.java:70)
at com.ecwid.consul.transport.AbstractHttpTransport.makePutRequest(AbstractHttpTransport.java:49)
at com.ecwid.consul.v1.ConsulRawClient.makePutRequest(ConsulRawClient.java:163)
at com.ecwid.consul.v1.agent.AgentConsulClient.agentCheckPass(AgentConsulClient.java:206)
at com.ecwid.consul.v1.ConsulClient.agentCheckPass(ConsulClient.java:270)
at com.ecwid.consul.v1.ConsulClient$$FastClassBySpringCGLIB$$2fe581c7.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:779)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692)
at com.ecwid.consul.v1.ConsulClient$$EnhancerBySpringCGLIB$$2cbeb92d.agentCheckPass(<generated>)
at com.easemob.kce.registry.ConsulClientAspect.aroundAgentCheckPass(ConsulClientAspect.java:30)
at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
......
之前处理一个公司内部框架报错的问题时,框架开发者发现有OOM日志,然后就以JVM发生OOM后,系统状态不确定为由解释过去了,当时感觉好像说的通,就没有去细究了。
今天遇到相似的问题,仔细思考了一下,感觉他的结论站不住脚。
如果如他所说,OOM后系统状态不可预知,那说明JVM发现内存不足时,会随意的回收内存,即使该内存仍然被引用;或者内存不足后,jvm内部错误处理机制不健全,导致代码跑飞了
正常的逻辑应该是,底层去分配内存,发现内存不足,然后终止分配,并向上抛异常。并且会保证当前的内存快照是完整的,jvm不会随意回收正在使用的内存,然后由应用层决定异常后如何处理。
测试代码如下
private static void test() throws FileNotFoundException {
FileInputStream inputStream = new FileInputStream(excel);
List<String> a = new ArrayList<>();
for(int i = 0;i<2000000;i++) {
a.add(String.valueOf(new Random().nextLong()));
}
try (Workbook workbook = WorkbookFactory.create(inputStream);
ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
} catch (Throwable e) {
e.printStackTrace();
}
}
从结果来看inputStream 、 a 这两个变量的内存都没有被损害。可见发生oom时,jvm并不会胡乱回收内存在释放空间。
测试代码如下:
private static void test() throws FileNotFoundException {
FileInputStream inputStream = new FileInputStream(excel);
try (Workbook workbook = WorkbookFactory.create(inputStream);
ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
} catch (Throwable e) {
e.printStackTrace();
}
}
Heap Usage:
PS Young Generation
Eden Space:
capacity = 67108864 (64.0MB)
used = 14830352 (14.143325805664062MB)
free = 52278512 (49.85667419433594MB)
22.098946571350098% used
From Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
To Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
PS Old Generation
capacity = 179306496 (171.0MB)
used = 0 (0.0MB)
free = 179306496 (171.0MB)
0.0% used
2936 interned Strings occupying 234272 bytes.
PS Young Generation
Eden Space:
capacity = 67108864 (64.0MB)
used = 1342216 (1.2800369262695312MB)
free = 65766648 (62.71996307373047MB)
2.0000576972961426% used
From Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
To Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
PS Old Generation
capacity = 179306496 (171.0MB)
used = 95233752 (90.82198333740234MB)
free = 84072744 (80.17801666259766MB)
53.11227095754523% used
3034 interned Strings occupying 246616 bytes.
两次对比是发现如下
java中变量最小作用域并不是方法级,最而是代码块,出了代码块,对象就可被回收。前后两次堆对比,发现内存使用差不多,可见中间发生了一次内存回收,不然发生oom后堆使用率应该是接近100%。
OOM时,除了问题代码所在的线程会有内存分配问题,其他线程中的代码如果在OOM前夕有执行内存分配也会遇到同样的问题,因此可能是开源组件在遇到错误后,自己主动对资源进行了清理。
顺着日志信息可以找到相关线索:
public ConnectionRequest requestConnection(
final HttpRoute route,
final Object state) {
Args.notNull(route, "HTTP route");
if (this.log.isDebugEnabled()) {
this.log.debug("Connection request: " + format(route, state) + formatStats(route));
}
Asserts.check(!this.isShutDown.get(), "Connection pool shut down");
final Future<CPoolEntry> future = this.pool.lease(route, state, null);
return new ConnectionRequest() {
@Override
public CloseableHttpResponse execute(
final HttpRoute route,
final HttpRequestWrapper request,
final HttpClientContext context,
final HttpExecutionAware execAware) throws IOException, HttpException {
Args.notNull(route, "HTTP route");
Args.notNull(request, "HTTP request");
Args.notNull(context, "HTTP context");
# 中间省略无关代码
} catch (final Error error) {
connManager.shutdown();
throw error;
}
}
org.apache.http.impl.execchain.MinimalClientExec#execute
public CloseableHttpResponse execute(
final HttpRoute route,
final HttpRequestWrapper request,
final HttpClientContext context,
final HttpExecutionAware execAware) throws IOException, HttpException {
Args.notNull(route, "HTTP route");
Args.notNull(request, "HTTP request");
Args.notNull(context, "HTTP context");
# 中间省略无关代码
} catch (final Error error) {
connManager.shutdown();
throw error;
}
}
结合第3点,可以猜想线程执行MainClientExec.execute的过程中,问题线程(使用POI解析excel的线程)不断的消耗内存,导致该线程内存不足,也抛出了java.lang.OutOfMemoryError,然后org.apache.http.impl.execchain.MinimalClientExec#execute方法catch主了Error然后shutdown了connManager,导致后续定时任务不断的爆Connection pool shut down。
上述猜想在日志中,也可以得到印证(定时任务线程中也抛出了java.lang.OutOfMemoryError):
2022-03-09 11:22:09,363 ERROR org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.OutOfMemoryError: GC overhead limit exceeded
2022-03-09 11:22:09,363 ERROR org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task
java.lang.IllegalStateException: Connection pool shut down
java.lang.OutOfMemoryError本身不会导致jvm处于未知状态,应用层的各种框架,在遇到java.lang.OutOfMemoryError时有各自的处理策略,有的会忽略该错误;有的会认为遇到该错误没必要在进行恢复,因此会去清理资源(比如上述报错的httpClient组件)。