最近我组使用的spring cloud gateway 线上偶发返回500,后台查看报错日志信息,发现有一条下面的异常:
reactor.netty.channel.AbortedException: Connection has been closed BEFORE response
网上狂搜,以我以往的解决思路,欲解决问题,首先复现问题,然后再解决问题,这种最放心,若是实在无法复现问题,那就先解决问题,然后再持续观察
参考GitHub 几位大神的经验,自己也在本地进行模拟,不过github里面的模拟方案大都是直接访问reactor-nettty工具包,并且并发量比较大的情况进行重现。跟我组并发性不到100qps还不能相提并论
首先准备我的祸源,就是产生问题的client服务,实际该问题主要是使用spring5后一个webclient(见另一篇博文)非阻塞的web请求客户端引起的,他在大量的并发访问时就会产生该问题,他底层是使用reactor-nettty(待完善)实现。代码如下:
@RestController
@RequestMapping("/requestPost")
@AllArgsConstructor
public class Controller {
private SampleApiClient sampleApiClient;
@GetMapping
public Mono trigger() {
int count = 10000;
List list = new ArrayList();
for (int i = 0; i < count; i ++) {
list.add(i);
}
// simulate concurrent API calls
return Flux.fromStream(list.stream())
.flatMap(id -> sampleApiClient.postSomethingWithExchange())
.collectList()
.map(results -> "success");
}
}
另外一个调用外部的类:
@Component
@AllArgsConstructor
public class SampleApiClient {
private final static String PATH = "/";
private WebClient sampleWebClient;
public Mono postSomethingWithExchange() {
return sampleWebClient.post().uri(PATH, System.currentTimeMillis())
.exchange();
}
}
配置类:
package com.thsnoopy.report.configuration;
import io.netty.channel.ChannelOption;
import io.netty.handler.timeout.ReadTimeoutHandler;
import io.netty.handler.timeout.WriteTimeoutHandler;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.PropertySource;
import org.springframework.http.client.reactive.ClientHttpConnector;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;
import org.springframework.web.reactive.function.client.WebClient;
import org.springframework.web.util.DefaultUriBuilderFactory;
import reactor.netty.http.client.HttpClient;
import static org.springframework.web.reactive.function.client.ExchangeFilterFunctions.basicAuthentication;
@Configuration
public class WebClientConfiguration {
@Qualifier("sampleWebClient")
@Bean
public WebClient sampleWebClient(WebClient.Builder builder,
@Value("${api.sample.endpoint}") String endpoint,
@Value("${api.sample.connectTimeoutInMilliSecond}") int connectTimeoutInMilliSecond,
@Value("${api.sample.readWriteTimeoutInSecond}") int readWriteTimeoutInSecond) {
return builder
.uriBuilderFactory(uriBuilderFactory(endpoint))
.clientConnector(clientHttpConnector(connectTimeoutInMilliSecond, readWriteTimeoutInSecond))
.build();
}
private DefaultUriBuilderFactory uriBuilderFactory(String baseUri) {
DefaultUriBuilderFactory defaultUriBuilderFactory = new DefaultUriBuilderFactory(baseUri);
return defaultUriBuilderFactory;
}
private ClientHttpConnector clientHttpConnector(int connectTimeoutInMilliSecond, int readWriteTimeoutInSecond) {
HttpClient httpClient =
HttpClient.create()
// .keepAlive(false);
.tcpConfiguration(client ->
client.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, connectTimeoutInMilliSecond)
.doOnConnected(conn -> conn
.addHandlerLast(new ReadTimeoutHandler(readWriteTimeoutInSecond))
.addHandlerLast(new WriteTimeoutHandler(readWriteTimeoutInSecond))
)
);
return new ReactorClientHttpConnector(httpClient);
}
}
yml配置
server:
port: 8080
api:
sample:
endpoint: http://www.baidu.com
connectTimeoutInMilliSecond: 20000
readWriteTimeoutInSecond: 100
logging:
level:
reactor:
netty: debug
然后狂点:http://localhost:8080/requestPost 几次,就打印出相关的日志了:
eactor.netty.channel.AbortedException: Connection has been closed BEFORE response
at reactor.netty.http.HttpOperations.then(HttpOperations.java:131)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.springframework.http.client.reactive.AbstractClientHttpRequest.doCommit(AbstractClientHttpRequest.java:133)
at org.springframework.http.client.reactive.ReactorClientHttpRequest.setComplete(ReactorClientHttpRequest.java:114)
at org.springframework.web.reactive.function.BodyInserters.lambda$static$0(BodyInserters.java:62)
at org.springframework.web.reactive.function.client.DefaultClientRequestBuilder$BodyInserterRequest.writeTo(DefaultClientRequestBuilder.java:249)
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$exchange$1(ExchangeFunctions.java:103)
at org.springframework.http.client.reactive.ReactorClientHttpConnector.lambda$connect$2(ReactorClientHttpConnector.java:110)
at reactor.netty.http.client.HttpClientConnect$HttpClientHandler.requestWithBody(HttpClientConnect.java:587)
at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.lambda$onStateChange$0(HttpClientConnect.java:440)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44)
at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:441)
at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:470)
at reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.onStateChange(PooledConnectionProvider.java:512)
at reactor.netty.resources.PooledConnectionProvider$PooledConnection.onStateChange(PooledConnectionProvider.java:451)
at reactor.netty.channel.ChannelOperationsHandler.channelActive(ChannelOperationsHandler.java:62)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:225)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:211)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:204)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelActive(CombinedChannelDuplexHandler.java:414)
at io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:69)
at io.netty.channel.CombinedChannelDuplexHandler.channelActive(CombinedChannelDuplexHandler.java:213)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:225)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:211)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:204)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1396)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:225)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:211)
at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:906)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:311)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:341)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:617)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:534)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:748)
google一下吧,果然是个bug,见一下所述(https://github.com/reactor/reactor-netty/issues/796):
we root caused the issue to two things :
the exception is not very self descriptive, but its a valid one
the root cause of the exception : a connect timeout due to too many concurrent connection open/acquired.
We are going to change the default connection pool for our clients in 0.9, and document how to set those up. What happens right now is there is no limit, so you are opening 80000 connections to a single destination, thats more than ephemeral ports and thats overall a lot of pending events that could be delayed and cause a connect timeout. Usually, connection pools are capped by default, e.g. Golang has 100 (including only "2" by destination), other libraries run 64, 100 or again 2-6 (the RFC recommendation for browsers). We should limit the number of open connections too by default and we benchmarked with ConnectionProvider.fixed("test", 500) without any troubles.
So i'm leaving the issue open until we change that default and document it overall but you should try as a workaround to set your client this way:
HttpClient.create(ConnectionProvider.fixed("test", 500))
如上所示,大概意思是会在reactor-netty 0.9中解决该问题,问题的根源是默认连接池数没有做限制,单个目标要打开80000个连接,就会导致很多等待中的端口和相关事件延迟和超时
并且在最新版本里面加上了如下代码。
那就更新版本吧,修改入戏pom配置
org.springframework.boot
spring-boot-starter-webflux
2.2.2.RELEASE
该版本webflux会使用最新的reactor-netty 0.9.2.RELEASE
然后再次发起测试
接口依然抛出该异常,服务日志依然:reactor.netty.channel.AbortedException: Connection has been closed BEFORE response
为何?
三、spring cloud gateway 测试
既然我组出问题的根源是在spring cloud gateway ,那就先剖析调用代码,顺藤摸瓜,整个网络调用的核心类是在一个
NettyRoutingFilter这里面,先欣赏下代码:
public class NettyRoutingFilter implements GlobalFilter, Ordered {
private static final Log log = LogFactory.getLog(NettyRoutingFilter.class);
private final HttpClient httpClient;
private final ObjectProvider> headersFiltersProvider;
private final HttpClientProperties properties;
private volatile List headersFilters;
public NettyRoutingFilter(HttpClient httpClient, ObjectProvider> headersFiltersProvider, HttpClientProperties properties) {
this.httpClient = httpClient;
this.headersFiltersProvider = headersFiltersProvider;
this.properties = properties;
}
public List getHeadersFilters() {
if (this.headersFilters == null) {
this.headersFilters = (List)this.headersFiltersProvider.getIfAvailable();
}
return this.headersFilters;
}
public int getOrder() {
return 2147483647;
}
public Mono filter(ServerWebExchange exchange, GatewayFilterChain chain) {
URI requestUrl = (URI)exchange.getRequiredAttribute(ServerWebExchangeUtils.GATEWAY_REQUEST_URL_ATTR);
String scheme = requestUrl.getScheme();
if (!ServerWebExchangeUtils.isAlreadyRouted(exchange) && ("http".equals(scheme) || "https".equals(scheme))) {
ServerWebExchangeUtils.setAlreadyRouted(exchange);
ServerHttpRequest request = exchange.getRequest();
HttpMethod method = HttpMethod.valueOf(request.getMethodValue());
String url = requestUrl.toString();
HttpHeaders filtered = HttpHeadersFilter.filterRequest(this.getHeadersFilters(), exchange);
DefaultHttpHeaders httpHeaders = new DefaultHttpHeaders();
filtered.forEach(httpHeaders::set);
String transferEncoding = request.getHeaders().getFirst("Transfer-Encoding");
boolean chunkedTransfer = "chunked".equalsIgnoreCase(transferEncoding);
boolean preserveHost = (Boolean)exchange.getAttributeOrDefault(ServerWebExchangeUtils.PRESERVE_HOST_HEADER_ATTRIBUTE, false);
Flux responseFlux = ((RequestSender)this.httpClient.chunkedTransfer(chunkedTransfer).request(method).uri(url)).send((req, nettyOutbound) -> {
req.headers(httpHeaders);
if (preserveHost) {
String host = request.getHeaders().getFirst("Host");
req.header("Host", host);
}
if (log.isTraceEnabled()) {
nettyOutbound.withConnection((connection) -> {
log.trace("outbound route: " + connection.channel().id().asShortText() + ", inbound: " + exchange.getLogPrefix());
});
}
return nettyOutbound.options(SendOptions::flushOnEach).send(request.getBody().map((dataBuffer) -> {
return ((NettyDataBuffer)dataBuffer).getNativeBuffer();
}));
}).responseConnection((res, connection) -> {
exchange.getAttributes().put(ServerWebExchangeUtils.CLIENT_RESPONSE_ATTR, res);
exchange.getAttributes().put(ServerWebExchangeUtils.CLIENT_RESPONSE_CONN_ATTR, connection);
ServerHttpResponse response = exchange.getResponse();
HttpHeaders headers = new HttpHeaders();
res.responseHeaders().forEach((entry) -> {
headers.add((String)entry.getKey(), (String)entry.getValue());
});
String contentTypeValue = headers.getFirst("Content-Type");
if (StringUtils.hasLength(contentTypeValue)) {
exchange.getAttributes().put("original_response_content_type", contentTypeValue);
}
HttpStatus status = HttpStatus.resolve(res.status().code());
if (status != null) {
response.setStatusCode(status);
} else {
if (!(response instanceof AbstractServerHttpResponse)) {
throw new IllegalStateException("Unable to set status code on response: " + res.status().code() + ", " + response.getClass());
}
((AbstractServerHttpResponse)response).setStatusCodeValue(res.status().code());
}
HttpHeaders filteredResponseHeaders = HttpHeadersFilter.filter(this.getHeadersFilters(), headers, exchange, Type.RESPONSE);
if (!filteredResponseHeaders.containsKey("Transfer-Encoding") && filteredResponseHeaders.containsKey("Content-Length")) {
response.getHeaders().remove("Transfer-Encoding");
}
exchange.getAttributes().put(ServerWebExchangeUtils.CLIENT_RESPONSE_HEADER_NAMES, filteredResponseHeaders.keySet());
response.getHeaders().putAll(filteredResponseHeaders);
return Mono.just(res);
});
if (this.properties.getResponseTimeout() != null) {
responseFlux = responseFlux.timeout(this.properties.getResponseTimeout(), Mono.error(new TimeoutException("Response took longer than timeout: " + this.properties.getResponseTimeout()))).onErrorMap(TimeoutException.class, (th) -> {
return new ResponseStatusException(HttpStatus.GATEWAY_TIMEOUT, th.getMessage(), th);
});
}
return responseFlux.then(chain.filter(exchange));
} else {
return chain.filter(exchange);
}
}
}
这块代码很简单,就是调用reactor-netty的 httpclient 接口,整个是netty 响应式的非阻塞网络方案,参考:https://projectreactor.io/docs/netty/release/reference/index.html#http-client
所以核心代码还是来自于reactor-netty
使用jmeter压测spring cloud gateway 接口
线程配置:
结果如下:
持续测试半个多小时,期间没有出现一笔相关错误
陷入沉思,思索片刻后,决定只有2种方案可做
1、给作者报bug
2、深入阅读reactor-netty,协助作者解决
只能持续关注了
后续:
参考:https://github.com/spring-cloud/spring-cloud-gateway/issues/1148
最近有一则解决方案,产生的根源如下,与上述大同小异:
在 reactor-netty0.9.0版本后 添加了一个连接池的 maxIdleTime参数,如果达到这个时间后,这个channel将会关闭
如果不设置他,netty将不会关闭,一直保持无限制的状态。然后apache的服务将会在2s后关闭这个连接,而导致报:
Connection has been closed BEFORE response
的异常。
该作者提交了代码,在spring cloud gateway HttpClientProperties.java 类中添加了 该时间:
希望在spring cloud gateway正式版 稳定版 更新后
加入连接池、maxIdleTime 再做观察
希望可以解决问题
参考:https://github.com/reactor/reactor-netty/issues/796
https://github.com/reactor/reactor-netty/issues/632