一次Reactor-Netty连接池错误排查

关键字

PooledConnectionProvider

前言

在csimple项目中,使用了reactor-netty技术,由于reactor-netty的异步模式,可以很轻松的跟第三方接口完成异步交互,并且对资源消耗极小,配合Dubbo的异步编程,提高性能。

Jar版本

  • reactor-netty 版本 0.8.8-RELEASE

配置代码

@Bean
public WebClient webClient() throws SSLException {
    SslContext sslContext = SslContextBuilder.forClient()
            .trustManager(InsecureTrustManagerFactory.INSTANCE)
            .build();

    TcpClient tcpClient = TcpClient.create() // 使用默认的配置 ConnectionProvider 
            .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 10000)
            .doOnConnected(connection -> connection.addHandlerLast(new ReadTimeoutHandler(10))
                    .addHandlerLast(new WriteTimeoutHandler(10)));

    HttpClient httpClient = HttpClient.from(tcpClient).secure(t -> t.sslContext(sslContext));
    ClientHttpConnector connector = new ReactorClientHttpConnector(httpClient);
    return WebClient.builder()
            .clientConnector(connector)
            .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
            .build();
}

问题

  • reactor-netty提供连接池的支持(复用channel),在使用连接池的情况下,会与服务器端保持连接,直到下次使用时,会从Pool中获取连接,减少tcp握手消耗,达到提高性能重复利用的效果。
  • 当使用连接池的时候,如果长时间(没有测出具体时间,大约半小时,可能跟服务器端有关)没有使用channel,服务端会主动关闭连接,但是WebClient还是依然保持这连接。
  • 当服务器端断开连接,客户端依然持有连接时,下次请求,从pool中获取到这个连接,继续使用这个连接就会出现以下错误
io.netty.handler.timeout.ReadTimeoutException: null

解决方案

1、可升级Reactor-Netty的版本到 0.9.0-RELEASE 及以上,配置连接的空闲时间。
ConnectionProvider pool = ConnectionProvider.elastic("WebClient Pool", Duration.ofMinutes(1)); // 设置Pool连接的空闲时间为1分钟
TcpClient tcpClient = TcpClient.create(pool);
2、修改不使用连接池的方式,每次请求都新建连接。
TcpClient tcpClient = TcpClient.newConnection(); // 修改TcpClient构建的ConnectorProvider为NewConnectionProvider

扩展阅读

0.9.0版本增加了 PooledRefMetadata 接口表示池的原数据信息,接口提供了一个 idleTime 的方法,用来获取池的空闲时间。方法的注释表明了本次发生的这个问题。

Design notes:
This can be useful to do active idle eviction (eg. some loadbalancers will terminate a TCP connection unilaterally after x minutes).

The evictionPredicate from the PoolConfig can look at this time even in the release phase, because it MUST be reset to 0L before the application of the recycler function and evictionPredicate itself, so it will always look "fresh".

Eviction can happen when an acquire() encounters an available element that is detected as idle.
It could then either:

  • only remove that element and call the allocator
    OR
  • continuously loop until it finds a valid available element, only calling the allocator when it ends up finding no valid element

Another possibility is to use a reaper thread that actively removes idle resources from the available set (but that would need some more synchronization).s

References

1、Spring Webclient Read Timeout after being idle for several minutes

你可能感兴趣的:(一次Reactor-Netty连接池错误排查)