整个问题排查过程中基本把Hikari连接池源码看完了,也发现了Hikari内部的一些实现细节,更多细节参见:Hikari 数据库连接池内部源码实现细节
最近发布项目测试环境后,发现后台经常隔28分钟时间就打印mysql 连接已经被关闭警告。(应用程序每28分钟访问一次数据库)
数据库: mysql 5.7
数据库驱动:mysql-connector-java-8.0.23
连接池:HikariCP-3.4.5
SpringBoot 2.4.3
连接池配置如下:
spring.datasource.username=${name}
spring.datasource.password=${pwd}
spring.datasource.url=jdbc:mysql://${ip}:3306/${dbname}?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai
spring.datasource.type=com.zaxxer.hikari.HikariDataSource
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.maximum-pool-size=20
spring.datasource.hikari.connection-test-query=SELECT 1 FROM DUAL
spring.datasource.hikari.validation-timeout=10000
spring.datasource.hikari.auto-commit=true
警告信息如下:
2021-03-19 13:25:58.486 WARN 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase :
slave - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@2e2ce118 (No operations allowed after connection closed.).
Possibly consider using a shorter maxLifetime value.
虽然这只是一个警告,并不影响程序执行,但还是很好奇想要排查一下具体原因。
warn信息的描述,程序通过已经关闭的连接来访问数据数据库,这是不允许的操作,并且警告中提醒减小maxLifetime
的值来解决这个问题。
看到这个警告之后,笔者首先想到的就是mysql 8小时主动断开问题:如果hikari的maxLifetime
的值超过了mysql默认的wait_timeout
设置的时间28800s (8小时),实际上mysql已经主动关闭了连接。但示例中并没有配置hikari的maxLifetime
选项,采用的是默认。查看hikari的源码发现了如下几个默认配置
com.zaxxer.hikari.HikariConfig
源码中的类变量
private static final long CONNECTION_TIMEOUT = SECONDS.toMillis(30); 30s
private static final long VALIDATION_TIMEOUT = SECONDS.toMillis(5); 5s
private static final long IDLE_TIMEOUT = MINUTES.toMillis(10); 20分钟
private static final long MAX_LIFETIME = MINUTES.toMillis(30); 30分钟
private static final int DEFAULT_POOL_SIZE = 10;
hikari中默认连接存活的最大时间是30分钟,小于mysql默认的wait_timeout
28800s (8小时),按道理不会是服务端主动主动关闭连接导致。
那会不会是mysql的wait_timeout
被修改过??? 通过如下语句查询发现并没有被修改。
SHOW VARIABLES LIKE '%timeout%'
当排除这个常规问题后,笔者尝试搜索了有没有其它可能导致这个警告,最终还是有一个小发现:
搜索到的案例是:虽然mysql服务端通过SHOW VARIABLES LIKE '%timeout%'
查出来的时间28800s,但是实际上,/etc/my.cnf
中配置的超时时间更小,导致mysql服务端主动关闭连接。
但是经过和dba确认,也排除了这个情况。
既然以上两种情况都排除了,那只能深入源码,在测试环境远程断点调试。先搜索下警告打印的位置,只有一处在isConnectionAlive
方法中.
源码如下: com.zaxxer.hikari.pool.PoolBase#isConnectionAlive
boolean isConnectionAlive(final Connection connection)
{
try {
try {
setNetworkTimeout(connection, validationTimeout);
final int validationSeconds = (int) Math.max(1000L, validationTimeout) / 1000;
if (isUseJdbc4Validation) {
return connection.isValid(validationSeconds);
}
try (Statement statement = connection.createStatement()) {
if (isNetworkTimeoutSupported != TRUE) {
setQueryTimeout(statement, validationSeconds);
}
statement.execute(config.getConnectionTestQuery());
}
}
finally {
setNetworkTimeout(connection, networkTimeout);
if (isIsolateInternalQueries && !isAutoCommit) {
connection.rollback();
}
}
return true;
}
catch (Exception e) {
lastConnectionFailure.set(e);
logger.warn("{} - Failed to validate connection {} ({}). Possibly consider using a shorter maxLifetime value.",
poolName, connection, e.getMessage());
return false;
}
}
该方法用于检测connection的有效性,catch到异常后,只是简单打印文章开头所提的异常警告信息,并返回false,表示连接失效。然后会继续从连接池中获取有效的连接。
获取连接的代码如下: com.zaxxer.hikari.pool.HikariPool#getConnection(long hardTimeout)
public Connection getConnection(final long hardTimeout) throws SQLException
{
suspendResumeLock.acquire();
final long startTime = currentTime();
try {
long timeout = hardTimeout;
do {
PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
if (poolEntry == null) {
break; // We timed out... break and throw exception
}
final long now = currentTime();
if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > aliveBypassWindowMs && !isConnectionAlive(poolEntry.connection))) {
closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
timeout = hardTimeout - elapsedMillis(startTime);
}
else {
metricsTracker.recordBorrowStats(poolEntry, startTime);
return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
}
} while (timeout > 0L);
metricsTracker.recordBorrowTimeoutStats(startTime);
throw createTimeoutException(startTime);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new SQLException(poolName + " - Interrupted during connection acquisition", e);
}
finally {
suspendResumeLock.release();
}
}
为了打印更多的异常栈信息,笔者修改了isConnectionAlive
方法源码,在catch到异常后,打印整个异常栈信息,最终得到如下日志:
2021-03-19 13:25:58.486 WARN 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase : slave - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@2e2ce118 (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
2021-03-19 13:25:58.489 ERROR 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase : poolBase:
java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:73) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2495) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:563) ~[classes!/:na]
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173) ~[classes!/:na]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186) [HikariCP-3.4.5.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) [HikariCP-3.4.5.jar!/:na]
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) [HikariCP-3.4.5.jar!/:na]
at org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource.getConnection(AbstractRoutingDataSource.java:194) [spring-jdbc-5.3.4.jar!/:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) [spring-jdbc-5.3.4.jar!/:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) [spring-jdbc-5.3.4.jar!/:5.3.4]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) [spring-jdbc-5.3.4.jar!/:5.3.4]
.........................
Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.
at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source) ~[na:na]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_91]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_91]
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:1171) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:573) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2491) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
... 44 common frames omitted
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet successfully received from the server was 1,758,048 milliseconds ago. The last packet sent successfully to the server was 1,758,049 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor31.newInstance(Unknown Source) ~[na:na]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_91]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_91]
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.protocol.a.NativeProtocol.send(NativeProtocol.java:577) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:632) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:948) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.protocol.a.NativeProtocol.sendQueryString(NativeProtocol.java:894) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.NativeSession.execSQL(NativeSession.java:1073) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:722) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:646) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:169) ~[classes!/:na]
... 42 common frames omitted
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_91]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_91]
at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_91]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_91]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_91]
at com.mysql.cj.protocol.a.SimplePacketSender.send(SimplePacketSender.java:55) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.protocol.a.TimeTrackingPacketSender.send(TimeTrackingPacketSender.java:50) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
at com.mysql.cj.protocol.a.NativeProtocol.send(NativeProtocol.java:568) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
... 49 common frames omitted
从最后的Caused by: java.net.SocketException: Broken pipe 似乎只能看出在发送数据时socket已经被断开。
于是想到是不是socket的connectionTime
和 socketTimeout
太短,网络较差导致发送sql请求时超时???
又是一通debug之后,深入到mysql-connector-java客户端源码中,最终发现了如下两段重要的代码,用于初始化mysql的连接。
其中成员变量propertySet
则保存了全部的连接参数以及数据连接配置参数。
com.mysql.cj.NativeSession#connect
源码
com.mysql.cj.protocol.StandardSocketFactory#configureSocket
源码
com.mysql.cj.CoreSession#propertySet
中保存的连接配置属性如下:
result = {Properties@10050} size = 167
"useCompression" -> "false"
"maxRows" -> "-1"
"tcpTrafficClass" -> "0"
"allowNanAndInf" -> "false"
"databaseTerm" -> "CATALOG"
"sendFractionalSeconds" -> "true"
"loadBalanceBlocklistTimeout" -> "0"
"tcpKeepAlive" -> "true"
"allowReplicaDownConnections" -> "false"
"maxQuerySizeToLog" -> "2048"
"verifyServerCertificate" -> "false"
"cacheServerConfiguration" -> "false"
"dontCheckOnDuplicateKeyUpdateInSQL" -> "false"
"paranoid" -> "false"
"allowSourceDownConnections" -> "false"
"xdevapi.ssl-truststore-type" -> "JKS"
"fallbackToSystemTrustStore" -> "true"
"enableQueryTimeouts" -> "true"
"blobSendChunkSize" -> "1048576"
"useNanosForElapsedTime" -> "false"
"readOnlyPropagatesToServer" -> "true"
"profilerEventHandler" -> "com.mysql.cj.log.LoggingProfilerEventHandler"
"secondsBeforeRetrySource" -> "30"
"parseInfoCacheFactory" -> "com.mysql.cj.PerConnectionLRUFactory"
"includeThreadNamesAsStatementComment" -> "false"
"xdevapi.compression" -> "PREFERRED"
"socketTimeout" -> "5000"
"socksProxyPort" -> "1080"
"requireSSL" -> "false"
"largeRowSizeThreshold" -> "2048"
"xdevapi.fallback-to-system-keystore" -> "true"
"allowMultiQueries" -> "false"
"rewriteBatchedStatements" -> "false"
"enablePacketDebug" -> "false"
"allowPublicKeyRetrieval" -> "false"
"noDatetimeStringSync" -> "false"
"includeThreadDumpInDeadlockExceptions" -> "false"
"enableEscapeProcessing" -> "true"
"failOverReadOnly" -> "true"
"readFromSourceWhenNoReplicas" -> "false"
"clientCertificateKeyStoreType" -> "JKS"
"overrideSupportsIntegrityEnhancementFacility" -> "false"
"disconnectOnExpiredPasswords" -> "true"
"clobberStreamingResults" -> "false"
"trustCertificateKeyStoreType" -> "JKS"
"jdbcCompliantTruncation" -> "false"
"useLocalTransactionState" -> "false"
"cacheCallableStmts" -> "false"
"rollbackOnPooledClose" -> "true"
"cacheResultSetMetadata" -> "false"
"loadBalanceValidateConnectionOnSwapServer" -> "false"
"selfDestructOnPingMaxOperations" -> "0"
"elideSetAutoCommits" -> "false"
"loadBalanceHostRemovalGracePeriod" -> "15000"
"serverConfigCacheFactory" -> "com.mysql.cj.util.PerVmServerConfigCacheFactory"
"cacheDefaultTimeZone" -> "true"
"allowLoadLocalInfile" -> "false"
"loadBalanceAutoCommitStatementThreshold" -> "0"
"getProceduresReturnsFunctions" -> "true"
"forceConnectionTimeZoneToSession" -> "false"
"useInformationSchema" -> "false"
"reportMetricsIntervalMillis" -> "30000"
"alwaysSendSetIsolation" -> "true"
"tinyInt1isBit" -> "true"
"xdevapi.ssl-keystore-type" -> "JKS"
"xdevapi.auth" -> "PLAIN"
"processEscapeCodesForPrepStmts" -> "true"
"emptyStringsConvertToZero" -> "true"
"sslMode" -> "DISABLED"
"xdevapi.connect-timeout" -> "10000"
"zeroDateTimeBehavior" -> "EXCEPTION"
"sendFractionalSecondsForTime" -> "true"
"maxReconnects" -> "3"
"autoDeserialize" -> "false"
"callableStmtCacheSize" -> "100"
"prepStmtCacheSqlLimit" -> "256"
"resultSetSizeThreshold" -> "100"
"useSSL" -> "false"
"metadataCacheSize" -> "50"
"useHostsInPrivileges" -> "true"
"tcpRcvBuf" -> "0"
"dontTrackOpenResources" -> "false"
"autoGenerateTestcaseScript" -> "false"
"characterEncoding" -> "utf-8"
"xdevapi.fallback-to-system-truststore" -> "true"
"nullDatabaseMeansCurrent" -> "false"
"useLocalSessionState" -> "false"
"detectCustomCollations" -> "false"
"useOldAliasMetadataBehavior" -> "false"
"autoReconnect" -> "false"
"autoReconnectForPools" -> "false"
"traceProtocol" -> "false"
"maxAllowedPacket" -> "16777216"
"populateInsertRowWithDefaultValues" -> "false"
"useUnbufferedInput" -> "true"
"ignoreNonTxTables" -> "false"
"useAffectedRows" -> "false"
"ha.loadBalanceStrategy" -> "random"
"loadBalancePingTimeout" -> "0"
"compensateOnDuplicateKeyUpdateCounts" -> "false"
最后发现在验证connection有效性时,connectionTime
和 socketTimeout
设置的就是数据源中配置的connection-timeout
和validation-timeout
,并通过ping命令检测发现,网络并不是很慢,所以网络请求超时这个可能也排除。
调试过程中发现mysql客户端和服务端之间Socket通信,心跳是通过TCP协议中默认的tcpKeepAlive
参数来维护心跳的。
通过搜索得知,linux中默认是2小时。
会不会服务器tcp_keepalive_time时间被修改过,但因为网络缓慢,心跳未接收到,导致服务器关闭了连接???
可以通过linux的如下命令查看
cat /proc/sys/net/ipv4/tcp_keepalive_time
在笔者的应用服务器上发现这个时间确实被改成了较短的180s,那么mysql所在的机器,很可能也是被改过。
于是找到dba查看该项的配置。同样是180s,但这个时间也不至于让心跳超时啊!
最终原因: 好在dba询问了一下为什么要查这个参数,然后笔者将警告信息以及排查的过程描述了一下,结果dba那边竟然告诉我有定时任务, 删除4分钟内没有请求的连接,而我的程序访问数据库时间间隔是28分钟一次,所以早就在mysql服务端关闭了(捂脸哭的表情)
那如何让这个告警不打印呢?
既然数据库服务端会在4分钟后删除没有请求的连接,我们可以减小Hikari中maxLifetime
的值小于该值,这样达到maxLifetime
时Hikari连接池会通过软驱逐的方式来主动关闭该连接,下次就不会获取到因超时而关闭的连接,也就不会出现警告了。
该参数就是用于控制连接池中一个connection的最大存活时间。
Hikari内部有一个ScheduledExecutorService
对象houseKeepingExecutorService
,每创建一个连接,就会被封装成一个PoolEntry
对象,然后放在定时任务中,定时时间就是设置的max-lifetime
。只要到达这个时间就会采取软驱逐的方式从池中移除。
除此之外,houseKeepingExecutorService
还用于每隔30s来检查一次池中的空闲连接、最大连接情况,并维护池中连接数的平衡。
何谓软驱逐
private volatile boolean evict;
字段来标记为需要关闭。下次有线程来获取到这个连接时,发现evict=true
则调用异步关闭方法。重新获取池中其它的连接。源码见: com.zaxxer.hikari.pool.HikariPool#createPoolEntry
private PoolEntry createPoolEntry()
{
try {
final PoolEntry poolEntry = newPoolEntry();
final long maxLifetime = config.getMaxLifetime();
if (maxLifetime > 0) {
// variance up to 2.5% of the maxlifetime
final long variance = maxLifetime > 10_000 ? ThreadLocalRandom.current().nextLong( maxLifetime / 40 ) : 0;
final long lifetime = maxLifetime - variance;
poolEntry.setFutureEol(houseKeepingExecutorService.schedule(
() -> {
if (softEvictConnection(poolEntry, "(connection has passed maxLifetime)", false /* not owner */)) {
addBagItem(connectionBag.getWaitingThreadCount());
}
},
lifetime, MILLISECONDS));
}
return poolEntry;
}
catch (ConnectionSetupException e) {
if (poolState == POOL_NORMAL) { // we check POOL_NORMAL to avoid a flood of messages if shutdown() is running concurrently
logger.error("{} - Error thrown while acquiring connection from data source", poolName, e.getCause());
lastConnectionFailure.set(e);
}
}
catch (Exception e) {
if (poolState == POOL_NORMAL) { // we check POOL_NORMAL to avoid a flood of messages if shutdown() is running concurrently
logger.debug("{} - Cannot acquire connection from data source", poolName, e);
}
}
return null;
}
整个问题排查过程中基本把Hikari连接池源码看完了,也发现了Hikari内部的一些实现细节,更多细节参见:Hikari 数据库连接池内部源码实现细节
如果Hikari连接池遇到Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl问题注意从下面三个方向排查。
max-lifetime
(默认30分钟)(对应springboot中的spring.datasource.hikari.max-lifetime
)是否大于了MySQL中wait_timeout
和interactive_timeout
(默认8小时)。/etc/my.cnf
配置文件中wait_timeout
和interactive_timeout
两个超时时间是否和通过sql SHOW VARIABLES LIKE '%timeout%'
查出的结果一致。