springboot版本2.1.4.RELEASE、jedis连接池
设置了tomcat最大线程数为1000:
server:
port: 9090
tomcat:
uri-encoding: utf-8
max-threads: 1000
jedis连接池配置:
pool:
max-active: 300 # 连接池最大连接数(使用负值表示没有限制)
max-wait: -1 # 连接池最大阻塞等待时间(使用负值表示没有限制)
max-idle: 100 # 连接池中的最大空闲连接
min-idle: 20 # 连接池中的最小空闲连接
1、浏览器访问业务接口链接,报超时,且服务日志没有接口相关日志输出
2、Eureka控制台上能够看到该服务节点,且状态正常
3、打开服务日志查看,每隔5分钟,和Eureka通讯一次
23:47:36.206 INFO [trap-executor-0] c.n.d.s.r.a.ConfigClusterResolver - Resolving eureka endpoints via configuration
23:52:36.206 INFO [trap-executor-0] c.n.d.s.r.a.ConfigClusterResolver - Resolving eureka endpoints via configuration
23:57:36.206 INFO [trap-executor-0] c.n.d.s.r.a.ConfigClusterResolver - Resolving eureka endpoints via configuration
jps
top -Hp pid
发现 Threads: 1017 total
这么多的线程说明服务不正常,正常来说,线程数也就80多,100以内。
继续排查
jstack pid
发现有大量线程处于WAITING状态:
"http-nio-9084-exec-1059" #18281 daemon prio=5 os_prio=0 tid=0x00007f951cc17000 nid=0x75de waiting on condition [0x00007f947d1cf000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000801d03f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:590)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:425)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:346)
at redis.clients.util.Pool.getResource(Pool.java:49)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226)
at redis.clients.jedis.JedisSlotBasedConnectionHandler.getConnectionFromSlot(JedisSlotBasedConnectionHandler.java:70)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:113)
at redis.clients.jedis.JedisClusterCommand.runBinary(JedisClusterCommand.java:58)
at redis.clients.jedis.BinaryJedisCluster.hget(BinaryJedisCluster.java:373)
at org.springframework.data.redis.connection.jedis.JedisClusterHashCommands.hGet(JedisClusterHashCommands.java:95)
而且都在等待锁定同一个地址<0x00000000801d03f8>,统计了一下这样的线程有1000个,和上面设置的tomcat max-threads刚好相等。
猜测可能就是这个原因导致后续请求过来,tomcat由于线程数到达上限而拒绝响应。但是tomcat并没有死掉,因为它还在跟Eureka正常通讯, Eureka控制台也能看到这个节点。
结合上面的堆栈,可以看出JedisPool在getResource的时候被阻塞住了。为什么会这样呢?可能某个时间JedisPool中的线程获取Redis连接超时,而连接池max-wait配置的值是-1,也就是一直阻塞等待下去,导致了线程越积越多,最后超过Tomcat设置的max-threads,而无法响应后续请求。
给max-wait设置一个正值,当超过这个时间,就会抛出异常:
max-wait: 1000
观察一段时间再说,看看管用不管用!
@Override
public Jedis getResource() {
Jedis jedis = super.getResource();
jedis.setDataSource(this);
return jedis;
}
public T getResource() {
try {
return internalPool.borrowObject();
} catch (Exception e) {
throw new JedisConnectionException("Could not get a resource from the pool", e);
}
}
public T borrowObject(long borrowMaxWaitMillis) throws Exception {
assertOpen();
AbandonedConfig ac = this.abandonedConfig;
if (ac != null && ac.getRemoveAbandonedOnBorrow() &&
(getNumIdle() < 2) &&
(getNumActive() > getMaxTotal() - 3) ) {
removeAbandoned(ac);
}
PooledObject p = null;
// Get local copy of current config so it is consistent for entire
// method execution
boolean blockWhenExhausted = getBlockWhenExhausted();
boolean create;
long waitTime = 0;
while (p == null) {
create = false;
if (blockWhenExhausted) {
p = idleObjects.pollFirst();
if (p == null) {
create = true;
p = create();
}
if (p == null) {
if (borrowMaxWaitMillis < 0) {
p = idleObjects.takeFirst();
} else {
waitTime = System.currentTimeMillis();
p = idleObjects.pollFirst(borrowMaxWaitMillis,
TimeUnit.MILLISECONDS);
waitTime = System.currentTimeMillis() - waitTime;
}
}
if (p == null) {
throw new NoSuchElementException(
"Timeout waiting for idle object");
}
if (!p.allocate()) {
p = null;
}
} else {
p = idleObjects.pollFirst();
if (p == null) {
create = true;
p = create();
}
if (p == null) {
throw new NoSuchElementException("Pool exhausted");
}
if (!p.allocate()) {
p = null;
}
}
if (p != null) {
try {
factory.activateObject(p);
} catch (Exception e) {
try {
destroy(p);
} catch (Exception e1) {
// Ignore - activation failure is more important
}
p = null;
if (create) {
NoSuchElementException nsee = new NoSuchElementException(
"Unable to activate object");
nsee.initCause(e);
throw nsee;
}
}
if (p != null && getTestOnBorrow()) {
boolean validate = false;
Throwable validationThrowable = null;
try {
validate = factory.validateObject(p);
} catch (Throwable t) {
PoolUtils.checkRethrow(t);
validationThrowable = t;
}
if (!validate) {
try {
destroy(p);
destroyedByBorrowValidationCount.incrementAndGet();
} catch (Exception e) {
// Ignore - validation failure is more important
}
p = null;
if (create) {
NoSuchElementException nsee = new NoSuchElementException(
"Unable to validate object");
nsee.initCause(validationThrowable);
throw nsee;
}
}
}
}
}
updateStatsBorrow(p, waitTime);
return p.getObject();
}
/**
* Unlinks the first element in the queue, waiting until there is an element
* to unlink if the queue is empty.
*
* @return the unlinked element
* @throws InterruptedException if the current thread is interrupted
*/
public E takeFirst() throws InterruptedException {
lock.lock();
try {
E x;
while ( (x = unlinkFirst()) == null) {
notEmpty.await();
}
return x;
} finally {
lock.unlock();
}
}
参考链接:
https://www.codetd.com/article/4661280
https://blog.csdn.net/u012998254/article/details/78305866
https://www.jianshu.com/p/c4a75ca20abe