springboot下,JedisPool getResource导致大量线程WAITING,服务假死

环境:

springboot版本2.1.4.RELEASE、jedis连接池

服务配置:

设置了tomcat最大线程数为1000:

server:
  port: 9090
  tomcat:
    uri-encoding: utf-8
    max-threads: 1000

jedis连接池配置:

      pool:
        max-active: 300  # 连接池最大连接数(使用负值表示没有限制)
        max-wait: -1  # 连接池最大阻塞等待时间(使用负值表示没有限制)
        max-idle: 100  # 连接池中的最大空闲连接
        min-idle: 20   # 连接池中的最小空闲连接

现象:

1、浏览器访问业务接口链接,报超时,且服务日志没有接口相关日志输出

2、Eureka控制台上能够看到该服务节点,且状态正常

3、打开服务日志查看,每隔5分钟,和Eureka通讯一次

23:47:36.206 INFO  [trap-executor-0] c.n.d.s.r.a.ConfigClusterResolver - Resolving eureka endpoints via configuration
23:52:36.206 INFO  [trap-executor-0] c.n.d.s.r.a.ConfigClusterResolver - Resolving eureka endpoints via configuration
23:57:36.206 INFO  [trap-executor-0] c.n.d.s.r.a.ConfigClusterResolver - Resolving eureka endpoints via configuration

排查:

jps

top -Hp pid

发现 Threads:  1017 total
这么多的线程说明服务不正常,正常来说,线程数也就80多,100以内。

继续排查

jstack pid

发现有大量线程处于WAITING状态:

"http-nio-9084-exec-1059" #18281 daemon prio=5 os_prio=0 tid=0x00007f951cc17000 nid=0x75de waiting on condition [0x00007f947d1cf000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000801d03f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:590)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:425)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:346)
    at redis.clients.util.Pool.getResource(Pool.java:49)
    at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226)
    at redis.clients.jedis.JedisSlotBasedConnectionHandler.getConnectionFromSlot(JedisSlotBasedConnectionHandler.java:70)
    at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:113)
    at redis.clients.jedis.JedisClusterCommand.runBinary(JedisClusterCommand.java:58)
    at redis.clients.jedis.BinaryJedisCluster.hget(BinaryJedisCluster.java:373)
    at org.springframework.data.redis.connection.jedis.JedisClusterHashCommands.hGet(JedisClusterHashCommands.java:95)

而且都在等待锁定同一个地址<0x00000000801d03f8>,统计了一下这样的线程有1000个,和上面设置的tomcat max-threads刚好相等。

猜测可能就是这个原因导致后续请求过来,tomcat由于线程数到达上限而拒绝响应。但是tomcat并没有死掉,因为它还在跟Eureka正常通讯, Eureka控制台也能看到这个节点。

结合上面的堆栈,可以看出JedisPool在getResource的时候被阻塞住了。为什么会这样呢?可能某个时间JedisPool中的线程获取Redis连接超时,而连接池max-wait配置的值是-1,也就是一直阻塞等待下去,导致了线程越积越多,最后超过Tomcat设置的max-threads,而无法响应后续请求。

方案:

给max-wait设置一个正值,当超过这个时间,就会抛出异常:

max-wait: 1000

观察一段时间再说,看看管用不管用!

附JedisPool源码:

@Override
public Jedis getResource() {
  Jedis jedis = super.getResource();
  jedis.setDataSource(this);
  return jedis;
}


public T getResource() {
  try {
    return internalPool.borrowObject();
  } catch (Exception e) {
    throw new JedisConnectionException("Could not get a resource from the pool", e);
  }
}



public T borrowObject(long borrowMaxWaitMillis) throws Exception {
    assertOpen();

    AbandonedConfig ac = this.abandonedConfig;
    if (ac != null && ac.getRemoveAbandonedOnBorrow() &&
            (getNumIdle() < 2) &&
            (getNumActive() > getMaxTotal() - 3) ) {
        removeAbandoned(ac);
    }

    PooledObject p = null;

    // Get local copy of current config so it is consistent for entire
    // method execution
    boolean blockWhenExhausted = getBlockWhenExhausted();

    boolean create;
    long waitTime = 0;

    while (p == null) {
        create = false;
        if (blockWhenExhausted) {
            p = idleObjects.pollFirst();
            if (p == null) {
                create = true;
                p = create();
            }
            if (p == null) {
                if (borrowMaxWaitMillis < 0) {
                    p = idleObjects.takeFirst();
                } else {
                    waitTime = System.currentTimeMillis();
                    p = idleObjects.pollFirst(borrowMaxWaitMillis,
                            TimeUnit.MILLISECONDS);
                    waitTime = System.currentTimeMillis() - waitTime;
                }
            }
            if (p == null) {
                throw new NoSuchElementException(
                        "Timeout waiting for idle object");
            }
            if (!p.allocate()) {
                p = null;
            }
        } else {
            p = idleObjects.pollFirst();
            if (p == null) {
                create = true;
                p = create();
            }
            if (p == null) {
                throw new NoSuchElementException("Pool exhausted");
            }
            if (!p.allocate()) {
                p = null;
            }
        }

        if (p != null) {
            try {
                factory.activateObject(p);
            } catch (Exception e) {
                try {
                    destroy(p);
                } catch (Exception e1) {
                    // Ignore - activation failure is more important
                }
                p = null;
                if (create) {
                    NoSuchElementException nsee = new NoSuchElementException(
                            "Unable to activate object");
                    nsee.initCause(e);
                    throw nsee;
                }
            }
            if (p != null && getTestOnBorrow()) {
                boolean validate = false;
                Throwable validationThrowable = null;
                try {
                    validate = factory.validateObject(p);
                } catch (Throwable t) {
                    PoolUtils.checkRethrow(t);
                    validationThrowable = t;
                }
                if (!validate) {
                    try {
                        destroy(p);
                        destroyedByBorrowValidationCount.incrementAndGet();
                    } catch (Exception e) {
                        // Ignore - validation failure is more important
                    }
                    p = null;
                    if (create) {
                        NoSuchElementException nsee = new NoSuchElementException(
                                "Unable to validate object");
                        nsee.initCause(validationThrowable);
                        throw nsee;
                    }
                }
            }
        }
    }

    updateStatsBorrow(p, waitTime);

    return p.getObject();
}
    /**
     * Unlinks the first element in the queue, waiting until there is an element
     * to unlink if the queue is empty.
     *
     * @return the unlinked element
     * @throws InterruptedException if the current thread is interrupted
     */
    public E takeFirst() throws InterruptedException {
        lock.lock();
        try {
            E x;
            while ( (x = unlinkFirst()) == null) {
                notEmpty.await();
            }
            return x;
        } finally {
            lock.unlock();
        }
    }

参考链接:

https://www.codetd.com/article/4661280

https://blog.csdn.net/u012998254/article/details/78305866

https://www.jianshu.com/p/c4a75ca20abe

你可能感兴趣的:(springboot,java)