排查高并发下线程池假死的情况

排查高并发下线程池假死的情况

问题描述

项目中使用到线程池,该线程池主要任务就是通过HttpClient发送http请求。在网易云音乐环境会偶发性的出现这个问题,在严选环境先更严重些。(严选环境并发压力更大一点)。只能通过哨兵一直监控线程池大小,超过阈值时重启服务。这样做会丢失大量需要持久化的SQL查询。在严选上尤为严重,三天两头需要重启。

初步怀疑

  • 线程死锁
  • HttpClient未设置连接超时

进一步怀疑论证

首先是查看日志文件

发现日志文件没有报错,线程池任务数逐渐增大,呈现一种假死的现象

再阅读项目代码,分析可能会出现假死的情况,排查死锁情况

通过阅读代码排除死锁和未设置连接超时的情况

通过jstack查看堆栈信息

通过哨兵监控线程池大小,超过阈值时,先导出jstack文件,再重启服务

"receive-thread-pool-79962" #275147 prio=5 os_prio=0 tid=0x0000000033134000 nid=0x60e28 in Object.wait() [0x00007fc3e63d8000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:509)
- locked <0x00000000b95165b0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:394)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:152)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at com.netease.impala.util.HttpUtil.httpRequest(HttpUtil.java:55)
at com.netease.impala.util.HttpUtil.getRequest(HttpUtil.java:38)
at com.netease.impala.service.ImpalaService.getThriftInfo(ImpalaService.java:96)
at com.netease.impala.service.ImpalaService.getDetailInfo(ImpalaService.java:64)
at com.netease.impala.service.RecordService.getQueryInfo(RecordService.java:294)
at com.netease.impala.service.RecordService.access$000(RecordService.java:27)
at com.netease.impala.service.RecordService$1.run(RecordService.java:60)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

发现线程都堵塞在MultiThreadedHttpConnectionManager.doGetConnection方法

再阅读doGetConnection方法的源码发现在未将连接池

在未设置从连接池中获取链接的超时,doGetConnection会陷入循环。由于使用了MultiThreadedHttpConnectionManager管理连接池后,除了需要设置连接超时和socket超时,还需要设置获取连接超时,至此解决了项目假死日志不报错的原因,也就是获取不到连接却没有超时。回去再去阅读项目代码发现未手动将连接归还至连接池

手动归还的代码如下:

  method.releaseConnection();
private HttpConnection doGetConnection(HostConfiguration hostConfiguration, 
    long timeout) throws ConnectionPoolTimeoutException {

    HttpConnection connection = null;

    int maxHostConnections = this.params.getMaxConnectionsPerHost(hostConfiguration);
    int maxTotalConnections = this.params.getMaxTotalConnections();
    
    synchronized (connectionPool) {

        // we clone the hostConfiguration
        // so that it cannot be changed once the connection has been retrieved
        hostConfiguration = new HostConfiguration(hostConfiguration);
        HostConnectionPool hostPool = connectionPool.getHostPool(hostConfiguration, true);
        WaitingThread waitingThread = null;

        boolean useTimeout = (timeout > 0);
        long timeToWait = timeout;
        long startWait = 0;
        long endWait = 0;

        while (connection == null) {

            if (shutdown) {
                throw new IllegalStateException("Connection factory has been shutdown.");
            }
            
            // happen to have a free connection with the right specs
            //
            if (hostPool.freeConnections.size() > 0) {
                connection = connectionPool.getFreeConnection(hostConfiguration);

            // have room to make more
            //
            } else if ((hostPool.numConnections < maxHostConnections) 
                && (connectionPool.numConnections < maxTotalConnections)) {

                connection = connectionPool.createConnection(hostConfiguration);

            // have room to add host connection, and there is at least one free
            // connection that can be liberated to make overall room
            //
            } else if ((hostPool.numConnections < maxHostConnections) 
                && (connectionPool.freeConnections.size() > 0)) {

                connectionPool.deleteLeastUsedConnection();
                connection = connectionPool.createConnection(hostConfiguration);

            // otherwise, we have to wait for one of the above conditions to
            // become true
            //
            } else {
                // TODO: keep track of which hostConfigurations have waiting
                // threads, so they avoid being sacrificed before necessary

                try {
                    
                    if (useTimeout && timeToWait <= 0) {
                        throw new ConnectionPoolTimeoutException("Timeout waiting for connection");
                    }
                    
                    if (LOG.isDebugEnabled()) {
                        LOG.debug("Unable to get a connection, waiting..., hostConfig=" + hostConfiguration);
                    }
                    
                    if (waitingThread == null) {
                        waitingThread = new WaitingThread();
                        waitingThread.hostConnectionPool = hostPool;
                        waitingThread.thread = Thread.currentThread();
                    } else {
                        waitingThread.interruptedByConnectionPool = false;
                    }
                                
                    if (useTimeout) {
                        startWait = System.currentTimeMillis();
                    }
                    
                    hostPool.waitingThreads.addLast(waitingThread);
                    connectionPool.waitingThreads.addLast(waitingThread);
                    connectionPool.wait(timeToWait);
                } catch (InterruptedException e) {
                    if (!waitingThread.interruptedByConnectionPool) {
                        LOG.debug("Interrupted while waiting for connection", e);
                        throw new IllegalThreadStateException(
                            "Interrupted while waiting in MultiThreadedHttpConnectionManager");
                    }
                    // Else, do nothing, we were interrupted by the connection pool
                    // and should now have a connection waiting for us, continue
                    // in the loop and let's get it.
                } finally {
                    if (!waitingThread.interruptedByConnectionPool) {
                        // Either we timed out, experienced a "spurious wakeup", or were
                        // interrupted by an external thread.  Regardless we need to 
                        // cleanup for ourselves in the wait queue.
                        hostPool.waitingThreads.remove(waitingThread);
                        connectionPool.waitingThreads.remove(waitingThread);
                    }
                    
                    if (useTimeout) {
                        endWait = System.currentTimeMillis();
                        timeToWait -= (endWait - startWait);
                    }
                }
            }
        }
    }
    return connection;
}

阅读连接归还逻辑发现只有在,Response被正常消费时才能被自动归还。具体逻辑如下。

private InputStream readResponseBody(HttpConnection conn)
    throws HttpException, IOException {

    LOG.trace("enter HttpMethodBase.readResponseBody(HttpConnection)");

    responseBody = null;
    InputStream is = conn.getResponseInputStream();
    if (Wire.CONTENT_WIRE.enabled()) {
        is = new WireLogInputStream(is, Wire.CONTENT_WIRE);
    }
    boolean canHaveBody = canResponseHaveBody(statusLine.getStatusCode());
    InputStream result = null;
    Header transferEncodingHeader = responseHeaders.getFirstHeader("Transfer-Encoding");
    // We use Transfer-Encoding if present and ignore Content-Length.
    // RFC2616, 4.4 item number 3
    if (transferEncodingHeader != null) {

        String transferEncoding = transferEncodingHeader.getValue();
        if (!"chunked".equalsIgnoreCase(transferEncoding) 
            && !"identity".equalsIgnoreCase(transferEncoding)) {
            if (LOG.isWarnEnabled()) {
                LOG.warn("Unsupported transfer encoding: " + transferEncoding);
            }
        }
        HeaderElement[] encodings = transferEncodingHeader.getElements();
        // The chunked encoding must be the last one applied
        // RFC2616, 14.41
        int len = encodings.length;            
        if ((len > 0) && ("chunked".equalsIgnoreCase(encodings[len - 1].getName()))) { 
            // if response body is empty
            if (conn.isResponseAvailable(conn.getParams().getSoTimeout())) {
                result = new ChunkedInputStream(is, this);
            } else {
                if (getParams().isParameterTrue(HttpMethodParams.STRICT_TRANSFER_ENCODING)) {
                    throw new ProtocolException("Chunk-encoded body declared but not sent");
                } else {
                    LOG.warn("Chunk-encoded body missing");
                }
            }
        } else {
            LOG.info("Response content is not chunk-encoded");
            // The connection must be terminated by closing 
            // the socket as per RFC 2616, 3.6
            setConnectionCloseForced(true);
            result = is;  
        }
    } else {
        long expectedLength = getResponseContentLength();
        if (expectedLength == -1) {
            if (canHaveBody && this.effectiveVersion.greaterEquals(HttpVersion.HTTP_1_1)) {
                Header connectionHeader = responseHeaders.getFirstHeader("Connection");
                String connectionDirective = null;
                if (connectionHeader != null) {
                    connectionDirective = connectionHeader.getValue();
                }
                if (!"close".equalsIgnoreCase(connectionDirective)) {
                    LOG.info("Response content length is not known");
                    setConnectionCloseForced(true);
                }
            }
            result = is;            
        } else {
            result = new ContentLengthInputStream(is, expectedLength);
        }
    } 

    // See if the response is supposed to have a response body
    if (!canHaveBody) {
        result = null;
    }
    // if there is a result - ALWAYS wrap it in an observer which will
    // close the underlying stream as soon as it is consumed, and notify
    // the watcher that the stream has been consumed.
    if (result != null) {

        result = new AutoCloseInputStream(
            result,
            new ResponseConsumedWatcher() {
                public void responseConsumed() {
                    responseBodyConsumed();
                }
            }
        );
    }

    return result;
}

网上解答

再排查问题的时候也发现网上有类似的问题,建议调大xHostConnections and maxTotalConnections。

Your threads are waiting on synchronized (connectionPool) monitor in
MultiThreadedHttpConnectionManager.doGetConnection which isn't responsible for interruption. According to the documentation of getConnectionWithTimeout increasing number of maxHostConnections and maxTotalConnections can help. It's also possible to specify timeout value in http.connection-manager.timeout which is 0 by default so threads are waiting for connection indefinitely.

总结

此次排查根本性的解决了项目中老大难的问题,为管理服务器稳定部署在严选环境提供了保障。对jstack,线程池,连接池,HTTP协议有一定的了解。

你可能感兴趣的:(javaspring)