前几天一个机房网络抖动,引发了很多对外请求的超时问题,在发生问题排查日志的时候,发现了这么一个现象,httpclient我们的请求超时时间并没有按照我们的设置报超时异常
我们的大概配置如下:
RequestConfig requestConfig = RequestConfig.custom() .setConnectTimeout(1000) .setSocketTimeout(2000) .setConnectionRequestTimeout(1000) .build();
但实际却发现很多请求超时时间都达到了10几秒甚至有的二十几秒,大大超过了我们的预期时间,决定通过跟踪源码一探究竟:
原来http读取网络数据的时候是其实是使用的BufferedReader类,而我们知道java的io类其实都是对基本输入流的装饰,其底层其实是利用的SocketInputStream来读取数据,一路代码跟踪,我们跟踪到了这个方法
int read(byte b[], int off, int length, int timeout) throws IOException { int n = 0; // EOF already encountered if (eof) { return -1; } // connection reset if (impl.isConnectionReset()) { throw new SocketException("Connection reset"); } // bounds check if (length <= 0 || off < 0 || off + length > b.length) { if (length == 0) { return 0; } throw new ArrayIndexOutOfBoundsException(); } boolean gotReset = false; Object traceContext = IoTrace.socketReadBegin(); // acquire file descriptor and do the read FileDescriptor fd = impl.acquireFD(); try { n = socketRead0(fd, b, off, length, timeout); if (n > 0) { return n; } } catch (ConnectionResetException rstExc) { gotReset = true; } finally { impl.releaseFD(); IoTrace.socketReadEnd(traceContext, impl.address, impl.port, timeout, n > 0 ? n : 0); } /* * We receive a "connection reset" but there may be bytes still * buffered on the socket */ if (gotReset) { traceContext = IoTrace.socketReadBegin(); impl.setConnectionResetPending(); impl.acquireFD(); try { n = socketRead0(fd, b, off, length, timeout); if (n > 0) { return n; } } catch (ConnectionResetException rstExc) { } finally { impl.releaseFD(); IoTrace.socketReadEnd(traceContext, impl.address, impl.port, timeout, n > 0 ? n : 0); } } /* * If we get here we are at EOF, the socket has been closed, * or the connection has been reset. */ if (impl.isClosedOrPending()) { throw new SocketException("Socket closed"); } if (impl.isConnectionResetPending()) { impl.setConnectionReset(); } if (impl.isConnectionReset()) { throw new SocketException("Connection reset"); } eof = true; return -1; }
这个方法的核心其实就是 socketRead0(fd, b, off, length, timeout)这个方法的调用,而这个方法是这样的:
private native int socketRead0(FileDescriptor fd, byte b[], int off, int len, int timeout) throws IOException;
这个是native方法,通过下载openjdk1.8源码,我们在openjdk\jdk\src\solaris\native\java\net的目录下找到了相关实现,在SocketInputStream.c文件里,代码如下:
Java_java_net_SocketInputStream_socketRead0(JNIEnv *env, jobject this, jobject fdObj, jbyteArray data, jint off, jint len, jint timeout) { char BUF[MAX_BUFFER_LEN]; char *bufP; jint fd, nread; if (IS_NULL(fdObj)) { /* shouldn't this be a NullPointerException? -br */ JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException", "Socket closed"); return -1; } else { fd = (*env)->GetIntField(env, fdObj, IO_fd_fdID); /* Bug 4086704 - If the Socket associated with this file descriptor * was closed (sysCloseFD), then the file descriptor is set to -1. */ if (fd == -1) { JNU_ThrowByName(env, "java/net/SocketException", "Socket closed"); return -1; } } /* * If the read is greater than our stack allocated buffer then * we allocate from the heap (up to a limit) */ if (len > MAX_BUFFER_LEN) { if (len > MAX_HEAP_BUFFER_LEN) { len = MAX_HEAP_BUFFER_LEN; } bufP = (char *)malloc((size_t)len); if (bufP == NULL) { bufP = BUF; len = MAX_BUFFER_LEN; } } else { bufP = BUF; } if (timeout) { nread = NET_Timeout(fd, timeout); if (nread <= 0) { if (nread == 0) { JNU_ThrowByName(env, JNU_JAVANETPKG "SocketTimeoutException", "Read timed out"); } else if (nread == JVM_IO_ERR) { if (errno == EBADF) { JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException", "Socket closed"); } else if (errno == ENOMEM) { JNU_ThrowOutOfMemoryError(env, "NET_Timeout native heap allocation failed"); } else { NET_ThrowByNameWithLastError(env, JNU_JAVANETPKG "SocketException", "select/poll failed"); } } else if (nread == JVM_IO_INTR) { JNU_ThrowByName(env, JNU_JAVAIOPKG "InterruptedIOException", "Operation interrupted"); } if (bufP != BUF) { free(bufP); } return -1; } } nread = NET_Read(fd, bufP, len); if (nread <= 0) { if (nread < 0) { switch (errno) { case ECONNRESET: case EPIPE: JNU_ThrowByName(env, "sun/net/ConnectionResetException", "Connection reset"); break; case EBADF: JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException", "Socket closed"); break; case EINTR: JNU_ThrowByName(env, JNU_JAVAIOPKG "InterruptedIOException", "Operation interrupted"); break; default: NET_ThrowByNameWithLastError(env, JNU_JAVANETPKG "SocketException", "Read failed"); } } } else { (*env)->SetByteArrayRegion(env, data, off, nread, (jbyte *)bufP); } if (bufP != BUF) { free(bufP); } return nread; }
通过代码我们可以知道,数据的读取是通过NET_Timeout (fd, timeout)来实现的,我们继续跟踪代码,在linux_close.c文件中,发现了NET_Timeout的实现:
int NET_Timeout(int s, long timeout) { long prevtime = 0, newtime; struct timeval t; fdEntry_t *fdEntry = getFdEntry(s); /* * Check that fd hasn't been closed. */ if (fdEntry == NULL) { errno = EBADF; return -1; } /* * Pick up current time as may need to adjust timeout */ if (timeout > 0) { gettimeofday(&t, NULL); prevtime = t.tv_sec * 1000 + t.tv_usec / 1000; } for(;;) { struct pollfd pfd; int rv; threadEntry_t self; /* * Poll the fd. If interrupted by our wakeup signal * errno will be set to EBADF. */ pfd.fd = s; pfd.events = POLLIN | POLLERR; startOp(fdEntry, &self); rv = poll(&pfd, 1, timeout); endOp(fdEntry, &self); /* * If interrupted then adjust timeout. If timeout * has expired return 0 (indicating timeout expired). */ if (rv < 0 && errno == EINTR) { if (timeout > 0) { gettimeofday(&t, NULL); newtime = t.tv_sec * 1000 + t.tv_usec / 1000; timeout -= newtime - prevtime; if (timeout <= 0) { return 0; } prevtime = newtime; } } else { return rv; } } }
代码中的关键点在 poll(&pfd, 1, timeout);poll是linux中的字符设备驱动中的一个函数,作用是把当前的文件指针挂到设备内部定义的等待
这样就很好理解了,其实这个时间是我两次读取数据之间的最长阻塞时间,如果我在网络抖动的情况下,我每次2秒之内返回一部分数据,这样我就一直不会超时了,为了验证我们的理解写了test,代码如下,一个controller,用来接受http请求:
@org.springframework.stereotype.Controller @RequestMapping("/hello") public class Controller { @RequestMapping("/test") public void tets(HttpServletRequest request ,HttpServletResponse response) throws IOException, InterruptedException { System.out.println("I'm coming"); PrintWriter writer = response.getWriter(); while (true){ writer.print("ha ha ha"); writer.flush(); Thread.sleep(2000); System.out.println("I'm ha ha ha"); } } }
这个代码就是每隔2s发送一条数据,循环发送,模拟网络不好的时候,收到的数据断断续续,再来一个test用来发送请求:
@Test public void tetsHttpClientHttp() throws IOException { RequestConfig requestConfig = RequestConfig.custom() .setConnectTimeout(1000) .setSocketTimeout(3000) .setConnectionRequestTimeout(1000) .build(); CloseableHttpClient httpClient = HttpClientBuilder.create().setDefaultRequestConfig(requestConfig).build(); // 创建Get请求 HttpGet httpGet = new HttpGet("http://127.0.0.1:8080/hello/test"); CloseableHttpResponse response =httpClient.execute(httpGet); HttpEntity responseEntity = response.getEntity(); if (responseEntity != null) { System.out.println("响应内容为:" + EntityUtils.toString(responseEntity)); } }
服务端结果如下:
客户端结果如下:
程序并没有如期抛出异常,和我们预想的一样,而当我们修改socketTimeout为1000时,经验证可以抛出java.net.SocketTimeoutException: Read timed out 异常
为此,为了更准确控制时间,我们需要自己实现超时机制:
ExecutorService executor = Executors.newFixedThreadPool(1); Callablecallable = new Callable () { @Override public String call() throws Exception { RequestConfig requestConfig = RequestConfig.custom() .setConnectTimeout(1000) .setSocketTimeout(3000) .setConnectionRequestTimeout(1000) .build(); CloseableHttpClient httpClient = HttpClientBuilder.create().setDefaultRequestConfig(requestConfig).build(); // 创建Get请求 HttpGet httpGet = new HttpGet("http://127.0.0.1:8080/hello/test"); CloseableHttpResponse response =httpClient.execute(httpGet); HttpEntity responseEntity = response.getEntity(); return EntityUtils.toString(responseEntity); } }; Future future = executor.submit(callable); System.out.print(future.get(5,TimeUnit.SECONDS));
这样就可以避免这种情况,在请求线程超时时抛出 java.util.concurrent.TimeoutException避免长时间占住业务线程影响我们的服务,当然这只是个例子,现实我们可能还要考虑线程数,拒绝策略等情况。