apache log的%D到底包含什么时间?
一.现象:
apache + mod_jk + jboss环境, jboss的处理时间和apache log中的处理时间总是相差很大,jboss的时间是用框架记录从request进入到请求结束的时间,对于一个特定的URL,时间基本在 100ms以内,可是apache log出来的%D大都在200ms以上,有的竟然离奇的是几百秒。
二.目的:
找出jboss的业务处理和apache log的处理时间差,如果一个正常的业务逻辑处理只需100ms,因为前置了apache却花费了200ms,说明
架构存在问题,浪费了一倍以上的时间。
三.分析:
最初怀疑是mod_jk与jboss之间通讯造成的,但测试后发现apache不加载jk自己的静态资源有时也莫名其妙地很长时间。
又怀疑是加载的module消耗了时间,但把所有的module都掉,现象还是存在。
详细看文档,就一句话:The time taken to serve the request, in microseconds.说明 %D的时间是“服务端处理时间”。
然后网上搜索相关资料,能找到的一篇文章(不具体说名称)的结论是:不包括请求发送的时间。
那么服务端处理请求的时间应该是所有请求被接受后开即处理到把响应发出去之前的这个时间,可以一个静态资源的读取不可能花
多少秒的。即使不在内存中缓存,也就是从磁盘上读这个文件然后输出。这个时间和log出来的%D相差太远。
解决疑难的万能之钥只有源码。read...................................
找到mod_log_config.c中:
static const char *log_request_duration(request_rec *r, char *a) { apr_time_t duration = apr_time_now() - r->request_time; return apr_psprintf(r->pool, "%" APR_TIME_T_FMT, apr_time_sec(duration)); } static const char *log_request_duration_microseconds(request_rec *r, char *a) { return apr_psprintf(r->pool, "%" APR_TIME_T_FMT, (apr_time_now() - r->request_time)); }
apr_time_now()没有疑问,服务端逻辑处理结束不包括把响应内容发送到客户端的时间。
那么r->request_time是从什么时候算起的呢?
找到protocol.c ,发现:
static int read_request_line(request_rec *r, apr_bucket_brigade *bb) { const char *ll; const char *uri; const char *pro; #if 0 conn_rec *conn = r->connection; #endif int major = 1, minor = 0; /* Assume HTTP/1.0 if non-"HTTP" protocol */ char http[5]; apr_size_t len; int num_blank_lines = 0; int max_blank_lines = r->server->limit_req_fields; if (max_blank_lines <= 0) { max_blank_lines = DEFAULT_LIMIT_REQUEST_FIELDS; } /* Read past empty lines until we get a real request line, * a read error, the connection closes (EOF), or we timeout. * * We skip empty lines because browsers have to tack a CRLF on to the end * of POSTs to support old CERN webservers. But note that we may not * have flushed any previous response completely to the client yet. * We delay the flush as long as possible so that we can improve * performance for clients that are pipelining requests. If a request * is pipelined then we won't block during the (implicit) read() below. * If the requests aren't pipelined, then the client is still waiting * for the final buffer flush from us, and we will block in the implicit * read(). B_SAFEREAD ensures that the BUFF layer flushes if it will * have to block during a read. */ do { apr_status_t rv; /* insure ap_rgetline allocates memory each time thru the loop * if there are empty lines */ r->the_request = NULL; rv = ap_rgetline(&(r->the_request), (apr_size_t)(r->server->limit_req_line + 2), &len, r, 0, bb); if (rv != APR_SUCCESS) { r->request_time = apr_time_now(); /* ap_rgetline returns APR_ENOSPC if it fills up the * buffer before finding the end-of-line. This is only going to * happen if it exceeds the configured limit for a request-line. */ if (rv == APR_ENOSPC) { r->status = HTTP_REQUEST_URI_TOO_LARGE; r->proto_num = HTTP_VERSION(1,0); r->protocol = apr_pstrdup(r->pool, "HTTP/1.0"); } return 0; } } while ((len <= 0) && (++num_blank_lines < max_blank_lines)); /* we've probably got something to do, ignore graceful restart requests */ r->request_time = apr_time_now(); ll = r->the_request; r->method = ap_getword_white(r->pool, &ll); #if 0 /* XXX If we want to keep track of the Method, the protocol module should do * it. That support isn't in the scoreboard yet. Hopefully next week * sometime. rbb */ ap_update_connection_status(AP_CHILD_THREAD_FROM_ID(conn->id), "Method", r->method); #endif uri = ap_getword_white(r->pool, &ll); /* Provide quick information about the request method as soon as known */ r->method_number = ap_method_number_of(r->method); if (r->method_number == M_GET && r->method[0] == 'H') { r->header_only = 1; } ap_parse_uri(r, uri); if (ll[0]) { r->assbackwards = 0; pro = ll; len = strlen(ll); } else { r->assbackwards = 1; pro = "HTTP/0.9"; len = 8; } r->protocol = apr_pstrmemdup(r->pool, pro, len); /* XXX ap_update_connection_status(conn->id, "Protocol", r->protocol); */ /* Avoid sscanf in the common case */ if (len == 8 && pro[0] == 'H' && pro[1] == 'T' && pro[2] == 'T' && pro[3] == 'P' && pro[4] == '/' && apr_isdigit(pro[5]) && pro[6] == '.' && apr_isdigit(pro[7])) { r->proto_num = HTTP_VERSION(pro[5] - '0', pro[7] - '0'); } else if (3 == sscanf(r->protocol, "%4s/%u.%u", http, &major, &minor) && (strcasecmp("http", http) == 0) && (minor < HTTP_VERSION(1, 0)) ) /* don't allow HTTP/0.1000 */ r->proto_num = HTTP_VERSION(major, minor); else r->proto_num = HTTP_VERSION(1, 0); return 1; }
这个时间是从请求的第一行被分析(读取到)后开始计时的,注意,无论这个第一行是否合法,都会给 r->request_time 赋值。
if (rv != APR_SUCCESS) {
r->request_time = apr_time_now();
......
return 0;
}
...........
r->request_time = apr_time_now();
所以%D实际是包含了大多数请求发送的时间,大多数的意思是在第一行被读出前网络传输时间是不计在内的,但是如果第一行被读到,比如已经读了"GET /path HTTP/1.1/r/n"的内容,下面的内容在传输时网络状况很差,那么这些传输时间都会被计算在内。
四.验证:
写一个socket和apache通讯,测试一个正常的连接时间,在本机测试以最大程度减少网络延迟。
Socket sc = new Socket("127.0.0.1", 9000); PrintWriter out = new PrintWriter(new OutputStreamWriter(sc.getOutputStream())); out.print("GET /testweb/ThirdServlet?test=1111 HTTP/1.1/r/n"); out.print("Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*r/rn"); out.print("Accept-Encoding: gzip, deflater/r/n"); out.print("Accept-Language: zh-CN/r/n"); out.print("Connection: Close/r/n"); out.print("Host: 10.16.26.81:9000/r/n"); out.print("User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E; msn OptimizedIE8;ZHCN)/r/n/r/n"); out.flush(); InputStream in = sc.getInputStream(); byte[] buf = new byte[1024]; int len; while ((len = in.read(buf)) != -1) { System.out.print(new String(buf, 0, len)); } System.out.println(); in.close(); out.close(); sc.close(); System.out.println(i);
这段代码循环100次,apache log的%D时间都在340μs左右,jboss的时间在300μs左右。而这段代码在客户端运行的平时时间在550μs左右.注意这个时间都是μs的单位。
修改上面的代码:
Socket sc = new Socket("127.0.0.1", 9000); PrintWriter out = new PrintWriter(new OutputStreamWriter(sc.getOutputStream())); out.print("GET /testweb/Third"); out.flush(); Thread.sleep(100); out.print("Servlet?test=1111 HTTP/1.1/r/n"); out.print("Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*r/rn"); out.print("Accept-Encoding: gzip, deflater/r/n"); out.print("Accept-Language: zh-CN/r/n"); out.print("Connection: Close/r/n"); out.print("Host: 10.16.26.81:9000/r/n"); out.print("User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E; msn OptimizedIE8;ZHCN)/r/n/r/n"); out.flush(); InputStream in = sc.getInputStream(); byte[] buf = new byte[1024]; int len; while ((len = in.read(buf)) != -1) { System.out.print(new String(buf, 0, len)); } System.out.println(); in.close(); out.close(); sc.close(); System.out.println(i);
或者在建立socket以后,将sleep放在out建立后的第一行,模拟在第一个head行被服务端读到前的网络延时,得到的结果和网络第一种正常情况基本一致。除了每次请求的正常波动,平均值几乎没有变化。
再次修改代码:
Socket sc = new Socket("127.0.0.1", 9000); PrintWriter out = new PrintWriter(new OutputStreamWriter(sc.getOutputStream())); out.print("GET /testweb/ThirdServlet?test=1111 HTTP/1.1/r/n"); out.flush(); Thread.sleep(100); out.print("Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*r/rn"); out.print("Accept-Encoding: gzip, deflater/r/n"); out.print("Accept-Language: zh-CN/r/n"); out.print("Connection: Close/r/n"); out.print("Host: 10.16.26.81:9000/r/n"); out.print("User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; Embedded Web Browser from: http://bsalsa.com/; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E; msn OptimizedIE8;ZHCN)/r/n/r/n"); out.flush(); InputStream in = sc.getInputStream(); byte[] buf = new byte[1024]; int len; while ((len = in.read(buf)) != -1) { System.out.print(new String(buf, 0, len)); } System.out.println(); in.close(); out.close(); sc.close(); System.out.println(i);
将sleep放在第一个头域被发送后的任何一行之间,这个时间差明显看了来了。因为上面服务端处理的340和客户端运行的550的时间单位都是μs,而此处sleep的时间是ms,所以反映到服务端立即看出了明显的差值,JBOSS的业务处理时间没有什么变化,但apache log的%D一下子变成了100700μs左右,除了客户端网络传输的100000μs,“另外的”时间也增加了一倍,这就是延迟放大,当网络延迟后,处理请求的线程在等待地网络数据和处理其它逻辑之间的切换等开销随之加大。基于同样的道理客户端的平均运行时间也变成了101200μs左右,除掉sleep的100000μs,因为sleep本身带来的线程功能的消耗了大量的时间。
五.结论:apache log %D的意思是服务端处理时间,但实际包含了除第一个头域之前的网络传输时间之外的所有客户端传输时间。这句话说起来非常拗口,说白了就是从头域的第一行被读取后开始计时,包括请求的后续内容网络传输时间在内的整个服务端处理时间。