CSE JAVA SDK 一个请求超时问题的定位

某业务出现一个奇怪的消息,一个接口的调用始终超时, 错误消息为:

The timeout period of 60000ms has been exceeded while executing POST /rest/csbmessagecenterservice/v1/config/email/test-email-config for host 172.16.1.186, server is rest://172.16.1.186:8443?sslEnabled=true

 

详细日志中有如下内容,表示配置了重试,重试也是失败。用户使用postman直接请求,调用不会超时,而使用测试工具系统请求,则会超时。 一时不知道为什么。

 

[2019-01-22 06:16:25:638] [ERROR] - [transport-vert.x-eventloop-thread-18] [org.apache.servicecomb.transport.rest.client.http.RestClientInvocation.lambda$invoke$0(RestClientInvocation.java:104)] - Failed to send request to /172.16.1.186:8443.

io.vertx.core.http.impl.HttpClientRequestBase$1: The timeout period of 60000ms has been exceeded while executing POST /rest/csbmessagecenterservice/v1/config/email/test-email-config for host 172.16.1.186

[2019-01-22 06:16:25:638] [ERROR] - [transport-vert.x-eventloop-thread-18] [org.apache.servicecomb.loadbalance.LoadbalanceHandler$4.lambda$null$0(LoadbalanceHandler.java:369)] - service CONSUMER rest CSBMessageCenterService.ApiMessageConfigResource.testEmailConfig, call error, msg is cause:InvocationException,message:InvocationException: code=490;msg=CommonExceptionData [message=Cse Internal Bad Request];cause:,message:The timeout period of 60000ms has been exceeded while executing POST /rest/csbmessagecenterservice/v1/config/email/test-email-config for host 172.16.1.186, server is rest://172.16.1.186:8443?sslEnabled=true

[2019-01-22 06:16:25:639] [ERROR] - [transport-vert.x-eventloop-thread-18] [org.apache.servicecomb.loadbalance.LoadbalanceHandler$3.onExceptionWithServer(LoadbalanceHandler.java:294)] - Invoke server failed. Operation CONSUMER rest CSBMessageCenterService.ApiMessageConfigResource.testEmailConfig; server rest://172.16.1.186:8443?sslEnabled=true; 0-0 msg cause:InvocationException,message:InvocationException: code=490;msg=CommonExceptionData [message=Cse Internal Bad Request];cause:,message:The timeout period of 60000ms has been exceeded while executing POST /rest/csbmessagecenterservice/v1/config/email/test-email-config for host 172.16.1.186

[2019-01-22 06:16:25:639] [ERROR] - [transport-vert.x-eventloop-thread-18] [org.apache.servicecomb.loadbalance.LoadbalanceHandler$3.onExecutionFailed(LoadbalanceHandler.java:322)] - Invoke all server failed. Operation CONSUMER rest CSBMessageCenterService.ApiMessageConfigResource.testEmailConfig, e=cause:InvocationException,message:InvocationException: code=490;msg=CommonExceptionData [message=Cse Internal Bad Request];cause:,message:The timeout period of 60000ms has been exceeded while executing POST /rest/csbmessagecenterservice/v1/config/email/test-email-config for host 172.16.1.186

 

超时问题一般通过日志看不出根本原因。 建议业务先做了如下排查:

  1. 查看服务端的access log,看是否有接受到请求;弄清楚问题出现的环节。 

  2. 查看下打印超时日志的的服务的access log, 将调用目标服务的其他接口的访问情况观察看,看是一个接口超时还是所有接口都超时。

通过日志看,很多接口都有超时,但也并不是每次都超时,只是有些请求超时。 一时找不到原因。 无赖只好自己把所有日志都拉出来,重新排查了一遍,最后发现业务开发者在排查第1步的都是搞漏了。 业务接口在某种输入情况下,处理时间超过8分钟,而access log是在业务处理完毕后打印的,包日志看漏了。 

 

contract:

172.16.3.8 - - - - [22/Jan/2019:03:26:16 +0000] "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" 400 72 8

172.16.3.8 - - - - [22/Jan/2019:03:33:07 +0000] "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" 500 62 481105

172.16.3.8 - - - - [22/Jan/2019:03:34:48 +0000] "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" 500 62 481116

172.16.3.8 - - - - [22/Jan/2019:03:40:17 +0000] "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" 400 72 8

172.16.3.8 - - - - [22/Jan/2019:03:41:44 +0000] "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" 400 72 7

172.16.3.8 - - - - [22/Jan/2019:03:42:42 +0000] "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" 400 72 8

 

edge:

172.16.3.1 - 2019-01-22 03:25:06,271 "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" - 490 38 60003 809168370546  -- 业务22/Jan/2019:03:33:07返回,超过8分钟

172.16.3.1 - 2019-01-22 03:26:16,297 "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" - 400 61 13 5c468d580e692bee

172.16.3.1 - 2019-01-22 03:26:47,616 "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" - 490 38 60002 180593684930 -- 业务22/Jan/2019:03:34:48返回,超过8分钟

172.16.3.1 - 2019-01-22 03:40:17,559 "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" - 400 61 12 5c4690a18978dcfe

172.16.3.1 - 2019-01-22 03:41:44,823 "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" - 400 61 11 180593684930

172.16.3.1 - 2019-01-22 03:42:42,757 "PUT /rest/csb/csbcontractservice/v1/business-change HTTP/1.1" - 400 61 12 180593684930

 

为了让超时问题更好的得到定位,建议开发者每个服务都打开access log,便于分析。 对于性能问题,可能还需要打开metrics日志,分析瓶颈点。

你可能感兴趣的:(技术剖析)