使用coherence超时的一次诊断

     系统架构上用coherence来缓存用户session和一些数据字典的信息。用户在登陆的时候报超时,应用系统日志如下:

<Error> <HTTP> <BEA-101020> <[ServletContext@754103289[app:EAR module:web path:/web spec-version:2.5]] Servlet failed with Exception

com.tangosol.net.RequestTimeoutException: Request timed out

         at com.tangosol.coherence.component.net.extend.message.Request$Status.waitForResponse(Request.CDB:47)

         at com.tangosol.coherence.component.net.extend.Channel.request(Channel.CDB:20)

         at com.tangosol.coherence.component.net.extend.Channel.request(Channel.CDB:1)

         at com.tangosol.coherence.component.net.extend.RemoteNamedCache$BinaryCache.get(RemoteNamedCache.CDB:11)

         at com.oracle.common.collections.ConverterCollections$ConverterMap.get(ConverterCollections.java:1528)

         Truncated. see log file for complete stacktrace

      第一次报下列的错:

 2015-11-11 15:46:17.800/8049446.887 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Proxy:TcpProxyService:TcpAcceptorWorker:297, member=1): Extend*TCP has marked TcpConnection(Id=0x00000150EC0D36640A9616024BE7D1DC90D0CEC2CB1F75E5C2435B40DEB6088E, Open=true, Member(Id=0, Timestamp=2015-11-09 19:43:58.321, Address=10.10.15.15:0, MachineId=0, Location=site:,machine:MANAGER03,process:21081, Role=WeblogicServer), LocalAddress=10.10.15.2:8012, RemoteAddress=10.10.15.15:44272) as suspect: The connection has fallen 44 messages (10009759 bytes) behind; the threshold is 10000 messages or 10000000 bytes.
2015-11-11 15:46:30.171/8049459.258 Oracle Coherence GE 12.1.2.0.0 <Info> (thread=Proxy:TcpProxyService:TcpAcceptorWorker:281, member=1): Extend*TCP has determined that TcpConnection(Id=0x00000150EC0D36640A9616024BE7D1DC90D0CEC2CB1F75E5C2435B40DEB6088E, Open=true, Member(Id=0, Timestamp=2015-11-09 19:43:58.321, Address=10.10.15.15:0, MachineId=0, Location=site:,machine:MANAGER03,process:21081, Role=WeblogicServer), LocalAddress=10.10.15.2:8012, RemoteAddress=10.10.15.15:44272) is no longer a suspect: The connection has reduced its backlog to 120787 bytes; the target was 2000000 bytes.

     Oracle的人说需要升级打补丁
 "Proxy Node Creates Many More Threads Than the Number of Concurrent Requests" 
This bug is fixed in the current patch release of your coherence version: 12.1.2.04 - release notes below. 
http://docs.oracle.com/middleware/1212/coherence/COHRX/technotes.htm#CHDICFBC 
Would recommend that you plan to upgrade to this patch release. 
More info. on the bug can found by searching for this bug number on support.oracle.com. 
Bug 20091524  - Proxy node creates many more threads than the number of concurrent requests 

20275167 Fixed an issue which could cause the proxy service thread count to grow unnecessarily large

     第二次报错,明明只有100个请求,但是512个链接都满了:
     ProxyService thread pool size has reached its maximum of 512 threads.
     解决方法是:

     1.在proxy的启动文件中配置:-Dtangosol.coherence.proxy.threads.decrease.interval=5000

     2.<tcp-delay-enabled>true</tcp-delay-enabled>改为false


Poor Response Time for Requests Sent By an Extend Client via a Coherence Proxy Server Can Be Caused If tcp-delay is Enabled (文档 ID 1670150.1)

Two problems are observed: 
1. Slowness detected in accessing the backend through a proxy server using Coherence Extend Client. 
2. Large amount of idle threads on proxy server, eventually showing the following warning on log: 

2014-04-29 17:51:22,195 [Logger@9265725 12.1.2.0.0] WARN Coherence - 2014-04-29 17:51:22.195/3876.113 Oracle Coherence GE 12.1.2.0.0 (thread=Proxy:TcpProxyService:TcpAcceptor, member=3): ProxyService thread pool size has reached its maximum of N threads.
Where N is the maximum number of threads set by: -Dtangosol.coherence.proxy.threads.max=N
This will cause too many threads on idle state and slow backend response.
The proxy's TCP acceptor has the following configuration:
   <proxy-scheme>
     <scheme-name>example-proxy</scheme-name>
     <service-name>TcpProxyService</service-name>
     <acceptor-config>
       <tcp-acceptor>
         <local-address>
           <address system-property="tangosol.coherence.extend.address">localhost</address>
           <port system-property="tangosol.coherence.extend.port">9099</port>
         </local-address>
         <tcp-delay-enabled>true</tcp-delay-enabled>
       </tcp-acceptor>
     </acceptor-config>
     <autostart system-property="tangosol.coherence.extend.enabled">false</autostart>
   </proxy-scheme>
 
CAUSE
The proxy's TCP acceptor contained a property that will changes the behaviour of the TCP/IP layer, which can lead to increased latency.  In addition, the amount of threads that are increasing and then becoming idle are expected in Coherence and needs to be controlled through a java option, to make its recovery more aggressive.
 
The tcp-delay-enabled config property is explained in  Coherence documentation
<tcp-delay-enabled>
Optional
Indicates whether TCP delay (Nagle's algorithm) is enabled on a TCP/IP socket. Valid values are true and false. TCP delay is disabled by default.
SOLUTION
On Proxy's cache config make sure that either tcp-delay-enabled is removed or set to false: 
   <proxy-scheme>
     <scheme-name>example-proxy</scheme-name>
     <service-name>TcpProxyService</service-name>
     <acceptor-config>
       <tcp-acceptor>
         <local-address>
           <address system-property="tangosol.coherence.extend.address">localhost</address>
           <port system-property="tangosol.coherence.extend.port">9099</port>
         </local-address>
         <tcp-delay-enabled>false</tcp-delay-enabled>
       </tcp-acceptor>
     </acceptor-config>
     <autostart system-property="tangosol.coherence.extend.enabled">false</autostart>
   </proxy-scheme>
  
On proxy server startup script, set the following:

-Dtangosol.coherence.proxy.threads.decrease.interval=X
where X is in milliseconds, which should be set to a low value, for example 15 seconds which makes X=15000.

coherence:在配置缓存有多个节点时,它总是会选择第一个节点,所以,不同的应用节点配置连缓存的节点需要不同。需要调整为tcp的方式,不然应用连缓存会有不成功的问题。

你可能感兴趣的:(使用coherence超时的一次诊断)