本文节选自[《云和恩墨技术通讯》(10月刊)]

问题表现

WebSphere连接池夯住的主要表现为:WebSphere在运行过程中出现页面操作偶尔长时间没有反应,有时jdbc连接数达到最大值,同时发现数据库监听中的jdbc连接数飙高,运行不稳定。

分析过程

1. 监控连接池是否泄露

为运行server添加连接池诊断跟踪参数ConnLeakLogic=all:

./wsadmin.sh -user wasadmin -password wasadmin
set ds [$AdminControl queryNames"*:name=ds1,process=server1,node=app141Node01,j2eeType=JDBCDataSource,*"]
$AdminControl invoke $ds showPoolContents

查看产生的结果为:

    MCWrapper id 85b085b  Managed connection WSRdbManagedConnectionImpl@3f613f61  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 0 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 67236723  Managed connection WSRdbManagedConnectionImpl@4d134d13  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 1 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 37083708  Managed connection WSRdbManagedConnectionImpl@6b056b05  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 2 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 4dc44dc4  Managed connection WSRdbManagedConnectionImpl@6350635  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 3 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 39d039d0  Managed connection WSRdbManagedConnectionImpl@50875087  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 4 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 41e441e4  Managed connection WSRdbManagedConnectionImpl@78997899  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 5 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 38a938a9  Managed connection WSRdbManagedConnectionImpl@21392139  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 6 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 723b723b  Managed connection WSRdbManagedConnectionImpl@5a7a5a7a  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 7 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 7d6d7d6d  Managed connection WSRdbManagedConnectionImpl@31bb31bb  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 8 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
    MCWrapper id 2c982c98  Managed connection WSRdbManagedConnectionImpl@13ee13ee  State:STATE_ACTIVE_INUSE Thread Id: 00000024 Thread Name: WebContainer : 9 Handle count 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@432e432e;RUNNING;
  Total number of connection in unshared pool: 10

发现连接被10个线程占用,并且全部处于running状态。

2. 此时收集分析对应的javacore:

set jvm [$AdminControl completeObjectName type=JVM,process=server1,*]
$AdminControl invoke $jvm dumpThreads

发现线程都在等待下面的方法:

java.net.Inet4AddressImpl.lookupAllHostAddr

由此可以看到在jdbc申请时执行DNS Lookup时影响了性能。

3. 测试dns性能

nslookup TRFFDB1
....
NAME SERVER:   DNS.XX
ADDRESS:  XX.XX.1.10
looking up FILES
........

Lookup数据库的主机名时不稳定,时快时慢。

4. 查看WebSphere主机的/etc/hosts文件发现未配置db服务器的IP主机名映射。

问题原因

在jdbc申请db连接时,由于WebSphere主机的/etc/hosts文件中没有配置db服务器的 IP主机名映射,再加上本身DNS服务器的性能低下共同导致了这个问题。

故障解决及建议

  1. 每一台WebSphere主机/etc/hosts文件中配置db服务器的IP主机名映射。
  2. 增加连接池的初始值,尽量减少连接数的伸缩,在初始阶段就申请足够的连接数,保证在并发高时连接池已经保留了足够的连接数;同时考虑db层面能够允许的最大连接数。

活动预告:2019数据技术嘉年华于2019年11月15日-16日在北京举办,扫描下方二维码了解详情,欢迎大家携好友一同参会!