APPARENT DEADLOCK!!! - C3P0连接池DeadLock机制分析

1 问题

近期,刚上线不久的生产系统的数据库连接池 C3P0版本为0.9.5.2)突然报出 APPARENT DEADLOCK!!! 错误。

1.1 错误日志

错误日志如下。

com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@7cf60134 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@7cf60134 -- APPARENT DEADLOCK!!! Complete Status: 
    Managed Threads: 3
    Active Threads: 3
    Active Tasks: 
        com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@3ce7f8aa
            on thread: C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#1
        com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@17f3c7a1
            on thread: C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#2
        com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@65dc473c
            on thread: C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#0
    Pending Tasks: 
        com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@744937e8
        com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@77e8b363
        com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@7eafacf0
        com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@79cde1be
        com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@44067f18
        com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@76c7bf50
Pool thread stack traces:
    Thread[C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#0,5,main]
        java.net.SocketInputStream.socketRead0(Native Method)
        java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        java.net.SocketInputStream.read(SocketInputStream.java:171)
        java.net.SocketInputStream.read(SocketInputStream.java:141)
        oracle.net.ns.Packet.receive(Unknown Source)
        oracle.net.ns.DataPacket.receive(Unknown Source)
        oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
        oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
        oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
        oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:213)
        oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:796)
        oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1031)
        oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:836)
        oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1124)
        oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
        oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
        oracle.jdbc.OracleDatabaseMetaData.getTables(OracleDatabaseMetaData.java:2471)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester$1.activeCheckConnectionNoQuery(DefaultConnectionTester.java:84)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester$3.activeCheckConnectionNoQuery(DefaultConnectionTester.java:182)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:275)
        com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(AbstractConnectionTester.java:79)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:504)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:464)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:436)
        com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:2211)
        com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
    Thread[C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#1,5,main]
        java.net.SocketInputStream.socketRead0(Native Method)
        java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        java.net.SocketInputStream.read(SocketInputStream.java:171)
        java.net.SocketInputStream.read(SocketInputStream.java:141)
        oracle.net.ns.Packet.receive(Unknown Source)
        oracle.net.ns.DataPacket.receive(Unknown Source)
        oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
        oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
        oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
        oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:213)
        oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:796)
        oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1031)
        oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:836)
        oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1124)
        oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
        oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
        oracle.jdbc.OracleDatabaseMetaData.getTables(OracleDatabaseMetaData.java:2471)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester$1.activeCheckConnectionNoQuery(DefaultConnectionTester.java:84)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester$3.activeCheckConnectionNoQuery(DefaultConnectionTester.java:182)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:275)
        com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(AbstractConnectionTester.java:79)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:504)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:464)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:436)
        com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:2211)
        com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
    Thread[C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#2,5,main]
        java.net.SocketInputStream.socketRead0(Native Method)
        java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        java.net.SocketInputStream.read(SocketInputStream.java:171)
        java.net.SocketInputStream.read(SocketInputStream.java:141)
        oracle.net.ns.Packet.receive(Unknown Source)
        oracle.net.ns.DataPacket.receive(Unknown Source)
        oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.net.ns.NetInputStream.read(Unknown Source)
        oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
        oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
        oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
        oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:213)
        oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:796)
        oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1031)
        oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:836)
        oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1124)
        oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
        oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
        oracle.jdbc.OracleDatabaseMetaData.getTables(OracleDatabaseMetaData.java:2471)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester$1.activeCheckConnectionNoQuery(DefaultConnectionTester.java:84)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester$3.activeCheckConnectionNoQuery(DefaultConnectionTester.java:182)
        com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:275)
        com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(AbstractConnectionTester.java:79)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:504)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:464)
        com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:436)
        com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:2211)
        com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)

Task com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@65dc473c (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().
Task com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@3ce7f8aa (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().
Task com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@17f3c7a1 (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().

上面的日志报出了 APPARENT DEADLOCK!!! 错误,然后输出了
ThreadPoolAsynchronousRunner 的状态,包括线程数,当前正在处理的任务,队列中等待处理的任务,以及 ThreadPoolAsynchronousRunner 任务线程的堆栈。

1.2 配置

生产上的配置如下。


        
        
        
        
        
        
        
        

基本相当于使用默认配置。

1.3 初步判断

根据日志,C3P0判断其线程池(ThreadPoolAsynchronousRunner)中的任务出现死锁(DEADLOCK),此时线程处于检查空闲数据链接状态是否可用。单从日志字面暂时无法判断问题所在。

查阅了网上的资料,对此问题原因分析较少,绝大部分不甚了了。比较多的文章是尝试通过调整参数来避免问题的出现的帖子。

秉着 “知其所以然” 的态度,再加上一点点的好奇心,我翻阅了一下C3P0的代码,试图从中发现端倪。

2 C3P0的死锁(DeadLock)机制分析

2.1 C3P0类结构

为了后面展示代码的需要,先附上一个我整理的C3P0的类结构图,有助于我们分析问题时确定问题的位置。如果下面的图太小,可以点击这里浏览

c3p0类结构图

我们重点关注右下角的几个类。

2.2 ThreadPoolAsynchronousRunner线程池

C3P0PooledConnectionPoolManager 类在初始化时中创建了一个taskRunner,其类型为 ThreadPoolAsynchronousRunner , 是一个 C3P0 自定实现的线程池。

public final class C3P0PooledConnectionPoolManager{
...
    ThreadPoolAsynchronousRunner taskRunner;
...
    private synchronized void _poolsInit()
    {
    ...
        this.taskRunner = createTaskRunner( 
                num_task_threads, matt, timer, idStr + "-HelperThread" );
    ...
    }
 ...
    private ThreadPoolAsynchronousRunner createTaskRunner( 
                int num_threads, 
                int matt /* maxAdministrativeTaskTime */, 
                Timer timer, 
                String threadLabel )
    {
    ...
        out = new ThreadPoolAsynchronousRunner( 
                      num_threads, true, timer, threadLabel );
        return out;
    ...
    }
...
}

ThreadPoolAsynchronousRunner 线程池被作为成员注入到多个对象之中,最终用于执行 BasicResourcePool 类中定义的 ScatteredAcquireTaskAcquireTaskRemoveTaskDestroyResourceTaskRefurbishCheckinResourceTaskAsyncTestIdleResourceTask 等任务。

我们来看一下 ThreadPoolAsynchronousRunner 的代码。

public final class ThreadPoolAsynchronousRunner implements AsynchronousRunner
{
...
    TimerTask deadlockDetector = new DeadlockDetector();
...
    public ThreadPoolAsynchronousRunner( 
        int num_threads, boolean daemon, 
        Timer sharedTimer, String threadLabel )
    { 
        this( num_threads, 
          daemon, 
          DFLT_MAX_INDIVIDUAL_TASK_TIME, 
          DFLT_DEADLOCK_DETECTOR_INTERVAL, 
          DFLT_INTERRUPT_DELAY_AFTER_APPARENT_DEADLOCK, 
          sharedTimer, 
          false,
          threadLabel ); 
    }
...
    private ThreadPoolAsynchronousRunner( int num_threads, 
                      boolean daemon, 
                      int max_individual_task_time,
                      int deadlock_detector_interval, 
                      int interrupt_delay_after_apparent_deadlock,
                      Timer myTimer,
                      boolean should_cancel_timer,
                      String threadLabel )
    {
        this.num_threads = num_threads;
        this.daemon = daemon;
        this.max_individual_task_time = max_individual_task_time;
        this.deadlock_detector_interval = 
                deadlock_detector_interval;
        this.interrupt_delay_after_apparent_deadlock = 
                interrupt_delay_after_apparent_deadlock;
        this.myTimer = myTimer;
        this.should_cancel_timer = should_cancel_timer;
        this.threadLabel = threadLabel;

        recreateThreadsAndTasks();

        myTimer.schedule( deadlockDetector, 
                deadlock_detector_interval, 
                deadlock_detector_interval );
    }
...

}

可以看到, ThreadPoolAsynchronousRunner 线程池默认创建3个执行线程,并且创建了一个 死锁检测(DeadLockDetector) 的定时任务,默认情况下每10秒执行一次。

2.3 DeadLockDetector内部机制

public final class ThreadPoolAsynchronousRunner implements AsynchronousRunner
{
...
    class DeadlockDetector extends TimerTask
    {
        LinkedList last = null;
        LinkedList current = null;

        public void run()
        {
        ...
            current = (LinkedList) pendingTasks.clone();
        ...
            if ( current.equals( last ) )
            {
                //System.err.println(this + " -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!");
                ...
                recreateThreadsAndTasks();
                run_stray_tasks = true;
            }
                ...
                if (run_stray_tasks)
            {
                AsynchronousRunner ar = new ThreadPerTaskAsynchronousRunner( DFLT_MAX_EMERGENCY_THREADS, max_individual_task_time );
                for ( Iterator ii = current.iterator(); ii.hasNext(); )
                    ar.postRunnable( (Runnable) ii.next() );
                ar.close( false ); //tell the emergency runner to close itself when its tasks are complete
                last = null;
            }
            else
                last = current;

...

可以看到,DeadLockDetector
ThreadPoolAsynchronousRunner 的内部类,是一个默认每10秒执行一次的定时任务。其工作原理是定时检查任务队列(pendingTasks)中的任务,如果前后两次检查发现待执行的任务没有变化,就认为可能产生了死锁,并额外创建线程执行等待任务。

3 问题分析

3.1 线程等待

从代码中可以看出,APPARENT DEADLOCK!!!报错原因主要是ThreadPoolAsynchronousRunner线程池的中的等待任务队列在经过一定时长(默认10秒)前后没有变化。

结合问题日志。我们可以初步判断问题的原因,是由于某些任务(如日志中显示AsyncTestIdleResourceTask)执行过程中失去响应(超过60s没有收到响应),导致后续的任务长时间等待,进而报错。

3.2 具体分析

进而观察执行线程的堆栈,可以发现三个任务都在执行检测连接有效性时,向服务端发送请求数据后,Oracle数据库未返回响应,导致线程长时间等待。

由于此问题无法通过构造场景重现,只能猜测,产生这种原因可能是连接在服务端已经超时,由服务端主动关闭连接,但由于TCP协议断开连接需要完成4次握手,而服务端只完成了前两次握手,这样就导致了客户端可以发送消息但接收不到任何内容,长时间等待直到超时。

3.3 解决方案

根据以上分析,我认为问题是maxIdleTime(连接最大空闲时间)与idleConnectionTestPeriod(空闲检测周期)参数配置与数据库不匹配引起的。

根据DBA的建议,将maxIdleTime调整为30秒,将idleConnectionTestPeriod调整为10秒。

4 后续

调整完后系统稳定运行,至今未再报出同样的错误。

在查看资料的过程中,发现有与我类似的情况,请看这里。问题出在同一块代码,但现象与原因均与我不同。我并不认为这是C3P0的缺陷,而是具体环境上的配置没有相匹配。

我再说一点废话,C3P0中大量使用了synchronized关键字进行加锁,由于单例没有使用double-check,导致正常代码中大量加锁,在高并发下效率很低。因此,如果性能对系统较为重要,还是推荐大家使用HikariCP替换C3P0。

你可能感兴趣的:(APPARENT DEADLOCK!!! - C3P0连接池DeadLock机制分析)