ThreadPoolExecutor是Executor执行框架最重要的一个实现类,提供了线程池管理和任务管理是两个最基本的能力。这篇通过分析ThreadPoolExecutor的源码来看看如何设计和实现一个基于生产者消费者模型的执行器。
生产者消费者模型
生产者消费者模型包含三个角色:生产者,工作队列,消费者。对于ThreadPoolExecutor来说,
1. 生产者是任务的提交者,是外部调用ThreadPoolExecutor的线程
2. 工作队列是一个阻塞队列的接口,具体的实现类可以有很多种。BlockingQueue<Runnable> workQueue;
3. 消费者是封装了线程的Worker类的集合。HashSet<Worker> workers = new HashSet<Worker>();
主要属性
明确了ThreadPoolExecutor的基本执行模型之后,来看下它的几个主要属性:
1. private final AtomicInteger ctl = new AtomicInteger(ctlOf(RUNNING, 0)); 一个32位的原子整形作为线程池的状态控制描述符。低29位作为工作者线程的数量。所以工作者线程最多有2^29 -1个。高3位来保持线程池的状态。ThreadPoolExecutor总共有5种状态:
* RUNNING: 可以接受新任务并执行
* SHUTDOWN: 不再接受新任务,但是仍然执行工作队列中的任务
* STOP: 不再接受新任务,不执行工作队列中的任务,并且中断正在执行的任务
* TIDYING: 所有任务被终止,工作线程的数量为0,会去执行terminated()钩子方法
* TERMINATED: terminated()执行结束
下面是一系列ctl这个变量定义和工具方法
private final AtomicInteger ctl = new AtomicInteger(ctlOf(RUNNING, 0)); private static final int COUNT_BITS = Integer.SIZE - 3; private static final int CAPACITY = (1 << COUNT_BITS) - 1; // runState is stored in the high-order bits private static final int RUNNING = -1 << COUNT_BITS; private static final int SHUTDOWN = 0 << COUNT_BITS; private static final int STOP = 1 << COUNT_BITS; private static final int TIDYING = 2 << COUNT_BITS; private static final int TERMINATED = 3 << COUNT_BITS; // Packing and unpacking ctl private static int runStateOf(int c) { return c & ~CAPACITY; } private static int workerCountOf(int c) { return c & CAPACITY; } private static int ctlOf(int rs, int wc) { return rs | wc; } private static boolean runStateLessThan(int c, int s) { return c < s; } private static boolean runStateAtLeast(int c, int s) { return c >= s; } private static boolean isRunning(int c) { return c < SHUTDOWN; } private boolean compareAndIncrementWorkerCount(int expect) { return ctl.compareAndSet(expect, expect + 1); } private boolean compareAndDecrementWorkerCount(int expect) { return ctl.compareAndSet(expect, expect - 1); } private void decrementWorkerCount() { do {} while (! compareAndDecrementWorkerCount(ctl.get())); }
3. private final ReentrantLock mainLock = new ReentrantLock(); 控制ThreadPoolExecutor的全局可重入锁,所有需要同步的操作都要被这个锁保护
4. private final Condition termination = mainLock.newCondition(); mainLock的条件队列,来进行wait()和notify()等条件操作
5. private final HashSet<Worker> workers = new HashSet<Worker>(); 工作线程集合
6. private volatile ThreadFactory threadFactory; 创建线程的工厂,可以自定义线程创建的逻辑
7. private volatile RejectedExecutionHandler handler; 拒绝执行任务的处理器,可以自定义拒绝的策略
8. private volatile long keepAliveTime; 空闲线程的存活时间。可以根据这个存活时间来判断空闲线程是否等待超时,然后采取相应的线程回收操作
9. private volatile boolean allowCoreThreadTimeOut; 是否允许coreThread线程超时回收
10. private volatile int corePoolSize; 可存活的线程的最小值。如果设置了allowCoreThreadTimeOut, 那么corePoolSize的值可以为0。
11. private volatile int maximumPoolSize; 可存活的线程的最大值
工作线程创建和回收策略
ThreadPoolExecutor通过corePoolSize,maximumPoolSize, allowCoreThreadTimeOut,keepAliveTime等几个参数提供一个灵活的工作线程创建和回收的策略。
创建策略:
1. 当工作线程数量小于corePoolSize时,不管其他线程是否空闲,都创建新的工作线程来处理新加入的任务
2. 当工作线程数量大于corePoolSize,小于maximumPoolSize时,只有当工作队列满了,才会创建新的工作线程来处理新加入的任务。当工作队列有空余时,只把新任务加入队列
3. 把corePoolSize和maximumPoolSize 设置成相同的值时,线程池就是一个固定(fixed)工作线程数的线程。
回收策略:
1. keepAliveTime变量设置了空闲工作线程超时的时间,当工作线程数量超过了corePoolSize后,空闲的工作线程等待超过了keepAliveTime后,会被回收。后面会说怎么确定一个工作线程是否“空闲”。
2. 如果设置了allowCoreThreadTimeOut,那么core Thread也可以被回收,即当core thread也空闲时,也可以被回收,直到工作线程集合为0。
工作队列策略
工作队列BlockingQueue<Runnable> workQueue 是用来存放提交的任务的。它有4个基本的策略,并且根据不同的阻塞队列的实现类可以引入更多的工作队列的策略。
4个基本策略:
1. 当工作线程数量小于corePoolSize时,新提交的任务总是会由新创建的工作线程执行,不入队列
2. 当工作线程数量大于corePoolSize,如果工作队列没满,新提交的任务就入队列
3. 当工作线程数量大于corePoolSize,小于MaximumPoolSize时,如果工作队列满了,新提交的任务就交给新创建的工作线程,不入队列
4. 当工作线程数量大于MaximumPoolSize,并且工作队列满了,那么新提交的任务会被拒绝执行。具体看采用何种拒绝策略
根据不同的阻塞队列的实现类,又有几种额外的策略
1. 采用SynchronousQueue直接将任务传递给空闲的线程执行,不额外存储任务。这种方式需要无限制的MaximumPoolSize,可以创建无限制的工作线程来处理提交的任务。这种方式的好处是任务可以很快被执行,适用于任务到达时间大于任务处理时间的情况。缺点是当任务量很大时,会占用大量线程
2. 采用无边界的工作队列LinkedBlockingQueue。这种情况下,由于工作队列永远不会满,那么工作线程的数量最大就是corePoolSize,因为当工作线程数量达到corePoolSize时,只有工作队列满的时候才会创建新的工作线程。这种方式好处是使用的线程数量是稳定的,当内存足够大时,可以处理足够多的请求。缺点是如果任务直接有依赖,很有可能形成死锁,因为当工作线程被消耗完时,不会创建新的工作现场,只会把任务加入工作队列。并且可能由于内存耗尽引发内存溢出OOM
3. 采用有界的工作队列AraayBlockingQueue。这种情况下对于内存资源是可控的,但是需要合理调节MaximumPoolSize和工作队列的长度,这两个值是相互影响的。当工作队列长度比较小的时,必定会创建更多的线程。而更多的线程会引起上下文切换等额外的消耗。当工作队列大,MaximumPoolSize小的时候,会影响吞吐量,并且会触发拒绝机制。
拒绝执行策略
当Executor处于shutdown状态或者工作线程超过MaximumPoolSize并且工作队列满了之后,新提交的任务将会被拒绝执行。RejectedExecutionHandler接口定义了拒绝执行的策略。具体的策略有
CallerRunsPolicy:由调用者线程来执行被拒绝的任务,属于同步执行
AbortPolicy:中止执行,抛出RejectedExecutionException异常
DiscardPolicy:丢弃任务
DiscardOldestPolicy:丢弃最老的任务
public static class CallerRunsPolicy implements RejectedExecutionHandler { /** * Creates a {@code CallerRunsPolicy}. */ public CallerRunsPolicy() { } /** * Executes task r in the caller's thread, unless the executor * has been shut down, in which case the task is discarded. * * @param r the runnable task requested to be executed * @param e the executor attempting to execute this task */ public void rejectedExecution(Runnable r, ThreadPoolExecutor e) { if (!e.isShutdown()) { r.run(); } } } /** * A handler for rejected tasks that throws a * {@code RejectedExecutionException}. */ public static class AbortPolicy implements RejectedExecutionHandler { /** * Creates an {@code AbortPolicy}. */ public AbortPolicy() { } /** * Always throws RejectedExecutionException. * * @param r the runnable task requested to be executed * @param e the executor attempting to execute this task * @throws RejectedExecutionException always. */ public void rejectedExecution(Runnable r, ThreadPoolExecutor e) { throw new RejectedExecutionException("Task " + r.toString() + " rejected from " + e.toString()); } } /** * A handler for rejected tasks that silently discards the * rejected task. */ public static class DiscardPolicy implements RejectedExecutionHandler { /** * Creates a {@code DiscardPolicy}. */ public DiscardPolicy() { } /** * Does nothing, which has the effect of discarding task r. * * @param r the runnable task requested to be executed * @param e the executor attempting to execute this task */ public void rejectedExecution(Runnable r, ThreadPoolExecutor e) { } } /** * A handler for rejected tasks that discards the oldest unhandled * request and then retries {@code execute}, unless the executor * is shut down, in which case the task is discarded. */ public static class DiscardOldestPolicy implements RejectedExecutionHandler { /** * Creates a {@code DiscardOldestPolicy} for the given executor. */ public DiscardOldestPolicy() { } /** * Obtains and ignores the next task that the executor * would otherwise execute, if one is immediately available, * and then retries execution of task r, unless the executor * is shut down, in which case task r is instead discarded. * * @param r the runnable task requested to be executed * @param e the executor attempting to execute this task */ public void rejectedExecution(Runnable r, ThreadPoolExecutor e) { if (!e.isShutdown()) { e.getQueue().poll(); e.execute(r); } } }
工作线程Worker的设计
工作线程没有直接使用Thread,而是采用了Worker类封装了Thread,目的是更好地进行中断控制。Worker直接继承了AbstractQueuedSynchronizer来进行同步操作,它实现了一个不可重入的互斥结构。当它的state属性为0时表示unlock,state为1时表示lock。任务执行时必须在lock状态的保护下,防止出现同步问题。因此当Worker处于lock状态时,表示它正在运行,当它处于unlock状态时,表示它“空闲”。当它空闲超过keepAliveTime时,就有可能被回收。
Worker还实现了Runnable接口, 执行它的线程是Worker包含的Thread对象,在Worker的构造函数可以看到Thread创建时,把Worker对象传递给了它。
private final class Worker extends AbstractQueuedSynchronizer implements Runnable { /** Thread this worker is running in. Null if factory fails. */ final Thread thread; /** Initial task to run. Possibly null. */ Runnable firstTask; /** Per-thread task counter */ volatile long completedTasks; Worker(Runnable firstTask) { setState(-1); // inhibit interrupts until runWorker this.firstTask = firstTask; // 把Worker对象作为Runnable的实例传递给了新创建Thread对象 this.thread = getThreadFactory().newThread(this); } public void run() { runWorker(this); } // Lock methods // // The value 0 represents the unlocked state. // The value 1 represents the locked state. protected boolean isHeldExclusively() { return getState() != 0; } protected boolean tryAcquire(int unused) { if (compareAndSetState(0, 1)) { setExclusiveOwnerThread(Thread.currentThread()); return true; } return false; } protected boolean tryRelease(int unused) { setExclusiveOwnerThread(null); setState(0); return true; } public void lock() { acquire(1); } public boolean tryLock() { return tryAcquire(1); } public void unlock() { release(1); } public boolean isLocked() { return isHeldExclusively(); } void interruptIfStarted() { Thread t; if (getState() >= 0 && (t = thread) != null && !t.isInterrupted()) { try { t.interrupt(); } catch (SecurityException ignore) { } } } }
1. wt指向当前执行Worker的run方法的线程,也就是指向了Worker包含的工作线程对象
2. task指向Worker包含的firstTask对象,表示当前要执行的任务
3. 当task不为null或者从工作队列中取到了新任务,那么先加锁w.lock表示正在运行任务。在真正开始执行task.run()之前,先判断线程池的状态是否已经STOP,如果是,就中断Worker的线程。
4. 一旦判断当前线程不是STOP并且工作线程没有中断。那么就开始执行task.run()了。Worker的interruptIfStarted方法可以中断这个Worker的线程,从而中断正在执行任务。
5. beforeExecute(wt, task)和afterExecute(wt,task)是两个钩子方法,支持在任务真正开始执行前就行扩展。
final void runWorker(Worker w) { Thread wt = Thread.currentThread(); Runnable task = w.firstTask; w.firstTask = null; w.unlock(); // allow interrupts boolean completedAbruptly = true; try { while (task != null || (task = getTask()) != null) { w.lock(); // If pool is stopping, ensure thread is interrupted; // if not, ensure thread is not interrupted. This // requires a recheck in second case to deal with // shutdownNow race while clearing interrupt if ((runStateAtLeast(ctl.get(), STOP) || (Thread.interrupted() && runStateAtLeast(ctl.get(), STOP))) && !wt.isInterrupted()) wt.interrupt(); try { beforeExecute(wt, task); Throwable thrown = null; try { task.run(); } catch (RuntimeException x) { thrown = x; throw x; } catch (Error x) { thrown = x; throw x; } catch (Throwable x) { thrown = x; throw new Error(x); } finally { afterExecute(task, thrown); } } finally { task = null; w.completedTasks++; w.unlock(); } } completedAbruptly = false; } finally { processWorkerExit(w, completedAbruptly); } }
首先看一下ThreadPoolExecutor的execute方法,这个方式是任务提交的入口。可以看到它的逻辑符合之前说的工作线程创建的基本策略
1. 当工作线程数量小于corePoolSize时,通过addWorker(command,true)来新建工作线程处理新建的任务,不入工作队列
2. 当工作线程数量大于等于corePoolSize时,先入队列,使用的是BlockingQueue的offer方法。当工作线程数量为0时,还会通过addWorker(null, false)添加一个新的工作线程
3. 当工作队列满了并且工作线程数量在corePoolSize和MaximumPoolSize之间,就创建新的工作线程去执行新添加的任务。当工作线程数量超过了MaximumPoolSize,就拒绝任务。
public void execute(Runnable command) { if (command == null) throw new NullPointerException(); int c = ctl.get(); if (workerCountOf(c) < corePoolSize) { if (addWorker(command, true)) return; c = ctl.get(); } if (isRunning(c) && workQueue.offer(command)) { int recheck = ctl.get(); if (! isRunning(recheck) && remove(command)) reject(command); else if (workerCountOf(recheck) == 0) addWorker(null, false); } else if (!addWorker(command, false)) reject(command); }
1. retry这个循环判断线程池的状态和当前工作线程数量的边界。如果允许创建工作现场,首先修改ctl变量表示的工作线程的数量
2. 把工作线程添加到workers集合中的操作要在mainLock这个锁的保护下进行。所有和ThreadPoolExecutor状态相关的操作都要在mainLock锁的保护下进行
3. w = new Worker(firstTask); 创建Worker实例,把firstTask作为它当前的任务。firstTask为null时表示先只创建Worker线程,然后去工作队列中取任务执行
4. 把新创建的Worker实例加入到workers集合,修改相关统计变量。
5. 当加入集合成功后,开始启动这个Worker实例。启动的方法是调用Worker封装的Thread的start()方法。之前说了,这个Thread对应的Runnable是Worker本身,会去调用Worker的run方法,然后调用ThreadPoolExecutor的runWorker方法。在runWorker方法中真正去执行任务。
private boolean addWorker(Runnable firstTask, boolean core) { retry: for (;;) { int c = ctl.get(); int rs = runStateOf(c); // Check if queue empty only if necessary. if (rs >= SHUTDOWN && ! (rs == SHUTDOWN && firstTask == null && ! workQueue.isEmpty())) return false; for (;;) { int wc = workerCountOf(c); if (wc >= CAPACITY || wc >= (core ? corePoolSize : maximumPoolSize)) return false; if (compareAndIncrementWorkerCount(c)) break retry; c = ctl.get(); // Re-read ctl if (runStateOf(c) != rs) continue retry; // else CAS failed due to workerCount change; retry inner loop } } boolean workerStarted = false; boolean workerAdded = false; Worker w = null; try { final ReentrantLock mainLock = this.mainLock; w = new Worker(firstTask); final Thread t = w.thread; if (t != null) { mainLock.lock(); try { // Recheck while holding lock. // Back out on ThreadFactory failure or if // shut down before lock acquired. int c = ctl.get(); int rs = runStateOf(c); if (rs < SHUTDOWN || (rs == SHUTDOWN && firstTask == null)) { if (t.isAlive()) // precheck that t is startable throw new IllegalThreadStateException(); workers.add(w); int s = workers.size(); if (s > largestPoolSize) largestPoolSize = s; workerAdded = true; } } finally { mainLock.unlock(); } if (workerAdded) { t.start(); workerStarted = true; } } } finally { if (! workerStarted) addWorkerFailed(w); } return workerStarted; }
1. 结合runWorker方法看,如果Worker执行task.run()的时候抛出了异常,那么completedAbruptly为true,需要从workers集合中把这个工作线程移除掉。
2. 如果是completedAbruptly为true,并且线程池不是STOP状态,那么就创建一个新的Worker工作线程
3. 如果是completedAbruptly为false,并且线程池不是STOP状态,首先检查是否allowCoreThreadTimeout,如果运行,那么最少线程数可以为0,否则是corePoolSize。如果最少线程数为0,并且工作队列不为空,那么最小值为1。最后检查当前的工作线程数量,如果小于最小值,就创建新的工作线程。
private void processWorkerExit(Worker w, boolean completedAbruptly) { if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted decrementWorkerCount(); final ReentrantLock mainLock = this.mainLock; mainLock.lock(); try { completedTaskCount += w.completedTasks; workers.remove(w); } finally { mainLock.unlock(); } tryTerminate(); int c = ctl.get(); if (runStateLessThan(c, STOP)) { if (!completedAbruptly) { int min = allowCoreThreadTimeOut ? 0 : corePoolSize; if (min == 0 && ! workQueue.isEmpty()) min = 1; if (workerCountOf(c) >= min) return; // replacement not needed } addWorker(null, false); } }
工作线程从工作队列中取任务的代码在getTask方法中
1. timed变量表示是否要计时,当计时超过keepAliveTime后还没取到任务,就返回null。结合runWorker方法可以知道,当getTask返回null时,该Worker线程会被回收,这就是如何回收空闲工作线程的方法。
timed变量当allowCoreThreadTimeout为true或者当工作线程数大于corePoolSize时为true。
2. 如果timed为true,就用BlockingQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS)方法来计时从队头取任务,否则直接用take()方法从队头取任务
private Runnable getTask() { boolean timedOut = false; // Did the last poll() time out? retry: for (;;) { int c = ctl.get(); int rs = runStateOf(c); // Check if queue empty only if necessary. if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) { decrementWorkerCount(); return null; } boolean timed; // Are workers subject to culling? for (;;) { int wc = workerCountOf(c); timed = allowCoreThreadTimeOut || wc > corePoolSize; if (wc <= maximumPoolSize && ! (timedOut && timed)) break; if (compareAndDecrementWorkerCount(c)) return null; c = ctl.get(); // Re-read ctl if (runStateOf(c) != rs) continue retry; // else CAS failed due to workerCount change; retry inner loop } try { Runnable r = timed ? workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) : workQueue.take(); if (r != null) return r; timedOut = true; } catch (InterruptedException retry) { timedOut = false; } } }
线程池有SHUTDOWN, STOP, TIDYING, TERMINATED这几个状态和线程池关闭相关。通常我们把关闭分为优雅的关闭和强制立刻关闭。
所谓优雅的关闭就是调用shutdown()方法,线程池进入SHUTDOWN状态,不在接收新的任务,会把工作队列的任务执行完毕后再结束。
强制立刻关闭就是调用shutdownNow()方法,线程池直接进入STOP状态,会中断正在执行的工作线程,清空工作队列。
1. 在shutdown方法中,先设置线程池状态为SHUTDOWN,然后先去中断空闲的工作线程,再调用onShutdown钩子方法。最后tryTerminate()
2. 在shutdownNow方法中,先设置线程池状态为STOP,然后先中断所有的工作线程,再清空工作队列。最后tryTerminate()。这个方法会把工作队列中的任务返回给调用者处理。
public void shutdown() { final ReentrantLock mainLock = this.mainLock; mainLock.lock(); try { checkShutdownAccess(); advanceRunState(SHUTDOWN); interruptIdleWorkers(); onShutdown(); // hook for ScheduledThreadPoolExecutor } finally { mainLock.unlock(); } tryTerminate(); } public List<Runnable> shutdownNow() { List<Runnable> tasks; final ReentrantLock mainLock = this.mainLock; mainLock.lock(); try { checkShutdownAccess(); advanceRunState(STOP); interruptWorkers(); tasks = drainQueue(); } finally { mainLock.unlock(); } tryTerminate(); return tasks; }
而interruptWorkers方法直接去中断所有的Worker,调用Worker.interruptIfStarted()方法
private void interruptIdleWorkers(boolean onlyOne) { final ReentrantLock mainLock = this.mainLock; mainLock.lock(); try { for (Worker w : workers) { Thread t = w.thread; if (!t.isInterrupted() && w.tryLock()) { try { t.interrupt(); } catch (SecurityException ignore) { } finally { w.unlock(); } } if (onlyOne) break; } } finally { mainLock.unlock(); } } private void interruptWorkers() { final ReentrantLock mainLock = this.mainLock; mainLock.lock(); try { for (Worker w : workers) w.interruptIfStarted(); } finally { mainLock.unlock(); } } void interruptIfStarted() { Thread t; if (getState() >= 0 && (t = thread) != null && !t.isInterrupted()) { try { t.interrupt(); } catch (SecurityException ignore) { } } }
final void tryTerminate() { for (;;) { int c = ctl.get(); if (isRunning(c) || runStateAtLeast(c, TIDYING) || (runStateOf(c) == SHUTDOWN && ! workQueue.isEmpty())) return; if (workerCountOf(c) != 0) { // Eligible to terminate interruptIdleWorkers(ONLY_ONE); return; } final ReentrantLock mainLock = this.mainLock; mainLock.lock(); try { if (ctl.compareAndSet(c, ctlOf(TIDYING, 0))) { try { terminated(); } finally { ctl.set(ctlOf(TERMINATED, 0)); termination.signalAll(); } return; } } finally { mainLock.unlock(); } // else retry on failed CAS } }