在日常的移动开发过程中,并发处理任务是不可避免的。尤其是伴随着移动互联时代的飞速发展,用户对应用的要求也越来越最求高质量的极致体验,甚至逼得Google不允许在Android的UI线程发起网络请求。
还好由于Android强大的生态圈,AsyncTask、Executors、RxJava等一批异步处理工具被开发人员用的炉火纯青。
可好景不长,近年阿里发布了自己的Java开发手册,带动全行业纷纷效仿升级,不但不能使用Thread搞开发,就算是使用Executors提供的四大线程池工具,也被限制了。对就是下面这张图:
为什么阿里不允许使用系统提供的线程池方法构建线程池呢?这就跟线程池的执行逻辑有关了。
那我们先通过一张表来阐述一下线程池的执行逻辑。由于所有的的Executors工具提供的线程池创建方法最终都会通过ThreadPoolExecutor创建线程池,并通过其execute方法执行。我们直接分析ThreadPoolExecutor的执行逻辑。
一、执行策略
根据图上我们知道,当系统调用execute方法发起执行的时候,线程池有三种选择:拒绝执行任务、直接执行任务、缓冲执行任务。执行不同的任务自然要依赖不同的条件,我们从源码角度来看看这些条件到底是什么?
当我们调用execute方法开始执行的时候,官方对这个方法做了相当详细的解释:系统提供的任务会在未来的某个时刻被执行,执行的线程要么是一个新的线程,要么是线程池里的一个已经存在的线程。如果因为线程池关闭或者容量超限,任务无法执行,那么就会通过RejectedExecutionHandler执行拒绝策略。而系统提供的任务,就是execute传入的Runnable参数:command
/**
* Executes the given task sometime in the future. The task
* may execute in a new thread or in an existing pooled thread.
*
* If the task cannot be submitted for execution, either because this
* executor has been shutdown or because its capacity has been reached,
* the task is handled by the current {@code RejectedExecutionHandler}.
*
* @param command the task to execute
* @throws RejectedExecutionException at discretion of
* {@code RejectedExecutionHandler}, if the task
* cannot be accepted for execution
* @throws NullPointerException if {@code command} is null
*/
public void execute(Runnable command) {
if (command == null)
throw new NullPointerException();
/*
* Proceed in 3 steps:
*
* 1. If fewer than corePoolSize threads are running, try to
* start a new thread with the given command as its first
* task. The call to addWorker atomically checks runState and
* workerCount, and so prevents false alarms that would add
* threads when it shouldn't, by returning false.
*
* 2. If a task can be successfully queued, then we still need
* to double-check whether we should have added a thread
* (because existing ones died since last checking) or that
* the pool shut down since entry into this method. So we
* recheck state and if necessary roll back the enqueuing if
* stopped, or start a new thread if there are none.
*
* 3. If we cannot queue task, then we try to add a new
* thread. If it fails, we know we are shut down or saturated
* and so reject the task.
*/
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) { // 代码1
if (addWorker(command, true))
return;
c = ctl.get();
}
if (isRunning(c) && workQueue.offer(command)) { // 代码2
int recheck = ctl.get();
if (! isRunning(recheck) && remove(command)) // 代码2.1
reject(command);
else if (workerCountOf(recheck) == 0) // 代码2.2
addWorker(null, false);
}
else if (!addWorker(command, false)) // 代码3
reject(command);
}
为了是用户更容易理解,方法内还通过三个步骤对执行逻辑,做了更进一步的解释:
step1:如果当前正在运行的线程数
step2:如果一个任务可以成功的加入阻塞队列,我们就要对预执行行为做二次检测:2.1 我们是否本应该先添加一个新的线程【用于执行command任务】,因为从上一次核对到现在线程池中的线程可能已经死了;2.2 当前线程池已经不是running状态了。根据核对结果,如果线程池停了就重新执行入队操作;如果工作线程空了,就先启动一个新的工作线程获取阻塞队列里的执行任务,维持正常的执行工作。
代码2对应这个步骤,其中代码2.1处所谓的恢复线程池,就是执行拒绝策略。代码2.2处倒是真的通过addWorker启动了一个新的工作线程。
step3:如果阻塞队列已满,就要尝试通过addWorker启动一个新的工作线程。如果启动失败,就要执行拒绝策略:关闭线程池或者执行拒绝任务。
整个线程池执行的策略,可以参考下面的流程图:
二、任务处理逻辑
在上面的步骤中,我们一直在提到addWorker这个方法,是否执行拒绝策略是由它通过返回一个bool值决定的。看源码说明:
注释译文:(该方法)根据当前线程池状态、核心线程数、最大线程数等临界值,来决定是否可以向线程池中添加一个新的worker【执行线程】。如果添加完成,执行线程的数量也要做相应的调整。如果可能的话,甚至要创建并开启一个新的线程,把传入firstTask作为第一个任务。在线程池STOP、SHUTDOWN状态下,返回false。线程工厂申请创建线程失败,或者申请过程中发生OOM,也返回false。
firstTask参数就是在execute执行时传入的command待执行任务,如果执行线程数<核心线程数,或者队列满的时候执行线程说小于最大线程数,addWorker都会通过它创建一个新的执行线程worker。
初始空闲线程常常通过prestartCoreThread 方法创建,或者用来代替其他的死亡线程。
当然firstTask也可能为空,比如系统在将command插入阻塞队列后发现,执行线程数为0,就会先创建一个空任务线程执行。因为就算执行任务的线程数为0,也要有一个调度线程负责获取任务。
/**
* Checks if a new worker can be added with respect to current
* pool state and the given bound (either core or maximum). If so,
* the worker count is adjusted accordingly, and, if possible, a
* new worker is created and started, running firstTask as its
* first task. This method returns false if the pool is stopped or
* eligible to shut down. It also returns false if the thread
* factory fails to create a thread when asked. If the thread
* creation fails, either due to the thread factory returning
* null, or due to an exception (typically OutOfMemoryError in
* Thread.start()), we roll back cleanly.
*
* @param firstTask the task the new thread should run first (or
* null if none). Workers are created with an initial first task
* (in method execute()) to bypass queuing when there are fewer
* than corePoolSize threads (in which case we always start one),
* or when the queue is full (in which case we must bypass queue).
* Initially idle threads are usually created via
* prestartCoreThread or to replace other dying workers.
*
* @param core if true use corePoolSize as bound, else
* maximumPoolSize. (A boolean indicator is used here rather than a
* value to ensure reads of fresh values after checking other pool
* state).
* @return true if successful
*/
private boolean addWorker(Runnable firstTask, boolean core) {
retry: // break时,可以直接结束所有循环。
for (;;) {
int c = ctl.get();
int rs = runStateOf(c);
// Check if queue empty only if necessary.
if (rs >= SHUTDOWN &&
! (rs == SHUTDOWN &&
firstTask == null &&
! workQueue.isEmpty()))
return false;
for (;;) {
int wc = workerCountOf(c);
if (wc >= CAPACITY ||
wc >= (core ? corePoolSize : maximumPoolSize))
return false;
if (compareAndIncrementWorkerCount(c))
break retry; // 结束内外循环
c = ctl.get(); // Re-read ctl
if (runStateOf(c) != rs)
continue retry; // 结束外循环的当前循环,执行下一个外循环
// else CAS failed due to workerCount change; retry inner loop
}
}
// 代码1
boolean workerStarted = false;
boolean workerAdded = false;
Worker w = null;
try { // 代码2
w = new Worker(firstTask);
final Thread t = w.thread;
if (t != null) {
// 代码3
final ReentrantLock mainLock = this.mainLock;
mainLock.lock();
try {
// Recheck while holding lock.
// Back out on ThreadFactory failure or if
// shut down before lock acquired.
int rs = runStateOf(ctl.get());
// 代码4
if (rs < SHUTDOWN ||
(rs == SHUTDOWN && firstTask == null)) {
if (t.isAlive()) // precheck that t is startable
throw new IllegalThreadStateException();
workers.add(w);
int s = workers.size();
if (s > largestPoolSize)
largestPoolSize = s;
workerAdded = true;
}
} finally {
mainLock.unlock();
}
if (workerAdded) { // 代码5
t.start();
workerStarted = true;
}
}
} finally {
if (! workerStarted)
addWorkerFailed(w);
}
return workerStarted;
}
知道addWorker的功能,我们就可以按部就班的查看源码的实现逻辑。整个方法大致分为两个步骤:
step1:通过两个for循环检测整个线程池环境:判断线程池是否处于RUNNING状态【负数】、SHUTDOWN状态【0】、其他状态【正数】,只有RUNNING状态和符合条件的SHUTDOWN状态【比如,阻塞队列为空】才允许创建新的执行线程。
在第二个for循环处对允许创建执行线程的临界值做了判断,只有执行线程数小于核心线程数或者队列满的时候,总执行线程数小于最大线程数才允许添加新的执行任务。并且通过compareAndIncrementWorkerCount更新执行线程数。
step2:构造worker执行线程实体,并发起执行任务。
代码1处,声明两个局部变量workerStarted和workerAdded表征执行线程worker是否添加成功并发起执行;
代码2处,构造了worker实体,并将待执行任务firstTask一并传入。其构造方法如下:
/**
* Creates with given first task and thread from ThreadFactory.
* @param firstTask the first task (null if none)
*/
Worker(Runnable firstTask) {
setState(-1); // inhibit interrupts until runWorker
this.firstTask = firstTask;
this.thread = getThreadFactory().newThread(this);
}
可见无论执行任务是否为空,线程工厂都会创建一个新的线程thread。看看整个thread的定义:
/** Thread this worker is running in. Null if factory fails. */
final Thread thread;
执行线程worker要运行在整个thread中。换句话说,任务在这个thread中执行。
注意:newThread()的参数this,说明worker也是一个Runnable对象。
紧接着,
代码3处,声明了一个ReentrantLock可重入锁对象mainLock,它锁住了下面的try代码块,因为这里要进行一个重要的操作:
workers.add(w);
这里的workers是一个HashSet集合,如下:
/**
* Set containing all worker threads in pool. Accessed only when
* holding mainLock.
*/
// Android-added: @ReachabilitySensitive
@ReachabilitySensitive
private final HashSetworkers = new HashSet<>();
它存储了线程里所有的执行线程,当多个任务同时访问他的时候,必然要做好安全保护。所以,规定只有持有mainlock的请求任务才能修改它。一旦一个任务添加完成,workerAdded被置位true,系统就要发起工作线程的执行。
代码5处,通过t.start()方法发起工作线程的执行。t就是代码2处,构造worker时,ThreadFactory创建的执行线程。
注意:之所以用workerStarted表征线程启动,是因为执行线程可能启动失败。方便通过addWorkerFailed完成兜底操作。
整个执行逻辑比较简单,如下图:
三、任务执行逻辑
还记得刚刚发起的Thread吗,它在被创建的时候传入了worker实例:
this.thread = getThreadFactory().newThread(this);
看看worker的继承关系:
/**
* Class Worker mainly maintains interrupt control state for
* threads running tasks, along with other minor bookkeeping.
* This class opportunistically extends AbstractQueuedSynchronizer
* to simplify acquiring and releasing a lock surrounding each
* task execution. This protects against interrupts that are
* intended to wake up a worker thread waiting for a task from
* instead interrupting a task being run. We implement a simple
* non-reentrant mutual exclusion lock rather than use
* ReentrantLock because we do not want worker tasks to be able to
* reacquire the lock when they invoke pool control methods like
* setCorePoolSize. Additionally, to suppress interrupts until
* the thread actually starts running tasks, we initialize lock
* state to a negative value, and clear it upon start (in
* runWorker).
*/
private final class Worker
extends AbstractQueuedSynchronizer
implements Runnable
也就是说Worker类是一个Runnable的实现类,thread.start其实启动的是Worker的run方法。注意:这个类,主要是为了维持正在运行任务的线程的中断控制状态。它还顺便继承了AbstractQueuedSynchronizer,这是为了简化每一次围绕任务执行的锁请求和释放操作。
看一下Worker的run方法:
/** Delegates main run loop to outer runWorker. */
public void run() {
runWorker(this);
}
它把run循环委托给了外部的runWorker方法。系统对该方法也做了较为详细的说明:
/**
* Main worker run loop. Repeatedly gets tasks from queue and
* executes them, while coping with a number of issues:
*
* 1. We may start out with an initial task, in which case we
* don't need to get the first one. Otherwise, as long as pool is
* running, we get tasks from getTask. If it returns null then the
* worker exits due to changed pool state or configuration
* parameters. Other exits result from exception throws in
* external code, in which case completedAbruptly holds, which
* usually leads processWorkerExit to replace this thread.
*
* 2. Before running any task, the lock is acquired to prevent
* other pool interrupts while the task is executing, and then we
* ensure that unless pool is stopping, this thread does not have
* its interrupt set.
*
* 3. Each task run is preceded by a call to beforeExecute, which
* might throw an exception, in which case we cause thread to die
* (breaking loop with completedAbruptly true) without processing
* the task.
*
* 4. Assuming beforeExecute completes normally, we run the task,
* gathering any of its thrown exceptions to send to afterExecute.
* We separately handle RuntimeException, Error (both of which the
* specs guarantee that we trap) and arbitrary Throwables.
* Because we cannot rethrow Throwables within Runnable.run, we
* wrap them within Errors on the way out (to the thread's
* UncaughtExceptionHandler). Any thrown exception also
* conservatively causes thread to die.
*
* 5. After task.run completes, we call afterExecute, which may
* also throw an exception, which will also cause thread to
* die. According to JLS Sec 14.20, this exception is the one that
* will be in effect even if task.run throws.
*
* The net effect of the exception mechanics is that afterExecute
* and the thread's UncaughtExceptionHandler have as accurate
* information as we can provide about any problems encountered by
* user code.
*
* @param w the worker
*/
final void runWorker(Worker w) {
Thread wt = Thread.currentThread();
Runnable task = w.firstTask;
w.firstTask = null;
w.unlock(); // allow interrupts
boolean completedAbruptly = true;
try {
while (task != null || (task = getTask()) != null) {
w.lock();
// If pool is stopping, ensure thread is interrupted;
// if not, ensure thread is not interrupted. This
// requires a recheck in second case to deal with
// shutdownNow race while clearing interrupt
if ((runStateAtLeast(ctl.get(), STOP) ||
(Thread.interrupted() &&
runStateAtLeast(ctl.get(), STOP))) &&
!wt.isInterrupted())
wt.interrupt();
try {
beforeExecute(wt, task);
Throwable thrown = null;
try {
task.run();
} catch (RuntimeException x) {
thrown = x; throw x;
} catch (Error x) {
thrown = x; throw x;
} catch (Throwable x) {
thrown = x; throw new Error(x);
} finally {
afterExecute(task, thrown);
}
} finally {
task = null;
w.completedTasks++;
w.unlock();
}
}
completedAbruptly = false;
} finally {
processWorkerExit(w, completedAbruptly);
}
}
虽然这个方法,从头到尾扯了一箩筐的闲片儿,归结起来就是:获取任务task、执行任务task.run、中断控制、执行前任务处理、执行后任务处理。
1. 获取任务
1.1 firstTask不为null
也就是执行线程<核心线程数,或者队列已满,执行线程<最大线程数的情况下,直接执行firstTask中的任务
1.2 firstTask为null
此时需要从阻塞队列获取正在等待的任务。它是通过getTask方法完成。
源码如下:
/**
* Performs blocking or timed wait for a task, depending on
* current configuration settings, or returns null if this worker
* must exit because of any of:
* 1. There are more than maximumPoolSize workers (due to
* a call to setMaximumPoolSize).
* 2. The pool is stopped.
* 3. The pool is shutdown and the queue is empty.
* 4. This worker timed out waiting for a task, and timed-out
* workers are subject to termination (that is,
* {@code allowCoreThreadTimeOut || workerCount > corePoolSize})
* both before and after the timed wait, and if the queue is
* non-empty, this worker is not the last thread in the pool.
*
* @return task, or null if the worker must exit, in which case
* workerCount is decremented
*/
private Runnable getTask() {
boolean timedOut = false; // Did the last poll() time out?
for (;;) {
int c = ctl.get();
int rs = runStateOf(c);
// Check if queue empty only if necessary.
if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
decrementWorkerCount();
return null;
}
int wc = workerCountOf(c);
// Are workers subject to culling?
boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;
if ((wc > maximumPoolSize || (timed && timedOut))
&& (wc > 1 || workQueue.isEmpty())) {
if (compareAndDecrementWorkerCount(c))
return null;
continue;
}
try {
Runnable r = timed ?
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
workQueue.take();
if (r != null)
return r;
timedOut = true;
} catch (InterruptedException retry) {
timedOut = false;
}
}
}
通过一个无限循环,检测线程池环境、队列状况,没有异常的情况下,就从队列里获取一个等待执行的任务。
1.3 中断控制
线程的执行要跟线程池的状态保持一致:如果线程池停止,执行线程就要中断;如果线程池运行,执行线程就不能中断。
1.4 预执行处理
在执行任务执行,runWorker调用beforeExecute(wt, task)方法,将将要执行的工作线程和任务拦截处理,比如:重新初始化ThreadLocals、打印日志等
1.5 执行后处理
在执行完task后,runWorker最终会调用afterExecute(task, thrown)方法,将任务和搜集的异常传入。当然,如果调用submit方法可以针对每一个执行任务的结果进行监听。
值得注意的是,以上五个步骤都是在
AQS的保护下完成的。还记得Worker的继承关系吗?它是AbstractQueuedSynchronizer的派生类,Worker利用了AQS的独占机制,来控制任务执行过程的安全。之所以,没有使用ReentrantLock这个可重入锁,是为了防止调用类似于setCorePoolSize的方法时,worker任务可以再次获取到锁。
问题1. 什么是可重入锁?ThreadPoolExecutor在向workers添加任务的时候使用了ReentrantLock这个可重入锁?使用sychronized行吗?
ReentrantLock是AQS的派生类,它不仅支持synchronized加锁方式的基本功能,还做了相应的扩充:支持中断、超时、在获取失败的时候可以尝试二次获取。synchronized的灵活性相对较差,而且他是基于监视器模式,在大量并发的情况下性能不如ReentrantLock。
问题2. 什么是AQS,谈谈对它的理解
AQS也就是AbstractQueuedSynchronizer,他是一个框架。提供了原子式管理、阻塞和唤醒线程的功能。Worker和ReentrantLock、CountdownLatch都是基于这种框架的,它通过一个volatile修饰的state变量,控制锁的可重入性。Worker只使用了State的0和1两个值,所以不支持可重入机制。可重入的意思就是:一个线程可以对一个临界资源可以重复加锁,并且将请求次数+1。释放的时候,将请求次数-1。
为了实现原子式管理,它通过CAS修改STATE状态,而且它内存通过双向链表队列控制锁的acquire和release,支持独占和共享模式。
问题3. 如何监听线程池中执行任务的执行结果?
ThreadPoolExecutor继承自AbstractExecutorService,该方法提拱了submit方法执行任务并返回FutureTask,然后用afterExecute即可监听结果。
问题4. 为什么阿里不允许使用系统提供的线程池方法构建线程池呢?
以单例线程池为例,它使用的是阻塞队列是LinkedBlockingQueue,该队列允许最多添加Integer.MAX_VALUE个task进行等待,如果CPU执行效率低,而任务量过于繁重的情况下OOM是不可避免的。由于四个基本线程池的限制条件是固定的,可控性相对较差,为了灵活控制线程池的运行,使用自定义方案是不错的选择。