关于框架fork-join的概述网上很多,本文深入剖析java平台下fork-join的实现。
作为一个轻量级的并发执行框架,fork-join事实上由3个角色构成:任务队列(WorkQueue)、工作者线程(ForkJoinWorkerThread)、任务(ForkJoinTask),他们一般通过执行者(ForkJoinPool)的接口来对外提供服务。
对于这些角色如何协调来执行任务,我们通过三个方面进行阐述:提交任务、执行任务、联结任务(join)。
提交任务
当你实例化一个ForkJoinPool之后,一般有三种提交任务的方式:execute、submit(返回future)、invoke(返回join操作得到的结果)。
他们都调用externalPush方法:
final void externalPush(ForkJoinTask> task) {
WorkQueue q; int m, s, n, am; ForkJoinTask>[] a;
int r = ThreadLocalRandom.getProbe();
int ps = plock;
WorkQueue[] ws = workQueues;
if (ps > 0 && ws != null && (m = (ws.length - 1)) >= 0 &&
(q = ws[m & r & SQMASK]) != null && r != 0 &&
U.compareAndSwapInt(q, QLOCK, 0, 1)) { // lock
if ((a = q.array) != null &&
(am = a.length - 1) > (n = (s = q.top) - q.base)) {
int j = ((am & s) << ASHIFT) + ABASE;
U.putOrderedObject(a, j, task);
q.top = s + 1; // push on to deque
q.qlock = 0;
if (n <= 1)
signalWork(ws, q);
return;
}
q.qlock = 0;
}
fullExternalPush(task);
}
02-05行:分别取得探测数r(线程私有的threadLocalRandomProbe)、ps(执行器私有的plock)、ws(执行器私有的任务队列)。
06-20行:当执行器没有停止&&任务队列不为空&&通过探测数取得的某个even队列不为空&&成功锁定了这个任务队列之后,假如队列至少剩余2个空位时,将task放入top指向的位置,更新top值,并且释放队列锁,假如之前的队列中任务数少于或者等于1就调用signalWork,返回。假如队列不满足剩余2个空位(释放锁)或者ws不满足条件,直接进入21行。
21行:调用fullExternalPush来进行另一层面的提交任务。
要点:事实上大多数提交任务只需要执行这部分代码,除了第一次提交和少数workQueue还未初始化的情况。这里有趣的地方在于,因为操作执行的时间极短,同一个queue时的同步不是通过传统的锁来处理了,而是通过CAS操作来改变私用的qlock的值来实现互斥操作,1为锁定(1: locked, -1: terminate; else 0),失败之后就转为进入fullExternalPush。另外对于执行器来说,ps<0代表执行器被关闭,SQMASK[0X7E]即126用于取得二进制下第一位为0的队列,就是事实上的共享队列,另外探测数不能为0。
fullExternalPush:
private void fullExternalPush(ForkJoinTask> task) {
int r;
if ((r = ThreadLocalRandom.getProbe()) == 0) {
ThreadLocalRandom.localInit();
r = ThreadLocalRandom.getProbe();
}
for (;;) {
WorkQueue[] ws; WorkQueue q; int ps, m, k;
boolean move = false;
if ((ps = plock) < 0)
throw new RejectedExecutionException();
else if (ps == 0 || (ws = workQueues) == null ||
(m = ws.length - 1) < 0) { // initialize workQueues
int p = parallelism; // find power of two table size
int n = (p > 1) ? p - 1 : 1; // ensure at least 2 slots
n |= n >>> 1; n |= n >>> 2; n |= n >>> 4;
n |= n >>> 8; n |= n >>> 16; n = (n + 1) << 1;
WorkQueue[] nws = ((ws = workQueues) == null || ws.length == 0 ?
new WorkQueue[n] : null);
if (((ps = plock) & PL_LOCK) != 0 ||
!U.compareAndSwapInt(this, PLOCK, ps, ps += PL_LOCK))
ps = acquirePlock();
if (((ws = workQueues) == null || ws.length == 0) && nws != null)
workQueues = nws;
int nps = (ps & SHUTDOWN) | ((ps + PL_LOCK) & ~SHUTDOWN);
if (!U.compareAndSwapInt(this, PLOCK, ps, nps))
releasePlock(nps);
}
else if ((q = ws[k = r & m & SQMASK]) != null) {
if (q.qlock == 0 && U.compareAndSwapInt(q, QLOCK, 0, 1)) {
ForkJoinTask>[] a = q.array;
int s = q.top;
boolean submitted = false;
try { // locked version of push
if ((a != null && a.length > s + 1 - q.base) ||
(a = q.growArray()) != null) { // must presize
int j = (((a.length - 1) & s) << ASHIFT) + ABASE;
U.putOrderedObject(a, j, task);
q.top = s + 1;
submitted = true;
}
} finally {
q.qlock = 0; // unlock
}
if (submitted) {
signalWork(ws, q);
return;
}
}
move = true; // move on failure
}
else if (((ps = plock) & PL_LOCK) == 0) { // create new queue
q = new WorkQueue(this, null, SHARED_QUEUE, r);
q.poolIndex = (short)k;
if (((ps = plock) & PL_LOCK) != 0 ||
!U.compareAndSwapInt(this, PLOCK, ps, ps += PL_LOCK))
ps = acquirePlock();
if ((ws = workQueues) != null && k < ws.length && ws[k] == null)
ws[k] = q;
int nps = (ps & SHUTDOWN) | ((ps + PL_LOCK) & ~SHUTDOWN);
if (!U.compareAndSwapInt(this, PLOCK, ps, nps))
releasePlock(nps);
}
else
move = true; // move if busy
if (move)
r = ThreadLocalRandom.advanceProbe(r);
}
}
03-06行:取得探测数r,当为0时重新计算并得到非0探测数。
08-28行:设置move(用于判断是否需要重设探测数),得到plock(假如为负就直接抛出RejectedExecutionException异常拒绝),接下来通过并行数(parallelism)来构造我们的队列组,得到一个合适的队列组大小n,获得锁plock(即把第2位从0变为1),设置队列组,释放锁(第2位1变为0),进入下一次循环。
29-50行:通过r取得的queue不为空,并且取得锁(qlock)之后将任务放进array里,如有需要会扩充array。成功之后释放所,并且给出一个信号signalWork,然后返回。假如queue不为空但是竞争锁失败,那么把move变为true,进入下一次循环。
52-62行:通过r取得的queue为空,那么就创建一个共享queue(SHARED_QUEUE),之后获取执行器的锁对象(plock),假如为空则在这个下标下设置queue,释放锁,进入下一次循环。
65行:连创建queue的机会都被别的取走,那么便改变move。
66-67行:如果move被设置,那么重新计算r的值。
要点:事实上这里可以分为3个阶段的操作,第一阶段创建queue组,第二阶段创建以r计算得到的下标对应的queue实例,第三阶段通过得到的不为空的queue实例来push一个task(有可能需要创建为空的array)。当然,最后还是会通过提交任务从而退出,退出前传递信号signalWork。在需要的时候设置move为true从而选择其他的queue组下标,比如在选择的queue被其他线程锁定或者想要创建queue实例却被其他线程抢先一步,从而提高效率。另一方面,这里创建的共享队列组的个数与parallelism相同,并且workQueues的总数为2*parallelism(下标第一位0的为共享队列,1为工作队列),这个构造减少了任务队列的争用情况,并且提前构造了足够多的队列引用。另外这里有趣的是加锁的方式:自旋+内置锁,大多数情况下会通过自旋锁的方式得到锁,内置锁作为一种备份选择实际上只是为了进入等待状态,通过CAS将plock用PL_LOCK把第2位变为1从而锁定,第1位变为1从而表明有线程处于等待状态,解锁时候将plock第二位变回为0,需要的话会获取内置锁从而唤醒等待线程,整个过程需要保留处于第32位用来标示关闭的SHUTDOWN,具体可以参考acquirePlock和releasePlock。(acquirePlock中存在理论上的ABA问题,后面会讲)
final void signalWork(WorkQueue[] ws, WorkQueue q) {
for (;;) {
long c; int e, u, i; WorkQueue w; Thread p;
if ((u = (int)((c = ctl) >>> 32)) >= 0)
break;
if ((e = (int)c) <= 0) {
if ((short)u < 0)
tryAddWorker();
break;
}
if (ws == null || ws.length <= (i = e & SMASK) ||
(w = ws[i]) == null)
break;
long nc = (((long)(w.nextWait & E_MASK)) |
((long)(u + UAC_UNIT)) << 32);
int ne = (e + E_SEQ) & E_MASK;
if (w.eventCount == (e | INT_SIGN) &&
U.compareAndSwapLong(this, CTL, c, nc)) {
w.eventCount = ne;
if ((p = w.parker) != null)
U.unpark(p);
break;
}
if (q != null && q.base >= q.top)
break;
}
}
private void tryAddWorker() {
long c; int u, e;
while ((u = (int)((c = ctl) >>> 32)) < 0 &&
(u & SHORT_SIGN) != 0 && (e = (int)c) >= 0) {
long nc = ((long)(((u + UTC_UNIT) & UTC_MASK) |
((u + UAC_UNIT) & UAC_MASK)) << 32) | (long)e;
if (U.compareAndSwapLong(this, CTL, c, nc)) {
ForkJoinWorkerThreadFactory fac;
Throwable ex = null;
ForkJoinWorkerThread wt = null;
try {
if ((fac = factory) != null &&
(wt = fac.newThread(this)) != null) {
wt.start();
break;
}
} catch (Throwable rex) {
ex = rex;
}
deregisterWorker(wt, ex);
break;
}
}
}
执行任务
我们从ForkJoinWorkerThread的run方法开始:
public void run() {
if (workQueue.array == null) { // only run once
Throwable exception = null;
try {
onStart();
pool.runWorker(workQueue);
} catch (Throwable ex) {
exception = ex;
} finally {
try {
onTermination(exception);
} catch (Throwable ex) {
if (exception == null)
exception = ex;
} finally {
pool.deregisterWorker(this, exception);
}
}
}
}
final void runWorker(WorkQueue w) {
w.growArray(); // allocate queue
for (int r = w.hint; scan(w, r) == 0; ) {
r ^= r << 13; r ^= r >>> 17; r ^= r << 5; // xorshift
}
}
private final int scan(WorkQueue w, int r) {
WorkQueue[] ws; int m;
long c = ctl; // for consistency check
if ((ws = workQueues) != null && (m = ws.length - 1) >= 0 && w != null) {
for (int j = m + m + 1, ec = w.eventCount;;) {
WorkQueue q; int b, e; ForkJoinTask>[] a; ForkJoinTask> t;
if ((q = ws[(r - j) & m]) != null &&
(b = q.base) - q.top < 0 && (a = q.array) != null) {
long i = (((a.length - 1) & b) << ASHIFT) + ABASE;
if ((t = ((ForkJoinTask>)
U.getObjectVolatile(a, i))) != null) {
if (ec < 0)
helpRelease(c, ws, w, q, b);
else if (q.base == b &&
U.compareAndSwapObject(a, i, t, null)) {
U.putOrderedInt(q, QBASE, b + 1);
if ((b + 1) - q.top < 0)
signalWork(ws, q);
w.runTask(t);
}
}
break;
}
else if (--j < 0) {
if ((ec | (e = (int)c)) < 0) // inactive or terminating
return awaitWork(w, c, ec);
else if (ctl == c) { // try to inactivate and enqueue
long nc = (long)ec | ((c - AC_UNIT) & (AC_MASK|TC_MASK));
w.nextWait = e;
w.eventCount = ec | INT_SIGN;
if (!U.compareAndSwapLong(this, CTL, c, nc))
w.eventCount = ec; // back out
}
break;
}
}
}
return 0;
}
private final void helpRelease(long c, WorkQueue[] ws, WorkQueue w,
WorkQueue q, int b) {
WorkQueue v; int e, i; Thread p;
if (w != null && w.eventCount < 0 && (e = (int)c) > 0 &&
ws != null && ws.length > (i = e & SMASK) &&
(v = ws[i]) != null && ctl == c) {
long nc = (((long)(v.nextWait & E_MASK)) |
((long)((int)(c >>> 32) + UAC_UNIT)) << 32);
int ne = (e + E_SEQ) & E_MASK;
if (q != null && q.base == b && w.eventCount < 0 &&
v.eventCount == (e | INT_SIGN) &&
U.compareAndSwapLong(this, CTL, c, nc)) {
v.eventCount = ne;
if ((p = v.parker) != null)
U.unpark(p);
}
}
}
final void runTask(ForkJoinTask> task) {
if ((currentSteal = task) != null) {
ForkJoinWorkerThread thread;
task.doExec();
ForkJoinTask>[] a = array;
int md = mode;
++nsteals;
currentSteal = null;
if (md != 0)
pollAndExecAll();
else if (a != null) {
int s, m = a.length - 1;
ForkJoinTask> t;
while ((s = top - 1) - base >= 0 &&
(t = (ForkJoinTask>)U.getAndSetObject
(a, ((m & s) << ASHIFT) + ABASE, null)) != null) {
top = s;
t.doExec();
}
}
if ((thread = owner) != null) // no need to do in finally clause
thread.afterTopLevelExec();
}
}
private final int awaitWork(WorkQueue w, long c, int ec) {
int stat, ns; long parkTime, deadline;
if ((stat = w.qlock) >= 0 && w.eventCount == ec && ctl == c &&
!Thread.interrupted()) {
int e = (int)c;
int u = (int)(c >>> 32);
int d = (u >> UAC_SHIFT) + parallelism; // active count
if (e < 0 || (d <= 0 && tryTerminate(false, false)))
stat = w.qlock = -1; // pool is terminating
else if ((ns = w.nsteals) != 0) { // collect steals and retry
w.nsteals = 0;
U.getAndAddLong(this, STEALCOUNT, (long)ns);
}
else {
long pc = ((d > 0 || ec != (e | INT_SIGN)) ? 0L :
((long)(w.nextWait & E_MASK)) | // ctl to restore
((long)(u + UAC_UNIT)) << 32);
if (pc != 0L) { // timed wait if last waiter
int dc = -(short)(c >>> TC_SHIFT);
parkTime = (dc < 0 ? FAST_IDLE_TIMEOUT:
(dc + 1) * IDLE_TIMEOUT);
deadline = System.nanoTime() + parkTime - TIMEOUT_SLOP;
}
else
parkTime = deadline = 0L;
if (w.eventCount == ec && ctl == c) {
Thread wt = Thread.currentThread();
U.putObject(wt, PARKBLOCKER, this);
w.parker = wt; // emulate LockSupport.park
if (w.eventCount == ec && ctl == c)
U.park(false, parkTime); // must recheck before park
w.parker = null;
U.putObject(wt, PARKBLOCKER, null);
if (parkTime != 0L && ctl == c &&
deadline - System.nanoTime() <= 0L &&
U.compareAndSwapLong(this, CTL, c, pc))
stat = w.qlock = -1; // shrink pool
}
}
}
return stat;
}
final void deregisterWorker(ForkJoinWorkerThread wt, Throwable ex) {
WorkQueue w = null;
if (wt != null && (w = wt.workQueue) != null) {
int ps;
w.qlock = -1; // ensure set
U.getAndAddLong(this, STEALCOUNT, w.nsteals); // collect steals
if (((ps = plock) & PL_LOCK) != 0 ||
!U.compareAndSwapInt(this, PLOCK, ps, ps += PL_LOCK))
ps = acquirePlock();
int nps = (ps & SHUTDOWN) | ((ps + PL_LOCK) & ~SHUTDOWN);
try {
int idx = w.poolIndex;
WorkQueue[] ws = workQueues;
if (ws != null && idx >= 0 && idx < ws.length && ws[idx] == w)
ws[idx] = null;
} finally {
if (!U.compareAndSwapInt(this, PLOCK, ps, nps))
releasePlock(nps);
}
}
long c; // adjust ctl counts
do {} while (!U.compareAndSwapLong
(this, CTL, c = ctl, (((c - AC_UNIT) & AC_MASK) |
((c - TC_UNIT) & TC_MASK) |
(c & ~(AC_MASK|TC_MASK)))));
if (!tryTerminate(false, false) && w != null && w.array != null) {
w.cancelAll(); // cancel remaining tasks
WorkQueue[] ws; WorkQueue v; Thread p; int u, i, e;
while ((u = (int)((c = ctl) >>> 32)) < 0 && (e = (int)c) >= 0) {
if (e > 0) { // activate or create replacement
if ((ws = workQueues) == null ||
(i = e & SMASK) >= ws.length ||
(v = ws[i]) == null)
break;
long nc = (((long)(v.nextWait & E_MASK)) |
((long)(u + UAC_UNIT) << 32));
if (v.eventCount != (e | INT_SIGN))
break;
if (U.compareAndSwapLong(this, CTL, c, nc)) {
v.eventCount = (e + E_SEQ) & E_MASK;
if ((p = v.parker) != null)
U.unpark(p);
break;
}
}
else {
if ((short)u < 0)
tryAddWorker();
break;
}
}
}
if (ex == null) // help clean refs on way out
ForkJoinTask.helpExpungeStaleExceptions();
else // rethrow
ForkJoinTask.rethrow(ex);
}
final int doExec() {
int s; boolean completed;
if ((s = status) >= 0) {
try {
completed = exec();
} catch (Throwable rex) {
return setExceptionalCompletion(rex);
}
if (completed)
s = setCompletion(NORMAL);
}
return s;
}
public ForkJoinTask submit(Callable task) {
ForkJoinTask job = new ForkJoinTask.AdaptedCallable(task);
externalPush(job);
return job;
}
public final V join() {
int s;
if ((s = doJoin() & DONE_MASK) != NORMAL)
reportException(s);
return getRawResult();
}
/** The run status of this task */
volatile int status; // accessed directly by pool and workers
static final int DONE_MASK = 0xf0000000; // mask out non-completion bits
static final int NORMAL = 0xf0000000; // must be negative
static final int CANCELLED = 0xc0000000; // must be < NORMAL
static final int EXCEPTIONAL = 0x80000000; // must be < CANCELLED
static final int SIGNAL = 0x00010000; // must be >= 1 << 16
static final int SMASK = 0x0000ffff; // short bits for tags
private int doJoin() {
int s; Thread t; ForkJoinWorkerThread wt; ForkJoinPool.WorkQueue w;
return (s = status) < 0 ? s :
((t = Thread.currentThread()) instanceof ForkJoinWorkerThread) ?
(w = (wt = (ForkJoinWorkerThread)t).workQueue).
tryUnpush(this) && (s = doExec()) < 0 ? s :
wt.pool.awaitJoin(w, this) :
externalAwaitDone();
}
private int externalAwaitDone() {
int s;
ForkJoinPool cp = ForkJoinPool.common;
if ((s = status) >= 0) {
if (cp != null) {
if (this instanceof CountedCompleter)
s = cp.externalHelpComplete((CountedCompleter>)this, Integer.MAX_VALUE);
else if (cp.tryExternalUnpush(this))
s = doExec();
}
if (s >= 0 && (s = status) >= 0) {
boolean interrupted = false;
do {
if (U.compareAndSwapInt(this, STATUS, s, s | SIGNAL)) {
synchronized (this) {
if (status >= 0) {
try {
wait();
} catch (InterruptedException ie) {
interrupted = true;
}
}
else
notifyAll();
}
}
} while ((s = status) >= 0);
if (interrupted)
Thread.currentThread().interrupt();
}
}
return s;
}
final int awaitJoin(WorkQueue joiner, ForkJoinTask> task) {
int s = 0;
if (task != null && (s = task.status) >= 0 && joiner != null) {
ForkJoinTask> prevJoin = joiner.currentJoin;
joiner.currentJoin = task;
do {} while (joiner.tryRemoveAndExec(task) && // process local tasks
(s = task.status) >= 0);
if (s >= 0 && (task instanceof CountedCompleter))
s = helpComplete(joiner, (CountedCompleter>)task, Integer.MAX_VALUE);
long cc = 0; // for stability checks
while (s >= 0 && (s = task.status) >= 0) {
if ((s = tryHelpStealer(joiner, task)) == 0 &&
(s = task.status) >= 0) {
if (!tryCompensate(cc))
cc = ctl;
else {
if (task.trySetSignal() && (s = task.status) >= 0) {
synchronized (task) {
if (task.status >= 0) {
try { // see ForkJoinTask
task.wait(); // for explanation
} catch (InterruptedException ie) {
}
}
else
task.notifyAll();
}
}
long c; // reactivate
do {} while (!U.compareAndSwapLong
(this, CTL, c = ctl,
((c & ~AC_MASK) |
((c & AC_MASK) + AC_UNIT))));
}
}
}
joiner.currentJoin = prevJoin;
}
return s;
}
final int doExec() {
int s; boolean completed;
if ((s = status) >= 0) {
try {
completed = exec();
} catch (Throwable rex) {
return setExceptionalCompletion(rex);
}
if (completed)
s = setCompletion(NORMAL);
}
return s;
}
private int setExceptionalCompletion(Throwable ex) {
int s = recordExceptionalCompletion(ex);
if ((s & DONE_MASK) == EXCEPTIONAL)
internalPropagateException(ex);
return s;
}
final int recordExceptionalCompletion(Throwable ex) {
int s;
if ((s = status) >= 0) {
int h = System.identityHashCode(this);
final ReentrantLock lock = exceptionTableLock;
lock.lock();
try {
expungeStaleExceptions();
ExceptionNode[] t = exceptionTable;
int i = h & (t.length - 1);
for (ExceptionNode e = t[i]; ; e = e.next) {
if (e == null) {
t[i] = new ExceptionNode(this, ex, t[i]);
break;
}
if (e.get() == this) // already present
break;
}
} finally {
lock.unlock();
}
s = setCompletion(EXCEPTIONAL);
}
return s;
}
private int setCompletion(int completion) {
for (int s;;) {
if ((s = status) < 0)
return s;
if (U.compareAndSwapInt(this, STATUS, s, s | completion)) {
if ((s >>> 16) != 0)
synchronized (this) { notifyAll(); }
return completion;
}
}
}
public boolean cancel(boolean mayInterruptIfRunning) {
return (setCompletion(CANCELLED) & DONE_MASK) == CANCELLED;
}
public void shutdown() {
checkPermission();
tryTerminate(false, true);
}
private boolean tryTerminate(boolean now, boolean enable) {
int ps;
if (this == common) // cannot shut down
return false;
if ((ps = plock) >= 0) { // enable by setting plock
if (!enable)
return false;
if ((ps & PL_LOCK) != 0 ||
!U.compareAndSwapInt(this, PLOCK, ps, ps += PL_LOCK))
ps = acquirePlock();
int nps = ((ps + PL_LOCK) & ~SHUTDOWN) | SHUTDOWN;
if (!U.compareAndSwapInt(this, PLOCK, ps, nps))
releasePlock(nps);
}
for (long c;;) {
if (((c = ctl) & STOP_BIT) != 0) { // already terminating
if ((short)(c >>> TC_SHIFT) + parallelism <= 0) {
synchronized (this) {
notifyAll(); // signal when 0 workers
}
}
return true;
}
if (!now) { // check if idle & no tasks
WorkQueue[] ws; WorkQueue w;
if ((int)(c >> AC_SHIFT) + parallelism > 0)
return false;
if ((ws = workQueues) != null) {
for (int i = 0; i < ws.length; ++i) {
if ((w = ws[i]) != null &&
(!w.isEmpty() ||
((i & 1) != 0 && w.eventCount >= 0))) {
signalWork(ws, w);
return false;
}
}
}
}
if (U.compareAndSwapLong(this, CTL, c, c | STOP_BIT)) {
for (int pass = 0; pass < 3; ++pass) {
WorkQueue[] ws; WorkQueue w; Thread wt;
if ((ws = workQueues) != null) {
int n = ws.length;
for (int i = 0; i < n; ++i) {
if ((w = ws[i]) != null) {
w.qlock = -1;
if (pass > 0) {
w.cancelAll();
if (pass > 1 && (wt = w.owner) != null) {
if (!wt.isInterrupted()) {
try {
wt.interrupt();
} catch (Throwable ignore) {
}
}
U.unpark(wt);
}
}
}
}
// Wake up workers parked on event queue
int i, e; long cc; Thread p;
while ((e = (int)(cc = ctl) & E_MASK) != 0 &&
(i = e & SMASK) < n && i >= 0 &&
(w = ws[i]) != null) {
long nc = ((long)(w.nextWait & E_MASK) |
((cc + AC_UNIT) & AC_MASK) |
(cc & (TC_MASK|STOP_BIT)));
if (w.eventCount == (e | INT_SIGN) &&
U.compareAndSwapLong(this, CTL, cc, nc)) {
w.eventCount = (e + E_SEQ) & E_MASK;
w.qlock = -1;
if ((p = w.parker) != null)
U.unpark(p);
}
}
}
}
}
}
}
不理解的地方: