public final class Deferredextends Object
A thread-safeimplementation of a deferred result for easy asynchronous processing.
Thisimplementation is based on Twisted's Python Deferred
API.
defer.py
This API is a simple and elegant way ofmanaging asynchronous and dynamic "pipelines" (processing chains)without having to explicitly define a finite state machine.
The tl;dr version
We're all busy and don't always have timeto RTFM in details. Please pay special attention to theinvariants youmust respect. Other than that, here's an executive summary of whatDeferred
offers:
Deferred
is like a Future
with a dynamic Callback
chain associated to it.Deferred
, the next callback in the chain doesn't get executed until that otherDeferred
result becomes available.Callback
that handles errors is called an "errback".Deferred
is an important building block for writing easy-to-use asynchronous APIs in a thread-safe fashion.Understanding the concept of Deferred
The idea is that a Deferred
represents a result that's not yet available. An asynchronous operation (I/O,RPC, whatever) has been started and will hand its result (be it successful ornot) to theDeferred
inthe future. The key difference between a Deferred
and aFuture
isthat a Deferred
has acallback chain associated to it, whereas withjust aFuture
youneed get the result manually at some point, which poses problems such as: Howdo you know when the result is available? What if the result itself depends onanother future?
When you start anasynchronous operation, you typically want to be called back when the operationcompletes. If the operation was successful, you want your callback to use itsresult to carry on what you were doing at the time you started the asynchronousoperation. If there was an error, you want to trigger some error handling code.
But there's moreto a Deferred
than a single callback. You can addarbitrary number of callbacks, which effectively allows you to easily buildcomplex processing pipelines in a really simple and elegant way.
Understanding the callback chain
Let's take a typical example. You'rewriting a client library for others to use your simple remote storage service.When your users call theget
method in your library, you want toretrieve some piece of data from your remote service and hand it back to theuser, but you want to do so in an asynchronous fashion.
When the user ofyour client library invokes get
, you assemble arequest and send it out to the remote server through a socket. Before sendingit to the socket, you create aDeferred
and you store itsomewhere, for example in a map, to keep an association between the request andthisDeferred
. You then return this Deferred
to the user, this is how they will access the deferred result as soon as theRPC completes.
Sooner or later,the RPC will complete (successfully or not), and your socket will becomereadable (or maybe closed, in the event of a failure). Let's assume for nowthat everything works as expected, and thus the socket is readable, so you readthe response from the socket. At this point you extract the result of theremote get
call, and you hand it out to theDeferred
you created for this request (remember, you had to store it somewhere, so youcould give it the deferred result once you have it). TheDeferred
then stores this result and triggers any callback that may have been added toit. The expectation is that the user of your client library, after calling yourget
method, will add aCallback
to the Deferred
you gave them. This way, when the deferred result becomes available, you'llcall it with the result in argument.
So far what we'veexplained is nothing more than a Future
with a callbackassociated to it. But there's more toDeferred
than just this. Let's assume now that someone else wants to build a cachinglayer on top of your client library, to avoid repeatedlyget
tingthe same value over and over again through the network. Users who want to usethe cache will invokeget
on the caching library instead of directlycalling your client library.
Let's assume thatthe caching library already has a result cached for a get
call. It will create aDeferred
, and immediately hand it the cachedresult, and it will return thisDeferred
to the user. Theuser will add a Callback
to it, which will be immediatelyinvoked since the deferred result is already available. So the entireget
call completed virtually instantaneously and entirely from the same thread.There was no context switch (no other thread involved, no I/O and whatnot),nothing ever blocked, everything just happened really quickly.
Now let's assumethat the caching library has a cache miss and needs to do a remoteget
call using the original client library described earlier. The RPC is sent outto the remote server and the client library returns aDeferred
to the caching library. This is where things become exciting. The cachinglibrary can then add its own callback to theDeferred
before returning it to the user. This callback will take the result that cameback from the remote server, add it to the cache and return it. As usual, theuser then adds their own callback to process the result. So now theDeferred
has 2 callbacks associated to it:
1st callback 2nd callback
Deferred: add to cache --> user callback
When the RPC completes, the original clientlibrary will de-serialize the result from the wire and hand it out to theDeferred
. Thefirst callback will be invoked, which will add the result to the cache of thecaching library. Then whatever the first callback returns will be passed on tothe second callback. It turns out that the caching callback returns theget
response unchanged, so that will be passed on to the user callback.
Now it's veryimportant to understand that the first callback could have returned anotherarbitrary value, and that's what would have been passed to the second callback.This may sound weird at first but it's actually the key behindDeferred
.
To illustrate why,let's complicate things a bit more. Let's assume the remote service that servesthoseget
requests is a fairly simple and low-levelstorage service (think memcached
), so it only works with byte arrays, itdoesn't care what the contents is. So the original client library is onlyde-serializing the byte array from the network and handing that byte array totheDeferred
.
Now you're writinga higher-level library that uses this storage system to store some of yourcustom objects. So when you get the byte array from the server, you need tofurther de-serialize it into some kind of an object. Users of your higher-levellibrary don't care about what kind of remote storage system you use, the onlything they care about isget
ting those objectsasynchronously. Your higher-level library is built on top of the originallow-level library that does the RPC communication.
When the users ofthe higher-level library call get
, you call get
on the lower-level library, which issues an RPC call and returns a Deferred
to the higher-level library. The higher-level library then adds a firstcallback to further de-serialize the byte array into an object. Then the userof the higher-level library adds their own callback that does something withthat object. So now we have something that looks like this:
1st callback 2nd callback
Deferred: de-serialize to an object --> user callback
When the resultcomes in from the network, the byte array is de-serialized from the socket. Thefirst callback is invoked and its argument is theinitial result, the byte array. So thefirst callback further de-serializes it into some object that it returns. Thesecond callback is then invoked and its argument isthe result of the previous callback,that is the de-serialized object.
Now back to thecaching library, which has nothing to do with the higher level library. All itdoes is, given an object that implements some interface with aget
method, it keeps a map of whatever arguments get
receives to anObject
that was cached for this particular get
call. Thanks to the way the callback chain works, it's possible to use thecaching library together with the higher-level library transparently. Users whowant to use caching simply need to use the caching library together with thehigher level library. Now when they callget
on the caching library, and there's a cache miss, here's what happens, step bystep:
1. The caching library calls get
onthe higher-level library.
2. The higher-level library calls get
onthe lower-level library.
3. The lower-level library creates a Deferred
,issues out the RPC call and returns itsDeferred
.
4. The higher-level library adds its ownobject de-serialization callback to theDeferred
andreturns it.
5. The caching library adds its owncache-updating callback to the Deferred
and returns it.
6. The user gets the Deferred
andadds their own callback to do something with the object retrieved from the datastore.
1st callback 2nd callback 3rd callback
Deferred: de-serialize --> add to cache --> user callback
result: (none available)
Once the response comes back, the firstcallback is invoked, it de-serializes the object, returns it. Thecurrentresult of the Deferred
becomes the de-serialized object. Thecurrent state of theDeferred
is as follows:
2nd callback 3rd callback
Deferred: add to cache --> user callback
result: de-serialized object
Because there are more callbacks in thechain, the Deferred
invokes the next one and gives it the current result (the de-serialized object)in argument. The callback adds that object to its cache and returns itunchanged.
3rd callback
Deferred: user callback
result: de-serialized object
Finally, the user's callback is invokedwith the object in argument.
Deferred: (no more callbacks)
result: (whatever the user's callback returned)
If you think this is becoming interesting,read on, you haven't reached the most interesting thing aboutDeferred
yet.
Building dynamic processing pipelines withDeferred
Let's complicate the previous example alittle bit more. Let's assume that the remote storage service that serves thoseget
callsis a distributed service that runs on many machines. The data is partitionedover many nodes and moves around as nodes come and go (due to machine failuresand whatnot). In order to execute a get
call,the low-level client library first needs to know which server is currentlyserving that piece of data. Let's assume that there's another server, which ispart of that distributed service, that maintains an index and keeps track ofwhere each piece of data is. The low-level client library first needs to lookupthe location of the data using that first server (that's a first RPC), thenretrieves it from the storage node (that's another RPC). End users don't carethat retrieving data involves a 2-step process, they just want to callget
andbe called back when the data (a byte array) is available.
This is wherewhat's probably the most useful feature of Deferred
comes in. When the user callsget
, the low-level librarywill issue a first RPC to the index server to locate the piece of datarequested by the user. When issuing thislookup
RPC, a Deferred
gets created. The low-level get
code adds a first callback to process the lookup
response and then returns it to the user.
1st callback 2nd callback
Deferred: index lookup --> user callback
result: (none available)
Eventually, the lookup
RPCcompletes, and the Deferred
is given thelookup
response. So before triggering the first callback, the Deferred
willbe in this state:
1st callback 2nd callback
Deferred: index lookup --> user callback
result: lookup response
The first callback runs and now knows whereto find the piece of data initially requested. It issues theget
request to the right storage node. Doing so creates anotherDeferred
,let's call it (B)
,which is then returned by theindex lookup
callback. And this is wherethe magic happens. Now we're in this state:
(A) 2nd callback | (B)
|
Deferred: user callback | Deferred: (no more callbacks)
result: Deferred (B) | result: (none available)
Because a callback returned a Deferred
, wecan't invoke the user callback just yet, since the user doesn't want theircallback receive aDeferred
, they want it to receive a byte array. Thecurrent callback getspaused and stops processing thecallback chain. This callback chain needs to be resumed whenever theDeferred
ofthe get
call[(B)
]completes. In order to achieve that, a callback is added to that otherDeferred
thatwill resume the execution of the callback chain.
(A) 2nd callback | (B) 1st callback
|
Deferred: user callback | Deferred: resume (A)
result: Deferred (B) | result: (none available)
Once (A)
addedthe callback on (B)
, it can return immediately, there's noneed to wait, block a thread or anything like that. So the whole process ofreceiving thelookup
response and sending out the get
RPC happened really quickly, withoutblocking anything.
Now when the get
response comes back from the network, the RPC layer de-serializes the bytearray, as usual, and hands it to(B)
:
(A) 2nd callback | (B) 1st callback
|
Deferred: user callback | Deferred: resume (A)
result: Deferred (B) | result: byte array
(B)
'sfirst and only callback is going to set the result of (A)
andresume (A)
'scallback chain.
(A) 2nd callback | (B) 1st callback
|
Deferred: user callback | Deferred: resume (A)
result: byte array | result: byte array
So now (A)
resumes its callback chain, and invokes the user's callback with the byte arrayin argument, which is what they wanted.
(A) | (B) 1st callback
|
Deferred: (no more cb) | Deferred: resume (A)
result: (return value of | result: byte array
the user's cb)
Then (B)
moveson to its next callback in the chain, but there are none, so(B)
isdone too.
(A) | (B)
|
Deferred: (no more cb) | Deferred: (no more cb)
result: (return value of | result: byte array
the user's cb)
The whole process of reading the get
response, resuming the initialDeferred
and executing the second Deferred
happened all in the same thread, sequentially, and without blocking anything(provided that the user's callback didn't block, as it must not).
What we've done isessentially equivalent to dynamically building an implicit finite state machineto handle the life cycle of theget
request. This simpleAPI allows you to build arbitrarily complex processing pipelines that makedynamic decisions at each stage of the pipeline as to what to do next.
Handling errors
A Deferred
hasin fact not one but two callback chains. The first chain is the"normal" processing chain, and the second is the error handlingchain. Twisted calls an error handling callback an "errback", sowe've kept that term here. When the asynchronous processing completes with anerror, the Deferred
mustbe given the Exception
thatwas caught instead of giving it the result (or if noException
wascaught, one must be created and handed to the Deferred
).When the current result of a Deferred
is an instance ofException
, thenext errback is invoked. As for normal callbacks, whatever the errback returnsbecomes the current result. If the current result is still an instance ofException
, thenext errback is invoked. If the current result is no longer anException
, thenext callback is invoked.
When a callback oran errback itself throws an exception, it is caught by theDeferred
and becomes the current result, which means that the next errback in the chainwill be invoked with that exception in argument. Note thatDeferred
will only catch Exception
s, not any Throwable
or Error
.
Contract and Invariants
Read this carefully as this is yourwarranty.
Deferred
can receive only one initial result.Deferred
. This class does not create or manage any thread or executor.Deferred
will lose its reference to it.Deferred
does so in O(1)
.Deferred
cannot receive itself as an initial or intermediate result, as this would cause an infinite recursion.Deferred
s, as this would cause an infinite recursion (thankfully, it will quickly fail with aCallbackOverflowError
).Deferred
in argument. This is because they always receive the result of a previous callback, and when the result becomes aDeferred
, we suspend the execution of the callback chain until the result of that otherDeferred
is available.Exception
in argument. This because they're always given to the errbacks.Deferred
can lead to a deadlock, so don't use it. In other words, writingsynchronized (some_deferred) { ... }
(or anything equivalent) voids yourwarranty.
简单示例:
package test;
import com.stumbleupon.async.Callback;
import com.stumbleupon.async.Deferred;
public class TestDeferred {
/**
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
System.out.println("张三向李四借钱");
long l = lend().joinUninterruptibly();
System.out.println("张三向李四借钱" + l);
System.out.println("张三开始打牌");
}
public static Deferred lend() {
System.out.println("李四说回家看看家里还有多少钱");
class GetCB implements Callback {
@Override
public Long call(Long arg) throws Exception {
return arg;
}
}
return check().addCallback(new GetCB());
}
public static Deferred check() {
System.out.println("李四在家查看还有多少钱");
try {
Thread.sleep(10*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return Deferred.fromResult(new Long(1000));
}
}