from:https://engineering.linkedin.com/play/play-framework-async-io-without-thread-pool-and-callback-hell
Under the hood, LinkedIn consists of hundreds of services that can be evolved and scaled independently. That is, all functionality is broken down into separate codebases, deployed on separate hardware, and exposed via well-defined APIs. For example, we may have separate front-end services (e.g. Profile, Skills) that talk to separate back-end services (e.g. profile-backend, skills-backend), which in turn talk to separate data services (e.g.Voldemort or Kafka).
In this architecture, our services spend most of their time calling other services and waiting on I/O. Most server technologies rely on a thread pool to handle I/O. Unfortunately, in high traffic environments, thread pools can be hard to manage: big thread pools have too much overhead, but small thread pools may run out of threads if there is a spike in latency or traffic. Either case can cause the service's latency to increase, which cascades to the other services that depend on it; now their thread pools are inadequate, and the problem spreads further. This is thread pool hell.
Recently, we started using the Play Framework to build our services. Play is built on top ofAkka and Netty, so it’s fully asynchronous: that is, you can use non-blocking I/O when making calls to remote services. This makes it possible to support many concurrent connections without dealing with thread pools. It also makes it easy to make I/O calls in parallel to improve performance and enables the use of real-time, streaming, and server push technologies such as WebSockets.
However, if you’re not careful with asynchronous code, you may now find yourself in a new kind of hell: deeply nested spaghetti code that jumps all over the place, AKA callback hell (JavaScript, I’m looking at you). Fortunately, Play and Scala offer powerful abstractions to make it easy to write clean and performant async code that is easy to reason about (for the record, libraries like async and jQuery's deferred bring some of these features to JavaScript as well).
In this post, I’ll go over some of the basics of asynchronous programming, the tools that are available in Play 2.1 and Scala, and the patterns that make it easy to manage non-blocking code.
Under the hood, LinkedIn consists of hundreds of services that can be evolved and scaled independently. That is, all functionality is broken down into separate codebases, deployed on separate hardware, and exposed via well-defined APIs. For example, we may have separate front-end services (e.g. Profile, Skills) that talk to separate back-end services (e.g. profile-backend, skills-backend), which in turn talk to separate data services (e.g.Voldemort or Kafka).
In this architecture, our services spend most of their time calling other services and waiting on I/O. Most server technologies rely on a thread pool to handle I/O. Unfortunately, in high traffic environments, thread pools can be hard to manage: big thread pools have too much overhead, but small thread pools may run out of threads if there is a spike in latency or traffic. Either case can cause the service's latency to increase, which cascades to the other services that depend on it; now their thread pools are inadequate, and the problem spreads further. This is thread pool hell.
Recently, we started using the Play Framework to build our services. Play is built on top ofAkka and Netty, so it’s fully asynchronous: that is, you can use non-blocking I/O when making calls to remote services. This makes it possible to support many concurrent connections without dealing with thread pools. It also makes it easy to make I/O calls in parallel to improve performance and enables the use of real-time, streaming, and server push technologies such as WebSockets.
However, if you’re not careful with asynchronous code, you may now find yourself in a new kind of hell: deeply nested spaghetti code that jumps all over the place, AKA callback hell (JavaScript, I’m looking at you). Fortunately, Play and Scala offer powerful abstractions to make it easy to write clean and performant async code that is easy to reason about (for the record, libraries like async and jQuery's deferred bring some of these features to JavaScript as well).
In this post, I’ll go over some of the basics of asynchronous programming, the tools that are available in Play 2.1 and Scala, and the patterns that make it easy to manage non-blocking code.
Threaded vs. Evented Primer
We’ll start with a quick discussion of how different web frameworks handle concurrent requests, I/O, and threads. Bear in mind that this is just a high level overview that glosses over some details; as with any discussion of performance, results vary depending on the use case, hardware, OS, and programming language.
There are two general approaches:
- Threaded: Servlets, Ruby on Rails, Django, PHP
- Evented: Play, Node.js, Twisted
Most developers are used to working with Threaded servers: they dedicate one thread to each request, and that thread does all the processing for the request until a response is sent. Any I/O, such as a call to a remote service, is typically synchronous, which means the thread will become blocked and sit around idle until the I/O completes.
Here's an example of some blocking Scala code:
The code stops execution (blocks) at any line of code that is executing I/O and only executes the next line when the I/O is done, at which point the return value contains the data you requested. The code is executed (and read) from top to bottom.
By contrast, Evented servers typically use only a single thread per CPU core. The idea is to ensure that these scarce threads are never blocked: all I/O is asynchronous, so instead of waiting, the thread can process other requests, and only come back to the current one when the response from the I/O call is ready.
Here's an example of some non-blocking JavaScript code:
An I/O request fires, but the code continues on to the next line before the I/O actually completes. You typically provide some mechanism, such as a callback, to tell the framework to "execute this code later", when the data you requested is available. The flow of the code can no longer be read top to bottom, since a callback in the middle of the code may fire well after all the lines below it have already executed.
We’ll start with a quick discussion of how different web frameworks handle concurrent requests, I/O, and threads. Bear in mind that this is just a high level overview that glosses over some details; as with any discussion of performance, results vary depending on the use case, hardware, OS, and programming language.
There are two general approaches:
- Threaded: Servlets, Ruby on Rails, Django, PHP
- Evented: Play, Node.js, Twisted
Most developers are used to working with Threaded servers: they dedicate one thread to each request, and that thread does all the processing for the request until a response is sent. Any I/O, such as a call to a remote service, is typically synchronous, which means the thread will become blocked and sit around idle until the I/O completes.
Here's an example of some blocking Scala code:
The code stops execution (blocks) at any line of code that is executing I/O and only executes the next line when the I/O is done, at which point the return value contains the data you requested. The code is executed (and read) from top to bottom.
By contrast, Evented servers typically use only a single thread per CPU core. The idea is to ensure that these scarce threads are never blocked: all I/O is asynchronous, so instead of waiting, the thread can process other requests, and only come back to the current one when the response from the I/O call is ready.
Here's an example of some non-blocking JavaScript code:
An I/O request fires, but the code continues on to the next line before the I/O actually completes. You typically provide some mechanism, such as a callback, to tell the framework to "execute this code later", when the data you requested is available. The flow of the code can no longer be read top to bottom, since a callback in the middle of the code may fire well after all the lines below it have already executed.
Threaded vs. Evented performance
Threads can have significant memory overhead (e.g. default stack size for a single thread is 1MB on a 64bit JVM) and context switching overhead (e.g. saving register state, loading register state, impact on CPU cache/pipeline, lock contention). Creating threads on the fly tends to be expensive, so most servers use a fixed thread pool.
Therefore, with Threaded servers, the name of the game is "sizing the thread pool". If you don't have enough threads, it’s easy for all of them to become tied up waiting for I/O, preventing any new requests from being processed even though most of your threads are just idly waiting. If you have too many threads, the extra memory usage and context switching overhead become very costly. Ideally, you want the thread pool to have as many threads as the maximum number of concurrent calls your server will handle. Unfortunately, this number changes as traffic and/or downstream latency increases.
In other words, you're trying to configure a statically sized thread pool, but the proper value depends on something very dynamic (latency and traffic). Worse yet, you need to configure this thread pool on hundreds of different services, each of which talk to each other, impact each other's latency and traffic, and therefore, affect the very thread pool values you're trying to pick! This is thread pool hell.
On Evented servers, waiting for I/O is very cheap: idle requests have negligible cost, as they don’t hold up a thread. This is important, because waiting on I/O will typically dwarf all other server operations: for example, check out Latency Numbers Every Programmer Should Know and compare the time for a single roundtrip in a data center (500,000ns) to any operation within the same server (~5ns).
This means that for typical workloads, Evented servers will be able to handle far more concurrent requests than Threaded servers. However, be warned: even a single long calculation or accidental blocking I/O call can bring an Evented server to its knees. Check out Threaded vs. Evented Servers for some simple mathematical models of how threaded and evented servers behave under different load profiles.
Threads can have significant memory overhead (e.g. default stack size for a single thread is 1MB on a 64bit JVM) and context switching overhead (e.g. saving register state, loading register state, impact on CPU cache/pipeline, lock contention). Creating threads on the fly tends to be expensive, so most servers use a fixed thread pool.
Therefore, with Threaded servers, the name of the game is "sizing the thread pool". If you don't have enough threads, it’s easy for all of them to become tied up waiting for I/O, preventing any new requests from being processed even though most of your threads are just idly waiting. If you have too many threads, the extra memory usage and context switching overhead become very costly. Ideally, you want the thread pool to have as many threads as the maximum number of concurrent calls your server will handle. Unfortunately, this number changes as traffic and/or downstream latency increases.
In other words, you're trying to configure a statically sized thread pool, but the proper value depends on something very dynamic (latency and traffic). Worse yet, you need to configure this thread pool on hundreds of different services, each of which talk to each other, impact each other's latency and traffic, and therefore, affect the very thread pool values you're trying to pick! This is thread pool hell.
On Evented servers, waiting for I/O is very cheap: idle requests have negligible cost, as they don’t hold up a thread. This is important, because waiting on I/O will typically dwarf all other server operations: for example, check out Latency Numbers Every Programmer Should Know and compare the time for a single roundtrip in a data center (500,000ns) to any operation within the same server (~5ns).
This means that for typical workloads, Evented servers will be able to handle far more concurrent requests than Threaded servers. However, be warned: even a single long calculation or accidental blocking I/O call can bring an Evented server to its knees. Check out Threaded vs. Evented Servers for some simple mathematical models of how threaded and evented servers behave under different load profiles.
Threaded vs. Evented programming models
As we saw with the Scala and JavaScript examples above, for very simple cases, the Evented (asynchronous) code is generally more complicated than Threaded (synchronous) code. However, in most real-world scenarios, you’ll have to make several I/O calls, and to make them fast, you’ll need to do them in parallel.
In the Threaded world, performing I/O in parallel usually involves setting up a separate thread pool (more thread pool hell!). However, that means your I/O is now executing asynchronously, so, just as in the Evented world, you now need a way to say "execute this code later". At this point, you either switch to an Evented programming style, or you bolt on some solution that feels Threaded, such as Java Futures (with blocking get()
calls) orcontinuations. In either case, complexity goes up and flexibility goes down.
In the Evented world, all I/O runs in parallel by default and "execute this code later" is a core concept that will feel natural and consistent. The code will "scale" better, following the same patterns for both simple and complicated examples. We'll see some examples of Evented code in Play in the next several sections.
As we saw with the Scala and JavaScript examples above, for very simple cases, the Evented (asynchronous) code is generally more complicated than Threaded (synchronous) code. However, in most real-world scenarios, you’ll have to make several I/O calls, and to make them fast, you’ll need to do them in parallel.
In the Threaded world, performing I/O in parallel usually involves setting up a separate thread pool (more thread pool hell!). However, that means your I/O is now executing asynchronously, so, just as in the Evented world, you now need a way to say "execute this code later". At this point, you either switch to an Evented programming style, or you bolt on some solution that feels Threaded, such as Java Futures (with blocking get()
calls) orcontinuations. In either case, complexity goes up and flexibility goes down.
In the Evented world, all I/O runs in parallel by default and "execute this code later" is a core concept that will feel natural and consistent. The code will "scale" better, following the same patterns for both simple and complicated examples. We'll see some examples of Evented code in Play in the next several sections.
Async code in Play
The Play Framework uses an MVC pattern, which means most of the logic for I/O will live in the Controllers
and Models
. They have identical behavior in terms of async code, so for this blog post, we’ll just focus on Controllers
.
A Controller
in Play consists of actions, or methods, that return some sort of Result
:
The example above uses the Ok
method, which is a convenient way to return a Result
that’s a 200 OK
with some text.
Let’s now look at a Controller
with some asynchronous I/O. We’ll use Play’s Web Services (WS
) library to make a non-blocking HTTP call to a remote web service (in this case, example.com):
The WS.get()
method returns a Future[Response]
. This is a Scala Future, which represents a class that will eventually contain the result of an asynchronous operation (in this case, the Response
from example.com). When it’s available, we’d like to use it to build our Play Result
, but how do we go from a Future
to a Result
?
The first step is to transform the Future[Response]
into a Future[Result]
. We can do this using a method called map
(more on map
later):
Next, we need to convert the Future[Result]
into a Result
. This is easy: Play has a subclass of Result
called AsyncResult
, which can be created by passing aFuture[Result]
to the Async
function:
If you run the code above, it'll proxy example.com:
It’s important to understand that this code does not block at any point. You can demonstrate this by adding a few log statements:
Most of the time, this will be the console output:
Under the hood, Play uses a thread pool sized to one thread per CPU core. One of these scarce threads, T1
, executes the proxy
action, running through the code from top to bottom, except the contents of the function passed to the map
method, since that depends on a non-blocking I/O call that has not yet completed. Once T1
returns the AsyncResult
, it moves on to process other requests. Later on, when the response from example.com is finally available, another thread T2
(which may or may not be the same as T1
) executes the function passed to the map
method. At no point were either of the threads blocked waiting on the response from example.com.
The Play Framework uses an MVC pattern, which means most of the logic for I/O will live in the Controllers
and Models
. They have identical behavior in terms of async code, so for this blog post, we’ll just focus on Controllers
.
A Controller
in Play consists of actions, or methods, that return some sort of Result
:
The example above uses the Ok
method, which is a convenient way to return a Result
that’s a 200 OK
with some text.
Let’s now look at a Controller
with some asynchronous I/O. We’ll use Play’s Web Services (WS
) library to make a non-blocking HTTP call to a remote web service (in this case, example.com):
The WS.get()
method returns a Future[Response]
. This is a Scala Future, which represents a class that will eventually contain the result of an asynchronous operation (in this case, the Response
from example.com). When it’s available, we’d like to use it to build our Play Result
, but how do we go from a Future
to a Result
?
The first step is to transform the Future[Response]
into a Future[Result]
. We can do this using a method called map
(more on map
later):
Next, we need to convert the Future[Result]
into a Result
. This is easy: Play has a subclass of Result
called AsyncResult
, which can be created by passing aFuture[Result]
to the Async
function:
If you run the code above, it'll proxy example.com:
It’s important to understand that this code does not block at any point. You can demonstrate this by adding a few log statements:
Most of the time, this will be the console output:
Under the hood, Play uses a thread pool sized to one thread per CPU core. One of these scarce threads, T1
, executes the proxy
action, running through the code from top to bottom, except the contents of the function passed to the map
method, since that depends on a non-blocking I/O call that has not yet completed. Once T1
returns the AsyncResult
, it moves on to process other requests. Later on, when the response from example.com is finally available, another thread T2
(which may or may not be the same as T1
) executes the function passed to the map
method. At no point were either of the threads blocked waiting on the response from example.com.
What’s this "map" thing all about?
Some people will find that using a function called map
to convert a Future[Response]
into a Future[Result]
may seem a little strange. Let’s try to build some intuition around it.
map
is actually a functional programming concept that's easiest to understand when thinking about Lists. If you call map
on a List
, you get back a new List
where each element of the new List
is the result of applying a function to each element of the originalList
. For example, consider this lower
function:
What happens if we map
this function over a List
of Strings?
We get back a new List
of Strings that are all lower case. In other words, we transformed a List[String]
into a new List[String]
.
It’s also possible to use map
to convert a List[X]
into a List of some other type, such as a List[Y]
. For example, consider this strlen
function:
What happens if we map
this function over the same List
of Strings?
We now get back a new List
, where each element represents the length of a String in the original List
. That is, we’ve now transformed a List[String]
into a new List[Int]
.
Let’s look at one more case: a function called explode
that returns a List
of all the characters in the given String:
Look at what happens if we map
our List
of Strings over explode
:
We’ve now transformed a List[String]
into a new List[List[Char]]
. What if we didn’t want to have the nested List-in-a-List? To do that, we can use flatMap
instead of map
:
As the name implies, flatMap
behaves just like map
, except it combines (flattens) the nested Lists
into a single, flat List
.
The map
and flatMap
functions aren’t unique to Lists
. They are just as useful for other types of "containers", such as Array
, Set
, Map
, and so on:
As it turns out, a Future[T]
is also a "container": it’s a bit of a strange one, in that it contains just one element, but the same map
and flatMap
functions can be used to transform a Future[X]
into a Future[Y]
:
One detail that we’ve glossed over is that the "container" class controls when it actually calls the functions passed into map
and flatMap
. A List
may call the function immediately. However, a Future
will only do it when the I/O has completed and data is available.
This is ok, because the return value for Future.map
and Future.flatMap
is anotherFuture
. You can immediately transform, compose, and reason about this returned value because it also will control when it calls any functions passed to its map
, flatMap
, and other methods.
Some people will find that using a function called map
to convert a Future[Response]
into a Future[Result]
may seem a little strange. Let’s try to build some intuition around it.
map
is actually a functional programming concept that's easiest to understand when thinking about Lists. If you call map
on a List
, you get back a new List
where each element of the new List
is the result of applying a function to each element of the originalList
. For example, consider this lower
function:
What happens if we map
this function over a List
of Strings?
We get back a new List
of Strings that are all lower case. In other words, we transformed a List[String]
into a new List[String]
.
It’s also possible to use map
to convert a List[X]
into a List of some other type, such as a List[Y]
. For example, consider this strlen
function:
What happens if we map
this function over the same List
of Strings?
We now get back a new List
, where each element represents the length of a String in the original List
. That is, we’ve now transformed a List[String]
into a new List[Int]
.
Let’s look at one more case: a function called explode
that returns a List
of all the characters in the given String:
Look at what happens if we map
our List
of Strings over explode
:
We’ve now transformed a List[String]
into a new List[List[Char]]
. What if we didn’t want to have the nested List-in-a-List? To do that, we can use flatMap
instead of map
:
As the name implies, flatMap
behaves just like map
, except it combines (flattens) the nested Lists
into a single, flat List
.
The map
and flatMap
functions aren’t unique to Lists
. They are just as useful for other types of "containers", such as Array
, Set
, Map
, and so on:
As it turns out, a Future[T]
is also a "container": it’s a bit of a strange one, in that it contains just one element, but the same map
and flatMap
functions can be used to transform a Future[X]
into a Future[Y]
:
One detail that we’ve glossed over is that the "container" class controls when it actually calls the functions passed into map
and flatMap
. A List
may call the function immediately. However, a Future
will only do it when the I/O has completed and data is available.
This is ok, because the return value for Future.map
and Future.flatMap
is anotherFuture
. You can immediately transform, compose, and reason about this returned value because it also will control when it calls any functions passed to its map
, flatMap
, and other methods.
Composing futures with map and flatMap
map
and flatMap
make it easy to transform and compose Futures
. We saw above how to transform a single Future[Response]
into a Future[Result]
using map
:
What if you wanted to fire off two WS
requests in parallel? Here is an example that calls two search engines and measures how long it took to get a response from each:
What if one WS
request depends on the other? Here is how you could use map
andflatMap
to arrange async I/O sequentially:
map
and flatMap
make it easy to transform and compose Futures
. We saw above how to transform a single Future[Response]
into a Future[Result]
using map
:
What if you wanted to fire off two WS
requests in parallel? Here is an example that calls two search engines and measures how long it took to get a response from each:
What if one WS
request depends on the other? Here is how you could use map
andflatMap
to arrange async I/O sequentially:
Error handling, timeouts, and composition
Since non-blocking calls in Play return a Future
, we can perform certain types of composition that are harder in a callback-only world.
For example, you can easily add error handling to a Future
using the recover
method. This lets you specify a function to handle the error and return a new Future
that will contain a fallback value if an error occurred:
It's also possible to pass the Future
through some timeout-handling code, which will transform it into one that will have the value your requested if the Future
is redeemed within a certain time limit, or a fallback value if the I/O takes too long:
You can also compose a Future
with performance measuring code that will transform it into one that records how long the request took:
Finally, you can wrap all of this "instrumentation" code into a single withInstrumentation
function:
Working with Future
objects instead of callbacks makes composition easier. The "instrumentation" code above is fairly generic and reusable, so you can add it to any app without significantly modifying the code, without changing the signature of every callback function, and so on.
Since non-blocking calls in Play return a Future
, we can perform certain types of composition that are harder in a callback-only world.
For example, you can easily add error handling to a Future
using the recover
method. This lets you specify a function to handle the error and return a new Future
that will contain a fallback value if an error occurred:
It's also possible to pass the Future
through some timeout-handling code, which will transform it into one that will have the value your requested if the Future
is redeemed within a certain time limit, or a fallback value if the I/O takes too long:
You can also compose a Future
with performance measuring code that will transform it into one that records how long the request took:
Finally, you can wrap all of this "instrumentation" code into a single withInstrumentation
function:
Working with Future
objects instead of callbacks makes composition easier. The "instrumentation" code above is fairly generic and reusable, so you can add it to any app without significantly modifying the code, without changing the signature of every callback function, and so on.
Comprehensions
While map
and flatMap
are powerful concepts, using them for complicated cases can lead to deep nested structures. For example, let’s say we wanted to do 3 sequential, non-blocking calls. With map
and flatMap
, it would look something like this:
Yuck, this looks an awful lot like the callback hell we wanted to avoid! Fortunately, there is a better way. Scala supports comprehensions, which are effectively syntactic sugar formap
and flatMap
. The code above can be elegantly rewritten like this:
Much better, right? This pattern scales nicely too, looking much the same even with twice as many sequential steps:
Ok, so how about doing the 3 calls in parallel instead? You can use a comprehension again, but this time, call the WS.get()
method outside of it:
Woohoo, no nesting and callback hell! The code is easy to follow and consistent whether you’re doing parallel or sequential calls. It’s also easy to extend: we can easily combine it with the withInstrumentation
function from the previous section:
And there you have it: fully parallelized, non-blocking I/O with error handling, performance measurement, and SLAs. All in a few lines of code that are easy to read and reason about.
While map
and flatMap
are powerful concepts, using them for complicated cases can lead to deep nested structures. For example, let’s say we wanted to do 3 sequential, non-blocking calls. With map
and flatMap
, it would look something like this:
Yuck, this looks an awful lot like the callback hell we wanted to avoid! Fortunately, there is a better way. Scala supports comprehensions, which are effectively syntactic sugar formap
and flatMap
. The code above can be elegantly rewritten like this:
Much better, right? This pattern scales nicely too, looking much the same even with twice as many sequential steps:
Ok, so how about doing the 3 calls in parallel instead? You can use a comprehension again, but this time, call the WS.get()
method outside of it:
Woohoo, no nesting and callback hell! The code is easy to follow and consistent whether you’re doing parallel or sequential calls. It’s also easy to extend: we can easily combine it with the withInstrumentation
function from the previous section:
And there you have it: fully parallelized, non-blocking I/O with error handling, performance measurement, and SLAs. All in a few lines of code that are easy to read and reason about.
Future[MoreToCome]
This is just a taste of async programming in Play. For more information, check out the documentation for Asynchronous Http Programming and Handling Data Streams Reactively, as well as the comet-clock, comet-live-monitoring, and websocket-chatexample apps.
If you found this interesting, make sure to come back for future posts, where we’ll discuss async programming with Java, real time web-apps, and more.
This is just a taste of async programming in Play. For more information, check out the documentation for Asynchronous Http Programming and Handling Data Streams Reactively, as well as the comet-clock, comet-live-monitoring, and websocket-chatexample apps.
If you found this interesting, make sure to come back for future posts, where we’ll discuss async programming with Java, real time web-apps, and more.