PostRank recently released a new Ruby web server: Goliath. It uses an event loop in the same manner as node.js and nginx to achieve high levels of concurrency, but adds some special sauce that allows traditionally complicated asynchronous code to be written in a synchronous style.
For example, asynchronous code in Ruby typically looks like this (using the eventmachine library):
1
2
3
4
5
6
7
8
|
require
'eventmachine'
require
'em-http'
EM
.run {
puts http.response
}
}
|
This is neat in that it allows the application to do other things while the HTTP request completes (it is “non-blocking”), but to fetch two sites in succession, you need to nest callbacks:
1
2
3
4
5
6
7
8
|
# extract_next_url is a fake method, you get the idea
url = extract_next_url(http.response)
EM
::HttpRequest.
new
(url).get.callback {|http2|
puts http2.response
}
}
|
As you can imagine, this pattern gets messy fast. Goliath allows us to write the above code in the simple synchronous fashion we are familiar with:
1
2
3
4
|
# extract_next_url is a fake method, you get the idea
url = extract_next_url(http.response)
http2 =
EM
::HttpRequest.
new
(url).get
|
…yet behind the scenes it still executes asynchronously! Other code can be run while the HTTP requests are running.
This blows my mind. How does it work? Let’s find out.
Fibers
From the documentation, Goliath claims to works its magic by “leveraging Ruby fibers introduced in Ruby 1.9+”. This first hint sends us to the ruby rdocs to find:
Fibers are primitives for implementing light weight cooperative concurrency in Ruby. Basically they are a means of creating code blocks that can be paused and resumed, much like threads. The main difference is that they are never preempted and that the scheduling must be done by the programmer and not the VM.
Urgh, too many big words. Let’s just dive in and start poking around the Goliath code. The Goliath documentation contains a full example for proxying a site:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
require
'goliath'
require
'em-synchrony'
require
'em-synchrony/em-http'
class
HelloWorld < Goliath::
API
def
response(env)
resp = req.response
[
200
, {}, resp]
end
end
# to play along at home:
# $ gem install goliath
# $ gem install em-http-request --pre
# $ ruby hello_world.rb -sv
|
We know that for this to occur in an asynchronous manner there must be some funny business going on in that #get
call, so let’s try and find that. My spider sense tells me it will be somewhere in em-synchrony/em-http
…
1
2
3
4
5
|
$ gem unpack em-synchrony
Unpacked gem:
'/Users/xavier/Code/tmp/em-synchrony-0.3.0.beta.1'
$ cd em-synchrony-
0
.
3
.
0
.beta.
1
# I used tab completion on the next line to find the exact path
$ cat lib/em-synchrony/em-http.rb
|
That reveals:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
# em-synchrony/lib/em-synchrony/em-http.rb
begin
require
"em-http"
rescue
LoadError =< error
raise
"Missing EM-Synchrony dependency: gem install em-http-request"
end
module
EventMachine
module
HTTPMethods
%w[get head post delete put].
each
do
|type|
class_eval %[
alias
:a
#{type} :#{type}
def
#{type}(options = {}, &blk)
f = Fiber.current
conn = setup_request(:
#{type}, options, &blk)
conn.callback { f.resume(conn) }
conn.errback { f.resume(conn) }
Fiber.
yield
end
]
end
end
end
|
Jackpot! Fibers! It appears to be monkey-patching the existing em-http
library, so before we go too much further let’s find out what normal em-http
code looks like without fibers. There is a handy example on the em-http-request wiki:
1
2
3
4
5
6
7
8
9
10
11
12
|
EventMachine.run {
http = EventMachine::HttpRequest.
new
(
'http://google.com/'
).get
:query
=< {
'keyname'
=<
'value'
}
http.errback { p
'Uh oh'
;
EM
.stop }
http.callback {
p http.response_header.status
p http.response_header
p http.response
EventMachine.stop
}
}
|
It looks almost similar to the code above which is promising, and when we dig in a bit further it becomes even more so.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
$ gem unpack em-http
ERROR
: While executing gem ... (Gem::RemoteFetcher::FetchError)
SocketError: getaddrinfo: nodename nor servname provided,
or
not
known (http://rubygems.org/latest_specs.
4
.
8
.gz)
# Oh noes it doesn't work!
# Search for em gems
$ gem list em-
***
LOCAL
GEMS
***
em-http-request (
1
.
0
.
0
.beta.
2
,
0
.
3
.
0
)
em-socksify (
0
.
1
.
0
)
em-synchrony (
0
.
3
.
0
.beta.
1
)
$ gem unpack em-http-request
# Ah that is probably it
$ cd em-http-request-
1
.
0
.
0
.beta.
2
$ ack
"get"
lib/
lib/em-http/http_connection.rb
4
:
def
get options = {}, &blk; setup_request(
:get
, options, &blk);
end
|
Note on the last line that get
defers straight to setup_request
, which is the same call that is made in fiber example above. Yep, pretty much the same. Now we can head back to the fiber code.
1
2
3
4
5
|
f = Fiber.current
conn = setup_request(:
#{type}, options, &blk)
conn.callback { f.resume(conn) }
conn.errback { f.resume(conn) }
Fiber.
yield
|
It appears what is happening is rather than immediately doing any work when a callback triggers, resume
is called on the current fiber, presumably starting back up this thread at the point yield
was called. Checking the documentation for Fiber.yield
validates this, and also explains how the conn
variable is returned from this method in the last sentence:
Yields control back to the context that resumed the fiber, passing along any arguments that were passed to it. The fiber will resume processing at this point when resume is called next. Any arguments passed to the next resume will be the value that this Fiber.yield expression evaluates to.
Using it
We now have an idea of how Goliath works it magic, though it may still be a fuzzy one. Let’s see if we have it right by trying to write some code that emulates it.
Remember that this fiber trick is simply a way of simplifying callback-littered code, so we should be able to first write a non-fiber-aware method and then clean it up. I like to start with a dirt simple example, so we are going to write a basic Goliath class that blocks for one second then renders some text.
1
2
3
4
5
6
|
class
Surprise < Goliath::
API
def
response(env)
sleep
1
[
200
, {},
"Surprise!"
]
end
end
|
Hit that in your web browser and bingo, it waits for a second. Not so fast though tiger, what happens when we issue multiple simultaneous requests:
1
2
|
$ ab -n
3
-c
3
127
.
0
.
0
.
1
:
9000
/ | grep
"Time taken"
Time
taken
for
tests:
3
.
011
seconds
|
Alas, our webserver was only serving one request at a time. That’s not web scale. The sleep
call not only blocks our response, but the entire server. That’s why we moved to evented programming in the first place. Let’s try a classic EventMachine timer instead:
1
2
3
4
5
6
7
|
class
Surprise < Goliath::
API
def
response(env)
EventMachine.add_timer
1
, proc {
[
200
, {},
"Surprise!"
]
}
end
end
|
Of course this does not work, because the #response
method needs to appear synchronous. What happens in this case is that the #add_timer
returns nil
and Goliath immediately tries to render that, exploding in the process. The timer triggers sometime later, and no code is still around to care. We cannot send the result of our timer proc as the return value for the method.
We need to combine the synchronous nature of the first example, with the asynchronous elements of the second; a beautiful frankenstein. Hopefully you have caught on that we can use fibers to do the stitching.
1
2
3
4
5
6
7
8
9
|
class
Surprise < Goliath::
API
def
response(env)
f = Fiber.current
EventMachine.add_timer
1
, proc { f.resume }
Fiber.
yield
[
200
, {},
"Surprise!"
]
end
end
|
We steal the pattern we saw in em-synchronicity/em-http
above, grabbing the current fiber and setting up a resume
call in the asynchronous callback which resumes execution over at the Fiber.yield
. Testing this with ab, we see that this indeed solves our concurrency issue:
1
2
|
$ ab -n
3
-c
3
127
.
0
.
0
.
1
:
9000
/ | grep
"Time taken"
Time
taken
for
tests:
1
.
009
seconds
|
These fiber things are pretty cool.
Wrapping Up
In exploring the Goliath source code and associated libraries we discovered how it pulls off its asynchronous-masquerading-as-synchronous trick, and were able to put that knowledge into practice with a simple example.
To practice your code reading, here are some other research tasks for you to try:
- Find where Goliath calls into the
#response
method and see if there are any other lurking fiber tricks to be found. - Investigate one of the other libraries that
em-synchrony
provides an API for, such asem-mongo
. - Rack-fiber_pool uses fibers in a similar context, check it out and see what it is getting up to.