(翻译)Untangling Evented Code with Ruby Fibers

 

 

Untangling Evented Code with Ruby Fibers

By Ilya Grigorik on March 22, 2010

Event-driven programming requires a mind-shift in how you architect the program, and oftentimes directly affects the choice of language, drivers, and even frameworks you can use. Most commonly found in highly interactive applications (GUI, network servers, etc), it usually implements the reactor pattern to handle concurrency in favor of threads: the “reactor” is a main loop which is responsible for receiving all inbound events (network IO, filesystem, IPC, etc) and demultiplexing them to appropriate handlers or callbacks.

事件驱动编程需要在如何设计程序的时候转变以往的思维, 对于选择那种语言,驱动,甚至框架都有直接影响,在交互比较频繁的引用中,GUI, network servers等等中,实现反应器模式(Reactor pattern)来处理并发,“reactor”是一个主循环,负责接受所有输入事件,多路复用对应的handlers 或者 callbacks。

Turns out, the reactor pattern performs extremely well under heavy loads (C10K challenge), hence the continuous rise in adoption (Nginx, Beanstalkd, EventMachine, Twisted, Node.js), but it does have its downsides: it requires reactor-aware libraries, still relies on background processing capabilities for long running computations, and last but not least, once you have nested several callbacks, it results in much more complicated code. Functional purists will disagree with the last statement – after all, we all love JavaScript, and node.js is the new hot thing on the block – but what if we could write event driven code without the added complexity imposed by hundreds of nested callbacks?
Accidental Complexity of Event-Driven Code

反应器模式(Reactor pattern)在高负载下工作的非常良好,但是它也有自身的缺点, 它需要依赖reactor-aware类库, 后台长期运行的处理能力,最重要的一点,一旦你的嵌套很深的时候,代码就会非常负载难以维护,函数编程者可能不太同意最后一条观点。 毕竟我们都喜欢
像javascript和nodejs这样的热门技术,尤其是在block方面,但是我们该如何避免上百层的嵌套来写我们的事件驱动程序呢?


Anyone who has written a non-trivial event driven application will be familiar with the following pattern: you often start reading your code bottom-up and then navigate your way up the callback chain. In addition, since there is no single execution context, each callback requires its own nested exception handling, which adds to the complexity - debugging is non-trivial, to say the least. Of course, this is usually not too bad in a context of a simple demo, but it also quickly spirals out of control.

事件驱动程序的复杂性:

任何一个人在写事件驱动程序的时候都会这样: 自底向上的浏览代码查看回调函数,由于没有一个统一的执行上下文,每个回调需要自己嵌套异常处理,这增加了调试的复杂性。对于简单的demo来说不算太糟糕,但是程序会越来越复杂,最后难以控制。



EventMachine.run {
  page = EventMachine::HttpRequest.new('http://google.ca/').get
  page.errback { p "Google is down! terminate?" }
  page.callback {
    about = EventMachine::HttpRequest.new('http://google.ca/search?q=eventmachine').get
    about.callback { # callback nesting, ad infinitum }
    about.errback  { # error-handling code }
  }
}


Call me old-fashioned, but I much prefer the if-then-else control flow, with top-down execution and code I can actually read without callback gymnastics. And as luck would have it, turns out these are not inconsistent requirements in Ruby 1.9. With the introduction of Fibers, our applications can do fully-cooperative scheduling (worth a re-read to make sense of the rest), which with a little extra work also means that we can abstract much of the complexity of event driven programming while maintaining all of its benefits!
Fibers & EventMachine: under the hood

看看这段代码,我还是喜欢if-then-else控制流程,自上而下的执行代码,甚至可以不看回调函数。ruby1.9以后引入了fiber, 我们的程序可以充分的协同调度, 虽然增加了一点额外的工作,但是也意味着我们可以抽象更复杂的程序并享受fiber给我们带来的好处。


Ruby 1.9 Fibers are a means of creating code blocks which can be paused and resumed by our application (think lightweight threads, minus the thread scheduler and less overhead). Each fiber comes with a small 4KB stack, which makes them cheap to spin up, pause and resume. Best of all, a fiber can yield the control and wait until someone else resumes it. I bet you see where we're going: start an async operation, yield the fiber, and then make the callback resume the fiber once the operation is complete. Let's wrap our async em-http client as an example:

Ruby1.9Fibers让我们可以创建可挂起可恢复的代码块, 并且每个fiber仅占用4kb的堆空间,这使得清除,挂起和恢复变得非常简单, 最重要的是, 一个fiber可以被挂起,交出控制权直到有人再次恢复它, 让我们看一下em-http的例子:


def http_get(url)
  f = Fiber.current
  http = EventMachine::HttpRequest.new(url).get

  # resume fiber once http call is done
  http.callback { f.resume(http) }
  http.errback  { f.resume(http) }

  return Fiber.yield
end

EventMachine.run do
  Fiber.new{
    page = http_get('http://www.google.com/')
    puts "Fetched page: #{page.response_header.status}"

    if page
      page = http_get('http://www.google.com/search?q=eventmachine')
      puts "Fetched page 2: #{page.response_header.status}"
    end
  }.resume
end


First thing to notice is that we are now executing our asynchronous code within a fiber (Fiber.new{}.resume), and our http_get method sets up the call, assigns the callbacks and then immediately yields control as it tries to return from the function. From there, EventMachine takes over, fetches the data in the background, and then calls the callback method, which in turn resumes our fiber, passing it the actual response. A little bit of fiber gymnastics, but it means that our original code with nested callbacks can now be unwound into a regular top-down execution context with if-then-else control flow. Not bad!
EM-Synchrony: Evented Code With Less Pain

Of course, we wouldn't gain much if the net effect of introducing fibers into our event driven code was swapping callbacks for fiber gymnastics. Thankfully, we can do better because much of the underlying implementation can be easily abstracted at the level of the driver. Let's take a look at our new helper library, em-synchrony:

EventMachine.synchrony do
  page = EventMachine::HttpRequest.new("http://www.google.com").get

  p "No callbacks! Fetched page: #{page}"
  EventMachine.stop
end

Instead of invoking the default EM.run block, we call EM.synchrony, which in turn wraps our execution into a Ruby fiber behind the scenes. From there, the library also provides ready-made, fiber aware classes for some of the most common use cases (http: em-http-request, mysql: em-mysqlplus, and memcached: remcached), as well as, a fiber aware connection pool, iterator for concurrency control, and a multi-request interface. Let's take a look at an example which flexes all of the above:

EM.synchrony do

  # open 4 concurrent MySQL connections
  db = EventMachine::Synchrony::ConnectionPool.new(size: 4) do
    EventMachine::MySQL.new(host: "localhost")
  end

  # perform 4 http requests in parallel, and collect responses
  multi = EventMachine::Synchrony::Multi.new
  multi.add :page1, EventMachine::HttpRequest.new("http://service.com/page1").aget
  multi.add :page2, EventMachine::HttpRequest.new("http://service.com/page2").aget
  multi.add :page3, EventMachine::HttpRequest.new("http://service.com/page3").aget
  multi.add :page4, EventMachine::HttpRequest.new("http://service.com/page4").aget
  data = multi.perform.responses[:callback].values

  # insert fetched HTTP data into a mysql database, using at most 2 connections at a time
  # - note that we're writing async code within the iterator!
  EM::Synchrony::Iterator.new(data, 2).each do |page, iter|
    db.aquery("INSERT INTO table (data) VALUES(#{page});")
    db.callback { iter.return(db) }
  end

  puts "All done! Stopping event loop."
  EventMachine.stop
end

em-synchrony - Fiber aware EventMachine

Synchrony implements a common pattern: original asynchronous methods which return a deferrable object are aliased with "a" prefix (.get becomes .aget, .query becomes .aquery) to indicate that they are asynchronous, and the fiber aware methods take their place as the defaults. This way, you can still mix sync and async code all in the same context (within an iterator, for example). For more examples of em-synchrony in action, take a look at the specs provided within the library itself.
Towards Scalable & Manageable Event-Driven Code

Event driven programming does not have to be complicated. With a little help from Ruby 1.9 much of the complexity is easily abstracted, which means that we can have all the benefits of event-driven IO, without any of the overhead of a thread scheduler or complicated code. Node.js, Twisted, and other reactor frameworks have a lot going for them, but the combination of EventMachine and Ruby 1.9, to me, is a clear winner.

We should also mention that fibers have been ported to Ruby 1.8, but due to their implementation on top of green threads they incur much larger overhead - in other words, this is your reason to switch to Ruby 1.9! And last but not least, fibers or not, don't forget that Ruby 1.9 still has a GIL which means that only one CPU core will be used. Until we see MVM (multi-VM) support, the solution is simply to run multiple reactors (one or more for each core).

你可能感兴趣的:(fiber)