虽然是篇老点的文章,但是觉得还是很有意思的, mark一下,原文地址:http://developer.yahoo.com/blogs/ydn/posts/2010/07/multicore_http_server_with_nodejs/
NodeJS has been garnering a lot of attention late. Briefly, NodeJS is a server-side JavaScript runtime implemented using Google's highly performant V8 engine. It provides an (almost) completely non-blocking I/O stack which, when combined with JavaScript closures and anonymous functions, makes it an excellent platform for implementing high throughput web services. As an example, a simple "Hello, world!"
application written for NodeJS performs comparably to an Nginx module written to do the same.
In case you missed it, NodeJS author Ryan Dahl (@ryah) gave an excellent talk at a Bayjax event hosted by Yahoo! a few weeks ago.
Here at Yahoo!, the Mail team is investigating the use of NodeJS for some of our upcoming platform development work. We thought it would be fun to share a bit of what we've been working on, which includes
Of course, I would be remiss if I didn't mention that Mail is hiring! If you're interested, pleasecontact us.
CONTENTS
But all is not sunshine and lollipops in NodeJS land. While single-process performance is quite good, eventually one CPU is not going to be enough; the platform provides no ability to scale out to take advantage of the multiple cores commonly present in today's server-class hardware. With current NodeJS builds, the practical limits of a single CPU acting as an HTTP proxy are around 2100 reqs/s for a 2.5GHz Intel Xeon.
While Node is relatively solid, it does still crash occasionally, adversely impacting availability if you're running only a single NodeJS process. Such problems can be particularly common when using a buggy compiled add-on that can suffer from the usual cornucopia of C++ goodies such as segfaults and memory scribbling. When handling requests with multiple processes, one processing going down will simply result in incoming requests being directed at the other processes.
There are several ways to use mutliple cores in NodeJS, each with their own benefits and drawbacks.
Using a load balancer
Until node-v0.1.98
, the best practice for utilizing multiple cores was to start up a separate NodeJS processes per core, each running an HTTP server bound to a different port. To route client requests to the various processes, one would front them all with a load balancer configured to know about each of the different ports. This performed just fine, but the complexity of configurating and managing these multiple processes and endpoints left something to be desired.
As a benefit, this architecture allows the load balancer to route requests to different processes based on an affinity policy (for example, by IP, by cookie, and so on).
Using the OS kernel
In node-v0.1.98
, the Yahoo!-contributed file descriptor passing and re-use patches landed in core and allowed the emerging set of HTTP frameworks such as Connect and multi-node to serve HTTP requests on multiple cores simultaneously with no change in application code or configuration.
Briefly, the approach used by these frameworks is to create a bound and listening in a single process (say, bound to port 80). However, rather than accepting connections using this socket, it is passed off to some number of child processes using net.Stream.write()
(under the covers this uses sendmsg(2)
and FDs are delivered using recvmsg(2)
). Each of these processes in turn inserts the received file descriptor into its event loop and accepts incoming connections as they become available. The OS kernel itself is responsible for load balancing connections across processes.
It's important to note that each this is effectively an L4 load balancer with no affinity; each request by any given client may be served by any of the workers. Any application state that needs to be available to a request cannot simply be kept in-process in a single NodeJS instance.
Using NodeJS to route requests
In some cases it may be impossible or undesirable to use either of the above two facilities. For example, one's application may require affinity that cannot be configured using a load balancer (e.g., policy decisions based on complex application logic or the SE Linux security context of the incoming connection). In such cases, one can accept a connection in a single process and interrogate it before before handing it off to the correct process for handling.
The following example requires node-v0.1.100
or later and node-webworker, a NodeJS implementation of the emerging HTML5 Web Workers standard for parallel execution of JavaScript. You can install node-webworker
using npm by executing npm install webworker@stable
.
While an in-depth explanation of Web Workers is beyond the scope of this article, for our purposes one can think of a worker as an independent execution context (such as a process) that can pass messages back and forth with the JavaScript environment that spawned it. The node-webworker
implementation supports sending around file descriptors using this message passing mechamism.
First, the source of master process, master.js
:
var net = require('net');
var path = require('path');
var sys = require('sys');
var Worker = require('webworker/webworker').Worker;
var NUM_WORKERS = 5;
var workers = [];
var numReqs = 0;
for (var i = 0; i < NUM_WORKERS; i++) {
workers[i] = new Worker(path.join(__dirname, 'worker.js'));
}
net.createServer(function(s) {
s.pause();
var hv = 0;
s.remoteAddress.split('.').forEach(function(v) {
hv += parseInt(v);
});
var wid = hv % NUM_WORKERS;
sys.debug('Request from ' + s.remoteAddress + ' going to worker ' + wid);
workers[wid].postMessage(++numReqs, s.fd);
}).listen(80);
The master does the following:
net.Stream.pause()
on the incoming stream. This prevents the master process from reading any data off of the socket -- the worker should be able to see all data sent by the remote sidepostMessage()
to send the (incremented) global request counter and just-received socket descriptor to the assigned worker
Second, the source of worker processes, worker.js
:
var http = require('http');
var net = require('net');
var sys = require('sys');
process.setuid('nobody');
var srv = http.createServer(function(req, resp) {
resp.writeHead(200, {'Content-Type' : 'text/plain'});
resp.write(
'process=' + process.pid +
'; reqno=' + req.connection.reqNo + '\n'
);
resp.end();
});
onmessage = function(msg) {
var s = new net.Stream(msg.fd);
s.type = srv.type;
s.server = srv;
s.resume();
s.reqNo = msg.data;
srv.emit('connection', s);
};
The worker does the following:
nobody
user.listen()
variants. We will be processing requests based on the descriptors received from the master.net.Stream
instance with the received TCP connection and inject it into the HTTP processing pipeline by emitting the connection
event manually.reqNo
field which we set up when we received the message from the master.
Finally, to run this example, be sure to launch master.js
as the superuser, as we want to bind to a privileged port. Then use curl
to make some requests see which process they're coming from.
% sudo node ./master.js
% curl 'http://localhost:80'
process=13049; reqno=2
Of course, the preceding example is kind of a toy in that hashing based on IP is something that any HTTP load balancer worth its salt can do for you. A more realistic example of why you might want to do this would be dispatching requests to a worker running in the right SE Linux context after interrogating the other end of the connection (say, using node-selinux). Making routing decisions based on the HTTP request itself (path, vhost, etc.) is a bit more complicated but doable as well using similar techniques.
Finally, I hope this article has helped to shed some light on the state of multi-core support in NodeJS: several existing HTTP frameworks enable utilization of multiple cores of the box for a wide varietyof NodeJS apps; node-webworkers
provides a useful abstraction on top ofchild_process
for managing parallelism in NodeJS; and how to use NodeJS itself as an L7 HTTP router.
Sample code
This post was written by Peter Griess (@pgriess), Principal Engineer, Yahoo! Mail