原文:How to Map-Reduce withMongoose, mongoDB, Express, Node.js
mongoDB能够很好的支持Map-Reduce,如想通过Mongoose, Express, and Node.js实现这个功能,需要如下几个步骤:
在这个例子中,有如下的数据
1 www.yahoo.com
2 www.msn.com
3 www.google.com
4 www.yahoo.com
5 www.yahoo.com
6 www.msn.com
我们想把上述的数据变为如下的形式:
1 www.yahoo.com, 3
2 www.msn.com, 2
3 www.google.com, 1
可以理解为SQL的group后的结果。
首先,我们基于nodejs+mongoose建立模型。
01 mongoose = require('mongoose');
02 mongoose.connect('mongodb://localhost/db'); //this assumes your mongoDB is running on localhost within the collection 'db'
03 Schema = mongoose.Schema;
04 ObjectId = Schema.ObjectId;
05 var PingSchema = new Schema(
06 {
07 url : String
08 , active : { //each url has a start and end date for which it's active
09 start : Date
10 , end : Date
11 }
12 });
13 mongoose.model('Ping', PingSchema); //tell mongoose about the Ping schema
14 Ping = mongoose.model('Ping'); //ask mongoose to create an instance of the Ping model
15
16 app.get('/', function(req, res){ //set up an express route
17 //...the code we'll be discussing below goes here
18 }
建完model之后,我们现在转向map-reduce,有两个步骤需要处理: 一)执行map-reduce生成新的数据collection,二)查询新生成的collection.
关于map-reduce,可以参考 this post about howmap-reduce works. 执行map-reduce采用如下的code.1 mongoose.connection.db.executeDbCommand(command, function(err, dbres) {
2 //If you need to alert users, etc. that the mapreduce has been run, enter code here
3 });
command的定义如下
1 var command = {
2 mapreduce: "pings", //the name of the collection we are map-reducing *note, this is the model Ping we defined above...mongoose automatically appends an 's' to the model name within mongoDB
3 query: { 'active.end' : { $gt: new Date() } }, //I've included this as an example of how to query for parameters outside of the map-reduced variable
4 map: urlMap.toString(), //a function we'll define next for mapping
5 reduce: urlReduce.toString(), //a function we'll define next for reducing
6 sort: {url: 1}, //let's sort descending...it makes the operation run faster
7 out: "pingjar" //the collection that will contain the map-reduce results *note, this must be a different collection than the map-reduce input
8 };
接下来我们定义函数 urlMap 和urlReduce:
01 urlMap = function() { //map function
02 emit(this.url, 1); //sends the url 'key' and a 'value' of 1 to the reduce function
03 }
04
05 urlReduce = function(previous, current) { //reduce function
06 var count = 0;
07 for (index in current) { //in this example, 'current' will only have 1 index and the 'value' is 1
08 count += current[index]; //increments the counter by the 'value' of 1
09 }
10 return count;
11 };
如果一切都执行的顺利,会生成新的数据collection‘pingjar’,其中包含了map-reduce的结果.由于mongoose没有提供访问该collection的方法,我们需要采用mongoDB原有的命令来读取该collection
1 mongoose.connection.db.collection('pingjar', function(err, collection) { //query the new map-reduced table
2 collection.find({}).sort({'value': -1}).limit(10).toArray(function(err, pings) { //only pull in the top 10 results and sort descending by number of pings
3 res.render('home', { //tell Express to render the page with the database results pings and a title "PingJar"
4 'title': 'PingJar',
5 'pings': pings
6 });
7 });
8 });
结果中,对象’pings’的内容如下:
1 { "_id" : "www.yahoo.com", "value" : 3 }
2 { "_id" : "www.msn.com", "value" : 2 }
3 { "_id" : "www.google.com", "value" : 1 }
可能会有人疑问我为什么没有采用mongoDB group command. 因为我只是需要统计url的数目.如果想采用group,可以执行如下的代码:
command = {
'group' : { //mongodb group command
'ns' : 'pings', //the collection to query
'cond' : {'active.end' : { $gt: new Date() }}, //active.end must be in the future
'initial': {'count': 0}, //initialize any count object properties
'$reduce' : 'function(doc, out){ out.count++ }', //
'key' : {'url': 1} //fields to group by
}
}
mongoose.connection.db.executeDbCommand(command, function(err, dbres){
var ret = dbres.documents[0].retval; //这里包含了查询的结果集合。
for (var key in ret)
console.log(ret[key]);
});