mongoose中的map reduce和group by

原文:How to Map-Reduce withMongoose, mongoDB, Express, Node.js

mongoDB能够很好的支持Map-Reduce,如想通过Mongoose, Express, and Node.js实现这个功能,需要如下几个步骤:

在这个例子中,有如下的数据

1	www.yahoo.com 
2	www.msn.com 
3	www.google.com 
4	www.yahoo.com 
5	www.yahoo.com 
6	www.msn.com
我们想把上述的数据变为如下的形式:

1	www.yahoo.com,  3 
2	www.msn.com,    2 
3	www.google.com, 1
可以理解为SQL的group后的结果。

首先,我们基于nodejs+mongoose建立模型。

01	mongoose = require('mongoose'); 
02	mongoose.connect('mongodb://localhost/db'); //this assumes your mongoDB is running on localhost within the collection 'db'
03	Schema = mongoose.Schema; 
04	ObjectId = Schema.ObjectId; 
05	var PingSchema = new Schema( 
06	  { 
07	    url       : String 
08	  , active    : { //each url has a start and end date for which it's active 
09	      start   : Date 
10	    , end     : Date 
11	    } 
12	  }); 
13	mongoose.model('Ping', PingSchema); //tell mongoose about the Ping schema 
14	Ping = mongoose.model('Ping'); //ask mongoose to create an instance of the Ping model 
15	  
16	app.get('/', function(req, res){ //set up an express route 
17	//...the code we'll be discussing below goes here 
18	}

建完model之后,我们现在转向map-reduce,有两个步骤需要处理: 一)执行map-reduce生成新的数据collection,二)查询新生成的collection.

关于map-reduce,可以参考 this post about howmap-reduce works. 执行map-reduce采用如下的code.

1	mongoose.connection.db.executeDbCommand(command, function(err, dbres) { 
2	        //If you need to alert users, etc. that the mapreduce has been run, enter code here 
3	});
command的定义如下

1	var command = { 
2	        mapreduce: "pings", //the name of the collection we are map-reducing *note, this is the model Ping we defined above...mongoose automatically appends an 's' to the model name within mongoDB 
3	        query: { 'active.end' : { $gt: new Date() } }, //I've included this as an example of how to query for parameters outside of the map-reduced variable 
4	        map: urlMap.toString(), //a function we'll define next for mapping 
5	        reduce: urlReduce.toString(), //a function we'll define next for reducing 
6	        sort: {url: 1}, //let's sort descending...it makes the operation run faster 
7	        out: "pingjar" //the collection that will contain the map-reduce results *note, this must be a different collection than the map-reduce input 
8	};

接下来我们定义函数 urlMap 和urlReduce:

01	urlMap = function() { //map function 
02	     emit(this.url, 1); //sends the url 'key' and a 'value' of 1 to the reduce function 
03	}  
04	  
05	urlReduce = function(previous, current) { //reduce function 
06	     var count = 0; 
07	     for (index in current) {  //in this example, 'current' will only have 1 index and the 'value' is 1 
08	       count += current[index]; //increments the counter by the 'value' of 1 
09	     } 
10	     return count; 
11	};

如果一切都执行的顺利,会生成新的数据collection‘pingjar’,其中包含了map-reduce的结果.由于mongoose没有提供访问该collection的方法,我们需要采用mongoDB原有的命令来读取该collection

1	mongoose.connection.db.collection('pingjar', function(err, collection) { //query the new map-reduced table 
2	        collection.find({}).sort({'value': -1}).limit(10).toArray(function(err, pings) { //only pull in the top 10 results and sort descending by number of pings 
3	            res.render('home', { //tell Express to render the page with the database results pings and a title "PingJar" 
4	                'title': 'PingJar', 
5	                'pings': pings 
6	            }); 
7	        }); 
8	    });
结果中,对象’pings’的内容如下:

1	{ "_id" : "www.yahoo.com", "value" : 3 } 
2	{ "_id" : "www.msn.com", "value" : 2 } 
3	{ "_id" : "www.google.com", "value" : 1 }

可能会有人疑问我为什么没有采用mongoDB group command. 因为我只是需要统计url的数目.如果想采用group,可以执行如下的代码:

	command = { 
	    'group' : { //mongodb group command 
	       'ns' : 'pings', //the collection to query 
	       'cond' : {'active.end' : { $gt: new Date() }}, //active.end must be in the future 
	       'initial': {'count': 0}, //initialize any count object properties 
	       '$reduce' : 'function(doc, out){ out.count++ }', // 
	       'key' : {'url': 1} //fields to group by 
	    } 
	}
        mongoose.connection.db.executeDbCommand(command, function(err, dbres){
                        var ret = dbres.documents[0].retval; //这里包含了查询的结果集合。
                        for (var key in ret)
                                console.log(ret[key]);
                });

你可能感兴趣的:(mongoDB)