第一次使用mongo开发系统,最近遇到了性能问题,发现索引建的不好,该集合是用来存储采集的信息,字段包括发布日期publishDate(int型)、分值blendedScore(int型)、发布时间publishTime(date型)、相似数量similarNum(int型)、是否垃圾isRubbish(int型)以及信息自身的一些字段(不涉及查询及排序在此不再罗列),数量大概有千万级。
业务描述:
1)可按发布日期倒序+得分倒序+发布时间倒序。
2)可按相似数量倒序+发布时间倒序。
3)可按发布时间倒序。
以上三种情况都可按照发布日期、是否垃圾、关键词等进行搜索。
简化为mongo查询语句如下:
1)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishDate:-1,blendedScore:-1,publishTime:-1}
2)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{similarNum:-1,publishTime:-1}
3)db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishTime:-1}
为了得到最优的结果进行了以下尝试
第一次创建的索引:
db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,similarNum:-1,publishTime:-1},{name:"SortIdx",background:true});
考虑到节省空间,第一次按上面的方式试图将所有排序字段都放到一个索引中。这种方式的结果是:
1)业务1走索引速度最快。
2)业务2走索引,速度可接受。
3)业务3走索引,但速度慢到无法接受。
这种结果显然无法接受,所以继续修改索引。
第二次创建的索引:
db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1
db.infos.ensureIndex({similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true}); //索引2
db.infos.ensureIndex({publishTime:-1},{name:"TimeSortIdx",background:true}); //索引3
由于每个业务的排序方式差别较大,而mongodb每次查询只会使用一个索引所以按照上面的方式创建了三个索引。这样创建后每个业务查询速度都很快。由于isRubbish是各个查询基本都会带的条件,所以考虑将isRubbish加入索引中,但是isRubbish字段只有0、1的取值,可能用作索引的意义不大,所以针对业务1的查询测试了一下。测试方法就是在该集合上再创建一个增加了isRubbish的字段的索引,创建语句:
db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx1",background:true})
使用explain来查看执行情况,分别强制使用ScoreSortIdx和ScoreSortIdx1索引,mongo语句如下:
db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx").explain("executionStats");
db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx1").explain("executionStats")
ScoreSortIdx explain的执行结果:
{
"executionStats": {
"executionSuccess": true,
"nReturned": 1,
"executionTimeMillis": 62,
"totalKeysExamined": 10438,
"totalDocsExamined": 10438,
"executionStages": {
"stage": "LIMIT",
"nReturned": 1,
"executionTimeMillisEstimate": 50,
"works": 10439,
"advanced": 1,
"needTime": 10437,
"needFetch": 0,
"saveState": 81,
"restoreState": 81,
"isEOF": 1,
"invalidates": 0,
"limitAmount": 0,
"inputStage": {
"stage": "SKIP",
"nReturned": 1,
"executionTimeMillisEstimate": 50,
"works": 10438,
"advanced": 1,
"needTime": 10437,
"needFetch": 0,
"saveState": 81,
"restoreState": 81,
"isEOF": 0,
"invalidates": 0,
"skipAmount": 0,
"inputStage": {
"stage": "FETCH",
"filter": { "isRubbish": { "$eq": 0 } },
"nReturned": 10001,
"executionTimeMillisEstimate": 50,
"works": 10438,
"advanced": 10001,
"needTime": 437,
"needFetch": 0,
"saveState": 81,
"restoreState": 81,
"isEOF": 0,
"invalidates": 0,
"docsExamined": 10438,
"alreadyHasObj": 0,
"inputStage": { "stage": "IXSCAN", "nReturned": 10438, "executionTimeMillisEstimate": 30, "works": 10438, "advanced": 10438, "needTime": 0, "needFetch": 0, "saveState": 81, "restoreState": 81, "isEOF": 0, "invalidates": 0, "keyPattern": { "publishDate": -1, "blendedScore": -1, "publishTime": -1 }, "indexName": "ScoreSortIdx", "isMultiKey": false, "direction": "forward", "indexBounds": { "publishDate": [ "[20160318.0, 20160310.0]" ], "blendedScore": [ "[MaxKey, MinKey]" ], "publishTime": [ "[MaxKey, MinKey]" ] }, "keysExamined": 10438, "dupsTested": 0, "dupsDropped": 0, "seenInvalidated": 0, "matchTested": 0 } }
}
}
}
}
ScoreSortIdx1 explain的执行结果:
{
"executionStats": {
"executionSuccess": true,
"nReturned": 1,
"executionTimeMillis": 14,
"totalKeysExamined": 10001,
"totalDocsExamined": 10001,
"executionStages": {
"stage": "LIMIT",
"nReturned": 1,
"executionTimeMillisEstimate": 10,
"works": 10002,
"advanced": 1,
"needTime": 10000,
"needFetch": 0,
"saveState": 78,
"restoreState": 78,
"isEOF": 1,
"invalidates": 0,
"limitAmount": 0,
"inputStage": {
"stage": "SKIP",
"nReturned": 1,
"executionTimeMillisEstimate": 10,
"works": 10001,
"advanced": 1,
"needTime": 10000,
"needFetch": 0,
"saveState": 78,
"restoreState": 78,
"isEOF": 0,
"invalidates": 0,
"skipAmount": 0,
"inputStage": {
"stage": "FETCH",
"nReturned": 10001,
"executionTimeMillisEstimate": 10,
"works": 10001,
"advanced": 10001,
"needTime": 0,
"needFetch": 0,
"saveState": 78,
"restoreState": 78,
"isEOF": 0,
"invalidates": 0,
"docsExamined": 10001,
"alreadyHasObj": 0,
"inputStage": { "stage": "IXSCAN", "nReturned": 10001, "executionTimeMillisEstimate": 10, "works": 10001, "advanced": 10001, "needTime": 0, "needFetch": 0, "saveState": 78, "restoreState": 78, "isEOF": 0, "invalidates": 0, "keyPattern": { "isRubbish": 1, "publishDate": -1, "blendedScore": -1, "publishTime": -1 }, "indexName": "ScoreSortIdx1", "isMultiKey": false, "direction": "forward", "indexBounds": { "isRubbish": [ "[0.0, 0.0]" ], "publishDate": [ "[20160318.0, 20160310.0]" ], "blendedScore": [ "[MaxKey, MinKey]" ], "publishTime": [ "[MaxKey, MinKey]" ] }, "keysExamined": 10001, "dupsTested": 0, "dupsDropped": 0, "seenInvalidated": 0, "matchTested": 0 } }
}
}
}
}
可以看到使用ScoreSortIdx1的查询时间比ScoreSortIdx不只快了一倍,当使用上面的条件进行count时,这种差距会更明显。
所以最终创建的索引:
db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1
db.infos.ensureIndex({isRubbish:1,similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true}); //索引2
db.infos.ensureIndex({isRubbish:1,publishTime:-1},{name:"TimeSortIdx",background:true}); //索引3
发现mongodb的多列索引的第一个字段必须作为排序或查询字段,否则会执行全表扫描。所以在应用中所有的查询都保证必须使用isRubbish作为查询条件,即使不对isRubbish进行过滤,也会将查询条件加上{isRubbish:{$gte:0}}以保证查询都走索引。另外尽量丰富其他查询条件,减少命中的结果集数量。
后面读了一篇大神的文章:http://www.mongoing.com/eshu_explain3 mongo执行计划的详细讲解,得到一个建多列索引的原则:{精确匹配字段,排序字段,范围查询字段} 这样建起来的索引排序会更为高效(已做实际验证,的确是这样)。
由于该业务中存在{isRubbish:{$gte:0}}即范围查询的情况,所以最后将索引改成了下面的方式:
db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1,isRubbish:1},{name:"ScoreSortIdx",background:true}); //索引1
db.infos.ensureIndex({similarNum:-1,publishTime:-1,isRubbish:1},{name:"HotSortIdx",background:true}); //索引2
db.infos.ensureIndex({publishTime:-1,isRubbish:1},{name:"TimeSortIdx",background:true}); //索引3