MongoDB之二:复合索引

索引用来优化查询,而且在某些特定类型的查询中,索引必不可少。 ---《MongoDB权威指南》

直接创建100万条测试数据:

for(var i=0;i<1000000;i++){
    post={"user":"user: "+i,"age":Math.floor(Math.random(100)),"createAt":new Date()};
    db.foo.save(post);
}

PS: 以上代码可能要好几分钟才能执行完毕(时间直接跟你的电脑配置挂钩)。

完成以后直接执行查询看处理时间:

> db.foo.find({"user":"user: 105"}).explain("allPlansExecution")["executionStats"]["executionTimeMillis"]
> 316

可以看到在没有任何索引的时候查询一个值的时候使用的时间是316毫秒(不同配置的电脑查询时间不一样,反正查询时间不理想)

为"user"字段创建索引:

// 创建索引
> db.foo.ensureIndex({"user":1})

再次执行查询:

> db.foo.find({"user":"user: 105"}).explain("allPlansExecution")["executionStats"]["executionTimeMillis"]
> 0

Emmmm,查询时间为0毫秒。。。(有点小夸张,理解为快了很多吧)

排序

索引是按照一定顺序进行排序的,上面的{"user":1},索引会按照user字段进行升序排列,-1为降序排列,只有在索引的键上面进行排序,索引才会有用,以下查询排序用上面的索引没什么卵用:

> db.foo.find().sort({"age":1,"user":1})

优化以上这个排序,可以添加复合索引(两个或以上的键建立索引的时候,称为复合索引):

> db.foo.ensureIndex({"age":1,"user":1})

这个索引的每一条目都包含一个"age"键和一个"user"键,该索引首先会以"age"键进行升序排列,"age"相同的条目会以"user"进行升序排列。

这个时候如果你进行以下查询:

> db.foo.find({"age":10}).sort({"user":-1})

由于查询结果是已经有序的了("age" 为10的条目,"user"升序排列,上面说过)，我们需要"user"为倒序排列，MongoDB会自动逆序进行索引遍历，所以最后得出的结果也是很快的:

> db.foo.find({"age":10}).sort({"user":-1}).explain("allPlansExecution")["executionStats"]["executionTimeMillis"]
> 18

结果是18毫秒可以查找到年龄为10并且为倒序排列的所有用户。

范围查询

如下:

> db.foo.find({"age":{"$gt":10,"$lt":35}}).explain("allPlansExecution")["executionStats"]["executionTimeMillis"]
> 398

emmm.这个查询需要398ms 好像有点不科学。
看看具体的查询过程：

 "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 239459,
                "executionTimeMillis" : 335,
                "totalKeysExamined" : 239459,
                "totalDocsExamined" : 239459,
             ......
                }

nReturned：需要返回的结果数量.
totalKeysExamined: 一共查找了多少次.
executionTimeMillis: 执行的总时长.

看来是只能这么快了，毕竟查找的次数跟需要返回的数量是一样的了。。

需要范围查询同时需要排序:

> db.foo.find({"age":{"$gt":10,"$lt":35}}).sort({"user":-1}).explain("executionStats")["executionStats"]["executionTimeMillis"]
> 1229

花了1秒多。。。这个绝逼不科学。原因是因为"db.foo.find({"age":{"$gt":10,"$lt":35}})"这一句,索引是直接用"age"这个键进行查询,得到的结果按照"age"进行升序排列,但是这后面的"sort({"user":-1})"就完全需要MongoDB在内存中先排序完毕,再返回，可以键入命令看具体情况:

> db.foo.find({"age":{"$gt":10,"$lt":35}}).sort({"user":-1}).explain("executionStats")["executionStats"]

结果:

  "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 239459,
                "executionTimeMillis" : 1265,
                "totalKeysExamined" : 1000000,
                "totalDocsExamined" : 1000000,
              ......
}

进行了一次全表扫描并且要在内存里面排序,以至于需要一秒多.

这时候我们换一下索引,把需要排序的键放到前面来:

> db.foo.ensuerInsert({"user":1,"age":1})
> db.foo.find({"age":{$gt:10,$lt:35}}).hint({"user":1,"age":1}).sort({"user":-1}).explain("executionStats")["executionStats"]["executionTimeMillis"]
> 1663

emm...怎么还要1.6s呢。。其实嘛，重点是，当我们查询的时候，并不总是查全部数据，一般是查前面的一点数据，这个时候:

> db.foo.find({"age":{$gt:10,$lt:35}}).hint({"user":1,"age":1}).limit(1500).sort({"user":-1}).explain("executionStats")["executionStats"]["executionTimeMillis"]
> 11

//使用非查询键在前.
 db.foo.find({"age":{$gt:10,$lt:35}}).hint({"age":1,"user":1}).limit(1500).sort({"user":-1}).explain("executionStats")["executionStats"]["executionTimeMillis"]
//688

显而易见,只查询部分数据的时候使用索引的查询键在前面会有极大的速度提升。至于查询全部数据怎么提升性能，待研究。

基于多方向排序

如果我们以上的程序要求结果以"sort("age":1,"user":-1)"进行排序，以上索引都不再高效。
原因上面也都说了 "ensureIndex({"age":1,"user":1})"是默认以"age"进行升序,相同条目再以"user"键进行升序。
如果需要匹配"sort("age":1,"user":-1)"，则再添加一个索引:

db.foo.ensureIndex({"age":1,"user":-1})

只有基于多个方向进行排序时，索引方向才变得重要。
单方向排序的需求索引方向不重要，如"sort({"user":-1})",因为单键排序MongoDB可以简单的从反方向读取索引。

MongoDB之二:复合索引

完成以后直接执行查询看处理时间:

再次执行查询:

排序

范围查询

你可能感兴趣的:(MongoDB之二:复合索引)