数据库---mongodb查询慢优化实践

现象

发现SIO服务的查询升级路径很慢，耗时5s以上

定位

接口使用了两次数据库查询

// 查询总数
len(list(UpgradePathDashboard.objects.aggregate(*pipeline_all)))
db.getCollection('upgrade_path_dashboard').explain('allPlansExecution').aggregate(
    [{'$match': 
        {'$or': [
            {'src_version': {'$regex': '^(6.1.3)', '$options': '-i'}}, 
            {'tar_version': {'$regex': '^(6.1.3)', '$options': '-i'}}]}}]
    );
    
// 查询每页的数据
db.getCollection('upgrade_path_dashboard').explain('allPlansExecution').aggregate(
    [
    {'$match': {'$or': 
        [
        {'src_version': {'$regex': '^(6.1.3)', '$options': '-i'}}, 
        {'tar_version': {'$regex': '^(6.1.3)', '$options': '-i'}}]}}, 
    {'$sort': {'record_time': -1}}, {'$skip': 0}, {'$limit': 20}
    ]
    );

首先查询数据表发现没有针对match的src_version和tar_version加索引，先去增加索引

索引介绍

唯一索引

唯一索引是索引具有的一种属性，让索引具有唯一性，确保这张表索引数据不会重复出现，在每一次insert或update时，都会进行索引的唯一性校验

b.containers.createIndex({name: 1},{unique:true, background: true})
db.packages.createIndex({ appId: 1, version: 1 },{unique:true, background: true})
1 表示升序，-1表示降序
foreground表示前台操作，会阻塞用户对数据的读写操作知道index构建完毕
background表示后台模式，不会阻塞数据读写操作，独立的后台线程异步构建索引，此时仍然允许对数据的读写，建议创建索引时使用

复合索引

将多个建组合刀一起创建索引，终极目的是加速匹配到多个建的查询
db.flights.createIndex({ flight: 1, price: 1 },{background: true})
让数据按照索引先将所有数据以航班号进行排序，再在航班号相同的数据集中按照价格升序排序，这样在进行查询时，就可以准确适用索引扫描4条数据，并且他们本身就是有序的，无需额外的排序工作

内嵌索引

在嵌套的文档上创建索引，方式与建立正常索引完全一致
db.personInfos.createIndex({“address.city”:1})
对嵌套文档本身address建立索引与对嵌套文档的某个字段（address.city）建立索引是完全不同的，对整个文档建立索引，只有在使用文档王中王哪个匹配的时候才会使用到这个索引

对整个文档建立索引，只有在使用文档完整匹配时才会使用到这个索引，例如建立了这样一个索引db.personInfos.createIndex({“address”:1})，那么只有使用db.personInfos.find({“address”:{“province”:”xxx”,”city”:”xxx”,""district":"xxx"}})这种完整匹配时才会使用到这个索引，使用db.personInfos.find({“address.city”:”xxx”})是不会使用到该索引的。

其他索引

过期索引TTL：可以针对某个时间字段，指定文档的过期时间
哈希索引：按照某个字段的哈希值建立索引，hash索引只能满足字段完全匹配的查询，不能满足范围查询
文本索引：能解决快速文本的查询需求，比如日至平台相对日志关键字查找，但是通过正则来查找效率极低，就可以通过文本索引的方式查找

索引优缺点

优点：减少数据扫描，避免全表扫描代价；减少内存计算：避免分组查询计算；提供数据约束：唯一和时间约束性
缺点：增加容量消耗：创建时需要额外的存储索引数据；增加修改代价：曾莎改都需要维护索引数据；索引依赖内存：会占用及其宝贵的内存资源
虽然索引是持久化到瓷盘中的，但是为了确保索引的速度，需要将索引加载到内存中进行缓存，当内存不足以承载索引的时候，就会出现内存-磁盘交换，大大降低索引的性能
给每个字段都加上索引并不能加速查询，反而会导致每一次数据库操作都去进行大量索引更新，并且在多个查询条件的时候，也只会选取一个所以进行查找

使用explain进行计划查询

explain结果将查询计划以阶段数的方式呈现，每个阶段将其结果传递给父节点，在查询结果的阶段树一定从最里层一层一层向外查看

在查询计划中出现了很多stage，下面列举的经常出现的stage以及他的含义：
COLLSCAN：全表扫描
IXSCAN：索引扫描
FETCH：根据前面扫描到的位置抓取完整文档
SORT：进行内存排序，最终返回结果
SORT_KEY_GENERATOR：获取每一个文档排序所用的键值
LIMIT：使用limit限制返回数
SKIP：使用skip进行跳过
IDHACK：针对_id进行查询
COUNTSCAN：count不使用用Index进行count时的stage返回
COUNT_SCAN：count使用了Index进行count时的stage返回
TEXT：使用全文索引进行查询时候的stage返回
// 查询计划中的winningPlan部分
"winningPlan": {
    "stage": "FETCH",                                            // 5. 根据内层阶段树查到的索引去抓取完整的文档
    "filter": {                                                  // 6. 再根据createdAt参数进行筛选
        "createdAt": {
            "$gte": ISODate("2019-07-22T12:00:44.000Z")
        }
    },
    "inputStage": {                                               // 1. 每个阶段将自己的查询结果传递给父阶段树，所以从里往外读Explain
        "stage": "IXSCAN",                                    // 2. IXSCAN该阶段使用了索引进行扫描
        "keyPattern": {
            "load": 1                                     // 3. 使用了 load:1 这条索引
        },
        "indexName": "load_1",
        "isMultiKey": false,
        "multiKeyPaths": {
            "load": []
        },
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 2,
        "direction": "backward",                                       
        "indexBounds": {
            "load": [
                "[MaxKey, MinKey]"                      // 4. 边界
            ]
        }
    }
},
最期望看到的查询组合
Fetch+IDHACK
Fetch+ixscan
Limit+（Fetch+ixscan）
PROJECTION+ixscan
最不期望看到的查询组合
COLLSCAN（全表扫）
SORT（使用sort但是无index）
COUNTSCAN（不使用索引进行count）

效率低的查询

$where和$exists：这两个操作符，完全不能使用索引。
$ne和$not:通常来说取反和不等于,可以使用索引，但是效率极低，不是很有效，往往也会退化成扫描全表。
$nin:不包含，这个操作符也总是会全表扫描
对于管道中的索引，也很容易出现意外，只有在管道最开始时的match sort可以使用到索引，一旦发生过project投射，group分组，lookup表关联，unwind打散等操作后，就完全无法使用索引。

通过explain定位

db.getCollection('upgrade_path_dashboard').explain('allPlansExecution').aggregate(
    [{'$match': 
        {'$or': [
            {'src_version': {'$regex': '^(6.1.3)', '$options': '-i'}}, 
            {'tar_version': {'$regex': '^(6.1.3)', '$options': '-i'}}]}}]
    );

"winningPlan" : {
                        "stage" : "SUBPLAN",
                        "inputStage" : {
                            "stage" : "FETCH",
                            "inputStage" : {
                                "stage" : "OR",
                                "inputStages" : [ 
                                    {
                                        "stage" : "IXSCAN",
                                        "filter" : {
                                            "$or" : [ 
                                                {
                                                    "tar_version" : {
                                                        "$regex" : "^(6.1.3)",
                                                        "$options" : "-i"
                                                    }
                                                }
                                            ]
                                        },
                                        "keyPattern" : {
                                            "tar_version" : 1.0
                                        },
                                        "indexName" : "tar_version_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                            "tar_version" : []
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                            "tar_version" : [ 
                                                "[\"\", {})", 
                                                "[/^(6.1.3)/-i, /^(6.1.3)/-i]"
                                            ]
                                        }
                                    }, 
                                    {
                                        "stage" : "IXSCAN",
                                        "filter" : {
                                            "$or" : [ 
                                                {
                                                    "src_version" : {
                                                        "$regex" : "^(6.1.3)",
                                                        "$options" : "-i"
                                                    }
                                                }
                                            ]
                                        },
                                        "keyPattern" : {
                                            "src_version" : 1.0
                                        },
                                        "indexName" : "src_version_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                            "src_version" : []
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                            "src_version" : [ 
                                                "[\"\", {})", 
                                                "[/^(6.1.3)/-i, /^(6.1.3)/-i]"
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    },
                    "rejectedPlans" : []
                },
                
db.getCollection('upgrade_path_dashboard').explain('allPlansExecution').aggregate(
    [
    {'$match': {'$or': 
        [
        {'src_version': {'$regex': '^(6.1.3)', '$options': '-i'}}, 
        {'tar_version': {'$regex': '^(6.1.3)', '$options': '-i'}}]}}, 
    {'$sort': {'record_time': -1}}, {'$skip': 0}, {'$limit': 20}
    ]
    );

"winningPlan" : {
                        "stage" : "SUBPLAN",
                        "inputStage" : {
                            "stage" : "FETCH",
                            "inputStage" : {
                                "stage" : "OR",
                                "inputStages" : [ 
                                    {
                                        "stage" : "IXSCAN",
                                        "filter" : {
                                            "$or" : [ 
                                                {
                                                    "tar_version" : {
                                                        "$regex" : "^(6.1.3)",
                                                        "$options" : "-i"
                                                    }
                                                }
                                            ]
                                        },
                                        "keyPattern" : {
                                            "tar_version" : 1.0
                                        },
                                        "indexName" : "tar_version_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                            "tar_version" : []
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                            "tar_version" : [ 
                                                "[\"\", {})", 
                                                "[/^(6.1.3)/-i, /^(6.1.3)/-i]"
                                            ]
                                        }
                                    }, 
                                    {
                                        "stage" : "IXSCAN",
                                        "filter" : {
                                            "$or" : [ 
                                                {
                                                    "src_version" : {
                                                        "$regex" : "^(6.1.3)",
                                                        "$options" : "-i"
                                                    }
                                                }
                                            ]
                                        },
                                        "keyPattern" : {
                                            "src_version" : 1.0
                                        },
                                        "indexName" : "src_version_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                            "src_version" : []
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                            "src_version" : [ 
                                                "[\"\", {})", 
                                                "[/^(6.1.3)/-i, /^(6.1.3)/-i]"
                                            ]
                                        }
                                    }
                                ]
                            }
                        }
                    },

发现这两次查询都使用了索引查询，并且时间在114ms和90ms，都不慢，所以定位到了应该是list()的问题，通过查询得知，通过在管道上增加{"$count": "total"}可以直接返回总数
total = list(UpgradePathDashboard.objects.aggregate(pipeline_all))[0].get('total')
# total = len(list(UpgradePathDashboard.objects.aggregate(pipeline_all)))
修改后的时间由8s降低为1s