使用MongoDB地理空间索引2dsphere聚合附近的文档并按距离顺序输出

基于MongoDB数据库，实现一个后端分页查询接口，输出指定坐标点附近的文档，并计算距离并返回。类似地图app基于自己的位置查询附近的位置，并按距离从进到原排序。

要最高效的实现这个功能，需要利用MongoDB自带的2dsphere地理空间索引。

使用GeoJson对象存储坐标字段

使用该索引时，在数据的存储时，经纬度字段储存示例如下，需要在该字段上创建2dsphere索引，并且字段应为GeoJson对象，例如点对象字段的标识为：

location: {
      type: "Point",
      coordinates: [-73.856077, 40.848447]
}

关于线、多边行类型的存储可参考官方文档：https://docs.mongodb.com/manual/reference/geojson/

使用$geoNear聚合阶段查询排序附近的文档并返回计算距离

最终，要使用地理空间索引进行附近的文档查询并且返回距离字段，需要使用MongoDB的聚合aggregate查询，使用$geoNear作为第一阶段聚合所需文档。
使用如下：

db.places.aggregate([
   {
     $geoNear: {
        near: { type: "Point", coordinates: [ -73.99279, 40.719296]},
        distanceField: "dist.calculated",
        maxDistance: 2,
        query: { category: "Parks" },
        includeLocs: "dist.location",
        spherical: true
     }
   }
])

其中个字段含义:

near: 查找最近坐标的点，可理解为地图app中‘我的位置’；
distanceField: 指定计算后的距离字段名；
maxDistance: 最大距离限制；
query: 对查询文档的限制，类似$match阶段，使用mongodb的查询语句即可；
includeLocs: 指定返回当前文档的经纬度数据的字段名,
spherical: 当为True时将使用球形几何计算距离，不过2dsphere索引默认都使用球形几何。

更详细的使用方法可参考官方文档：https://docs.mongodb.com/manual/reference/operator/aggregation/geoNear/#pipe._S_geoNear

项目代码示例

1. 插入文档示例

先在数据库插入几条模拟数据如下：

[
{
    "_id": {
        "$oid": "5f43b73e5a25bb14ec753a23"
    },
    "area_name": "测试区域1",
    "company_name": "科技有限公司",
    "geo_point": {
        "type": "Point",
        "coordinates": [106.704063, 29.867705]
    }
},
{
    "_id": {
        "$oid": "5f43b815c515d315de2ae744"
    },
    "area_name": "测试区域1",
    "company_name": "农业责任公司",
    "geo_point": {
        "type": "Point",
        "coordinates": [106.704064, 29.867708]
    }
},
{
    "_id": {
        "$oid": "5f43b824c515d315de2ae745"
    },
    "area_name": "测试区域2",
    "company_name": "科技贸易有限公司",
    "geo_point": {
        "type": "Point",
        "coordinates": [106.704055, 29.867711]
    }
},
{
    "_id": {
        "$oid": "5f43b834c515d315de2ae746"
    },
    "area_name": "测试区域2",
    "company_name": "金融有限公司",
    "geo_point": {
        "type": "Point",
        "coordinates": [106.704052, 29.867718]
    }
}
]

2. 索引创建

除地理坐标索引外，由于后端接口需要对多个字段进行筛选，这里模拟多字段的索引创建。
因该处仅做模拟，mongodb的连接使用了最简单的方式进行，只是方便操作。

可以手动或者使用代码创建索引，我这里使用了代码创建，代码如下：

import pymongo
from pymongo import MongoClient, IndexModel

uri = "mongodb://%s:%s@%s" % ("admin", "admin", "localhost:27017")
client = MongoClient(uri, connect=False)
db = client["test"]
coll = db["company_detail"]


def create_all_indexes():
    indexes = [
        IndexModel([("geo_point", "2dsphere")], background=True),
        IndexModel([("area_name", pymongo.DESCENDING)], background=True),
        IndexModel([("company_name", pymongo.DESCENDING)], unique=True, background=True)
    ]
    create_result = coll.create_indexes(indexes)
    return create_result

3. MongoDB附近公司查询

模拟mongodb的地理坐标查询及分页方法，具体实现参照代码吧，看代码比看文字好明白：


def company_query(current_geo, area_name=None, name_kw=None, page_no=1, page_size=10):
    """
    聚合查询符合条件的附近公司
    :param current_geo: list:用户当前经纬度数组，0:经度，1：纬度，example：["106.704063"， "29.867705"]
    :param area_name: 公司所属的区域名
    :param name_kw:公司名模糊匹配关键字（长度大于1）
    :param page_no: 页码
    :param page_size: 页大小
    :return:tuple: (匹配文档总数，分页后附近文档列表)
    """
    # 匹配条件，默认为匹配所有
    match = {}
    if area_name:
        match.update({"area_name": area_name})
    if name_kw:
        addr_query = {"company_name": {"$regex": name_kw}}
        match.update(addr_query)

    # 地理空间距离查询、距离值返回按距离升序排序
    near_match = {
        "$geoNear": {
            "near": {"type": "Point", "coordinates": current_geo},
            "distanceField": "distance",
            "includeLocs": "location",
            "query": match,
            "spherical": True
        }
    }

    # 过滤字段
    project = {
        "$project": {
            "_id": 0,
            "geo_point": 0
        }
    }
    # 分页处理
    skip = {"$skip": (int(page_no) - 1) * int(page_size)}
    limit = {"$limit": int(page_size)}

    # 聚合管道1: 匹配的总文档数量
    total_pipeline = [{"$match": match}, {"$count": "total"}]
    # 聚合管道2：匹配且分页后的附近公司文档（包含距离及坐标）
    near_pipeline = [near_match, project, skip, limit]
    
    # 先后执行两个聚合管道
    total_match_result = list(coll.aggregate(total_pipeline))
    if total_match_result:
        total_match_count = total_match_result[0].get("total")
        page_result = list(coll.aggregate(near_pipeline))
        return total_match_count, page_result

由于需要获取分页前总文档的数量，这里使用了两次聚合，一次聚合只获取匹配结果集总数量，一次聚合才是分页后的附近企业文档列表。

4. 测试输出

在后面对查询函数做输出测试：

if __name__ == '__main__':
    # 创建索引
    create_all_indexes()

    current_geo = [106.704223, 29.867201]
    area_name = "测试区域1"
    name_kw = None
    total_count, page_res = company_query(current_geo, area_name, name_kw)
    print(total_count, page_res)

输出如下：

匹配的结果总数是：4
附近企业分页结果集是：[
{'area_name': '测试区域1', 'company_name': '科技有限公司', 'distance': 58.19188972091601, 'location': {'type': 'Point', 'coordinates': [106.704063, 29.867705]}}, 
{'area_name': '测试区域1', 'company_name': '农业责任公司', 'distance': 58.48852829885492, 'location': {'type': 'Point', 'coordinates': [106.704064, 29.867708]}}
]

其中distance则是指定的距离计算结果字段，单位为米，该聚合已经自动按照距离从近到远对结果集进行了排序，我在集合数据量较大的情况下进行了测试，聚合速度依然非常快，在做好索引优化的情况下接口查询速度保持在100ms左右，满足正常使用。

总结

MongoDB的地理坐标索引的使用、查询、聚合知识点挺多，若有其他相关需求，建议查看官方文档，内容详细，可以较快且准确的解决问题。
且在使用的时候需要注意MongoDB的版本。