mongodb聚合操作

一. 聚合操作

聚合操作处理数据记录并返回计算结果(诸如统计平均值,求和等)。聚合操作组值来自多个文档,可以对分组数据执行各种操作以返回单个结果。聚合操作包含三类:单一作用聚合、聚合管道、MapReduce。

  • 单一作用聚合:提供了对常见聚合过程的简单访问,操作都从单个集合聚合文档。
  • 聚合管道是一个数据聚合的框架,模型基于数据处理流水线的概念。文档进入多级管道,将文档转换为聚合结果。
  • MapReduce操作具有两个阶段:处理每个文档并向每个输入文档发射一个或多个对象的map阶
    段,以及reduce组合map操作的输出阶段。

二. 单一作用聚合

MongoDB提供 db.collection.estimatedDocumentCount(), db.collection.count(), db.collection.distinct()这类单一作用的聚合函数。 所有这些操作都聚合来自单个集合的文档。虽然这些操作提供了对公共聚合过程的简单访问,但它们缺乏聚合管道和map-Reduce的灵活性和功能。

函数 描述
db.collection.estimatedDocumentCount() 忽略查询条件,返回集合或视图中所有文档的计数
db.collection.count() 返回与find()集合或视图的查询匹配的文档计数 ,等同于 db.collection.find(query).count()构造
db.collection.distinct() 在单个集合或视图中查找指定字段的不同值,并在数组中返回结果
db.books.estimatedDocumentCount()	# 检索books集合中所有文档的计数
db.books.count({favCount:{$gt:50}})	# 计算与查询匹配的所有文档
db.books.distinct("type")	# 返回不同type的数组
db.books.distinct("type",{favCount:{$gt:90}})  # 返回收藏数大于90的文档不同type的数组

注意:在分片群集上,如果存在孤立文档或正在进行块迁移,则db.collection.count()没有查询谓词可能导致计数不准确。要避免这些情况,请在分片群集上使用 db.collection.aggregate()方法。

三. 聚合管道

3.1 MongnDB聚合框架

MongoDB 聚合框架(Aggregation Framework)是一个计算框架,它可以:

  • 作用在一个或几个集合上
  • 对集合中的数据进行的一系列运算
  • 将这些数据转化为期望的形式

从效果而言,聚合框架相当于 SQL 查询中的GROUP BYLEFT OUTER JOINAS等。

3.2 管道(Pipeline)和阶段(Stage)

整个聚合运算过程称为管道(Pipeline),它是由多个阶段(Stage)组成的, 每个管道:

  • 接受一系列文档(原始数据)
  • 每个阶段对这些文档进行一系列运算
  • 结果文档输出给下一个阶段
    mongodb聚合操作_第1张图片

3.3 聚合管道语法

# pipeline: 一组数据聚合阶段。除$out、$Merge和$geonear阶段之外,每个阶段都可以在管道中出现多次
pipeline = [$stage1, $stage2, ...$stageN]	

# options: 可选,聚合操作的其他参数。包含:查询计划、是否使用临时文件、 游标、最大操作时间、读写策略、强制索引等等
db.collection.aggregate(pipeline, {options})	

常用的聚合管道操作

阶段 描述 SQL等价运算符
$match 筛选条件 WHERE
$project 投影 AS和列过滤
$lookup 左外连接 LEFT OUTER JOIN
$sort 排序 ORDER BY
$group 分组 GROUP BY
$skip/$limit 分页
$unwind 展开数组
$graphLookup 图搜索
$facet/$bucket 分面搜索

数据准备:

var tags = ["nosql","mongodb","document","developer","popular"];
var types = ["technology","sociality","travel","novel","literature"];
var books=[];
for(var i=0;i<50;i++){
    var typeIdx = Math.floor(Math.random()*types.length);
    var tagIdx = Math.floor(Math.random()*tags.length);
    var tagIdx2 = Math.floor(Math.random()*tags.length);
    var favCount = Math.floor(Math.random()*100);
    var username = "xx00"+Math.floor(Math.random()*10);
    var age = 20 + Math.floor(Math.random()*15);
    var book = {
        title: "book-"+i,
        type: types[typeIdx],
        tag: [tags[tagIdx],tags[tagIdx2]],
        favCount: favCount,
        author: {name:username,age:age}
    };
    books.push(book)
}
db.books.insertMany(books);
  • $project
    投影操作,修改输入文档的结构,可以用来重命名、增加或删除域(即字段), 如将集合中的 title 投影成 name。

    db.books.aggregate([{$project:{name: "$title", booktag: "$tag"}}])	# 给title和tag这两列重命名并输出这两列
    db.books.aggregate([{$project:{name: "$title", booktag: "$tag", _id:0, type:1, author:1}}])		# 对指定字段重命名并剔除不需要的字段
    

    从嵌套文档中剔除字段

    db.books.aggregate([{$project:{name:'title', _id:0, tag:1, 'author.name':1}}])
    或者
    db.books.aggregate([{$project:{name:'title', _id:0, tag:1, author:{name:1}}}])
    
  • $match
    $match用于对文档进行筛选,之后可以在得到的文档子集上做聚合,$match可以使用除了地理空间之外的所有常规查询操作符,在实际应用中尽可能将$match放在管道的前面位置。在这里插入代码片这样有两个好处:一是可以快速将不需要的文档过滤掉,以减少管道的工作量;二是如果再投射和分组之前执行$match,查询可以使用索引。

    db.books.aggregate([{$match:{type:"technology"}}])	
    db.books.find({type: "technology"})
    
  • $count

    # 计数并返回与查询匹配的结果数
    > db.books.aggregate([{$match:{type: "technology"}}, {$count: "type_count"}])			
    { "type_count" : 10 }
    

    $match阶段筛选出 type 匹配 technology 的文档,并传到下一阶段;
    $count阶段返回聚合管道中剩余文档的计数,并将该值分配给 type_count

  • $group
    按指定的表达式对文档进行分组,并将每个不同分组的文档输出到下一个阶段。输出文档包含一个_id字段,该字段按键包含不同的组。输出文档还可以包含计算字段,该字段保存由$group的_id字段分组的一些accumulator表达式的值。$group不会输出具体的文档而只是统计信息。

    { $group: { _id: , : {  :  }, ...} }
    
    - _id字段是必填的,是要分组的字段; 但是,可以指定_id值为null来为整个输入文档计算累计值。
    - _id和表达式可以接受任何有效的表达式。
    

    $group阶段的内存限制为100M。默认情况下,如果stage超过此限制,$group将产生错误。但是,要允许处理大型数据集,请将allowDiskUse选项设置为true以启用$group操作以写入临时文件。

    accumulator操作符:

    名称 描述 类比SQL
    $avg 计算均值 avg
    $first 返回每组第一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的第一个文档 limit 0, 1
    $last 返回每组最后一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的最后个文档 -
    $max 根据分组,获取集合中所有文档对应值得最大值 max
    $min 根据分组,获取集合中所有文档对应值得最小值 min
    $push 将指定的表达式的值添加到一个数组中 -
    $addToSet 将表达式的值添加到一个集合中(无重复值,无序) -
    $sum 计算总和 sum
    $stdDevPop 返回输入值的总体标准偏差 -
    $stdDevSamp 返回输入值的样本标准偏差 -
    # book的数量,收藏总数和平均值
    > db.books.aggregate([{$group: {_id:"$_id", count:{$sum:1}, pop:{$sum: "$favCount"}, avg:{$avg: "$favCount"}}}])
    { "_id" : ObjectId("64852f979fc464aa2dcc2a78"), "count" : 1, "pop" : 33, "avg" : 33 }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a79"), "count" : 1, "pop" : 22, "avg" : 22 }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a76"), "count" : 1, "pop" : 16, "avg" : 16 }
    
    # 每本book的收藏平均数和收藏总数
    > db.books.aggregate([{$group: {_id:"$title", count:{$sum:1}, pop:{$sum: "$favCount"}, avg:{$avg: "$favCount"}}}])
    { "_id" : "book-49", "count" : 1, "pop" : 22, "avg" : 22 }
    { "_id" : "book-44", "count" : 1, "pop" : 92, "avg" : 92 }
    { "_id" : "book-48", "count" : 1, "pop" : 33, "avg" : 33 }
    
    # 每个作者的book收藏总数
    > db.books.aggregate([{$group: {_id: '$author.name', pop: {$sum: "$favCount"}}}])
    { "_id" : "xx006", "pop" : 22 }
    { "_id" : "xx002", "pop" : 96 }
    { "_id" : "xx007", "pop" : 178 }
    
    # 求每个作者每篇book的收藏数
    > db.books.aggregate([{$group:{_id: {name: "$author.name", title: "$title"}, pop: {$sum: "$favCount"}}}])
    { "_id" : { "name" : "xx001", "title" : "book-48" }, "pop" : 33 }
    { "_id" : { "name" : "xx008", "title" : "book-44" }, "pop" : 92 }
    { "_id" : { "name" : "xx005", "title" : "book-43" }, "pop" : 56 }
    
    # 每个作者的book的type合集
    > db.books.aggregate([{$group: {_id: "$author.name", types: {$addToSet: "$type"}}}])
    { "_id" : "xx006", "types" : [ "technology" ] }
    { "_id" : "xx002", "types" : [ "novel" ] }
    { "_id" : "xx007", "types" : [ "sociality", "literature", "novel" ] }
    { "_id" : "xx005", "types" : [ "literature", "travel", "sociality" ] }
    { "_id" : "xx008", "types" : [ "sociality", "travel", "literature", "technology", "novel" ] }
    
  • $limit
    限制输出指定数量的文档

    > db.books.aggregate([{$limit: 5}])
    { "_id" : ObjectId("64852f979fc464aa2dcc2a48"), "title" : "book-0", "type" : "travel", "tag" : [ "nosql", "document" ], "favCount" : 90, "author" : { "name" : "xx001", "age" : 28 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a49"), "title" : "book-1", "type" : "literature", "tag" : [ "mongodb", "popular" ], "favCount" : 36, "author" : { "name" : "xx003", "age" : 25 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a4a"), "title" : "book-2", "type" : "technology", "tag" : [ "popular", "popular" ], "favCount" : 98, "author" : { "name" : "xx004", "age" : 21 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a4b"), "title" : "book-3", "type" : "literature", "tag" : [ "nosql", "nosql" ], "favCount" : 99, "author" : { "name" : "xx000", "age" : 29 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a4c"), "title" : "book-4", "type" : "technology", "tag" : [ "popular", "nosql" ], "favCount" : 30, "author" : { "name" : "xx009", "age" : 30 } }
    
  • $skip
    跳过指定数量的文档,并将其余文档传递给管道中的下一个阶段

    > db.books.aggregate([{$skip: 5}])
    { "_id" : ObjectId("64852f979fc464aa2dcc2a4d"), "title" : "book-5", "type" : "sociality", "tag" : [ "popular", "document" ], "favCount" : 69, "author" : { "name" : "xx003", "age" : 20 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a4e"), "title" : "book-6", "type" : "travel", "tag" : [ "popular", "developer" ], "favCount" : 14, "author" : { "name" : "xx009", "age" : 25 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a4f"), "title" : "book-7", "type" : "novel", "tag" : [ "developer", "mongodb" ], "favCount" : 20, "author" : { "name" : "xx009", "age" : 21 } }
    ...
    ...
    
  • $sort
    对所有输入文档进行排序,并按排序顺序将它们返回到管道

    # 对所有book,先按照favCount降序排列,再按照作者年龄升序排列
    > db.books.aggregate([{$sort: {favCount:-1, 'author.name':1}}])
    { "_id" : ObjectId("64852f979fc464aa2dcc2a57"), "title" : "book-15", "type" : "novel", "tag" : [ "developer", "nosql" ], "favCount" : 95, "author" : { "name" : "xx008", "age" : 26 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a6e"), "title" : "book-38", "type" : "novel", "tag" : [ "popular", "nosql" ], "favCount" : 92, "author" : { "name" : "xx000", "age" : 29 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a60"), "title" : "book-24", "type" : "novel", "tag" : [ "nosql", "document" ], "favCount" : 92, "author" : { "name" : "xx001", "age" : 28 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a74"), "title" : "book-44", "type" : "travel", "tag" : [ "mongodb", "popular" ], "favCount" : 92, "author" : { "name" : "xx008", "age" : 34 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a48"), "title" : "book-0", "type" : "travel", "tag" : [ "nosql", "document" ], "favCount" : 90, "author" : { "name" : "xx001", "age" : 28 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a5c"), "title" : "book-20", "type" : "sociality", "tag" : [ "developer", "mongodb" ], "favCount" : 90, "author" : { "name" : "xx003", "age" : 25 } }
    
  • unwind
    可以将数组拆分为单独的文档。

    筛选type为travle的,并且favCount>= 90的book
    > db.books.aggregate([{$match: {type:"travel", favCount: {"$gte": 90}}}])
    { "_id" : ObjectId("64852f979fc464aa2dcc2a48"), "title" : "book-0", "type" : "travel", "tag" : [ "nosql", "document" ], "favCount" : 90, "author" : { "name" : "xx001", "age" : 28 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a74"), "title" : "book-44", "type" : "travel", "tag" : [ "mongodb", "popular" ], "favCount" : 92, "author" : { "name" : "xx008", "age" : 34 } }
    
    筛选type为travle的,并且favCount>= 90的book,将tag拆成单独的文档。
    > db.books.aggregate([{$match: {type:"travel", favCount: {"$gte": 90}}}, {$unwind: "$tag"}])
    { "_id" : ObjectId("64852f979fc464aa2dcc2a48"), "title" : "book-0", "type" : "travel", "tag" : "nosql", "favCount" : 90, "author" : { "name" : "xx001", "age" : 28 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a48"), "title" : "book-0", "type" : "travel", "tag" : "document", "favCount" : 90, "author" : { "name" : "xx001", "age" : 28 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a74"), "title" : "book-44", "type" : "travel", "tag" : "mongodb", "favCount" : 92, "author" : { "name" : "xx008", "age" : 34 } }
    { "_id" : ObjectId("64852f979fc464aa2dcc2a74"), "title" : "book-44", "type" : "travel", "tag" : "popular", "favCount" : 92, "author" : { "name" : "xx008", "age" : 34 } }
    
  • $lookup
    主要用来实现多表关联查询, 相当关系型数据库中多表关联查询。每个输入
    待处理的文档,经过$lookup阶段的处理,输出的新文档中会包含一个新生成的数组(可根据需要命名新key)。数组列存放的数据是来自被Join集合的适配文档,如果没有,集合为空(即为[ ])。

    使用语法:

    db.collection.aggregate([{
    	$lookup: {
    		from: "",			# 同一个数据库下等待被Join的集合
    		localField: "",		# 关联表的关联字段
    		foreignField: "",		# 被关联表的关联字段
    		as: ""		# 新的字段名称
    	}
    })
    

    数据准备:

    用户表:
    db.customer.insert({customerCode:1,name:"customer1",phone:"13112345678",address:
    "test1"})
    db.customer.insert({customerCode:2,name:"customer2",phone:"13112345679",address:
    "test2"})
    
    订单表:
    db.order.insert({orderId:1,orderCode:"order001",customerCode:1,price:200})
    db.order.insert({orderId:2,orderCode:"order002",customerCode:2,price:400})
    
    订单详情表:
    db.orderItem.insert({itemId:1,productName:"apples",qutity:2,orderId:1})
    db.orderItem.insert({itemId:2,productName:"oranges",qutity:2,orderId:1})
    db.orderItem.insert({itemId:3,productName:"mangoes",qutity:2,orderId:1})
    db.orderItem.insert({itemId:4,productName:"apples",qutity:2,orderId:2})
    db.orderItem.insert({itemId:5,productName:"oranges",qutity:2,orderId:2})
    db.orderItem.insert({itemId:6,productName:"mangoes",qutity:2,orderId:2})
    
    查询是否插入成功
    > db.customer.find()
    { "_id" : ObjectId("648551efea30a987fba9edae"), "customerCode" : 1, "name" : "customer1", "phone" : "13112345678", "address" : "test1" }
    { "_id" : ObjectId("648551f4ea30a987fba9edaf"), "customerCode" : 2, "name" : "customer2", "phone" : "13112345679", "address" : "test2" }
    > db.order.find()
    { "_id" : ObjectId("648551f9ea30a987fba9edb0"), "orderId" : 1, "orderCode" : "order001", "customerCode" : 1, "price" : 200 }
    { "_id" : ObjectId("648551fbea30a987fba9edb1"), "orderId" : 2, "orderCode" : "order002", "customerCode" : 2, "price" : 400 }
    > db.orderItem.find()
    { "_id" : ObjectId("64855204ea30a987fba9edb2"), "itemId" : 1, "productName" : "apples", "qutity" : 2, "orderId" : 1 }
    { "_id" : ObjectId("64855204ea30a987fba9edb3"), "itemId" : 2, "productName" : "oranges", "qutity" : 2, "orderId" : 1 }
    { "_id" : ObjectId("64855204ea30a987fba9edb4"), "itemId" : 3, "productName" : "mangoes", "qutity" : 2, "orderId" : 1 }
    { "_id" : ObjectId("64855204ea30a987fba9edb5"), "itemId" : 4, "productName" : "apples", "qutity" : 2, "orderId" : 2 }
    { "_id" : ObjectId("64855204ea30a987fba9edb6"), "itemId" : 5, "productName" : "oranges", "qutity" : 2, "orderId" : 2 }
    { "_id" : ObjectId("64855205ea30a987fba9edb7"), "itemId" : 6, "productName" : "mangoes", "qutity" : 2, "orderId" : 2 }
    

    用户表和订单表关联查询:

    > db.customer.aggregate([
    	{$lookup: {
    		from: "order", 
    		localField: "customerCode", 
    		foreignField: "customerCode", 
    		as: "customerOrder"
    	}}
      ])
    查询结果:
    { "_id" : ObjectId("648551efea30a987fba9edae"), "customerCode" : 1, "name" : "customer1", "phone" : "13112345678", "address" : "test1", "customerOrder" : [ { "_id" : ObjectId("648551f9ea30a987fba9edb0"), "orderId" : 1, "orderCode" : "order001", "customerCode" : 1, "price" : 200 } ] }
    { "_id" : ObjectId("648551f4ea30a987fba9edaf"), "customerCode" : 2, "name" : "customer2", "phone" : "13112345679", "address" : "test2", "customerOrder" : [ { "_id" : ObjectId("648551fbea30a987fba9edb1"), "orderId" : 2, "orderCode" : "order002", "customerCode" : 2, "price" : 400 } ] }
    >
    >
    > db.customer.aggregate([{$lookup: {from: "order", localField: "customerCode", foreignField: "customerCode", as: "customerOrder"}}]).pretty()
    {
            "_id" : ObjectId("648551efea30a987fba9edae"),
            "customerCode" : 1,
            "name" : "customer1",
            "phone" : "13112345678",
            "address" : "test1",
            "customerOrder" : [
                    {
                            "_id" : ObjectId("648551f9ea30a987fba9edb0"),
                            "orderId" : 1,
                            "orderCode" : "order001",
                            "customerCode" : 1,
                            "price" : 200
                    }
            ]
    }
    {
            "_id" : ObjectId("648551f4ea30a987fba9edaf"),
            "customerCode" : 2,
            "name" : "customer2",
            "phone" : "13112345679",
            "address" : "test2",
            "customerOrder" : [
                    {
                            "_id" : ObjectId("648551fbea30a987fba9edb1"),
                            "orderId" : 2,
                            "orderCode" : "order002",
                            "customerCode" : 2,
                            "price" : 400
                    }
            ]
    }
    

    用户表,订单表和订单详情表,三表关联查询:

    	
    > db.order.aggregate([
    	{$lookup: {
    		from: "customer", 
    		localField: "customerCode", 
    		foreignField: "customerCode", 
    		as: "customer"
    	}}, 
    	{$lookup: {
    		from: "orderItem", 
    		localField: "orderId", 
    		foreignField: "orderId", 
    		as: "orderItem"
    	}}
      ]).pretty()
    
    查询结果:
    {
            "_id" : ObjectId("648551f9ea30a987fba9edb0"),
            "orderId" : 1,
            "orderCode" : "order001",
            "customerCode" : 1,
            "price" : 200,
            "customer" : [
                    {
                            "_id" : ObjectId("648551efea30a987fba9edae"),
                            "customerCode" : 1,
                            "name" : "customer1",
                            "phone" : "13112345678",
                            "address" : "test1"
                    }
            ],
            "orderItem" : [
                    {
                            "_id" : ObjectId("64855204ea30a987fba9edb2"),
                            "itemId" : 1,
                            "productName" : "apples",
                            "qutity" : 2,
                            "orderId" : 1
                    },
                    {
                            "_id" : ObjectId("64855204ea30a987fba9edb3"),
                            "itemId" : 2,
                            "productName" : "oranges",
                            "qutity" : 2,
                            "orderId" : 1
                    },
                    {
                            "_id" : ObjectId("64855204ea30a987fba9edb4"),
                            "itemId" : 3,
                            "productName" : "mangoes",
                            "qutity" : 2,
                            "orderId" : 1
                    }
            ]
    }
    {
            "_id" : ObjectId("648551fbea30a987fba9edb1"),
            "orderId" : 2,
            "orderCode" : "order002",
            "customerCode" : 2,
            "price" : 400,
            "customer" : [
                    {
                            "_id" : ObjectId("648551f4ea30a987fba9edaf"),
                            "customerCode" : 2,
                            "name" : "customer2",
                            "phone" : "13112345679",
                            "address" : "test2"
                    }
            ],
            "orderItem" : [
                    {
                            "_id" : ObjectId("64855204ea30a987fba9edb5"),
                            "itemId" : 4,
                            "productName" : "apples",
                            "qutity" : 2,
                            "orderId" : 2
                    },
                    {
                            "_id" : ObjectId("64855204ea30a987fba9edb6"),
                            "itemId" : 5,
                            "productName" : "oranges",
                            "qutity" : 2,
                            "orderId" : 2
                    },
                    {
                            "_id" : ObjectId("64855205ea30a987fba9edb7"),
                            "itemId" : 6,
                            "productName" : "mangoes",
                            "qutity" : 2,
                            "orderId" : 2
                    }
            ]
    }
    

你可能感兴趣的:(mongodb,数据库)