聚合操作处理数据记录并返回计算结果。聚合操作将来自多个文档的值组合在一起,并且可以对分组数据执行各种操作以返回单个结果。MongoDB提供了三种执行聚合的方式:聚合管道,map-reduce函数和单用途聚合方法。
1、单用途聚合操作
MongoDB了提供db.collection.estimatedDocumentCount(), db.collection.count()和db.collection.distinct()。
所有这些操作都聚合来自单个集合的文档。虽然这些操作提供了对公共聚合过程的简单访问,但它们缺乏聚合管道和map-reduce的灵活性和功能。
(1)db.collection.estimatedDocumentCount()
返回集合或视图中所有文档的计数
示例:检索orders集合中所有文档的计数
> db.orders.find() //orders集合
{ "_id" : ObjectId("5d3e560d55ad906481cb2ecc"), "cust_id" : "abc123", "status" : "A", "price" : 50, "items" : [ { "sku" : "xxx", "qty" : 25, "price" : 1 }, { "sku" : "yyy", "qty" : 25, "price" : 1 } ] }
{ "_id" : ObjectId("5d3e570e55ad906481cb2ecd"), "cust_id" : "def456", "status" : "A", "price" : 50, "items" : [ { "sku" : "zzz", "qty" : 25, "price" : 1 }, { "sku" : "www", "qty" : 25, "price" : 1 } ] }
{ "_id" : ObjectId("5d3e573955ad906481cb2ece"), "cust_id" : "ghi789", "status" : "B", "price" : 100, "items" : [ { "sku" : "xxx", "qty" : 25, "price" : 1 }, { "sku" : "yyy", "qty" : 25, "price" : 1 } ] }
{ "_id" : ObjectId("5d3e587855ad906481cb2ecf"), "cust_id" : "abc123", "status" : "A", "price" : 70, "items" : [ { "sku" : "xxx", "qty" : 25, "price" : 1 }, { "sku" : "yyy", "qty" : 25, "price" : 1 } ] }
> db.orders.estimatedDocumentCount({})
4
(2)db.collection.count()
返回与find()集合或视图的查询匹配的文档计数 。该 db.collection.count()方法不执行 find()操作,而是计算并返回与查询匹配的结果数。
需要注意的是,在分片群集上,如果存在孤立文档或 正在进行块迁移,则db.collection.count()没有查询谓词可能导致计数 不准确。要避免这些情况,请在分片群集上使用 db.collection.aggregate()方法。
示例:计算orders集合中的所有文档
> db.orders.count()
4
count()等同于 db.collection.find(query).count()构造。以上操作等同于:
> db.orders.find().count()
4
示例:计算与查询匹配的所有文档
> db.orders.count({price : {$gt : 50}})
2
> db.orders.find({price : {$gt : 50}}).count() //这两个查询等效
2
(3)db.collection.distinct()
在单个集合或视图中查找指定字段的不同值,并在数组中返回结果。
> db.orders.distinct("cust_id") //返回不同cust_id的数组
[ "abc123", "def456", "ghi789" ]
> db.orders.distinct("items.sku") //返回items字段中嵌入字段sku的数组
[ "xxx", "yyy", "www", "zzz" ]
> db.orders.distinct("price",{status : "A"}) //返回status字段为“A”,price字段不同值的数组
[ 50, 70 ]
2、Aggregation Pipeline聚合管道
db.collection.aggregate()是基于数据处理的聚合管道,每个文档通过一个由多个阶段(stage)组成的管道,可以对每个阶段的管道进行分组、过滤等功能,然后经过一系列的处理,输出相应的结果。
通过这张图,可以了解Aggregate处理的过程:
在这张图中:
第一阶段:$match阶段按status字段过滤文档,并将那些status等于A的文档传递给下一阶段;
第二阶段:$group阶段按cust_id字段对文档进行分组,以计算每个唯一cust_id的数量总和。
aggregate常用pipeline stage介绍:
(1)$count
返回包含输入到stage的文档的计数,理解为返回与表或视图的find()查询匹配的文档的计数。db.collection.count()方法不执行find()操作,而是计数并返回与查询匹配的结果数。
> db.orders.find() //演示集合
{ "_id" : ObjectId("5d3e560d55ad906481cb2ecc"), "cust_id" : "abc123", "status" : "A", "price" : 50, "items" : [ { "sku" : "xxx", "qty" : 25, "price" : 1 }, { "sku" : "yyy", "qty" : 25, "price" : 1 } ] }
{ "_id" : ObjectId("5d3e570e55ad906481cb2ecd"), "cust_id" : "def456", "status" : "A", "price" : 50, "items" : [ { "sku" : "zzz", "qty" : 25, "price" : 1 }, { "sku" : "www", "qty" : 25, "price" : 1 } ] }
{ "_id" : ObjectId("5d3e573955ad906481cb2ece"), "cust_id" : "ghi789", "status" : "B", "price" : 100, "items" : [ { "sku" : "xxx", "qty" : 25, "price" : 1 }, { "sku" : "yyy", "qty" : 25, "price" : 1 } ] }
{ "_id" : ObjectId("5d3e587855ad906481cb2ecf"), "cust_id" : "abc123", "status" : "A", "price" : 70, "items" : [ { "sku" : "xxx", "qty" : 25, "price" : 1 }, { "sku" : "yyy", "qty" : 25, "price" : 1 } ] }
> db.orders.aggregate([
... {$match : {price : {$gt:50}}}, //$match 阶段排除price小于等于50的文档,符合的文档传到下个阶段
... {$count : "High_price"} //$count阶段返回聚合管道中剩余文档的计数,并将该值分配给名为High_price的字段
... ])
{ "High_price" : 2 }
该$count阶段相当于以下 $group+ $project序列:
> db.orders.aggregate([
... {$group : {_id : null,MyCount : {$sum : 1}}},
... {$project : {_id : 0}}
... ])
{ "MyCount" : 4 }
(2) $group
按指定的表达式对文档进行分组,并将每个不同分组的文档输出到下一个阶段。输出文档包含一个_id字段,还可以包含计算字段,该字段保存由$group的_id字段分组的一些accumulator表达式的值。 $group不会输出具体的文档而只是统计信息。
accumulator操作符:
名称 | 描述 |
---|---|
$avg | 计算均值 |
$first | 返回每组第一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的第一个文档。 |
$last | 返回每组最后一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的最后个文档。 |
$max | 根据分组,获取集合中所有文档对应值得最大值。 |
$min | 根据分组,获取集合中所有文档对应值得最小值。 |
$push | 将指定的表达式的值添加到一个数组中。 |
$addToSet | 将表达式的值添加到一个集合中(无重复值,无序)。 |
$sum | 计算总和 |
示例1:$group阶段按月份,日期和年份对文档进行分组,并计算total price和average quantity,并计算每个组的文档数量:
> db.test.find() //示例集合test
{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }
> db.test.aggregate(
... [
... {
... $group : {
... _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },
... totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
... averageQuantity: { $avg: "$quantity" },
... count: { $sum: 1 }
... }
... }
... ]
... )
{ "_id" : { "month" : 4, "day" : 4, "year" : 2014 }, "totalPrice" : 200, "averageQuantity" : 15, "count" : 2 }
{ "_id" : { "month" : 3, "day" : 15, "year" : 2014 }, "totalPrice" : 50, "averageQuantity" : 10, "count" : 1 }
{ "_id" : { "month" : 3, "day" : 1, "year" : 2014 }, "totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2 }
示例2: group null , 以下聚合操作将指定组_id为null,计算集合中所有文档的总价格和平均数量以及计数:
> db.test.aggregate(
... [
... {
... $group : {
... _id : null,
... totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
... averageQuantity: { $avg: "$quantity" },
... count: { $sum: 1 }
... }
... }
... ]
... )
{ "_id" : null, "totalPrice" : 290, "averageQuantity" : 8.6, "count" : 5 }
示例3:检索不同值
> db.test.aggregate([{$group : {_id:"$item"}}])
{ "_id" : "xyz" }
{ "_id" : "jkl" }
{ "_id" : "abc" }
示例4:数据转换,将books集合中数据转化为具有按作者分组的标题
> db.books.find()
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
> db.books.aggregate(
... [
... { $group : { _id : "$author", books: { $push: "$title" } } }
... ]
... )
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
示例5:使用$$ROOT 系统变量按作者对文档进行分组,生成的文档不得超过BSON文档大小限制。
> db.books.aggregate(
... [
... { $group : { _id : "$author", books: { $push: "$$ROOT" } } }
... ]
... )
{ "_id" : "Homer", "books" : [ { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }, { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 } ] }
{ "_id" : "Dante", "books" : [ { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }, { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }, { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 } ] }
(3) $match
过滤文档,将符合指定条件的文档传递到下一个管道阶段。
$match接受一个指定查询条件的文档, $match不接受原始聚合表达式。
管道优化:$match用于对文档进行筛选,之后可以在得到的文档子集上做聚合,$match可以使用除了地理空间之外的所有常规查询操作符,在实际应用中尽可能将$match放在管道的前面位置。这样有两个好处:一是可以快速将不需要的文档过滤掉,以减少管道的工作量;二是如果再投射和分组之前执行$match,查询可以使用索引。
示例1:match做简单的匹配查询
> db.articles.find()
{ "_id" : ObjectId("5d3ac60cb3a8911f2c1a70db"), "author" : "Dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5d3ac617b3a8911f2c1a70dc"), "author" : "Dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("5d3ac62ab3a8911f2c1a70dd"), "author" : "Ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("5d3ac63bb3a8911f2c1a70de"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("5d3ac64bb3a8911f2c1a70df"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("5d3ac65fb3a8911f2c1a70e0"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("5d3ac66fb3a8911f2c1a70e1"), "author" : "Ty", "score" : 95, "views" : 1000 }
> db.articles.aggregate(
... [{$match : {author : "Dave"}}]
... )
{ "_id" : ObjectId("5d3ac60cb3a8911f2c1a70db"), "author" : "Dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("5d3ac617b3a8911f2c1a70dc"), "author" : "Dave", "score" : 85, "views" : 521 }
示例2:使用$match管道选择要处理的文档,然后将结果输出到$group管道以计算文档的计数($match选择score大于70小于90或views大于等于1000的文件,通过管道送往$group计数):
> db.articles.aggregate( [
... { $match: { $or: [ { score: { $gt: 70, $lt: 90 } }, { views: { $gte: 1000 } } ] } },
... { $group: { _id: null, count: { $sum: 1 } } }
... ] )
{ "_id" : null, "count" : 5 }
(4) $unwind
从输入文档解构数组字段以输出每个元素的文档。每个输出文档都是输入文档,其中数组字段的值由元素替换。
示例1:$unwind为sizes数组中的每个元素输出一个文档:
> db.inventory.find()
{ "_id" : 1, "item" : "ABC1", "sizes" : [ "S", "M", "L" ] }
> db.inventory.aggregate([{$unwind : "$sizes"}])
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }
path | 数组字段的字段路径。要指定字段路径,请在字段名称前加上美元符号$,并用引号括起来。 |
---|---|
includeArrayIndex | 可选的。用于保存元素的数组索引的新字段的名称。该名称不能以美元符号开头$。 |
preserveNullAndEmptyArrays | 可选的。如果true,如果path为null、缺少或为空数组,则 $unwind输出文档。如果false,$unwind如果path为null,缺少或空数组, 则不输出文档。默认值为false。 |
示例2:以下$unwind操作是等效的,并返回sizes字段中每个元素的文档
> db.inventory.find()
{ "_id" : 1, "item" : "ABC", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
> db.inventory.aggregate( [ { $unwind: "$sizes" } ]
... )
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
> db.inventory.aggregate( [ { $unwind: { path: "$sizes" } } ] ) //与上一个操作等价
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
示例3:$unwind操作使用includeArrayIndex选项来输出数组元素的数组索引。
> db.inventory.aggregate( [ { $unwind: { path: "$sizes", includeArrayIndex: "arrayIndex" } } ] )
{ "_id" : 1, "item" : "ABC", "sizes" : "S", "arrayIndex" : NumberLong(0) }
{ "_id" : 1, "item" : "ABC", "sizes" : "M", "arrayIndex" : NumberLong(1) }
{ "_id" : 1, "item" : "ABC", "sizes" : "L", "arrayIndex" : NumberLong(2) }
{ "_id" : 3, "item" : "IJK", "sizes" : "M", "arrayIndex" : null }
示例3:$unwind操作使用preserveNullAndEmptyArrays选项在输出中包含缺少size字段,null或空数组的文档。
> db.inventory.aggregate( [
... { $unwind: { path: "$sizes", preserveNullAndEmptyArrays: true } }
... ] )
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 2, "item" : "EFG" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
(5) $project
将包含请求字段的文档传递到管道中的下一个阶段。指定的字段可以是输入文档或新计算字段中的现有字段。
$project规范有以下形式:
< field>: < 1 or true> | 指定包含字段。 |
---|---|
_id: <0 or false> | 指定_id字段的抑制。默认情况下,_id字段包含在输出文档中。 |
< field>: < expression> | 添加新字段或重置现有字段的值。 在版本3.6中更改:MongoDB 3.6添加变量REMOVE。如果表达式的计算结果为$$REMOVE,则该字段将排除在输出中。 |
< field>:<0 or false> | 指定排除字段。 |
示例:$project阶段的输出文档中包含特定字段:
> db.book.find()
{ "_id" : 1, "title" : "abc123", "isbn" : "0001122223334", "author" : { "last" : "zzz", "first" : "aaa" }, "copies" : 5 }
> db.book.aggregate([
... {$project : {title : 1,author : 1}}
... ])
{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }
> db.book.aggregate([ {$project : {_id : 0,title : 1,author : 1}} ]) //抑制输出文档中_id字段
{ "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }
示例2:从输出文档中排除特定字段
> db.book.find().pretty()
{
"_id" : 1,
"title" : "abc123",
"isbn" : "0001122223334",
"author" : {
"last" : "zzz",
"first" : "aaa"
},
"copies" : 5,
"lastModified" : "2016-07-28"
}
> db.book.aggregate([
... {$project : {"lastModified" : 0}}
... ]) //排除lastModified字段
{ "_id" : 1, "title" : "abc123", "isbn" : "0001122223334", "author" : { "last" : "zzz", "first" : "aaa" }, "copies" : 5 }
> db.book.aggregate([ {$project : {"author.first" : 0,"lastModified" : 0}} ]) //从嵌入式文档中排除字段
{ "_id" : 1, "title" : "abc123", "isbn" : "0001122223334", "author" : { "last" : "zzz" }, "copies" : 5 }
> db.book.aggregate([ {$project : {"author" : {"first" : 0},"lastModified" : 0}} ]) //将排除规范嵌套的文档中
{ "_id" : 1, "title" : "abc123", "isbn" : "0001122223334", "author" : { "last" : "zzz" }, "copies" : 5 }
从MongoDB 3.6开始,可以在聚合表达式中使用变量REMOVE来有条件地禁止一个字段。
示例3:$project阶段使用REMOVE变量来排除author.middle字段,前提是它等于""
> db.book.find().pretty()
{
"_id" : 1,
"title" : "abc123",
"isbn" : "0001122223334",
"author" : {
"last" : "zzz",
"first" : "aaa"
},
"copies" : 5,
"lastModified" : "2016-07-28"
}
{
"_id" : 2,
"title" : "Baked Goods",
"isbn" : "9999999999999",
"author" : {
"last" : "xyz",
"first" : "abc",
"middle" : ""
},
"copies" : 2,
"lastModified" : "2017-07-21"
}
{
"_id" : 3,
"title" : "Ice Cream Cakes",
"isbn" : "8888888888888",
"author" : {
"last" : "xyz",
"first" : "abc",
"middle" : "mmm"
},
"copies" : 5,
"lastModified" : "2017-07-22"
}
> db.book.aggregate([
... {
... $project: {
... title: 1,
... "author.first": 1,
... "author.last" : 1,
... "author.middle": {
... $cond: {
... if: { $eq: [ "", "$author.middle" ] },
... then: "$$REMOVE",
... else: "$author.middle"
... }
... }
... }
... }
... ] )
{ "_id" : 1, "title" : "abc123", "author" : { "last" : "zzz", "first" : "aaa" } }
{ "_id" : 2, "title" : "Baked Goods", "author" : { "last" : "xyz", "first" : "abc" } }
{ "_id" : 3, "title" : "Ice Cream Cakes", "author" : { "last" : "xyz", "first" : "abc", "middle" : "mmm" } }
投影出新数组字段
示例4:下面的聚合操作将返回新的数组字段MyArray
> db.collection.find()
{ "_id" : ObjectId("5d3ea6a4a76866b3f79f0754"), "x" : 1, "y" : 1 }
> db.collection.aggregate([{$project : {MyArray : ["$x","$y"]}}])
{ "_id" : ObjectId("5d3ea6a4a76866b3f79f0754"), "MyArray" : [ 1, 1 ] }
//如果返回的数组包含了不存在的字段,则会返回null:
> db.collection.aggregate([{$project : {MyArray : ["$x","$y","$testField"]}}])
{ "_id" : ObjectId("5d3ea6a4a76866b3f79f0754"), "MyArray" : [ 1, 1, null ] }
(6) $limit
限制传递到管道中下一个阶段的文档数 。
示例:
> db.article.aggregate(
... { $limit : 5 }
... )
此操作仅返回管道传递给它的前5个文档。$limit对其通过的文件的内容没有影响。
注:当$sort在管道中的$limit之前立即出现时,$sort操作只会在过程中维持前n个结果,其中n是指定的限制,而MongoDB只需要将n个项存储在内存中。当allowDiskUse为true并且n个项目超过聚合内存限制时,此优化仍然适用。
(7) $skip
跳过进入stage的指定数量的文档,并将其余文档传递到管道中的下一个阶段
示例:
> db.article.aggregate(
... { $skip : 5 }
... )
此操作会跳过管道传递给它的前5个文档。$skip对通过管道的文件内容没有影响。
(8) $sort
对所有输入文档进行排序,并按排序顺序将它们返回到管道。
示例:
> db.users.aggregate(
... [
... { $sort : { age : -1, posts: 1 } }
... ]
... )
{ "_id" : ObjectId("5d3eaa8aa76866b3f79f0757"), "name" : "lala", "age" : 20, "posts" : 600 }
{ "_id" : ObjectId("5d3eaa7aa76866b3f79f0756"), "name" : "hehe", "age" : 19, "posts" : 0 }
{ "_id" : ObjectId("5d3eaa6ea76866b3f79f0755"), "name" : "haha", "age" : 19, "posts" : 710 }
比较不同BSON类型的值时,MongoDB使用以下比较顺序,从最低到最高:
> db.users.aggregate(
... [
... { $sort : { age : -1, posts: 1 } }
... ]
... )
{ "_id" : ObjectId("5d3eaae7a76866b3f79f0758"), "name" : "hehe", "age" : "asdf", "posts" : 0 } //字符串高于数字
{ "_id" : ObjectId("5d3eaa8aa76866b3f79f0757"), "name" : "lala", "age" : 20, "posts" : 600 }
{ "_id" : ObjectId("5d3eaa7aa76866b3f79f0756"), "name" : "hehe", "age" : 19, "posts" : 0 }
{ "_id" : ObjectId("5d3eaa6ea76866b3f79f0755"), "name" : "haha", "age" : 19, "posts" : 710 }
(9)$sortByCount
3.4版本新增。根据指定表达式的值对传入文档分组,然后计算每个不同组中文档的数量。每个输出文档都包含两个字段:包含不同分组值的_id字段和包含属于该分组或类别的文档数的计数字段,文件按降序排列。
该$sortByCount阶段相当于以下$group+ $sort序列:
{ $ group : {_ id : < expression > , count : { $ sum : 1 } } },
{ $ sort : { count : - 1 } }
示例:$unwind解构数组tags,$sortByCount计算与每个标签相关联的文档的数量并按降序排列
> db.exhibits.find()
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "tags" : [ "painting", "satire", "Expressionism", "caricature" ] }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "tags" : [ "woodcut", "Expressionism" ] }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "tags" : [ "oil", "Surrealism", "painting" ] }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "tags" : [ "woodblock", "ukiyo-e" ] }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "tags" : [ "Surrealism", "painting", "oil" ] }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "tags" : [ "oil", "painting", "abstract" ] }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893, "tags" : [ "Expressionism", "painting", "oil" ] }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "tags" : [ "abstract", "painting" ] }
> db.exhibits.aggregate([
... {$unwind : "$tags"},
... {$sortByCount : "$tags"}
... ])
{ "_id" : "painting", "count" : 6 }
{ "_id" : "oil", "count" : 4 }
{ "_id" : "Expressionism", "count" : 3 }
{ "_id" : "abstract", "count" : 2 }
{ "_id" : "Surrealism", "count" : 2 }
{ "_id" : "ukiyo-e", "count" : 1 }
{ "_id" : "woodblock", "count" : 1 }
{ "_id" : "woodcut", "count" : 1 }
{ "_id" : "satire", "count" : 1 }