MongoDB聚合操作相当于关系型数据库SQL语句的"group by"、“order by"这种语句,而我们MongoDB的聚合操作也是使用到group这个聚合函数,但是在介绍聚合函数之前,需要先介绍一个,查询命令"db.collection.aggregate()”,之前我们使用的查询方法都是”db.collection.find()“;而当MongoDB需要进行聚合操作时,例如取平均值、统计数据,就需要用到"db.collection.aggregate()"方法了。
###插入一批测试数据:name为匹萨的种类、size是披萨的尺寸、price是披萨的单价、quantity是售卖的订单数量、date为时间。
db.orders.insertMany( [
{ _id: 0, name: "Pepperoni", size: "small", price: 19,
quantity: 10, date: ISODate( "2021-03-13T08:14:30Z" ) },
{ _id: 1, name: "Pepperoni", size: "medium", price: 20,
quantity: 20, date : ISODate( "2021-03-13T09:13:24Z" ) },
{ _id: 2, name: "Pepperoni", size: "large", price: 21,
quantity: 30, date : ISODate( "2021-03-17T09:22:12Z" ) },
{ _id: 3, name: "Cheese", size: "small", price: 12,
quantity: 15, date : ISODate( "2021-03-13T11:21:39.736Z" ) },
{ _id: 4, name: "Cheese", size: "medium", price: 13,
quantity:50, date : ISODate( "2022-01-12T21:23:13.331Z" ) },
{ _id: 5, name: "Cheese", size: "large", price: 14,
quantity: 10, date : ISODate( "2022-01-12T05:08:13Z" ) },
{ _id: 6, name: "Vegan", size: "small", price: 17,
quantity: 10, date : ISODate( "2021-01-13T05:08:13Z" ) },
{ _id: 7, name: "Vegan", size: "medium", price: 18,
quantity: 10, date : ISODate( "2021-01-13T05:10:13Z" ) }
] )
###查询数据,使用db.orders.aggregate()也可以像find命令一样直接查询数据。
Enterprise test [direct: primary] mongodb_test> db.orders.aggregate()
[
{
_id: 0,
name: 'Pepperoni',
size: 'small',
price: 19,
quantity: 10,
date: ISODate('2021-03-13T08:14:30.000Z')
},
{
_id: 1,
name: 'Pepperoni',
size: 'medium',
price: 20,
quantity: 20,
date: ISODate('2021-03-13T09:13:24.000Z')
},
{
_id: 2,
name: 'Pepperoni',
size: 'large',
price: 21,
quantity: 30,
date: ISODate('2021-03-17T09:22:12.000Z')
},
...
###使用聚合操作过滤size字段等于medium的数据,按照name进行分组排序,计算quantity列数据的总和,其中过滤使用到match函数,分组排序使用到group函数、计算总和用到sum函数。
mongodb_test> db.orders.aggregate( [
{
$match: { size: "medium" }
},
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
] )
...
[
{ _id: 'Cheese', totalQuantity: 50 },
{ _id: 'Vegan', totalQuantity: 10 },
{ _id: 'Pepperoni', totalQuantity: 20 }
]
###此时通过查询可以发现,我们的数据并没有被修改或变更。所有的操作都在内存中实现
Enterprise test [direct: primary] mongodb_test> db.orders.aggregate()
[
{
_id: 0,
name: 'Pepperoni',
size: 'small',
price: 19,
quantity: 10,
date: ISODate('2021-03-13T08:14:30.000Z')
},
...
###过滤一个指定日期范围内,每天的披萨的订单总额是多少,并且所有种类披萨加一起,平均出了多少单。根据订单总额从大到小排序
Enterprise test [direct: primary] mongodb_test> db.orders.aggregate( [
...
... // Stage 1: Filter pizza order documents by date range
... {
... $match:
... {
... "date": { $gte: new ISODate( "2020-01-30" ), $lt: new ISODate( "2022-01-30" ) }
... }
... },
...
... // Stage 2: Group remaining documents by date and calculate results
... {
... $group:
... {
... _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
... totalOrderValue: { $sum: { $multiply: [ "$price", "$quantity" ] } },
... averageOrderQuantity: { $avg: "$quantity" }
... }
... },
...
... // Stage 3: Sort documents by totalOrderValue in descending order
... {
... $sort: { totalOrderValue: -1 }
... }
...
... ] )
[
{ _id: '2022-01-12', totalOrderValue: 790, averageOrderQuantity: 30 },
{ _id: '2021-03-13', totalOrderValue: 770, averageOrderQuantity: 15 },
{ _id: '2021-03-17', totalOrderValue: 630, averageOrderQuantity: 30 },
{ _id: '2021-01-13', totalOrderValue: 350, averageOrderQuantity: 10 }
测试数据地址:https://media.mongodb.org/zips.json
### 安装mongodb tools工具,里面包含了mongoimport
root # wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-rhel70-x86_64-100.10.0.rpm
root # yum localinstall -y mongodb-database-tools-rhel70-x86_64-100.10.0.rpm
### 下载测试文件到服务器上
root # wget https://media.mongodb.org/zips.json
### 使用mongodb的mongoimport 导入数据到MongoDB里
root # mongoimport --username admin --password admin --host 127.0.0.1 --port 27017 --authenticationDatabase=admin -d mongodb_test --type=json --file=./zips.json
1.根据zips集合中的state字段(州)和pop(城市区域人口)列的相加进行分组排序,再计算小于100万人口的州有哪些。
db.zips.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $lt: 1000000 } } }
] )
[
{ _id: 'ND', totalPop: 638272 },
{ _id: 'AK', totalPop: 544698 },
{ _id: 'MT', totalPop: 798948 },
{ _id: 'VT', totalPop: 562758 },
{ _id: 'SD', totalPop: 695397 },
{ _id: 'WY', totalPop: 453528 },
{ _id: 'DC', totalPop: 606900 },
{ _id: 'DE', totalPop: 666168 }
]
这个测试数据中,不同的州可能拥有相同的城市、相同的城市区域不同,人口不同。
2.计算每个州state、每个城市city,每个区域的,平均人口有多少。需要先按照州、城市、和区域进行分组排序,然后将人口进行相加。随后通过分组后的数据,二次根据州城市分组排序,取总人口的平均值。
###先计算每个州每个城市的总人口,数据量很大。
db.zips.aggregate( [
{ $group: { _id: { state: "$state", city:"$city",loc:"$loc" },pop: { $sum: "$pop" } } },
] )
###通过group聚合操作,再次对state州和城市进行分组排序。
db.zips.aggregate( [
{ $group: { _id: { state: "$state", city:"$city" ,loc:"$loc"}, pop: { $sum: "$pop" } } },
{ $group: { _id:{state:"$_id.state",city:"$_id.city"}}}
] )
###现在可以取平均值了。
db.zips.aggregate( [
{ $group: { _id: { state: "$state", city:"$city",loc:"$loc" }, pop: { $sum: "$pop" } } },
{ $group: { _id:{state:"$_id.state",city:"$_id.city"}, avgcityloc:{$avg:"$pop"}}}
] )
小知识:在mongoshell中,当输出结果集行数过多,默认是显示不完的,需要设置参数指定需要显示的行数,DBQuery.shellBatchSize = 100000,这里表示设置输出结果集显示1万行。
3.按照state、city、区域,返回最大人口的城市和最小的城市是哪两个。
###先根据state和city进行排序,然后对人口进行相加得到每个state州的每个city城市有多少人口。
db.zips.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
}
] )
###使用前面提到的sort函数,对人口进行从小到达的一个排序,这样我们就得到了一个有序的数据。
db.zips.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } }
] )
###此时就要开始二次聚合操作,处理数据了。根据州state,进行分组排序,但是不同的是我们在排序的同时需要获取最大人口的城市和最小人口的城市。因为我们是按照州排序的,就可以通过两个参数,捞到每个州最大人口的城市和最小的城市了,一个是函数$last一个是first。分别是获取文档中最后一条数据和第一条数据。而刚才我们通过sort排序后最大的数据会在最后一行,最小的会在第一行。
db.zips.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } },
{$group:
{
_id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" }
}}
] )
###此时其实已经满足了我们需求,但是为了更好看可以使用 $project函数,它可以修改我们的输出结果。重定向字段。
db.zipcodes.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } },
{ $group:
{
_id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" }
}
},
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
}
}
] )
1.插入一批测试数据
db.members.insertMany( [
{
_id: "jane",
joined: ISODate("2011-03-02"),
likes: ["golf", "racquetball"]
},
{
_id: "joe",
joined: ISODate("2012-07-02"),
likes: ["tennis", "golf", "swimming"]
},
{
_id: "ruth",
joined: ISODate("2012-01-14"),
likes: ["golf", "racquetball"]
},
{
_id: "harold",
joined: ISODate("2012-01-21"),
likes: ["handball", "golf", "racquetball"]
},
{
_id: "kate",
joined: ISODate("2012-01-14"),
likes: ["swimming", "tennis"]
}
] )
2.通过project函数抑制_id字段的输出或只输出_id字段。
###抑制_id字段
db.members.aggregate(
[
{ $project: { _id: 0 } }
]
)
###只输出id字段
db.members.aggregate(
[
{ $project: { _id: 1 } }
]
)
2.对name字段内容进行全部大写处理后再排序输出
db.members.aggregate(
[
{ $project: { name: { $toUpper: "$_id" }, _id: 0 } },
{ $sort: { name: 1 } }
]
)
2.按照月份重定义字段,升序输出。
db.members.aggregate( [
{
$project: {
month_joined: { $month: "$joined" },
name: "$_id",
_id: 0
}
},
{ $sort: { month_joined: 1 } }
] )
3.按照月份排序,并统计相同的月份有多少数据,随后再按照数量进行升序排序
db.members.aggregate( [
{ $project: { month_joined: { $month: "$joined" } } } ,
{ $group: { _id: { month_joined: "$month_joined" } , count: { $sum: 1 } } },
{ $sort: { count: 1 } }
] )
Enterprise test [direct: primary] mongodb_test> db.members.aggregate(
... [
... { $unwind: "$likes" }
... ]
... )
[
{
_id: 'jane',
joined: ISODate('2011-03-02T00:00:00.000Z'),
likes: 'golf'
},
{
_id: 'jane',
joined: ISODate('2011-03-02T00:00:00.000Z'),
likes: 'racquetball'
},
{
_id: 'joe',
joined: ISODate('2012-07-02T00:00:00.000Z'),
likes: 'tennis'
},
{
_id: 'joe',
joined: ISODate('2012-07-02T00:00:00.000Z'),
likes: 'golf'
},
{
_id: 'joe',
joined: ISODate('2012-07-02T00:00:00.000Z'),
likes: 'swimming'
},
{
_id: 'ruth',
joined: ISODate('2012-01-14T00:00:00.000Z'),
likes: 'golf'
},
{
_id: 'ruth',
joined: ISODate('2012-01-14T00:00:00.000Z'),
likes: 'racquetball'
},
{
_id: 'harold',
joined: ISODate('2012-01-21T00:00:00.000Z'),
likes: 'handball'
},
{
_id: 'harold',
joined: ISODate('2012-01-21T00:00:00.000Z'),
likes: 'golf'
},
{
_id: 'harold',
joined: ISODate('2012-01-21T00:00:00.000Z'),
likes: 'racquetball'
},
{
_id: 'kate',
joined: ISODate('2012-01-14T00:00:00.000Z'),
likes: 'swimming'
},
{
_id: 'kate',
joined: ISODate('2012-01-14T00:00:00.000Z'),
likes: 'tennis'
}
]
db.members.aggregate(
[
{ $unwind: "$likes" },
{ $group: { _id: "$likes" , count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 5 }
]
)
...
[
{ _id: 'golf', count: 4 },
{ _id: 'racquetball', count: 3 },
{ _id: 'swimming', count: 2 },
{ _id: 'tennis', count: 2 },
{ _id: 'handball', count: 1 }
]