mongo一些基本语法概念:
- aggregate方法中的本质是一系列的 pipeline, 会按定义的顺序一个一个串联执行,前一个pipeline的结果是后一个pipeline的参数.
- 字段与字段的比较需要使用特殊手段,如
$redact
或$expr
- 时间戳格式化需要用到
$dateToString
,需要注意时区问题.
前置条件:
- mongodb版本: 4.2.1
- collection数据结构: t_wechat_user
{
"_id": "ObjectId(\"5f5f3ef2b53b633689108dfs\")",
"open_id": "o69xlwMzwkhTIlYoFGEWeHzUtles",
"app_id": "wxa82301b25sdf0153",
"subscribe_time": "NumberLong(1578880338000)",
"custom_time": "NumberLong(1638328107017)",
"subject": "游戏原画",
"create_time": "ISODate(\"2020-09-14T09:59:14.955Z\")",
"update_time": "ISODate(\"2021-12-01T05:37:35.962Z\")"
}
custom_time 和 subscribe_time 类型为时间戳毫秒数
正常情况 subscribe_time 一般在 custom_time 之前
但 subscribe_time 也有可能在 custom_time 之后
需求1:
按 subject 统计某个custom_time区间的 custom_time 在 subscribe_time 之后且不超过7天的间隔时间平均值,翻译成sql语言类似下面这样
SELECT subject, avg( custom_time - subscribe_time ) from t_wechat_user
WHERE
custom_time BETWEEN 1635696000000 AND 1638288000000
AND custom_time > subscribe_time
AND ( custom_time - subscribe_time ) < 604800000
GROUP BY subject
对应mongo写法如下
db.t_wechat_user.aggregate([{
$match: {
custom_time: {
"$gte": 1635696000000,
"$lt": 1638288000000
}
}
}, {
$redact: {
$cond: {
if : {
$and:[{"$gt":["$custom_time","$subscribe_time"]}, {"$lt": [{"$subtract": ["$custom_time", 604800000]}, "$subscribe_time"]}]
},
then: "$$KEEP",
else : "$$PRUNE"
}
}
}, {
$project:
{
custom_time: 1,
subject: 1,
"subTime": {
"$subtract": ["$custom_time", "$subscribe_time"]
}
}
}, {
$group: {
_id: "$subject",
myCount: { $sum: 1 },
subTimeAvg: {
$avg: "$subTime"
}
}
}]);
需求2:
按日统计某个custom_time区间的 custom_time 在 subscribe_time 之后且不超过7天的间隔时间平均值,翻译成sql语言类似下面这样
SELECT DATE_FORMAT(custom_time,"%Y-%m-%d") AS day, avg( custom_time - subscribe_time ) from t_wechat_user
WHERE
custom_time BETWEEN 1635696000000 AND 1638288000000
AND custom_time > subscribe_time
AND ( custom_time - subscribe_time ) < 604800000
GROUP BY DATE_FORMAT(custom_time,"%Y-%m-%d")
这里就涉及到如何将时间戳转换成指定的日期格式,需要用到$dateToString
函数,需要注意时区问题
db.t_wechat_user.aggregate([{
$match: {
custom_time: {
"$gte": 1635696000000,
"$lt": 1638288000000
}
}
}, {
$redact: {
$cond: {
if : {
$and:[{"$gt":["$custom_time","$subscribe_time"]}, {"$lt": [{"$subtract": ["$custom_time", 604800000]}, "$subscribe_time"]}]
},
then: "$$KEEP",
else : "$$PRUNE"
}
}
}, {
$project:
{
custom_time: 1,
subject: 1,
"subTime": {
"$subtract": ["$custom_time", "$subscribe_time"]
}
}
}, {
$group: {
_id: { $dateToString: { format: "%Y-%m-%d", date:{$add:[ISODate("1970-01-01T00:00:00Z"),"$custom_time"]},timezone: "+08:00" }},
myCount: { $sum: 1 },
subTimeAvg: {
$avg: "$subTime"
}
}
}]);
需求3:
如果我只是想查询出这些数据,而不是分组统计呢?
SELECT open_id, subject, app_id, subscribe_time, custom_time FROM t_wechat_user
WHERE
custom_time BETWEEN 1635696000000 AND 1638288000000
AND custom_time > subscribe_time
AND ( custom_time - subscribe_time ) < 604800000
相应的mongo写法如下:
db.t_wechat_user.find({
custom_time: {
$gte: 1638288000000,
$lt: 1638892800000
},
$expr: {
$and: [{
"$gt": ["$custom_time", "$subscribe_time"]
}, {
"$lt": [{
"$subtract": ["$custom_time", 604800000]
}, "$subscribe_time"]
}]
}
}, {
open_id: 1,
subject: 1,
app_id: 1,
subscribe_time: 1,
custom_time: 1
});
到这里可能有人已经发现,前面两个需求其实还可以换一种写法,把$redact
这一步可以并入到$match
这个pipeline里面去,写法如下:
db.t_wechat_user.aggregate([{
$match: {
custom_time: {
"$gte": 1635696000000,
"$lt": 1638288000000
},
$expr: {
$and: [{
"$gt": ["$custom_time", "$subscribe_time"]
}, {
"$lt": [{
"$subtract": ["$custom_time", "$subscribe_time"]
}, 604800000]
}]
}
}
}, {
$project: {
custom_time: 1,
subject: 1,
"subTime": {
"$subtract": ["$custom_time", "$subscribe_time"]
}
}
}, {
$group: {
_id: "$subject",
myCount: {
$sum: 1
},
subTimeAvg: {
$avg: "$subTime"
}
}
}]);