最近在做mongodb数据统计查询,需求:统计一段时间内每天的分享次数和分享人数? 查了众多资料,居然未找到好的方案,最终还是自己写函数解决了,现分享出来(若有好的方法请指出):
字段名 | 类型 | 说明 |
---|---|---|
id | string | 记录id |
gmtCreate | date | 创建时间 |
userId | int | 用户id |
url | string | 分享地址 |
2. 那mongodb中如何写呢?我们先看mysql写法:
select
date_format(gmtCreate,'%Y%m%d') day,
count(1) shareCount,
count(distinct userId) shareUserCount
from t_share_log
where gmtCreate between str_to_date(20161001, '%Y%m%d') and str_to_date(20161231, '%Y%m%d');
那么在mongodb中如何表达呢? 几经调试,先通过定义keyf的function格式化group by的日期值,再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId),最后完整的命令如下:
mongodb命令行:
db.runCommand({group:
{
ns:"t_share_log",
cond : { "$and":[{"gmtCreate":{"$gt":new ISODate("2016-10-01T00:00:00.000Z")}}, {"gmtCreate":{"$lt":new ISODate("2016-12-31T23:59:59.999Z")}}]},
$keyf:function(doc){
var myDate = new Date(doc.gmtCreate);
var mm = '0'+(myDate.getMonth()+1);
var dd = '0'+myDate.getDate();
return {day:myDate.getFullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};
},
initial:{"shareCount" : 0 , "shareUserCount" : 0 , "userIdMap" : {}},
$reduce:function(doc, prev){
if(doc.userId != null){
prev.shareCount ++;
if(prev.userIdMap[doc.userId] == null) {
prev.shareUserCount ++;
prev.userIdMap[doc.userId] = 1;
}
}
},
finalize: function(doc){ delete doc.userIdMap; }
}
});
运行以上语句,得到如下结果(正确无误,是不是很有意思呢):
{
"retval" : [
{
"day" : "20161129",
"shareCount" : 12,
"shareUserCount" : 4
},
{
"day" : "20161130",
"shareCount" : 21,
"shareUserCount" : 9
}
],
"count" : NumberLong(174),
"keys" : NumberLong(8),
"ok" : 1
}
3.最后我们用java来实现,代码如下:
private GroupByResults<Map> queryShareStatistics(String beginTime, String endTime)
{
Criteria criteria = null;
if(!StringUtils.isEmpty(beginTime)) {
try
{
criteria = Criteria.where("gmtCreate").gte(Utils.SDF_FULLTIME_FORMAT.parse(beginTime));
}
catch (ParseException e)
{
logger.error(e);
}
}
if(!StringUtils.isEmpty(endTime)) {
try{
if(criteria == null) {
criteria = Criteria.where("gmtCreate").lte(Utils.SDF_FULLTIME_FORMAT.parse(endTime));
}else {
criteria.lte(Utils.SDF_FULLTIME_FORMAT.parse(endTime));
}
}
catch (ParseException e)
{
logger.error(e);
}
}
if(criteria == null) {
criteria = new Criteria();
}
//按日统计分享次数与人数
GroupBy groupBy = GroupBy.keyFunction("function(doc){"
+ "var myDate = new Date(doc.gmtCreate);"
+ " var mm = '0'+(myDate.getMonth()+1); "
+ " var dd = '0'+myDate.getDate();"
+ " return {day:myDate.getFullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};"
+ "}");
groupBy.initialDocument("{shareCount:0, shareUserCount:0, userIdMap:{}}")
.reduceFunction("function(doc, prev){ "
+ "if(doc.userId != null){"
+ " prev.shareCount ++;"
+ " if(prev.userIdMap[doc.userId] == null) { "
+ " prev.shareUserCount ++; "
+ " prev.userIdMap[doc.userId] = 1;"
+ " }"
+ "}"
+ "}")
.finalizeFunction("function(prev){delete prev.userIdMap}");
GroupByResults<Map> r = statisticsTemplate.group(criteria, "t_share_log", groupBy, Map.class);
return r;
}
综上所述,先通过定义keyf的function格式化group by的日期值,再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId)。欢迎大家提出更好的办法!