Mongodb数据统计分析:group by、count(distinct userId)

最近在做mongodb数据统计查询,需求:统计一段时间内每天的分享次数和分享人数? 查了众多资料,居然未找到好的方案,最终还是自己写函数解决了,现分享出来(若有好的方法请指出):

  1. 表t_share_log结构(习惯这么叫,虽mongodb中为collection):
字段名 类型 说明
id string 记录id
gmtCreate date 创建时间
userId int 用户id
url string 分享地址

2. 那mongodb中如何写呢?我们先看mysql写法:

select 
      date_format(gmtCreate,'%Y%m%d') day,
      count(1) shareCount,
      count(distinct userId) shareUserCount
   from t_share_log
   where gmtCreate between str_to_date(20161001, '%Y%m%d') and str_to_date(20161231, '%Y%m%d');

那么在mongodb中如何表达呢? 几经调试,先通过定义keyf的function格式化group by的日期值,再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId),最后完整的命令如下:

mongodb命令行:
db.runCommand({group:
    {
        ns:"t_share_log",
        cond : { "$and":[{"gmtCreate":{"$gt":new ISODate("2016-10-01T00:00:00.000Z")}}, {"gmtCreate":{"$lt":new ISODate("2016-12-31T23:59:59.999Z")}}]},
        $keyf:function(doc){
          var myDate = new Date(doc.gmtCreate);
          var mm = '0'+(myDate.getMonth()+1); 
          var dd = '0'+myDate.getDate();
          return {day:myDate.getFullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};
        },
        initial:{"shareCount" : 0 , "shareUserCount" : 0 , "userIdMap" : {}},
        $reduce:function(doc, prev){ 
          if(doc.userId != null){  
            prev.shareCount ++;  
            if(prev.userIdMap[doc.userId] == null) {
              prev.shareUserCount ++;
              prev.userIdMap[doc.userId] = 1;
            }
           }
          },
        finalize: function(doc){ delete doc.userIdMap; }
    }
});

运行以上语句,得到如下结果(正确无误,是不是很有意思呢):

{
    "retval" : [
        {
            "day" : "20161129",
            "shareCount" : 12,
            "shareUserCount" : 4
        },
        {
            "day" : "20161130",
            "shareCount" : 21,
            "shareUserCount" : 9
        }
    ],
    "count" : NumberLong(174),
    "keys" : NumberLong(8),
    "ok" : 1
}

3.最后我们用java来实现,代码如下:

private GroupByResults<Map> queryShareStatistics(String beginTime, String endTime)
    {
        Criteria criteria = null;

        if(!StringUtils.isEmpty(beginTime)) {
            try
            {
                criteria = Criteria.where("gmtCreate").gte(Utils.SDF_FULLTIME_FORMAT.parse(beginTime));
            }
            catch (ParseException e)
            {
                logger.error(e);
            }
        }
        if(!StringUtils.isEmpty(endTime)) {
            try{
                if(criteria == null) {
                     criteria = Criteria.where("gmtCreate").lte(Utils.SDF_FULLTIME_FORMAT.parse(endTime));
                }else {
                    criteria.lte(Utils.SDF_FULLTIME_FORMAT.parse(endTime));
                }
            }
            catch (ParseException e)
            {
                logger.error(e);
            }
        }

        if(criteria == null) {
            criteria = new Criteria();
        }

        //按日统计分享次数与人数
        GroupBy groupBy = GroupBy.keyFunction("function(doc){"

                                + "var myDate = new Date(doc.gmtCreate);"
                                + "          var mm = '0'+(myDate.getMonth()+1); "
                                + "          var dd = '0'+myDate.getDate();"
                                + "          return {day:myDate.getFullYear()+''+mm.substring(mm.length-2)+''+dd.substring(dd.length-2)};"
                                + "}");

        groupBy.initialDocument("{shareCount:0, shareUserCount:0, userIdMap:{}}") 
                        .reduceFunction("function(doc, prev){ "
                                                            + "if(doc.userId != null){"
                                                            + "  prev.shareCount ++;"
                                                            + "  if(prev.userIdMap[doc.userId] == null) { "
                                                            + "      prev.shareUserCount ++; "
                                                            + "      prev.userIdMap[doc.userId] = 1;"
                                                            + "  }"
                                                            + "}"
                                                        + "}")
                       .finalizeFunction("function(prev){delete prev.userIdMap}");

        GroupByResults<Map> r = statisticsTemplate.group(criteria, "t_share_log", groupBy, Map.class);

        return r;
    }

综上所述,先通过定义keyf的function格式化group by的日期值,再通过定义reduce:function函数借助userIdMap去重userId(相当于dintinct userId)。欢迎大家提出更好的办法!

你可能感兴趣的:(nosql,java,mongodb)