MySQL 按时间单位进行分组

备注:测试数据库版本为MySQL 8.0

测试数据准备:

drop table trx_log;

create table trx_log(trx_id int,trx_date timestamp,trx_cnt int);

insert into trx_log values (1,'2020-10-28 19:03:07',44);
insert into trx_log values (2,'2020-10-28 19:03:08',18);
insert into trx_log values (3,'2020-10-28 19:03:09',23);
insert into trx_log values (4,'2020-10-28 19:03:10',29);
insert into trx_log values (5,'2020-10-28 19:03:11',27);
insert into trx_log values (6,'2020-10-28 19:03:12',45);
insert into trx_log values (7,'2020-10-28 19:03:13',45);
insert into trx_log values (8,'2020-10-28 19:03:14',32);
insert into trx_log values (9,'2020-10-28 19:03:15',41);
insert into trx_log values (10,'2020-10-28 19:03:16',15);
insert into trx_log values (11,'2020-10-28 19:03:17',24);
insert into trx_log values (12,'2020-10-28 19:03:18',47);
insert into trx_log values (13,'2020-10-28 19:03:19',37);
insert into trx_log values (14,'2020-10-28 19:03:20',48);
insert into trx_log values (15,'2020-10-28 19:03:21',46);
insert into trx_log values (16,'2020-10-28 19:03:22',44);
insert into trx_log values (17,'2020-10-28 19:03:23',36);
insert into trx_log values (18,'2020-10-28 19:03:24',41);
insert into trx_log values (19,'2020-10-28 19:03:25',33);
insert into trx_log values (20,'2020-10-28 19:03:26',19);


一.需求

按照某个时间间隔计算数据的和。

例如,有一个事务处理日志,想求得每5秒钟内的总事务数。
表trx_log 总的行如下所示:

mysql> select trx_id,
-> trx_date,
-> trx_cnt
-> from trx_log;
±-------±--------------------±--------+
| trx_id | trx_date | trx_cnt |
±-------±--------------------±--------+
| 1 | 2020-10-28 19:03:07 | 44 |
| 2 | 2020-10-28 19:03:08 | 18 |
| 3 | 2020-10-28 19:03:09 | 23 |
| 4 | 2020-10-28 19:03:10 | 29 |
| 5 | 2020-10-28 19:03:11 | 27 |
| 6 | 2020-10-28 19:03:12 | 45 |
| 7 | 2020-10-28 19:03:13 | 45 |
| 8 | 2020-10-28 19:03:14 | 32 |
| 9 | 2020-10-28 19:03:15 | 41 |
| 10 | 2020-10-28 19:03:16 | 15 |
| 11 | 2020-10-28 19:03:17 | 24 |
| 12 | 2020-10-28 19:03:18 | 47 |
| 13 | 2020-10-28 19:03:19 | 37 |
| 14 | 2020-10-28 19:03:20 | 48 |
| 15 | 2020-10-28 19:03:21 | 46 |
| 16 | 2020-10-28 19:03:22 | 44 |
| 17 | 2020-10-28 19:03:23 | 36 |
| 18 | 2020-10-28 19:03:24 | 41 |
| 19 | 2020-10-28 19:03:25 | 33 |
| 20 | 2020-10-28 19:03:26 | 19 |
±-------±--------------------±--------+

要返回如下结果集:
±-----±--------------------±--------------------±------+
| grp | trx_start | trx_end | total |
±-----±--------------------±--------------------±------+
| 62 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 | 141 |
| 63 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 | 178 |
| 64 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 | 202 |
| 65 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 | 173 |
±-----±--------------------±--------------------±------+

二.解决方案

把所有项分组成每5行1桶。
有很多方式可实现这种逻辑分组;本节采用了trx_id除以5的技巧。

一旦创建了"组",就可以使用拘谨函数min、max和sum求起始时间、结束时间及每个"组"的事务处理总数。

select ceil(trx_id/5.0) as grp,
       min(trx_date)    as trx_start,
       max(trx_date)    as trx_end,
       sum(trx_cnt)     as total
  from trx_log
 group by ceil(trx_id/5.0);

测试记录:

mysql> select ceil(trx_id/5.0) as grp,
    ->        min(trx_date)    as trx_start,
    ->        max(trx_date)    as trx_end,
    ->        sum(trx_cnt)     as total
    ->   from trx_log
    ->  group by ceil(trx_id/5.0);
+------+---------------------+---------------------+-------+
| grp  | trx_start           | trx_end             | total |
+------+---------------------+---------------------+-------+
|    1 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 |   141 |
|    2 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 |   178 |
|    3 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 |   202 |
|    4 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 |   173 |
+------+---------------------+---------------------+-------+
4 rows in set (0.00 sec)

那么此时你有疑问了,如果id值没有这么均匀,怎么处理呢?
其实可以根据将 时间转换为数字,除5之后向上取整。
这个例子我采用截取 分和秒转化为数字然后再除5,如时间分布跨度大,可以增加 年月日以及小时。

-- 对时间段进行分组
SELECT trx_id,
       trx_date,
       trx_cnt,
       ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp
from trx_log;

mysql> SELECT trx_id,
    ->        trx_date,
    ->        trx_cnt,
    ->        ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp
    -> from trx_log;
+--------+---------------------+---------+------+
| trx_id | trx_date            | trx_cnt | grp  |
+--------+---------------------+---------+------+
|      1 | 2020-10-28 19:03:07 |      44 |   62 |
|      2 | 2020-10-28 19:03:08 |      18 |   62 |
|      3 | 2020-10-28 19:03:09 |      23 |   62 |
|      4 | 2020-10-28 19:03:10 |      29 |   62 |
|      5 | 2020-10-28 19:03:11 |      27 |   62 |
|      6 | 2020-10-28 19:03:12 |      45 |   63 |
|      7 | 2020-10-28 19:03:13 |      45 |   63 |
|      8 | 2020-10-28 19:03:14 |      32 |   63 |
|      9 | 2020-10-28 19:03:15 |      41 |   63 |
|     10 | 2020-10-28 19:03:16 |      15 |   63 |
|     11 | 2020-10-28 19:03:17 |      24 |   64 |
|     12 | 2020-10-28 19:03:18 |      47 |   64 |
|     13 | 2020-10-28 19:03:19 |      37 |   64 |
|     14 | 2020-10-28 19:03:20 |      48 |   64 |
|     15 | 2020-10-28 19:03:21 |      46 |   64 |
|     16 | 2020-10-28 19:03:22 |      44 |   65 |
|     17 | 2020-10-28 19:03:23 |      36 |   65 |
|     18 | 2020-10-28 19:03:24 |      41 |   65 |
|     19 | 2020-10-28 19:03:25 |      33 |   65 |
|     20 | 2020-10-28 19:03:26 |      19 |   65 |
+--------+---------------------+---------+------+
20 rows in set (0.00 sec)

分组完成后,直接就可以进行聚合运算了

SELECT ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp,
       min(trx_date)    as trx_start,
       max(trx_date)    as trx_end,
       sum(trx_cnt)     as total
from trx_log
group by ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0)
;

测试记录

mysql> SELECT ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp,
    ->        min(trx_date)    as trx_start,
    ->        max(trx_date)    as trx_end,
    ->        sum(trx_cnt)     as total
    -> from trx_log
    -> group by ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0)
    -> ;
+------+---------------------+---------------------+-------+
| grp  | trx_start           | trx_end             | total |
+------+---------------------+---------------------+-------+
|   62 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 |   141 |
|   63 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 |   178 |
|   64 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 |   202 |
|   65 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 |   173 |
+------+---------------------+---------------------+-------+
4 rows in set (0.00 sec)

你可能感兴趣的:(MySQL从小工到专家之路,#,MySQL,CookBook,mysql,分组,时间单位分组,sql,数据分析)