备注:测试数据库版本为MySQL 8.0
测试数据准备:
drop table trx_log;
create table trx_log(trx_id int,trx_date timestamp,trx_cnt int);
insert into trx_log values (1,'2020-10-28 19:03:07',44);
insert into trx_log values (2,'2020-10-28 19:03:08',18);
insert into trx_log values (3,'2020-10-28 19:03:09',23);
insert into trx_log values (4,'2020-10-28 19:03:10',29);
insert into trx_log values (5,'2020-10-28 19:03:11',27);
insert into trx_log values (6,'2020-10-28 19:03:12',45);
insert into trx_log values (7,'2020-10-28 19:03:13',45);
insert into trx_log values (8,'2020-10-28 19:03:14',32);
insert into trx_log values (9,'2020-10-28 19:03:15',41);
insert into trx_log values (10,'2020-10-28 19:03:16',15);
insert into trx_log values (11,'2020-10-28 19:03:17',24);
insert into trx_log values (12,'2020-10-28 19:03:18',47);
insert into trx_log values (13,'2020-10-28 19:03:19',37);
insert into trx_log values (14,'2020-10-28 19:03:20',48);
insert into trx_log values (15,'2020-10-28 19:03:21',46);
insert into trx_log values (16,'2020-10-28 19:03:22',44);
insert into trx_log values (17,'2020-10-28 19:03:23',36);
insert into trx_log values (18,'2020-10-28 19:03:24',41);
insert into trx_log values (19,'2020-10-28 19:03:25',33);
insert into trx_log values (20,'2020-10-28 19:03:26',19);
按照某个时间间隔计算数据的和。
例如,有一个事务处理日志,想求得每5秒钟内的总事务数。
表trx_log 总的行如下所示:
mysql> select trx_id,
-> trx_date,
-> trx_cnt
-> from trx_log;
±-------±--------------------±--------+
| trx_id | trx_date | trx_cnt |
±-------±--------------------±--------+
| 1 | 2020-10-28 19:03:07 | 44 |
| 2 | 2020-10-28 19:03:08 | 18 |
| 3 | 2020-10-28 19:03:09 | 23 |
| 4 | 2020-10-28 19:03:10 | 29 |
| 5 | 2020-10-28 19:03:11 | 27 |
| 6 | 2020-10-28 19:03:12 | 45 |
| 7 | 2020-10-28 19:03:13 | 45 |
| 8 | 2020-10-28 19:03:14 | 32 |
| 9 | 2020-10-28 19:03:15 | 41 |
| 10 | 2020-10-28 19:03:16 | 15 |
| 11 | 2020-10-28 19:03:17 | 24 |
| 12 | 2020-10-28 19:03:18 | 47 |
| 13 | 2020-10-28 19:03:19 | 37 |
| 14 | 2020-10-28 19:03:20 | 48 |
| 15 | 2020-10-28 19:03:21 | 46 |
| 16 | 2020-10-28 19:03:22 | 44 |
| 17 | 2020-10-28 19:03:23 | 36 |
| 18 | 2020-10-28 19:03:24 | 41 |
| 19 | 2020-10-28 19:03:25 | 33 |
| 20 | 2020-10-28 19:03:26 | 19 |
±-------±--------------------±--------+
要返回如下结果集:
±-----±--------------------±--------------------±------+
| grp | trx_start | trx_end | total |
±-----±--------------------±--------------------±------+
| 62 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 | 141 |
| 63 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 | 178 |
| 64 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 | 202 |
| 65 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 | 173 |
±-----±--------------------±--------------------±------+
把所有项分组成每5行1桶。
有很多方式可实现这种逻辑分组;本节采用了trx_id除以5的技巧。
一旦创建了"组",就可以使用拘谨函数min、max和sum求起始时间、结束时间及每个"组"的事务处理总数。
select ceil(trx_id/5.0) as grp,
min(trx_date) as trx_start,
max(trx_date) as trx_end,
sum(trx_cnt) as total
from trx_log
group by ceil(trx_id/5.0);
测试记录:
mysql> select ceil(trx_id/5.0) as grp,
-> min(trx_date) as trx_start,
-> max(trx_date) as trx_end,
-> sum(trx_cnt) as total
-> from trx_log
-> group by ceil(trx_id/5.0);
+------+---------------------+---------------------+-------+
| grp | trx_start | trx_end | total |
+------+---------------------+---------------------+-------+
| 1 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 | 141 |
| 2 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 | 178 |
| 3 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 | 202 |
| 4 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 | 173 |
+------+---------------------+---------------------+-------+
4 rows in set (0.00 sec)
那么此时你有疑问了,如果id值没有这么均匀,怎么处理呢?
其实可以根据将 时间转换为数字,除5之后向上取整。
这个例子我采用截取 分和秒转化为数字然后再除5,如时间分布跨度大,可以增加 年月日以及小时。
-- 对时间段进行分组
SELECT trx_id,
trx_date,
trx_cnt,
ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp
from trx_log;
mysql> SELECT trx_id,
-> trx_date,
-> trx_cnt,
-> ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp
-> from trx_log;
+--------+---------------------+---------+------+
| trx_id | trx_date | trx_cnt | grp |
+--------+---------------------+---------+------+
| 1 | 2020-10-28 19:03:07 | 44 | 62 |
| 2 | 2020-10-28 19:03:08 | 18 | 62 |
| 3 | 2020-10-28 19:03:09 | 23 | 62 |
| 4 | 2020-10-28 19:03:10 | 29 | 62 |
| 5 | 2020-10-28 19:03:11 | 27 | 62 |
| 6 | 2020-10-28 19:03:12 | 45 | 63 |
| 7 | 2020-10-28 19:03:13 | 45 | 63 |
| 8 | 2020-10-28 19:03:14 | 32 | 63 |
| 9 | 2020-10-28 19:03:15 | 41 | 63 |
| 10 | 2020-10-28 19:03:16 | 15 | 63 |
| 11 | 2020-10-28 19:03:17 | 24 | 64 |
| 12 | 2020-10-28 19:03:18 | 47 | 64 |
| 13 | 2020-10-28 19:03:19 | 37 | 64 |
| 14 | 2020-10-28 19:03:20 | 48 | 64 |
| 15 | 2020-10-28 19:03:21 | 46 | 64 |
| 16 | 2020-10-28 19:03:22 | 44 | 65 |
| 17 | 2020-10-28 19:03:23 | 36 | 65 |
| 18 | 2020-10-28 19:03:24 | 41 | 65 |
| 19 | 2020-10-28 19:03:25 | 33 | 65 |
| 20 | 2020-10-28 19:03:26 | 19 | 65 |
+--------+---------------------+---------+------+
20 rows in set (0.00 sec)
分组完成后,直接就可以进行聚合运算了
SELECT ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp,
min(trx_date) as trx_start,
max(trx_date) as trx_end,
sum(trx_cnt) as total
from trx_log
group by ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0)
;
测试记录
mysql> SELECT ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0) grp,
-> min(trx_date) as trx_start,
-> max(trx_date) as trx_end,
-> sum(trx_cnt) as total
-> from trx_log
-> group by ceil(DATE_FORMAT(TIMESTAMPADD(second,-1,trx_date),'%i%s')/5.0)
-> ;
+------+---------------------+---------------------+-------+
| grp | trx_start | trx_end | total |
+------+---------------------+---------------------+-------+
| 62 | 2020-10-28 19:03:07 | 2020-10-28 19:03:11 | 141 |
| 63 | 2020-10-28 19:03:12 | 2020-10-28 19:03:16 | 178 |
| 64 | 2020-10-28 19:03:17 | 2020-10-28 19:03:21 | 202 |
| 65 | 2020-10-28 19:03:22 | 2020-10-28 19:03:26 | 173 |
+------+---------------------+---------------------+-------+
4 rows in set (0.00 sec)