连续聚合是按时间顺序对有序数据时行聚合的操作。它有几个变体:
累积聚合,滑动聚合,年初至今聚合
累积聚合:聚合从序列内第一个元素到当前元素的数据,如为每个员工返回每月开始到现在累积的订单数量和平均订单数。
因为“内幕”没有透露完整数据表,所以只有部分:
mysql> select * from EmpOrder;
+-------+------------+-----+
| empid | ordermonth | qty |
+-------+------------+-----+
| 1 | 1996-07 | 121 |
| 1 | 1996-08 | 121 |
| 1 | 1996-09 | 121 |
| 1 | 1996-10 | 121 |
| 1 | 1996-11 | 121 |
| 1 | 1996-12 | 121 |
| 2 | 1996-07 | 121 |
| 2 | 1996-08 | 121 |
| 2 | 1996-09 | 121 |
| 2 | 1996-10 | 121 |
| 2 | 1996-11 | 121 |
+-------+------------+-----+
11 rows in set
SQL语句可以这样写:
mysql> select a.empid,a.ordermonth,a.qty as thismonth,sum(b.qty) as total1,cast(avg(b.qty) as decimal(5,2)) as avg
from
EmpOrder a
inner join
EmpOrder b
on a.empid = b.empid and b.ordermonth <= a.ordermonth
group by a.empid,a.ordermonth,a.qty
order by a.empid,a.ordermonth;
+-------+------------+-----------+--------+--------+
| empid | ordermonth | thismonth | total1 | avg |
+-------+------------+-----------+--------+--------+
| 1 | 1996-07 | 121 | 121 | 121 |
| 1 | 1996-08 | 247 | 368 | 184 |
| 1 | 1996-09 | 255 | 623 | 207.67 |
| 1 | 1996-10 | 143 | 766 | 191.5 |
| 1 | 1996-11 | 318 | 1084 | 216.8 |
| 1 | 1996-12 | 536 | 1620 | 270 |
| 2 | 1996-07 | 50 | 50 | 50 |
| 2 | 1996-08 | 94 | 144 | 72 |
| 2 | 1996-09 | 137 | 281 | 93.67 |
| 2 | 1996-10 | 248 | 529 | 132.25 |
| 2 | 1996-11 | 237 | 766 | 153.2 |
+-------+------------+-----------+--------+--------+
11 rows in set
刚才的写法是联接查询,再用子查询:
select empid,ordermonth,qty as thismonth,
(select sum(b.qty) from EmpOrder as b WHERE a.empid = b.empid and b.ordermonth <= a.ordermonth) as total,
cast((select avg(b.qty)
from EmpOrder as b
WHERE a.empid = b.empid
and b.ordermonth <= a.ordermonth) as DECIMAL(5,2)) as avg
from EmpOrder a
GROUP BY a.empid,a.ordermonth;
结果是一样的,如果数据量大的话,联接查询的效率是要高于子查询的。因为在要求进行聚合时,子查询可能需要为每个聚合扫描一次数据,而联接方法通常只需扫描一冷色数据就可以
滑动聚合:按顺序对滑动窗口范围内的数据进行聚合的操作,它不再是计算从开始到当前位置的数据,比如要统计最近三个月中员工每月的订单情形。
它与累计聚合实现的差异在于:滑动聚合里的每二个联接表里的b.ordermonth大于前三个月的月份,并且小于当前月份,别的地方不用很大的改变。
SQL语句省着写(因为表的数据不太准确,所以滑动聚合与年初至今聚合的查询结果也有点问题,但SQL语句的写法是没问题的):
........
ON a.empid = b.empid
AND b.ordermonth > date_add(a.ordermonth,interval -3 month) (多了这句)
AND b.ordermonth <= a.ordermonth
......
年初至今滑动:它与滑动聚合唯一的区别体现在开始位置上。年初的下限为该年的第一天,而滑动的下限为前N个月的第一天。
SQL语句同样省着写(因为表的数据不太准确,所以滑动聚合与年初至今聚合的查询结果也有点问题,但SQL语句的写法是没问题的):
.......
ON a.empid = b.empid
AND b.ordermonth >= date_format(a.ordermonth,"Y%-01-01") (这句变了)
AND b.ordermonth <= a.ordermonth
......