在数据仓库中,经常会用历史数据和时间维度做数据分析。而保存历史数据最常见的方案是使用拉链表进行存储。
首先创建测试表:
create table deal_order
(
order_id varchar(20) comment '订单ID',
order_updatetime date comment '订单更新时间',
order_status_cn varchar(50) comment '订单更新状态'
);
插入初始数据:
insert into deal_order values('100001',date_format('2017-05-07 12:23:18','%Y-%m-%d %H:%i:%S'),'订单创建');
insert into deal_order values('100002',date_format('2017-05-07 16:48:05','%Y-%m-%d %H:%i:%S'),'订单创建');
insert into deal_order values('100003',date_format('2017-05-07 20:47:25','%Y-%m-%d %H:%i:%S'),'支付完成');
假设当前时间为2017年5月7日,今天的订单情况如下:
mysql> select * from deal_order;
+----------+------------------+-----------------+
| order_id | order_updatetime | order_status_cn |
+----------+------------------+-----------------+
| 100001 | 2017-05-07 | 订单创建 |
| 100002 | 2017-05-07 | 订单创建 |
| 100003 | 2017-05-07 | 支付完成 |
+----------+------------------+-----------------+
到了2017年5月8日,部分订单的订单状态发生了改变,并且有了新的订单:
update deal_order
set order_updatetime = date_format('2017-05-08 09:12:46', '%Y-%m-%d %H:%i:%S'),order_status_cn = '等待支付'
where order_id = '100001';
update deal_order
set order_updatetime = date_format('2017-05-08 13:25:15', '%Y-%m-%d %H:%i:%S'),order_status_cn = '已发货'
where order_id = '100003';
insert into deal_order values('100004',date_format('2017-05-08 16:11:48','%Y-%m-%d %H:%i:%S'),'等待发货');
insert into deal_order values('100005',date_format('2017-05-08 18:23:31','%Y-%m-%d %H:%i:%S'),'等待发货');
当前状态如下:
mysql> select * from deal_order;
+----------+------------------+-----------------+
| order_id | order_updatetime | order_status_cn |
+----------+------------------+-----------------+
| 100001 | 2017-05-08 | 等待支付 |
| 100002 | 2017-05-07 | 订单创建 |
| 100003 | 2017-05-08 | 已发货 |
| 100004 | 2017-05-08 | 等待发货 |
| 100005 | 2017-05-08 | 等待发货 |
+----------+------------------+-----------------+
5 rows in set
到了2017年5月9日,部分订单再次发生了改变,并且有了新的订单:
update deal_order
set order_updatetime = date_format('2017-05-09 11:28:35', '%Y-%m-%d %H:%i:%S'),order_status_cn = '订单关闭'
where order_id = '100002';
update deal_order
set order_updatetime = date_format('2017-05-09 14:48:48', '%Y-%m-%d %H:%i:%S'),order_status_cn = '交易完成'
where order_id = '100003';
update deal_order
set order_updatetime = date_format('2017-05-09 17:09:34', '%Y-%m-%d %H:%i:%S'),order_status_cn = '支付成功'
where order_id = '100005';
insert into deal_order values('100006',date_format('2017-05-09 19:49:55','%Y-%m-%d %H:%i:%S'),'等待发货');
当前状态如下:
mysql> select * from deal_order;
+----------+------------------+-----------------+
| order_id | order_updatetime | order_status_cn |
+----------+------------------+-----------------+
| 100001 | 2017-05-08 | 等待支付 |
| 100002 | 2017-05-09 | 订单关闭 |
| 100003 | 2017-05-09 | 交易完成 |
| 100004 | 2017-05-08 | 等待发货 |
| 100005 | 2017-05-09 | 支付成功 |
| 100006 | 2017-05-09 | 等待发货 |
+----------+------------------+-----------------+
对于上述表来说,由于没有保存历史数据,因此查询“100001”订单的状态历史就查询不到。如果使用历史拉链表来存储以上信息,则存储内容如下:
create table deal_order_zipper_tb
(
order_id varchar(20) comment '订单ID',
order_updatetime date comment '订单更新时间',
order_status_cn varchar(50) comment '订单当前状态',
order_start_time date comment '该记录生命周期开始时间(包括该时间点)',
order_end_time date comment '该记录生命周期结束时间(不包括该时间点)'
);
历史拉链表中,增加了记录的生命周期开始时间和结束时间,一条记录的生命周期为 [生命周期开始时间,生命周期结束时间)。
将原来的数据导入到历史拉链表,最终结果如下:
insert into deal_order_zipper_tb values('100001',date_format('2017-05-07 12:23:18','%Y-%m-%d %H:%i:%S'),'订单创建',date_format('2017-05-07 12:23:18','%Y-%m-%d %H:%i:%S'),date_format('2017-05-08 09:12:46','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100001',date_format('2017-05-07 12:23:18','%Y-%m-%d %H:%i:%S'),'等待支付',date_format('2017-05-08 09:12:46','%Y-%m-%d %H:%i:%S'),date_format('9999-09-09 23:59:59','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100002',date_format('2017-05-07 16:48:05','%Y-%m-%d %H:%i:%S'),'订单创建',date_format('2017-05-07 16:48:05','%Y-%m-%d %H:%i:%S'),date_format('2017-05-09 11:28:35','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100002',date_format('2017-05-07 16:48:05','%Y-%m-%d %H:%i:%S'),'订单关闭',date_format('2017-05-09 11:28:35','%Y-%m-%d %H:%i:%S'),date_format('9999-09-09 23:59:59','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100003',date_format('2017-05-07 20:47:25','%Y-%m-%d %H:%i:%S'),'支付完成',date_format('2017-05-07 20:47:25','%Y-%m-%d %H:%i:%S'),date_format('2017-05-08 13:25:15', '%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100003',date_format('2017-05-07 20:47:25','%Y-%m-%d %H:%i:%S'),'已发货',date_format('2017-05-08 13:25:15', '%Y-%m-%d %H:%i:%S'),date_format('2017-05-09 14:48:48', '%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100003',date_format('2017-05-07 20:47:25','%Y-%m-%d %H:%i:%S'),'交易完成',date_format('2017-05-09 14:48:48', '%Y-%m-%d %H:%i:%S'),date_format('9999-09-09 23:59:59','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100004',date_format('2017-05-08 16:11:48','%Y-%m-%d %H:%i:%S'),'等待发货',date_format('2017-05-08 16:11:48','%Y-%m-%d %H:%i:%S'),date_format('9999-09-09 23:59:59','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100005',date_format('2017-05-08 18:23:31','%Y-%m-%d %H:%i:%S'),'等待发货',date_format('2017-05-08 18:23:31','%Y-%m-%d %H:%i:%S'),date_format('2017-05-09 17:09:34', '%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100005',date_format('2017-05-08 18:23:31','%Y-%m-%d %H:%i:%S'),'支付成功',date_format('2017-05-09 17:09:34', '%Y-%m-%d %H:%i:%S'),date_format('9999-09-09 23:59:59','%Y-%m-%d %H:%i:%S'));
insert into deal_order_zipper_tb values('100006',date_format('2017-05-09 19:49:55','%Y-%m-%d %H:%i:%S'),'等待发货',date_format('2017-05-09 19:49:55','%Y-%m-%d %H:%i:%S'),date_format('9999-09-09 23:59:59','%Y-%m-%d %H:%i:%S'));
mysql> select * from deal_order_zipper_tb;
+----------+------------------+-----------------+------------------+----------------+
| order_id | order_updatetime | order_status_cn | order_start_time | order_end_time |
+----------+------------------+-----------------+------------------+----------------+
| 100002 | 2017-05-07 | 订单创建 | 2017-05-07 | 2017-05-09 |
| 100001 | 2017-05-07 | 订单创建 | 2017-05-07 | 2017-05-08 |
| 100001 | 2017-05-07 | 等待支付 | 2017-05-08 | 9999-09-09 |
| 100002 | 2017-05-07 | 订单创建 | 2017-05-07 | 2017-05-09 |
| 100002 | 2017-05-07 | 订单关闭 | 2017-05-09 | 9999-09-09 |
| 100003 | 2017-05-07 | 支付完成 | 2017-05-07 | 2017-05-08 |
| 100003 | 2017-05-07 | 已发货 | 2017-05-08 | 2017-05-09 |
| 100003 | 2017-05-07 | 交易完成 | 2017-05-09 | 9999-09-09 |
| 100004 | 2017-05-08 | 等待发货 | 2017-05-08 | 9999-09-09 |
| 100005 | 2017-05-08 | 等待发货 | 2017-05-08 | 2017-05-09 |
| 100005 | 2017-05-08 | 支付成功 | 2017-05-09 | 9999-09-09 |
| 100006 | 2017-05-09 | 等待发货 | 2017-05-09 | 9999-09-09 |
+----------+------------------+-----------------+------------------+----------------+
12 rows in set
如果需要查询2017年5月8日的历史快照,可以直接写出SQL:
mysql> select *
from deal_order_zipper_tb d
where d.order_start_time < date_format('2017-05-09', '%Y-%m-%d')
and d.order_end_time >= date_format('2017-05-09', '%Y-%m-%d');
+----------+------------------+-----------------+------------------+----------------+
| order_id | order_updatetime | order_status_cn | order_start_time | order_end_time |
+----------+------------------+-----------------+------------------+----------------+
| 100002 | 2017-05-07 | 订单创建 | 2017-05-07 | 2017-05-09 |
| 100001 | 2017-05-07 | 等待支付 | 2017-05-08 | 9999-09-09 |
| 100002 | 2017-05-07 | 订单创建 | 2017-05-07 | 2017-05-09 |
| 100003 | 2017-05-07 | 已发货 | 2017-05-08 | 2017-05-09 |
| 100004 | 2017-05-08 | 等待发货 | 2017-05-08 | 9999-09-09 |
| 100005 | 2017-05-08 | 等待发货 | 2017-05-08 | 2017-05-09 |
+----------+------------------+-----------------+------------------+----------------+
6 rows in set
该SQL的查询结果与之前2017-05-08的数据一致:
mysql> select * from deal_order;
+----------+------------------+-----------------+
| order_id | order_updatetime | order_status_cn |
+----------+------------------+-----------------+
| 100001 | 2017-05-08 | 等待支付 |
| 100002 | 2017-05-07 | 订单创建 |
| 100003 | 2017-05-08 | 已发货 |
| 100004 | 2017-05-08 | 等待发货 |
| 100005 | 2017-05-08 | 等待发货 |
+----------+------------------+-----------------+
5 rows in set