本文总结个这段时间研究MySQL水平分区表总结,列举分区表的相关操作和通过实际数据对分区表读写的性能比较.
在网上看了很多文章,都太过于概念,注意集中在介绍分区表的优点,而不注重时间操作,跟大学教授论文似的,唯一由于的一片文章和大家分享一下吧http://fanqiang.chinaunix.net/db/mysql/2006-05-08/4135.shtml.
<!--[if !supportLists]-->1.<!--[endif]-->创建分区表:
CREATE TABLE `表名` (
`EQUIPMENTID`char(17) NOT NULL,
`ATTRIBUTEID`char(4) NOT NULL,
`VALUE`varchar(20) NOT NULL,
`COLLECTTIME`datetime NOT NULL
) ENGINE=InnoDB(适用大部分引擎,可根据需要调整) DEFAULT CHARSET=latin1
PARTITION BY RANGE(to_days(`时间字段名`))
(PARTITION pminVALUES LESS THAN (to_days('2010-01-01')),
PARTITION p201001VALUES LESS THAN (to_days('2010-02-01')) ,
PARTITION p201002VALUES LESS THAN (to_days('2010-03-01')) ,
PARTITION p201003VALUES LESS THAN (to_days('2010-04-01')) ,
PARTITION p201004VALUES LESS THAN (to_days('2010-05-01')) ,
PARTITION p201005VALUES LESS THAN (to_days('2010-06-01')) ,
PARTITION p201006VALUES LESS THAN (to_days('2010-07-01')) ,
PARTITION p201007VALUES LESS THAN (to_days('2010-08-01')) ,
PARTITION p201008VALUES LESS THAN (to_days('2010-09-01')) ,
PARTITION p201009VALUES LESS THAN (to_days('2010-10-01')) ,
PARTITION p201010VALUES LESS THAN (to_days('2010-11-01')),
PARTITION p201011VALUES LESS THAN (to_days('2010-12-01')),
PARTITION p201012VALUES LESS THAN (to_days('2011-01-01')),
PARTITION pmax VALUESLESS THAN MAXVALUE );
<!--[if !supportLists]-->2.<!--[endif]-->为现有表创建分区:
alter table 表名
PARTITION BY RANGE(to_days(`时间字段名`))
(PARTITION pminVALUES LESS THAN (to_days('2010-01-01')),
PARTITION p201001VALUES LESS THAN (to_days('2010-02-01')) ,
PARTITION p201002VALUES LESS THAN (to_days('2010-03-01')) ,
PARTITION p201003VALUES LESS THAN (to_days('2010-04-01')) ,
PARTITION p201004 VALUESLESS THAN (to_days('2010-05-01')) ,
PARTITION p201005VALUES LESS THAN (to_days('2010-06-01')) ,
PARTITION p201006VALUES LESS THAN (to_days('2010-07-01')) ,
PARTITION p201007VALUES LESS THAN (to_days('2010-08-01')) ,
PARTITION p201008VALUES LESS THAN (to_days('2010-09-01')) ,
PARTITION p201009VALUES LESS THAN (to_days('2010-10-01')) ,
PARTITION p201010VALUES LESS THAN (to_days('2010-11-01')),
PARTITION p201011VALUES LESS THAN (to_days('2010-12-01')),
PARTITION p201012VALUES LESS THAN (to_days('2011-01-01')),
PARTITION pmax VALUESLESS THAN MAXVALUE );
<!--[if !supportLists]-->3.<!--[endif]-->删除表中的指定分区(删除分区会导致分区数据丢失,建议先备份):
ALTERTABLE 表名DROP PARTITION p0;
<!--[if !supportLists]-->4.<!--[endif]-->追加表分区
需要先删除MAXVALUE分区后增加分区后再重建MAXVALUE分区,删除前需要先备份MAXVALUE分区数据.
ALTER TABLE 表名 DROPPARTITION pmax;
ALTER TABLE表名
ADD PARTITION (
PARTITION p201201VALUES LESS THAN (to_days('2012-2-1')),
PARTITION pmax VALUESLESS THAN MAXVALUE);
<!--[if !supportLists]-->5.<!--[endif]-->查看标分区信息
SELECT
partition_namepart,
partition_expressionexpr,
partition_descriptiondescr,
table_rows
FROM
INFORMATION_SCHEMA.partitions
WHERE
TABLE_SCHEMA= schema()
ANDTABLE_NAME='表名';
<!--[if !supportLists]-->6.<!--[endif]-->查看查询语句涉及分区信息
explainpartitions
select …from 表名 where …;
<!--[if !supportLists]-->1.<!--[endif]-->测试环境
CPU: Intel 奔腾双核 E5300
硬盘: 西数(320GB/7200/16M 蓝盘)
内存: 南亚易胜 DDR2 800MHz 1GB + 三星 DDR2 800MHz 1GB
操作系统:Windows XP
MySQL版本: 5.1.57(5.1+版本支持分区表)
<!--[if !supportLists]-->2.<!--[endif]-->表信息
表结构:
名 |
类型 |
长度 |
|
EQUIPMENTID |
char |
17 |
主键1 |
ATTRIBUTEID |
char |
4 |
主键2 |
VALUE |
varchar |
20 |
|
COLLECTTIME |
datetime |
|
主键3 |
总记录数:580W
分区信息(红色为主要测试区域):
part |
expr |
descr |
table_rows |
pmin |
to_days(COLLECTTIME) |
734138 |
2686 |
p201001 |
to_days(COLLECTTIME) |
734169 |
2511883 |
p201002 |
to_days(COLLECTTIME) |
734197 |
192497 |
p201003 |
to_days(COLLECTTIME) |
734228 |
811103 |
p201004 |
to_days(COLLECTTIME) |
734258 |
82894 |
p201005 |
to_days(COLLECTTIME) |
734289 |
109297 |
p201006 |
to_days(COLLECTTIME) |
734319 |
555065 |
p201007 |
to_days(COLLECTTIME) |
734350 |
742949 |
p201008 |
to_days(COLLECTTIME) |
734381 |
525900 |
p201009 |
to_days(COLLECTTIME) |
734411 |
89 |
p201010 |
to_days(COLLECTTIME) |
734442 |
71665 |
p201011 |
to_days(COLLECTTIME) |
734472 |
85964 |
p201012 |
to_days(COLLECTTIME) |
734503 |
1612 |
p201101 |
to_days(COLLECTTIME) |
734534 |
176 |
p201102 |
to_days(COLLECTTIME) |
734562 |
253 |
p201103 |
to_days(COLLECTTIME) |
734593 |
44824 |
p201104 |
to_days(COLLECTTIME) |
734623 |
62324 |
p201105 |
to_days(COLLECTTIME) |
734654 |
50658 |
p201106 |
to_days(COLLECTTIME) |
734684 |
0 |
p201107 |
to_days(COLLECTTIME) |
734715 |
0 |
p201108 |
to_days(COLLECTTIME) |
734746 |
0 |
p201109 |
to_days(COLLECTTIME) |
734776 |
0 |
p201110 |
to_days(COLLECTTIME) |
734807 |
0 |
p201111 |
to_days(COLLECTTIME) |
734837 |
0 |
p201112 |
to_days(COLLECTTIME) |
734868 |
0 |
p201201 |
to_days(COLLECTTIME) |
734899 |
0 |
p201202 |
to_days(COLLECTTIME) |
734928 |
0 |
pmax |
to_days(COLLECTTIME) |
MAXVALUE |
921 |
<!--[if !supportLists]-->3.<!--[endif]-->查询效率对比
对比表:无分区表名nopart_data,有分区表名part_data
查询条件:select count(*) from 表名 where COLLECTTIME > 起始时间 and COLLECTTIME < 终止时间
查询耗时按照3次平均值统计
统计表:
开始时间 |
结束时间 |
查询结果 |
无分区耗时 |
有分区耗时 |
涉及分区 |
全部 |
5848859 |
6.26s |
9.58s |
全部 |
|
2010-5-1 |
2010-6-1 |
109086 |
7.04s |
0.48s |
pmin,p201005 |
2010-6-1 |
2010-7-1 |
554695 |
8.34s |
0.38s |
pmin,p201006 |
2010-7-1 |
2010-8-1 |
742565 |
7.57s |
0.43s |
pmin,p201007 |
2010-5-1 |
2010-7-1 |
663781 |
7.07s |
0.51s |
pmin,p201005,p201006 |
2010-6-1 |
2010-8-1 |
1297260 |
6.84s |
1.93s |
pmin,p201006,p201007 |
2010-5-1 |
2010-8-1 |
1406346 |
6.97s |
2.30s |
pmin,p201006,p201007,p201008 |
小结:
<!--[if !supportLists]-->1)分区表查询在查询上有明显优势.但在跨区查询时会有查询时间消耗,因此需要注意分区的疏密程度.
<!--[if !supportLists]-->2)每次查询都会查询pmin(第一个分区),因此需要尽量减少这个分区的数据.
<!--[if !supportLists]-->4.<!--[endif]-->写入数据效率对比
COLLECTTIME |
无分区耗时 |
有分区耗时 |
2010-5-22 15:36 |
0.05s |
0.03s |
2010-6-22 15:36 |
0.02s |
0.05s |
2010-7-22 15:36 |
0.03s |
0.03s |
小结:
1) 分区对单条数据的插入操作无较大影响.
以上是我对MySQL的初体验总结,没啥心得体会,只有一点点成就感,希望和大家分享.
另外分区表尚存在问题:
1,是否可将分区表设置在不同硬盘,innodb可行?
2,是否可根据多条件进行水平分区,类似group by 列1,列2...
3,是否能将分区设置成不同引擎,例如当前使用中的分区为innodb,老的分区使用MyISAM.