mysql分区笔记
内容包含阿里云的数据库内核月报、其他博客和我自己手敲的,不过放心大胆的看,即使不是我写的,我也验证过了
1、mock数据
DELIMITER $$
CREATE PROCEDURE generate_data()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 200000 DO
INSERT INTO `data` (`datetime`,`value`,`channel`) VALUES (
FROM_UNIXTIME(UNIX_TIMESTAMP('2014-01-01 01:00:00')+FLOOR(RAND()*31536000)),
ROUND(RAND()*100,2),
1
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
然后执行 CALL generate_data();
2、执行分区
2.1、按range分区,即制定范围
alter table `data`
PARTITION BY RANGE (id)
(
PARTITION p1 VALUES LESS THAN (50000)
DATA DIRECTORY = '/usr/mysql_data1'
INDEX DIRECTORY = '/usr/mysql_index1',
PARTITION p2 VALUES LESS THAN (100000)
DATA DIRECTORY = '/usr/mysql_data2'
INDEX DIRECTORY = '/usr/mysql_index2',
PARTITION p3 VALUES LESS THAN (120000)
DATA DIRECTORY = '/usr/mysql_data3'
INDEX DIRECTORY = '/usr/mysql_index3',
PARTITION p4 VALUES LESS THAN MAXVALUE
DATA DIRECTORY = '/usr/mysql_data4'
INDEX DIRECTORY = '/usr/mysql_index4'
);
DATA DIRECTORY 数据存储路径
INDEX DIRECTORY 索引存储路径
2.2、按list分区
# 在创建表的时候直接分区
create table t_list(
a int(11),
b int(11)
)(partition by list (b)
partition p0 values in (1,3,5,7,9),
partition p1 values in (2,4,6,8,0)
);
2.3、按hash分区
CREATE TABLE my_member (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
created DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY HASH(id)
PARTITIONS 4;
- HASH分区可以不用指定PARTITIONS子句,如上文中的PARTITIONS 4,则默认分区数为1。
- 不允许只写PARTITIONS,而不指定分区数。
- 同RANGE分区和LIST分区一样,PARTITION BY HASH (expr)子句中的expr返回的必须是整数值。
- HASH分区的底层实现其实是基于MOD函数。譬如,对于下表
2.4、按linear hash分区
CREATE TABLE my_members (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY LINEAR HASH( id )
PARTITIONS 4;
linear hash分区是hash分区的一种特殊类型,与HASH分区是基于MOD函数不同的是,它基于的是另外一种算法。
说明: 它的优点是在数据量大的场景,譬如TB级,增加、删除、合并和拆分分区会更快,缺点是,相对于HASH分区,它数据分布不均匀的概率更大。
2.5、按key分区
CREATE TABLE k1 (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;
KEY分区其实跟HASH分区差不多,不同点如下:
- KEY分区允许多列,而HASH分区只允许一列。
- 如果在有主键或者唯一键的情况下,key中分区列可不指定,默认为主键或者唯一键,如果没有,则必须显性指定列。
- KEY分区对象必须为列,而不能是基于列的表达式。
- KEY分区和HASH分区的算法不一样,PARTITION BY HASH (expr),MOD取值的对象是expr返回的值,而PARTITION BY KEY (column_list),基于的是列的MD5值。
2.6、在range/list基础上按照hash分区
CREATE TABLE users (
uid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(30) NOT NULL DEFAULT '',
email VARCHAR(30) NOT NULL DEFAULT ''
)
PARTITION BY RANGE (uid) SUBPARTITION BY HASH (uid % 4) SUBPARTITIONS 2(
PARTITION p0 VALUES LESS THAN (3000000),
PARTITION p1 VALUES LESS THAN (6000000)
);
抄的,不过验证过了,list雷同
3、针对其他字段分区事件
3.1、针对时间戳分区
CREATE TABLE my_range_timestamp (
id INT,
hiredate TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(hiredate) ) (
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-02 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-03 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-04 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-05 00:00:00') ),
PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-06 00:00:00') ),
PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-07 00:00:00') ),
PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-08 00:00:00') ),
PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-09 00:00:00') ),
PARTITION p9 VALUES LESS THAN ( UNIX_TIMESTAMP('2017-12-10 00:00:00') ),
PARTITION p10 VALUES LESS THAN (UNIX_TIMESTAMP('2017-12-11 00:00:00') )
);
4、删除分区
alter table data drop partition p0;
5、重建分区
alter table data reorganize partition p0, p1 into (partition p0 value less than (20000));
把原本的p0和p1合并起来放到新的p0内