数据库分区Partitioning, 2022-07-16

(2022.07.16 Sat)

数据库特别是MySQL的分区(partitioning)是一种数据库设计技术,用于提升数据库性能、可管理性,并简化维护和降低大规模数据的存储成本。分区允许表、索引、索引表(index-organised table)被细分成更小的块,查询只需要在满足查询条件的小块数据上进行,因此提升了效率。

分区是指将一个表的行(rows)分成不同的、独立的表,并保存在不同的地方,即不同磁盘上,但在逻辑上,分区之后的表仍然是一个表,拥有同样的名字。分区之后的数据根据我们预先指定的规则分布在一个文件系统中,而分区规则称为分区函数(partitioning function)。MySQL 8.0仅支持在InnoDB和NDB引擎中实现分区,其他引擎如MyISAM,MERGE,CSV和FEDERATED都不支持。

注意这里所说的分区和分表的差别。分区之后的表在逻辑上仍然是一张表,只是存储在不同的位置,而分表之后的每个表都是独立的。分区是对一张表的“行”做分割,不同的行进入不同的区,而分表是根据列做分割。

分区的进行可以有在创建表(CREATE TABLE)时完成,也可以对一个表做ALTER操作完成。

分区方法

MySQL中存在6种分区方法:

  • RANGE Partitioning
  • LIST Partitioning
  • COLUMNS Partitioning
  • HASH Partitioning
  • KEY Partitioning
  • Subpartitioning

RANGE分区

根据字段的值范围对数据做分区,

CREATE TABLE Sales (cust_id INT NOT NULL, 
name VARCHAR(40),   
store_id VARCHAR(20) NOT NULL, 
bill_no INT NOT NULL,
bill_date DATE PRIMARY KEY NOT NULL, 
amount DECIMAL(8,2) NOT NULL)   
PARTITION BY RANGE (year(bill_date)) (   
PARTITION p0 VALUES LESS THAN (2016),   
PARTITION p1 VALUES LESS THAN (2017),   
PARTITION p2 VALUES LESS THAN (2018),   
PARTITION p3 VALUES LESS THAN (2020));  

如果在创建数据表之后加入分区,则结合ALTER指令执行

ALTER TABLE Sales  PARTITION BY RANGE (year(bill_date)) 
(
  PARTITION p0 VALUES LESS THAN (2016),   
  PARTITION p1 VALUES LESS THAN (2017),   
  PARTITION p2 VALUES LESS THAN (2018),   
  PARTITION p3 VALUES LESS THAN (2020))
);

在添加数据后,可用下面指令查看不同的数据处在哪个partition中

SELECT TABLE_NAME, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH  
FROM INFORMATION_SCHEMA.PARTITIONS  
WHERE TABLE_SCHEMA = 'myemployeedb' AND TABLE_NAME = 'Sales';  

删除分区

ALTER TABLE Sales TRUNCATE PARTITION p0;  

LIST分区

根据特定列的值,做成列表,按照列表的范围分区。如创建下面的表

CREATE TABLE testt (
sid INT NOT NULL
province INT);

根据其中province字段代表的值,将不同的值列表放在不同的分区

ALTER TABLE testt PARTITION BY LIST(province) (
PARTITION northeast VALUES IN (1, 2, 3),
PARTITION north VALUES IN (4, 5, 6, 7),
...
);

COLUMN分区

This partitioning allows us to use the multiple columns in partitioning keys. The purpose of these columns is to place the rows in partitions and determine which partition will be validated for matching rows. It is mainly divided into two types:

RANGE Columns Partitioning
LIST Columns Partitioning
They provide supports for the use of non-integer columns to define the ranges or value lists. They support the following data types:

All Integer Types: TINYINT, SMALLINT, MEDIUMINT, INT (INTEGER), and BIGINT.
String Types: CHAR, VARCHAR, BINARY, and VARBINARY.
DATE and DATETIME data types.
Range Column Partitioning: It is similar to the range partitioning with one difference. It defines partitions using ranges based on various columns as partition keys. The defined ranges are of column types other than an integer type.

CREATE TABLE tab_name  
PARTITIONED BY RANGE COLUMNS(colm_list) (  
    PARTITION part_name VALUES LESS THAN (val_list)[,  
    PARTITION parti_name VALUES LESS THAN (val_list)][,  
    ...]  
)  
  
colm_list: It is a list of one or more columns.  
    colm_name[, colm_name][, ...]  
  
val_list: It is a list of values that supplied for each partition definition and have the same number of values as of columns.  
    val[, val][, ...]  

案例

CREATE TABLE AgentDetail (   
agent_id VARCHAR(10),  
agent_name VARCHAR(40),   
city VARCHAR(10))   
PARTITION BY LIST COLUMNS(agent_id) (   
PARTITION pNewyork VALUES IN('A1', 'A2', 'A3'),   
PARTITION pTexas VALUES IN('B1', 'B2', 'B3'),   
PARTITION pCalifornia VALUES IN ('C1', 'C2', 'C3'));  

Hash分区

根据预先指定的数字进行分区,也就是将单一表分为指定数字个数的分区。主要用于将数据平均分配给不同的区。如下面案例

CREATE TABLE Stores (   
    cust_name VARCHAR(40),   
    bill_no VARCHAR(20) NOT NULL,   
    store_id INT PRIMARY KEY NOT NULL,   
    bill_date DATE NOT NULL,   
    amount DECIMAL(8,2) NOT NULL  
)  
PARTITION BY HASH(store_id)  
PARTITIONS 4; 

Key分区

与Hash分区类似,对表的key按指定数字做分区,如果表指定了Primary key或Unique key,则不需要在Key中指定参数。

CREATE TABLE AgentDetail (   
    agent_id INT NOT NULL PRIMARY KEY,  
    agent_name VARCHAR(40)  
)  
PARTITION BY KEY()  
PARTITIONS 2;  
CREATE TABLE AgentDetail (   
    agent_id INT NOT NULL UNIQUE KEY,  
    agent_name VARCHAR(40)  
)  
PARTITION BY KEY()  
PARTITIONS 2;  

子分区

是将分区表进一步做分区的复合方式(composite partitioning)。

CREATE TABLE Person (   
    id INT NOT NULL PRIMARY KEY,  
    name VARCHAR(40),  
    purchased DATE  
)  
 PARTITION BY RANGE( YEAR(purchased) )  
    SUBPARTITION BY HASH( TO_DAYS(purchased) )  
    SUBPARTITIONS 2 (  
        PARTITION p0 VALUES LESS THAN (2015),  
        PARTITION p1 VALUES LESS THAN (2020),  
        PARTITION p2 VALUES LESS THAN MAXVALUE  
    );  

Reference

1 网络综合整理

你可能感兴趣的:(数据库分区Partitioning, 2022-07-16)