KADB 是北京人大金仓信息技术股份有限公司基于开源的greenplum数据库研发的一款分布式关系型数据库,全称是KingbaseAnalyticsDataBase。
分区表在KADB中一种重要的功能,一种好的分区策略可以通过只读取满足查询所需的分区来降低被扫描的数据量。每个分区在每一个计算节点上都是一个单独的物理文件或文件集合(这种情况出现在列寸表上)。就像在 宽列存表中读取一整行比从堆表读取同一行需要更多时间一样,在分区表中读取所有分区比从非分区表中读取相同 的数据要求更多的时间。同时KADB还可以利用分区表来创建行列混合存储的表,不同的分区使用不同的存储方式,还可以利用分区表来做冷热数据的压缩存储,提高查询速度的同时有效的利用的存储空间。总之利用好分区表对整个数据库系统来说都会有一定的提升。
KADB支持多种分区策略:支持rang和list以及两者混合的分区选项,无限制子分区层级数量。
rang分区键只能有一个,适用于时间、数字范围的分区
list可以有多个分区键,适用于字符值的分区。
CREATE TABLE t10 (
ID INT,
DATE DATE,
amt DECIMAL (10, 2)
) DISTRIBUTED BY (ID) PARTITION BY RANGE (DATE)(
PARTITION Jan21 START (DATE '2021-01-01') INCLUSIVE,
PARTITION Feb21 START (DATE '2021-02-01') INCLUSIVE,
PARTITION Mar21 START (DATE '2021-03-01') INCLUSIVE,
PARTITION Apr21 START (DATE '2021-04-01') INCLUSIVE,
PARTITION May21 START (DATE '2021-05-01') INCLUSIVE,
PARTITION Jun21 START (DATE '2021-06-01') INCLUSIVE,
PARTITION Jul21 START (DATE '2021-07-01') INCLUSIVE,
PARTITION Aug21 START (DATE '2021-08-01') INCLUSIVE,
PARTITION Sep21 START (DATE '2021-09-01') INCLUSIVE,
PARTITION Oct21 START (DATE '2021-10-01') INCLUSIVE,
PARTITION Nov21 START (DATE '2021-11-01') INCLUSIVE,
PARTITION Dec21 START (DATE '2021-12-01') INCLUSIVE END (DATE '2022-01-01') EXCLUSIVE
);
CREATE TABLE t11 (
ID INT,
DATE DATE,
amt DECIMAL (10, 2)
) DISTRIBUTED BY (ID) PARTITION BY RANGE (DATE)
(start (date '2021-01-01') end (date '2022-01-01') every (interval '1 month')
)
使用every会根据interval 中定义的跨越基数来自动创建分区, every 中interval 后的参数可以是 day 、 month、year 等根据需求来确定间隔,创建出的分区表命名时按照数字增量来命的,也可以使用关键字来定义分区表的名字。
CREATE TABLE t12 (
ID INT,
DATE DATE,
amt DECIMAL (10, 2)
) DISTRIBUTED BY (ID) PARTITION BY RANGE (DATE)
(partition test start (date '2021-01-01') end (date '2022-01-01') every (interval '2 month')
)
create table t13 (id int,sex char) distributed by (id) partition by list (sex) (partition p_boy values('b'), partition p_girl values('g'),default partition uknow);
default partition 为默认分区,即不符合分区规则的数据会进入到该分区。
create table t14 (id int,date date,region text) distributed by (id)
partition by range (date)
subpartition by list (region)
subpartition template
(
subpartition prtregionusa values ('usa'),
subpartition prtregionchn values ('china'),
default subpartition prtregionother
)
(start (date '2021-01-01') end (date '2022-01-01') every (interval '1 month'),default partition prtdataother);
该表为二级分区表,即先用rang的方式按date分区,每个按时间分区的子分区再用list的方式按region分区,同时每级分区都有自己default分区。
alter table t10 add partition jan22 start (date '2022-01-01') end (date '2022-02-01');
alter table t10 add default partition prtdef;
alter table t10 rename partition for ('2021-01-01') to prt202101;
alter table t10 drop partition for ('2022-01-01');
alter table t10 drop default partition;
alter table t10 truncate partition for('2021-01-01');
alter table t10 split partition for ('2021-01-01') at ('2021-01-16') into (partition prt202101a,partition prt202101b);
create table t15 (id int,sex char) distributed by (id)
partition by list (sex) (partition p_boy values('b') with(appendonly=true,orientation=column),
partition p_girl values('g') with(appendonly=true,orientation=row)
);
t15是一个list分区表,不同的分区存储方式不同,boy分区是列存,girl分区是行存。
CREATE TABLE t16 (
ID INT,
DATE DATE,
amt DECIMAL (10, 2)
) DISTRIBUTED BY (ID) PARTITION BY RANGE (DATE)(
PARTITION Jan21 START (DATE '2021-01-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=9),
PARTITION Feb21 START (DATE '2021-02-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=9),
PARTITION Mar21 START (DATE '2021-03-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=9),
PARTITION Apr21 START (DATE '2021-04-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=9),
PARTITION May21 START (DATE '2021-05-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=5),
PARTITION Jun21 START (DATE '2021-06-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=5),
PARTITION Jul21 START (DATE '2021-07-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=5),
PARTITION Aug21 START (DATE '2021-08-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=5),
PARTITION Sep21 START (DATE '2021-09-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=1),
PARTITION Oct21 START (DATE '2021-10-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=1),
PARTITION Nov21 START (DATE '2021-11-01') INCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=1),
PARTITION Dec21 START (DATE '2021-12-01') INCLUSIVE END (DATE '2022-01-01') EXCLUSIVE with (appendonly=true,compresstype=zlib,compresslevel=1)
);
restore=# \d+ t16
Table "public.t16"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------------+-----------+---------+--------------+-------------
id | integer | | plain | |
date | date | | plain | |
amt | numeric(10,2) | | main | |
Child tables: t16_1_prt_apr21, RANGE PARTITION ( date >= '2021-04-01'::date AND date < '2021-05-01'::date )
t16_1_prt_aug21, RANGE PARTITION ( date >= '2021-08-01'::date AND date < '2021-09-01'::date )
t16_1_prt_dec21, RANGE PARTITION ( date >= '2021-12-01'::date AND date < '2022-01-01'::date )
t16_1_prt_feb21, RANGE PARTITION ( date >= '2021-02-01'::date AND date < '2021-03-01'::date )
t16_1_prt_jan21, RANGE PARTITION ( date >= '2021-01-01'::date AND date < '2021-02-01'::date )
t16_1_prt_jul21, RANGE PARTITION ( date >= '2021-07-01'::date AND date < '2021-08-01'::date )
t16_1_prt_jun21, RANGE PARTITION ( date >= '2021-06-01'::date AND date < '2021-07-01'::date )
t16_1_prt_mar21, RANGE PARTITION ( date >= '2021-03-01'::date AND date < '2021-04-01'::date )
t16_1_prt_may21, RANGE PARTITION ( date >= '2021-05-01'::date AND date < '2021-06-01'::date )
t16_1_prt_nov21, RANGE PARTITION ( date >= '2021-11-01'::date AND date < '2021-12-01'::date )
t16_1_prt_oct21, RANGE PARTITION ( date >= '2021-10-01'::date AND date < '2021-11-01'::date )
t16_1_prt_sep21, RANGE PARTITION ( date >= '2021-09-01'::date AND date < '2021-10-01'::date )
Distributed by: (id)
Partition by: (date)
restore=# \d+ t16_1_prt_jan21
Append-Only Table "public.t16_1_prt_jan21"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------------+-----------+---------+--------------+-------------
id | integer | | plain | |
date | date | | plain | |
amt | numeric(10,2) | | main | |
Compression Type: zlib
Compression Level: 9
Block Size: 32768
Checksum: t
Check constraints:
"t16_1_prt_jan21_check" CHECK (date >= '2021-01-01'::date AND date < '2021-02-01'::date)
Inherits: t16
Distributed by: (id)
Options: appendonly=true, compresstype=zlib, compresslevel=9
restore=# \d+ t16_1_prt_jun21
Append-Only Table "public.t16_1_prt_jun21"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------------+-----------+---------+--------------+-------------
id | integer | | plain | |
date | date | | plain | |
amt | numeric(10,2) | | main | |
Compression Type: zlib
Compression Level: 5
Block Size: 32768
Checksum: t
Check constraints:
"t16_1_prt_jun21_check" CHECK (date >= '2021-06-01'::date AND date < '2021-07-01'::date)
Inherits: t16
Distributed by: (id)
Options: appendonly=true, compresstype=zlib, compresslevel=5
restore=# \d+ t16_1_prt_oct21
Append-Only Table "public.t16_1_prt_oct21"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------------+-----------+---------+--------------+-------------
id | integer | | plain | |
date | date | | plain | |
amt | numeric(10,2) | | main | |
Compression Type: zlib
Compression Level: 1
Block Size: 32768
Checksum: t
Check constraints:
"t16_1_prt_oct21_check" CHECK (date >= '2021-10-01'::date AND date < '2021-11-01'::date)
Inherits: t16
Distributed by: (id)
Options: appendonly=true, compresstype=zlib, compresslevel=1
t16是一个时间分区的分区表,不同的月份压缩级别不同,1-4月份压缩级别最高,5-8月份压缩级别适中,9-12月份压缩级别最低。
当然使用压缩就会影响查询速度,要综合考虑是空间换时间,还是时间换空间。对于时间较近的月份数据热度较高使用频繁,压缩级别的降低,降低了查询时CPU的负载,而时间较远的月份数据热度较低使用不频繁,压缩级别的升高能虽然大大节省了存储空间,但是同时也增加了查询时间。所以在使用压缩的时候要综合考虑利弊。
使用分区表时要注意
【本文正在参与炫“库”行动-人大金仓有奖征文】人大金仓有奖征文