PostgreSQL (PG) 高效分区表实现 - pg_pathman(1)

PG分区在数据应用中十分广泛。

 
背景

目前PostgreSQL社区版本的分区表功能比较弱,需要通过继承和触发器或RULE来实现分区表的功能,由于查询和更新涉及约束的检查、插入则涉及触发器或规则重写,导致分区功能性能较差。

商业版本EDB,以及数据仓库Greenplum都有比较好的分区支持。

去年GP开源后,阿里云RDS PostgreSQL的小伙伴将Greenplum的分区表功能port到PostgreSQL 9.4之后,比使用继承和触发器的方式性能提升了近百倍(参考我之前写的文章,传统方法除了粗发去本身的开销,还有SEARCH的开销,分区越多越慢,没有使用binary search),由于需要改动CATALOG,所以一直没有在9.4的版本上线这个功能。

分区表已经成为PostgreSQL用户万众期待的功能之一了。

社区核心成员之一叶涛所在的公司postgrespro,开发了一款分区表功能的插件,不需要动用catalog,可以很方便的增加分区表的功能。

本文将为大家讲解pg_pathman的原理,以及使用方法。
一、pg_pathman 原理

PostgreSQL传统的分区方法,使用约束来区分不同分区存放的数据(配置constraint_exclusion=partition),执行select/delete/update时执行计划根据约束和查询条件排除不需要查询的分区表。

调用COPY或插入数据时使用触发器或规则,将数据插入对应的分区表。

传统的做法,无论是查询还是插入,对性能的影响都较大。

pg_pathman与传统的继承分区表做法有一个不同的地方,分区的定义存放在一张元数据表中,表的信息会cache在内存中,同时使用HOOK来实现RELATION的替换,所以效率非常高。

目前支持两种分区模式,range和hash,其中range使用binary search查找对应的分区,hash使用hash search查找对应的分区。

pg_pathman 用到的hook如下

1. pg_pathman uses ProcessUtility_hook hook to handle COPY queries for partitioned tables.

2. RuntimeAppend (overrides Append plan node)

3. RuntimeMergeAppend (overrides MergeAppend plan node)

4. PartitionFilter (drop-in replacement for INSERT triggers)

https://wiki.postgresql.org/wiki/CustomScanAPI

pic1
二、pg_pathman 特性

1. 目前支持range , hash分区。

HASH and RANGE partitioning schemes;

2. 支持自动分区管理(通过函数接口创建分区,自动将主表数据迁移到分区表),或手工分区管理(通过函数实现,将已有的表绑定到分区表,或者从分区表剥离)。

Both automatic and manual partition management;

3. 支持的分区字段类型包括int, float, date, 以及其他常用类型,包括自定义的domain。

Support for integer, floating point, date and other types, including domains;

4. 通过CUSTOM SCAN实现了有效的分区表JOIN, 子查询过滤分区。

Effective query planning for partitioned tables (JOINs, subselects etc);

5. 使用RuntimeAppend & RuntimeMergeAppend custom plan nodes实现了动态分区选择。

RuntimeAppend & RuntimeMergeAppend custom plan nodes to pick partitions at runtime;

6. PartitionFilter HOOK,实现insert inplace, 代替传统的insert trigger或insert rule。

PartitionFilter: an efficient drop-in replacement for INSERT triggers;

7. 支持自动新增分区。 目前仅支持range分区表。

Automatic partition creation for new INSERTed data (only for RANGE partitioning);

8. 支持copy from/to 直接读取或写入分区表,提高效率。

Improved COPY FROM\TO statement that is able to insert rows directly into partitions;

9. 支持分区字段的更新,需要添加触发器,如果不需要更新分区字段,则不建议添加这个触发器,会产生一定的性能影响。

UPDATE triggers generation out of the box (will be replaced with custom nodes too);

10. 允许用户自定义回调函数,在创建分区时会自动触发。

User-defined callbacks for partition creation event handling;

回调函数的规则如下

    $part_init_callback$(args JSONB) RETURNS VOID
     
    传入参数如下  
    /* RANGE-partitioned table abc (for exp: child abc_4) */
    {
        "parent":    "abc",
        "parttype":  "2",
        "partition": "abc_4",
        "range_max": "401",
        "range_min": "301"
    }
     
    /* HASH-partitioned table abc (for exp: child abc_0) */
    {
        "parent":    "abc",
        "parttype":  "1",
        "partition": "abc_0"
    }

11. 非堵塞式创建分区表,以及后台自动将主表数据迁移到分区表,非堵塞式。

Non-blocking concurrent table partitioning;

12. 支持FDW,通过配置参数pg_pathman.insert_into_fdw=(disabled | postgres | any_fdw)支持postgres_fdw或任意fdw

FDW support (foreign partitions);

13. 支持GUC参数配置,注意由于使用了HOOK,如果其他插件也使用了相同的HOOK,需要将pg_pathman放在前面注册,如pg_stat_statements。

shared_preload_libraries = 'pg_pathman, pg_stat_statements'

Various GUC toggles and configurable settings.
三、pg_pathman 为什么高效

插入优化,使用PartitionFilter替换relation,代替触发器的方式。效率提升非常明显。

查询优化,分区定义加载在内存中,使用binary search和hash search对应range与hash分区表,使用RuntimeAppend & RuntimeMergeAppend custom plan nodes to pick partitions at runtime;

相比查询时通过约束过滤更加高效。 同时runtime过滤,支持子查询。传统的约束法不支持子查询过滤。
四、pg_pathman 使用方法

pg_pathman使用了custom scan provider api,所以只支持PostgreSQL 9.5以及以上的版本。
四.1 安装、配置

    $ git clone https://github.com/postgrespro/pg_pathman
    $ export PATH=/home/digoal/pgsql9.6:$PATH
     
    $ cd pg_pathman
    $ make USE_PGXS=1
    $ make USE_PGXS=1 install
     
    $ cd $PGDATA
    $ vi postgresql.conf
    shared_preload_libraries = 'pg_pathman,pg_stat_statements'
     
    $ pg_ctl restart -m fast
     
    $ psql
    postgres=# create extension pg_pathman;
    CREATE EXTENSION
     
    postgres=# \dx
                       List of installed extensions
        Name    | Version |   Schema   |         Description          
    ------------+---------+------------+------------------------------
     pg_pathman | 1.1     | public     | Partitioning tool ver. 1.1

四.2 参数

    pg_pathman.enable --- disable (or enable) pg_pathman completely
    默认on
     
    pg_pathman.enable_runtimeappend --- toggle RuntimeAppend custom node on\off
    默认on
     
    pg_pathman.enable_runtimemergeappend --- toggle RuntimeMergeAppend custom node on\off
    默认on
     
    pg_pathman.enable_partitionfilter --- toggle PartitionFilter custom node on\off
    默认on
     
    pg_pathman.enable_auto_partition --- toggle automatic partition creation on\off (per session)
    默认on
     
    pg_pathman.insert_into_fdw --- allow INSERTs into various FDWs (disabled | postgres | any_fdw)
    默认postgres
     
    pg_pathman.override_copy --- toggle COPY statement hooking on\off
    默认on

四.3 相关视图和表

pg_pathman 使用函数来维护分区表,并且创建了一些视图,可以查看分区表的状态。

分区表的定义则存在一张表中,定义数据缓存在内存中。

1. pathman_config --- main config storage

This table stores a list of partitioned tables.

    CREATE TABLE IF NOT EXISTS pathman_config (
        partrel         REGCLASS NOT NULL PRIMARY KEY,  -- 主表oid
        attname         TEXT NOT NULL,  -- 分区列名
        parttype        INTEGER NOT NULL,  -- 分区类型(hash or range)
        range_interval  TEXT,  -- range分区的interval
     
        CHECK (parttype IN (1, 2)) /* check for allowed part types */ );

2. pathman_config_params --- optional parameters

This table stores optional parameters which override standard behavior.

这张表存储的信息将覆盖标准配置(即postgresql.conf中的配置)

    CREATE TABLE IF NOT EXISTS pathman_config_params (
        partrel        REGCLASS NOT NULL PRIMARY KEY,  -- 主表oid
        enable_parent  BOOLEAN NOT NULL DEFAULT TRUE,  -- 是否在优化器中过滤主表
        auto           BOOLEAN NOT NULL DEFAULT TRUE,  -- insert时是否自动扩展不存在的分区
        init_callback  REGPROCEDURE NOT NULL DEFAULT 0);  -- create partition时的回调函数oid

3. pathman_concurrent_part_tasks --- currently running partitioning workers

This view lists all currently running concurrent partitioning tasks.

当前正在执行的数据迁移任务(从主表将数据迁移到分区)。

    -- helper SRF function
    CREATE OR REPLACE FUNCTION show_concurrent_part_tasks()  
    RETURNS TABLE (
        userid     REGROLE,
        pid        INT,
        dbid       OID,
        relid      REGCLASS,
        processed  INT,
        status     TEXT)
    AS 'pg_pathman', 'show_concurrent_part_tasks_internal'
    LANGUAGE C STRICT;
     
    CREATE OR REPLACE VIEW pathman_concurrent_part_tasks
    AS SELECT * FROM show_concurrent_part_tasks();

4. pathman_partition_list --- list of all existing partitions

This view lists all existing partitions, as well as their parents and range boundaries (NULL for HASH partitions).

列出已经存在的分区。

    -- helper SRF function
    CREATE OR REPLACE FUNCTION show_partition_list()
    RETURNS TABLE (
        parent     REGCLASS,
        partition  REGCLASS,
        parttype   INT4,
        partattr   TEXT,
        range_min  TEXT,
        range_max  TEXT)
    AS 'pg_pathman', 'show_partition_list_internal'
    LANGUAGE C STRICT;
     
    CREATE OR REPLACE VIEW pathman_partition_list
    AS SELECT * FROM show_partition_list();

四.4 分区表管理

创建分区表时,需要指定主表的名字,主表必须已存在,主表可以有数据,也可以是空表。

如果主表有数据,那么可以配置是否需要在创建分区时,将数据迁移到分区,(不建议对大表这么做)。

如果主表有很多数据,建议使用后台非堵塞式的迁移方法。 (调用partition_table_concurrently()函数进行迁移)。

如果在创建分区表前,使用set_init_callback(relation regclass, callback regproc DEFAULT 0)设置了回调函数,则创建分区时,每个分区表创建是,会自动调用对应的回调函数。

回调函数的传入参数和模式如下

    $part_init_callback$(args JSONB) RETURNS VOID
     
    传入参数如下  
    /* RANGE-partitioned table abc (for exp: child abc_4) */
    {
        "parent":    "abc",
        "parttype":  "2",
        "partition": "abc_4",
        "range_max": "401",
        "range_min": "301"
    }
     
    /* HASH-partitioned table abc (for exp: child abc_0) */
    {
        "parent":    "abc",
        "parttype":  "1",
        "partition": "abc_0"
    }

1. range分区

有4个管理函数用来创建范围分区

指定起始值、间隔、分区个数

    create_range_partitions(relation       REGCLASS,  -- 主表OID
                            attribute      TEXT,      -- 分区列名
                            start_value    ANYELEMENT,  -- 开始值
                            p_interval     ANYELEMENT,  -- 间隔;任意类型,适合任意类型的分区表
                            p_count        INTEGER DEFAULT NULL,   --  分多少个区
                            partition_data BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
     
    create_range_partitions(relation       REGCLASS,  -- 主表OID
                            attribute      TEXT,      -- 分区列名
                            start_value    ANYELEMENT,  -- 开始值
                            p_interval     INTERVAL,    -- 间隔;interval 类型,用于时间分区表
                            p_count        INTEGER DEFAULT NULL,   --  分多少个区
                            partition_data BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

指定起始值、终值、间隔

    create_partitions_from_range(relation       REGCLASS,  -- 主表OID
                                 attribute      TEXT,      -- 分区列名
                                 start_value    ANYELEMENT,  -- 开始值
                                 end_value      ANYELEMENT,  -- 结束值
                                 p_interval     ANYELEMENT,  -- 间隔;任意类型,适合任意类型的分区表
                                 partition_data BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )
     
    create_partitions_from_range(relation       REGCLASS,  -- 主表OID
                                 attribute      TEXT,      -- 分区列名
                                 start_value    ANYELEMENT,  -- 开始值
                                 end_value      ANYELEMENT,  -- 结束值
                                 p_interval     INTERVAL,    -- 间隔;interval 类型,用于时间分区表
                                 partition_data BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

例子

    创建需要分区的主表
    postgres=# create table part_test(id int, info text, crt_time timestamp not null);  -- 分区列必须有not null约束  
    CREATE TABLE
     
    插入一批测试数据,模拟已经有数据了的主表
    postgres=# insert into part_test select id,md5(random()::text),clock_timestamp() + (id||' hour')::interval from generate_series(1,10000) t(id);
    INSERT 0 10000
    postgres=# select * from part_test limit 10;                    
     id |               info               |          crt_time          
    ----+----------------------------------+----------------------------
      1 | 36fe1adedaa5b848caec4941f87d443a | 2016-10-25 10:27:13.206713
      2 | c7d7358e196a9180efb4d0a10269c889 | 2016-10-25 11:27:13.206893
      3 | 005bdb063550579333264b895df5b75e | 2016-10-25 12:27:13.206904
      4 | 6c900a0fc50c6e4da1ae95447c89dd55 | 2016-10-25 13:27:13.20691
      5 | 857214d8999348ed3cb0469b520dc8e5 | 2016-10-25 14:27:13.206916
      6 | 4495875013e96e625afbf2698124ef5b | 2016-10-25 15:27:13.206921
      7 | 82488cf7e44f87d9b879c70a9ed407d4 | 2016-10-25 16:27:13.20693
      8 | a0b92547c8f17f79814dfbb12b8694a0 | 2016-10-25 17:27:13.206936
      9 | 2ca09e0b85042b476fc235e75326b41b | 2016-10-25 18:27:13.206942
     10 | 7eb762e1ef7dca65faf413f236dff93d | 2016-10-25 19:27:13.206947
    (10 rows)
     
    注意:  
    1. 分区列必须有not null约束  
    2. 分区个数必须能覆盖已有的所有记录  
     
    创建分区,每个分区包含1个月的跨度数据  
    postgres=# select                                             
    create_range_partitions('part_test'::regclass,             -- 主表OID
                            'crt_time',                        -- 分区列名
                            '2016-10-25 00:00:00'::timestamp,  -- 开始值
                            interval '1 month',                -- 间隔;interval 类型,用于时间分区表
                            24,                                -- 分多少个区
                            false) ;                           -- 不迁移数据
    NOTICE:  sequence "part_test_seq" does not exist, skipping
     create_range_partitions
    -------------------------
                          24
    (1 row)
    postgres-# \d+ part_test
                                      Table "public.part_test"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Child tables: part_test_1,
                  part_test_10,
                  part_test_11,
                  part_test_12,
                  part_test_13,
                  part_test_14,
                  part_test_15,
                  part_test_16,
                  part_test_17,
                  part_test_18,
                  part_test_19,
                  part_test_2,
                  part_test_20,
                  part_test_21,
                  part_test_22,
                  part_test_23,
                  part_test_24,
                  part_test_3,
                  part_test_4,
                  part_test_5,
                  part_test_6,
                  part_test_7,
                  part_test_8,
                  part_test_9
     
     
    由于不迁移数据,所以数据还在主表
    postgres=# select count(*) from only part_test;
     count
    -------
     10000
    (1 row)
     
     
    使用非堵塞式的迁移接口  
    partition_table_concurrently(relation   REGCLASS,              -- 主表OID
                                 batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
                                 sleep_time FLOAT8 DEFAULT 1.0)    -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务。
     
     
    postgres=# select partition_table_concurrently('part_test'::regclass,
                                 10000,
                                 1.0);
    NOTICE:  worker started, you can stop it with the following command: select stop_concurrent_part_task('part_test');
     partition_table_concurrently
    ------------------------------
     
    (1 row)
     
     
    迁移结束后,主表数据已经没有了,全部在分区中
    postgres=# select count(*) from only part_test;
     count
    -------
         0
    (1 row)
     
     
    数据迁移完成后,建议禁用主表,这样执行计划就不会出现主表了
    postgres=# select set_enable_parent('part_test'::regclass, false);
     set_enable_parent
    -------------------
     
    (1 row)
     
    postgres=# explain select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;
                                       QUERY PLAN                                    
    ---------------------------------------------------------------------------------
     Append  (cost=0.00..16.18 rows=1 width=45)
       ->  Seq Scan on part_test_1  (cost=0.00..16.18 rows=1 width=45)
             Filter: (crt_time = '2016-10-25 00:00:00'::timestamp without time zone)
    (3 rows)

建议

1. 分区列必须有not null约束
2. 分区个数必须能覆盖已有的所有记录
3. 建议使用非堵塞式迁移接口
4. 建议数据迁移完成后,禁用主表

2. hash分区

有1个管理函数用来创建范围分区

指定起始值、间隔、分区个数

    create_hash_partitions(relation         REGCLASS,  -- 主表OID
                           attribute        TEXT,      -- 分区列名
                           partitions_count INTEGER,   -- 打算创建多少个分区
                           partition_data   BOOLEAN DEFAULT TRUE)   --  是否立即将数据从主表迁移到分区, 不建议这么使用, 建议使用非堵塞式的迁移( 调用partition_table_concurrently() )

例子

    创建需要分区的主表
    postgres=# create table part_test(id int, info text, crt_time timestamp not null);    -- 分区列必须有not null约束  
    CREATE TABLE
     
    插入一批测试数据,模拟已经有数据了的主表
    postgres=# insert into part_test select id,md5(random()::text),clock_timestamp() + (id||' hour')::interval from generate_series(1,10000) t(id);
    INSERT 0 10000
    postgres=# select * from part_test limit 10;   
     id |               info               |          crt_time          
    ----+----------------------------------+----------------------------
      1 | 29ce4edc70dbfbe78912beb7c4cc95c2 | 2016-10-25 10:47:32.873879
      2 | e0990a6fb5826409667c9eb150fef386 | 2016-10-25 11:47:32.874048
      3 | d25f577a01013925c203910e34470695 | 2016-10-25 12:47:32.874059
      4 | 501419c3f7c218e562b324a1bebfe0ad | 2016-10-25 13:47:32.874065
      5 | 5e5e22bdf110d66a5224a657955ba158 | 2016-10-25 14:47:32.87407
      6 | 55d2d4fd5229a6595e0dd56e13d32be4 | 2016-10-25 15:47:32.874076
      7 | 1dfb9a783af55b123c7a888afe1eb950 | 2016-10-25 16:47:32.874081
      8 | 41eeb0bf395a4ab1e08691125ae74bff | 2016-10-25 17:47:32.874087
      9 | 83783d69cc4f9bb41a3978fe9e13d7fa | 2016-10-25 18:47:32.874092
     10 | affc9406d5b3412ae31f7d7283cda0dd | 2016-10-25 19:47:32.874097
    (10 rows)
     
    注意:  
    1. 分区列必须有not null约束  
     
    创建128个分区
    postgres=# select                                             
    create_hash_partitions('part_test'::regclass,              -- 主表OID
                            'crt_time',                        -- 分区列名
                            128,                               -- 打算创建多少个分区
                            false) ;                           -- 不迁移数据
     create_hash_partitions
    ------------------------
                        128
    (1 row)
     
    postgres=# \d+ part_test
                                      Table "public.part_test"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Child tables: part_test_0,
                  part_test_1,
                  part_test_10,
                  part_test_100,
                  part_test_101,
                  part_test_102,
                  part_test_103,
                  part_test_104,
                  part_test_105,
                  part_test_106,
                  part_test_107,
                  part_test_108,
                  part_test_109,
                  part_test_11,
                  part_test_110,
                  part_test_111,
                  part_test_112,
                  part_test_113,
                  part_test_114,
                  part_test_115,
                  part_test_116,
                  part_test_117,
                  part_test_118,
                  part_test_119,
                  part_test_12,
                  part_test_120,
                  part_test_121,
                  part_test_122,
                  part_test_123,
                  part_test_124,
                  part_test_125,
                  part_test_126,
                  part_test_127,
                  part_test_13,
                  part_test_14,
                  part_test_15,
                  part_test_16,
                  part_test_17,
                  part_test_18,
                  part_test_19,
                  part_test_2,
                  part_test_20,
                  part_test_21,
                  part_test_22,
                  part_test_23,
                  part_test_24,
                  part_test_25,
                  part_test_26,
                  part_test_27,
                  part_test_28,
                  part_test_29,
                  part_test_3,
                  part_test_30,
                  part_test_31,
                  part_test_32,
                  part_test_33,
                  part_test_34,
                  part_test_35,
                  part_test_36,
                  part_test_37,
                  part_test_38,
                  part_test_39,
                  part_test_4,
                  part_test_40,
                  part_test_41,
                  part_test_42,
                  part_test_43,
                  part_test_44,
                  part_test_45,
                  part_test_46,
                  part_test_47,
                  part_test_48,
                  part_test_49,
                  part_test_5,
                  part_test_50,
                  part_test_51,
                  part_test_52,
                  part_test_53,
                  part_test_54,
                  part_test_55,
                  part_test_56,
                  part_test_57,
                  part_test_58,
                  part_test_59,
                  part_test_6,
                  part_test_60,
                  part_test_61,
                  part_test_62,
                  part_test_63,
                  part_test_64,
                  part_test_65,
                  part_test_66,
                  part_test_67,
                  part_test_68,
                  part_test_69,
                  part_test_7,
                  part_test_70,
                  part_test_71,
                  part_test_72,
                  part_test_73,
                  part_test_74,
                  part_test_75,
                  part_test_76,
                  part_test_77,
                  part_test_78,
                  part_test_79,
                  part_test_8,
                  part_test_80,
                  part_test_81,
                  part_test_82,
                  part_test_83,
                  part_test_84,
                  part_test_85,
                  part_test_86,
                  part_test_87,
                  part_test_88,
                  part_test_89,
                  part_test_9,
                  part_test_90,
                  part_test_91,
                  part_test_92,
                  part_test_93,
                  part_test_94,
                  part_test_95,
                  part_test_96,
                  part_test_97,
                  part_test_98,
                  part_test_99
     
     
    由于不迁移数据,所以数据还在主表
    postgres=# select count(*) from only part_test;
     count
    -------
     10000
    (1 row)
     
     
    使用非堵塞式的迁移接口  
    partition_table_concurrently(relation   REGCLASS,              -- 主表OID
                                 batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
                                 sleep_time FLOAT8 DEFAULT 1.0)    -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务。
     
     
    postgres=# select partition_table_concurrently('part_test'::regclass,
                                 10000,
                                 1.0);
    NOTICE:  worker started, you can stop it with the following command: select stop_concurrent_part_task('part_test');
     partition_table_concurrently
    ------------------------------
     
    (1 row)
     
     
    迁移结束后,主表数据已经没有了,全部在分区中
    postgres=# select count(*) from only part_test;
     count
    -------
         0
    (1 row)
     
     
    数据迁移完成后,建议禁用主表,这样执行计划就不会出现主表了
    postgres=# select set_enable_parent('part_test'::regclass, false);
     set_enable_parent
    -------------------
     
    (1 row)
     
    只查单个分区
    postgres=# explain select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;
                                       QUERY PLAN                                    
    ---------------------------------------------------------------------------------
     Append  (cost=0.00..1.91 rows=1 width=45)
       ->  Seq Scan on part_test_122  (cost=0.00..1.91 rows=1 width=45)
             Filter: (crt_time = '2016-10-25 00:00:00'::timestamp without time zone)
    (3 rows)
     
    分区表约束如下  
    很显然pg_pathman自动完成了转换,如果是传统的继承,select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp; 这种写法是不能筛选分区的。  
    postgres=# \d+ part_test_122
                                    Table "public.part_test_122"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_122_3_check" CHECK (get_hash_part_idx(timestamp_hash(crt_time), 128) = 122)
    Inherits: part_test

建议

1. 分区列必须有not null约束
2. 建议使用非堵塞式迁移接口
3. 建议数据迁移完成后,禁用主表
4. pg_pathman不会受制于表达式的写法,所以select * from part_test where crt_time = '2016-10-25 00:00:00'::timestamp;这样的写法也是能走哈希分区的。
5. hash分区列不局限于int类型的列,会使用hash函数自动转换。

3. 数据迁移到分区

如果创建分区表时,未将主表数据迁移到分区,那么可以使用非堵塞式的迁移接口,将数据迁移到分区

可能类似如下做法

    with tmp as (delete from 主表 limit xx nowait returning *) insert into 分区 select * from tmp
     
    或者使用 select array_agg(ctid) from 主表 limit xx for update nowati 进行标示 然后执行delete和insert。  

1. 函数接口如下

    partition_table_concurrently(relation   REGCLASS,              -- 主表OID
                                 batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
                                 sleep_time FLOAT8 DEFAULT 1.0)    -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务。

2. 例子

    postgres=# select partition_table_concurrently('part_test'::regclass,
                                 10000,
                                 1.0);
    NOTICE:  worker started, you can stop it with the following command: select stop_concurrent_part_task('part_test');
     partition_table_concurrently
    ------------------------------
     
    (1 row)

3. 如何停止迁移任务,调用如下函数接口

stop_concurrent_part_task(relation REGCLASS)

4. 查看后台的数据迁移任务

    postgres=# select * from pathman_concurrent_part_tasks;
     userid | pid | dbid | relid | processed | status
    --------+-----+------+-------+-----------+--------
    (0 rows)

4. 分裂范围分区

例如某个分区太大了,想分裂为两个分区,可以使用这种方法

仅支持范围分区表

    split_range_partition(partition      REGCLASS,            -- 分区oid
                          split_value    ANYELEMENT,          -- 分裂值
                          partition_name TEXT DEFAULT NULL)   -- 分裂后新增的分区表名

例子

    postgres=# \d+ part_test
                                      Table "public.part_test"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Child tables: part_test_1,
                  part_test_10,
                  part_test_11,
                  part_test_12,
                  part_test_13,
                  part_test_14,
                  part_test_15,
                  part_test_16,
                  part_test_17,
                  part_test_18,
                  part_test_19,
                  part_test_2,
                  part_test_20,
                  part_test_21,
                  part_test_22,
                  part_test_23,
                  part_test_24,
                  part_test_3,
                  part_test_4,
                  part_test_5,
                  part_test_6,
                  part_test_7,
                  part_test_8,
                  part_test_9
     
    postgres=# \d+ part_test_1
                                     Table "public.part_test_1"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)
    Inherits: part_test

分裂

    postgres=# select split_range_partition('part_test_1'::regclass,              -- 分区oid
                          '2016-11-10 00:00:00'::timestamp,     -- 分裂值
                          'part_test_1_2');                     -- 分区表名
                 split_range_partition             
    -----------------------------------------------
     {"2016-10-25 00:00:00","2016-11-25 00:00:00"}
    (1 row)

分裂后的两个表如下

    postgres=# \d+ part_test_1
                                     Table "public.part_test_1"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-10 00:00:00'::timestamp without time zone)
    Inherits: part_test
     
    postgres=# \d+ part_test_1_2
                                    Table "public.part_test_1_2"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_1_2_3_check" CHECK (crt_time >= '2016-11-10 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)
    Inherits: part_test

数据会自动迁移到另一个分区

    postgres=# select count(*) from part_test_1;
     count
    -------
       373
    (1 row)
    postgres=# select count(*) from part_test_1_2;
     count
    -------
       360
    (1 row)

继承关系如下

    postgres=# \d+ part_test
                                      Table "public.part_test"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Child tables: part_test_1,
                  part_test_10,
                  part_test_11,
                  part_test_12,
                  part_test_13,
                  part_test_14,
                  part_test_15,
                  part_test_16,
                  part_test_17,
                  part_test_18,
                  part_test_19,
                  part_test_1_2,    -- 新增的表
                  part_test_2,
                  part_test_20,
                  part_test_21,
                  part_test_22,
                  part_test_23,
                  part_test_24,
                  part_test_3,
                  part_test_4,
                  part_test_5,
                  part_test_6,
                  part_test_7,
                  part_test_8,
                  part_test_9

5. 合并范围分区

目前仅支持范围分区

调用如下接口

    指定两个需要合并分区,必须为相邻分区  
    merge_range_partitions(partition1 REGCLASS, partition2 REGCLASS)    

例子

    postgres=# select merge_range_partitions('part_test_2'::regclass, 'part_test_12'::regclass) ;
    ERROR:  merge failed, partitions must be adjacent
    CONTEXT:  PL/pgSQL function merge_range_partitions_internal(regclass,regclass,regclass,anyelement) line 27 at RAISE
    SQL statement "SELECT public.merge_range_partitions_internal($1, $2, $3, NULL::timestamp without time zone)"
    PL/pgSQL function merge_range_partitions(regclass,regclass) line 44 at EXECUTE
    不是相邻分区,报错
    相邻分区可以合并
    postgres=# select merge_range_partitions('part_test_1'::regclass, 'part_test_1_2'::regclass) ;
     merge_range_partitions
    ------------------------
     
    (1 row)

合并后,会删掉其中一个分区表

    postgres=# \d part_test_1_2
    Did not find any relation named "part_test_1_2".
     
    postgres=# \d part_test_1
                 Table "public.part_test_1"
      Column  |            Type             | Modifiers
    ----------+-----------------------------+-----------
     id       | integer                     |
     info     | text                        |
     crt_time | timestamp without time zone | not null
    Check constraints:
        "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)
    Inherits: part_test
     
    postgres=# select count(*) from part_test_1;
     count
    -------
       733
    (1 row)

6. 向后添加范围分区

如果已经对主表进行了分区,将来需要增加分区的话,有几种方法,一种是向后新增分区(即在末尾追加分区)。

新增分区时,会使用初次创建该分区表时的interval作为间隔。

可以在这个表中查询每个分区表初次创建时的 interval

    postgres=# select * from pathman_config;
      partrel  | attname  | parttype | range_interval
    -----------+----------+----------+----------------
     part_test | crt_time |        2 | 1 mon
    (1 row)

添加分区接口,支持指定表空间

    append_range_partition(parent         REGCLASS,            -- 主表OID
                           partition_name TEXT DEFAULT NULL,   -- 新增的分区表名, 默认不需要输入
                           tablespace     TEXT DEFAULT NULL)   -- 新增的分区表放到哪个表空间, 默认不需要输入

例子

    postgres=# select append_range_partition('part_test'::regclass);
     append_range_partition
    ------------------------
     public.part_test_25
    (1 row)
     
    postgres=# \d+ part_test_25
                                    Table "public.part_test_25"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_25_3_check" CHECK (crt_time >= '2018-10-25 00:00:00'::timestamp without time zone AND crt_time < '2018-11-25 00:00:00'::timestamp without time zone)
    Inherits: part_test
     
    postgres=# \d+ part_test_24
                                    Table "public.part_test_24"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_24_3_check" CHECK (crt_time >= '2018-09-25 00:00:00'::timestamp without time zone AND crt_time < '2018-10-25 00:00:00'::timestamp without time zone)
    Inherits: part_test

7. 向前添加范围分区

在头部追加分区。

接口

    prepend_range_partition(parent         REGCLASS,
                            partition_name TEXT DEFAULT NULL,
                            tablespace     TEXT DEFAULT NULL)

例子

    postgres=# select prepend_range_partition('part_test'::regclass);
     prepend_range_partition
    -------------------------
     public.part_test_26
    (1 row)
     
    postgres=# \d+ part_test_26
                                    Table "public.part_test_26"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_26_3_check" CHECK (crt_time >= '2016-09-25 00:00:00'::timestamp without time zone AND crt_time < '2016-10-25 00:00:00'::timestamp without time zone)
    Inherits: part_test
     
    postgres=# \d+ part_test_1
                                     Table "public.part_test_1"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_1_3_check" CHECK (crt_time >= '2016-10-25 00:00:00'::timestamp without time zone AND crt_time < '2016-11-25 00:00:00'::timestamp without time zone)
    Inherits: part_test

8. 添加分区

指定分区起始值的方式添加分区,只要创建的分区和已有分区不会存在数据交叉就可以创建成功。

也就是说使用这种方法,不要求强制创建连续的分区,例如已有分区覆盖了2010-2015的范围,你可以直接创建一个2020年的分区表,不需要覆盖2015到2020的范围。

接口如下

    add_range_partition(relation       REGCLASS,    -- 主表OID
                        start_value    ANYELEMENT,  -- 起始值
                        end_value      ANYELEMENT,  -- 结束值
                        partition_name TEXT DEFAULT NULL,  -- 分区名
                        tablespace     TEXT DEFAULT NULL)  -- 分区创建在哪个表空间下  

例子

    postgres=# select add_range_partition('part_test'::regclass,    -- 主表OID
                        '2020-01-01 00:00:00'::timestamp,  -- 起始值
                        '2020-02-01 00:00:00'::timestamp); -- 结束值
     add_range_partition
    ---------------------
     public.part_test_27
    (1 row)
     
    postgres=# \d+ part_test_27
                                    Table "public.part_test_27"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_27_3_check" CHECK (crt_time >= '2020-01-01 00:00:00'::timestamp without time zone AND crt_time < '2020-02-01 00:00:00'::timestamp without time zone)
    Inherits: part_test

9. 删除分区

1. 删除单个范围分区

接口如下

    drop_range_partition(partition TEXT,   -- 分区名称
                        delete_data BOOLEAN DEFAULT TRUE)  -- 是否删除分区数据,如果false,表示分区数据迁移到主表。  
     
    Drop RANGE partition and all of its data if delete_data is true.

例子

    删除分区, 数据迁移到主表  
    postgres=# select drop_range_partition('part_test_1',false);
    NOTICE:  733 rows copied from part_test_1
     drop_range_partition
    ----------------------
     part_test_1
    (1 row)
     
    postgres=# select drop_range_partition('part_test_2',false);
    NOTICE:  720 rows copied from part_test_2
     drop_range_partition
    ----------------------
     part_test_2
    (1 row)
     
    postgres=# select count(*) from part_test;
     count
    -------
     10000
    (1 row)
     
    删除分区,分区数据也删除,不迁移到主表  
    postgres=# select drop_range_partition('part_test_3',true);
     drop_range_partition
    ----------------------
     part_test_3
    (1 row)
     
    postgres=# select count(*) from part_test;
     count
    -------
      9256
    (1 row)
     
    postgres=# select count(*) from only part_test;
     count
    -------
      1453
    (1 row)

2. 删除所有分区,并且指定是否要将数据迁移到主表

接口如下

    drop_partitions(parent      REGCLASS,
                    delete_data BOOLEAN DEFAULT FALSE)
     
    Drop partitions of the parent table (both foreign and local relations).
    If delete_data is false, the data is copied to the parent table first.
    Default is false.

例子

    postgres=# select drop_partitions('part_test'::regclass, false);  -- 删除所有分区表,并将数据迁移到主表
    NOTICE:  function public.part_test_upd_trig_func() does not exist, skipping
    NOTICE:  744 rows copied from part_test_4
    NOTICE:  672 rows copied from part_test_5
    NOTICE:  744 rows copied from part_test_6
    NOTICE:  720 rows copied from part_test_7
    NOTICE:  744 rows copied from part_test_8
    NOTICE:  720 rows copied from part_test_9
    NOTICE:  744 rows copied from part_test_10
    NOTICE:  744 rows copied from part_test_11
    NOTICE:  720 rows copied from part_test_12
    NOTICE:  744 rows copied from part_test_13
    NOTICE:  507 rows copied from part_test_14
    NOTICE:  0 rows copied from part_test_15
    NOTICE:  0 rows copied from part_test_16
    NOTICE:  0 rows copied from part_test_17
    NOTICE:  0 rows copied from part_test_18
    NOTICE:  0 rows copied from part_test_19
    NOTICE:  0 rows copied from part_test_20
    NOTICE:  0 rows copied from part_test_21
    NOTICE:  0 rows copied from part_test_22
    NOTICE:  0 rows copied from part_test_23
    NOTICE:  0 rows copied from part_test_24
    NOTICE:  0 rows copied from part_test_25
    NOTICE:  0 rows copied from part_test_26
    NOTICE:  0 rows copied from part_test_27
     drop_partitions
    -----------------
                  24
    (1 row)
     
    postgres=# select count(*) from part_test;
     count
    -------
      9256
    (1 row)
     
    postgres=# \dt part_test_4
    No matching relations found.

10. 绑定分区(已有的表加入分区表)

将已有的表,绑定到已有的某个分区主表。

已有的表与主表要保持一致的结构,包括dropped columns。 (查看pg_attribute的一致性)

如果设置了回调函数,会触发。

接口如下

    attach_range_partition(relation    REGCLASS,    -- 主表OID
                           partition   REGCLASS,    -- 分区表OID
                           start_value ANYELEMENT,  -- 起始值
                           end_value   ANYELEMENT)  -- 结束值

例子

    postgres=# create table part_test_1 (like part_test including all);
    CREATE TABLE
    postgres=# \d+ part_test
                                      Table "public.part_test"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
     
    postgres=# \d+ part_test_1
                                     Table "public.part_test_1"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
     
    postgres=# select attach_range_partition('part_test'::regclass, 'part_test_1'::regclass, '2019-01-01 00:00:00'::timestamp, '2019-02-01 00:00:00'::timestamp);
     attach_range_partition
    ------------------------
     part_test_1
    (1 row)
     
    绑定分区时,
    自动创建继承关系,自动创建约束  
    postgres=# \d+ part_test_1
                                     Table "public.part_test_1"
      Column  |            Type             | Modifiers | Storage  | Stats target | Description
    ----------+-----------------------------+-----------+----------+--------------+-------------
     id       | integer                     |           | plain    |              |
     info     | text                        |           | extended |              |
     crt_time | timestamp without time zone | not null  | plain    |              |
    Check constraints:
        "pathman_part_test_1_3_check" CHECK (crt_time >= '2019-01-01 00:00:00'::timestamp without time zone AND crt_time < '2019-02-01 00:00:00'::timestamp without time zone)
    Inherits: part_test

11. 解绑分区(将分区变成普通表)

将分区从主表的继承关系中删除, 不删数据,删除继承关系,删除约束

接口如下

detach_range_partition(partition REGCLASS)  -- 指定分区名,转换为普通表  

例子

    postgres=# select count(*) from part_test;
     count
    -------
      9256
    (1 row)
    postgres=# select count(*) from part_test_2;
     count
    -------
       733
    (1 row)
     
    postgres=# select detach_range_partition('part_test_2');
     detach_range_partition
    ------------------------
     part_test_2
    (1 row)
     
    postgres=# select count(*) from part_test_2;
     count
    -------
       733
    (1 row)
    postgres=# select count(*) from part_test;
     count
    -------
      8523
    (1 row)

接口函数内容

    postgres=# \sf detach_range_partition
    CREATE OR REPLACE FUNCTION public.detach_range_partition(partition regclass)
     RETURNS text
     LANGUAGE plpgsql
    AS $function$
    DECLARE
            v_attname               TEXT;
            parent_relid    REGCLASS;
     
    BEGIN
            parent_relid := public.get_parent_of_partition(partition);
     
            /* Acquire lock on parent */
            PERFORM public.lock_partitioned_relation(parent_relid);
     
            v_attname := attname
            FROM public.pathman_config
            WHERE partrel = parent_relid;
     
            IF v_attname IS NULL THEN
                    RAISE EXCEPTION 'table "%" is not partitioned', parent_relid::TEXT;
            END IF;
     
            /* Remove inheritance */
            EXECUTE format('ALTER TABLE %s NO INHERIT %s',
                                       partition::TEXT,
                                       parent_relid::TEXT);
     
            /* Remove check constraint */
            EXECUTE format('ALTER TABLE %s DROP CONSTRAINT %s',
                                       partition::TEXT,
                                       public.build_check_constraint_name(partition, v_attname));
     
            /* Invalidate cache */
            PERFORM public.on_update_partitions(parent_relid);
     
            RETURN partition;
    END
    $function$

12. 更新触发器

如果分区字段要被更新,需要创建更新触发器,否则不需要。

接口函数如下

    create_hash_update_trigger(parent REGCLASS)
     
    Creates the trigger on UPDATE for HASH partitions.
    The UPDATE trigger isn't created by default because of the overhead.
    It's useful in cases when the key attribute might change.
     
    create_range_update_trigger(parent REGCLASS)
     
    Same as above, but for a RANGE-partitioned table.

例子

创建更新触发器前,如果更新分区字段后的值跨分区了,会报约束错误。

    postgres=# select * from part_test_3 limit 10;
     id  |               info               |          crt_time          
    -----+----------------------------------+----------------------------
     734 | 52288de52fccf3d47efe897e1320a0fd | 2016-11-25 00:11:34.113856
     735 | 16f4fffda933356192af8d1991c673cf | 2016-11-25 01:11:34.113862
     736 | 08ec10184500ef43a6efde38dc43df33 | 2016-11-25 02:11:34.113867
     737 | e658c7fb7f44ae3145401bf348cfa9dd | 2016-11-25 03:11:34.113872
     738 | 81ff4c5cb3404230341aa95c28f86931 | 2016-11-25 04:11:34.113877
     739 | 931652d6ba49f8155b1486d30fd23bab | 2016-11-25 05:11:34.113883
     740 | c616c01d98016ff0022aa5449d53ca8f | 2016-11-25 06:11:34.113888
     741 | 358e44b68259587233a0f571e8a86a81 | 2016-11-25 07:11:34.113893
     742 | 719bb75e67c23c1f76e4eb81cb22004e | 2016-11-25 08:11:34.113899
     743 | 1fc90c401eec2927fe9bb726651e4936 | 2016-11-25 09:11:34.113904
    (10 rows)
     
    postgres=# update part_test set crt_time='2016-01-25 00:11:34.113856' where id=734;
    ERROR:  new row for relation "part_test_3" violates check constraint "pathman_part_test_3_3_check"
    DETAIL:  Failing row contains (734, 52288de52fccf3d47efe897e1320a0fd, 2016-01-25 00:11:34.113856).

创建更新触发器后,正常

    postgres=# select create_range_update_trigger('part_test'::regclass);
      create_range_update_trigger   
    --------------------------------
     public.part_test_upd_trig_func
    (1 row)
     
    postgres=# update part_test set crt_time='2016-01-25 00:11:34.113856' where id=734;
    UPDATE 0
    postgres=# select * from part_test where id=734;
     id  |               info               |          crt_time          
    -----+----------------------------------+----------------------------
     734 | 52288de52fccf3d47efe897e1320a0fd | 2016-01-25 00:11:34.113856
    (1 row)

通常业务设计时,不应该允许分区字段的变更。
 

你可能感兴趣的:(数据库)