greenplum扩容segment可以分为2类:
下面详细介绍下这两种扩容方式的具体步骤和方法
首先我们来学习一下 gpexpand 命令,基本使用方法如下:
gpexpand
[-f
| -i
| {-d
[-analyze] [-n
| --rollback
| --clean
[-D
命令参数:
-B
并发数可以指定同时运行ssh命令的数量,默认时16,参数范围是1-128
一般的默认值即可
-D
指定要在其中创建扩展模式的数据库和表。如果未提供此选项,则为使用环境变量PGDATABASE。数据库模板不能使用template1和template0。
-i | --input
指定扩展配置文件的名称
主机名:地址:端口:segment data目录,注意是全路径:dbid号:角色
-n
同时进行重分布的表的个数,有效的参数为1-16,每一个表重分布必须要有两个数据库连接,设置之前,要确保最大连接数max_connections这个参数可以满足这个值
-S | --simple_progress
简单模式,不会显示提示
-t | --tardir
指定要放置tar文件的段主机上的临时目录。
-v | --verbose
调试模式,打印出详细的执行计划
-V | --novacuum
创建schema副本时,不对数据字典进行vacuum
原集群的一个master节点和2个数据节点,每一个数据节点上有4个primary计算节点(为了显示清晰,就没有增加mirror节点了)具体信息如下:
postgres=# SELECT * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gw_mdw1 | gw_mdw1 | |
2 | 0 | p | p | s | u | 40000 | gw_sdw1 | gw_sdw1 | |
6 | 4 | p | p | s | u | 40000 | gw_sdw2 | gw_sdw2 | |
3 | 1 | p | p | s | u | 40001 | gw_sdw1 | gw_sdw1 | |
7 | 5 | p | p | s | u | 40001 | gw_sdw2 | gw_sdw2 | |
4 | 2 | p | p | s | u | 40002 | gw_sdw1 | gw_sdw1 | |
8 | 6 | p | p | s | u | 40002 | gw_sdw2 | gw_sdw2 | |
5 | 3 | p | p | s | u | 40003 | gw_sdw1 | gw_sdw1 | |
9 | 7 | p | p | s | u | 40003 | gw_sdw2 | gw_sdw2 | |
(9 rows)
现在准备通过在每台机器上再增加1个节点,来扩容segment,简单的来说就是纵向扩容
指定一个数据库,在此数据库下生成schema
再创建一个seg_hosts的文件,即所有数据节点的主机名
[gpadmin@gw_mdw1 ~]$ cat seg_hosts
gw_sdw1
gw_sdw2
执行命令,生成参数文件
[gpadmin@gw_mdw1 ~]$ gpexpand -f seg_hosts -D test
20190327:23:18:01:007122 gpexpand:gw_mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.1.0 build 6'
20190327:23:18:01:007122 gpexpand:gw_mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.1.0 build 6) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jun 11 2014 17:23:40'
20190327:23:18:01:007122 gpexpand:gw_mdw1:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
System Expansion is used to add segments to an existing GPDB array.
gpexpand did not detect a System Expansion that is in progress.
Before initiating a System Expansion, you need to provision and burn-in
the new hardware. Please be sure to run gpcheckperf/gpcheckos to make
sure the new hardware is working properly.
Please refer to the Admin Guide for more information.
Would you like to initiate a new System Expansion Yy|Nn (default=N):
> y <------确认添加
How many new primary segments per host do you want to add? (default=0):
> 1 <------每台机器上添加1个计算节点
Enter new primary data directory 1:
> /data/primary <------增加的计算节点存放的目录
Generating configuration file...
20190327:23:19:03:007122 gpexpand:gw_mdw1:gpadmin-[INFO]:-Generating input file...
Input configuration files were written to 'gpexpand_inputfile_20190327_231903' and 'None'.
Please review the file and make sure that it is correct then re-run
with: gpexpand -i gpexpand_inputfile_20190327_231903 -D test
20190327:23:19:03:007122 gpexpand:gw_mdw1:gpadmin-[INFO]:-Exiting...
参数文件的内容如下:
[gpadmin@gw_mdw1 ~]$ cat gpexpand_inputfile_20190327_231903
gw_sdw1:gw_sdw1:40004:/data/primary/gpseg8:10:8:p
gw_sdw2:gw_sdw2:40004:/data/primary/gpseg9:11:9:p
如果存放的目录有所改变,可以手动去修改此文件,将该计算节点存放在自己想要放在的位置
[gpadmin@gw_mdw1 ~]$ gpexpand -i gpexpand_inputfile_20190327_231903 -D test
20190327:23:20:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.1.0 build 6'
20190327:23:20:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.1.0 build 6) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jun 11 2014 17:23:40'
.......(省略信息)
20190327:23:22:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-************************************************
20190327:23:22:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-Initialization of the system expansion complete.
20190327:23:22:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-To begin table expansion onto the new segments
20190327:23:22:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-rerun gpexpand
20190327:23:22:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-************************************************
20190327:23:22:19:007227 gpexpand:gw_mdw1:gpadmin-[INFO]:-Exiting...
此时去数据库里查看相关节点信息,可以看到已经增加了两个节点了。
postgres=# SELECT * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gw_mdw1 | gw_mdw1 | |
2 | 0 | p | p | s | u | 40000 | gw_sdw1 | gw_sdw1 | |
6 | 4 | p | p | s | u | 40000 | gw_sdw2 | gw_sdw2 | |
3 | 1 | p | p | s | u | 40001 | gw_sdw1 | gw_sdw1 | |
7 | 5 | p | p | s | u | 40001 | gw_sdw2 | gw_sdw2 | |
4 | 2 | p | p | s | u | 40002 | gw_sdw1 | gw_sdw1 | |
8 | 6 | p | p | s | u | 40002 | gw_sdw2 | gw_sdw2 | |
5 | 3 | p | p | s | u | 40003 | gw_sdw1 | gw_sdw1 | |
9 | 7 | p | p | s | u | 40003 | gw_sdw2 | gw_sdw2 | |
10 | 8 | p | p | s | u | 40004 | gw_sdw1 | gw_sdw1 | |
11 | 9 | p | p | s | u | 40004 | gw_sdw2 | gw_sdw2 | |
(11 rows)
执行重分布命令将数据重分布
[gpadmin@gw_mdw1 ~]$ gpexpand -a -d 1:00:00 -D test
此时还没有完成,去指定的数据库下面查看重分布状态
test=# select * from gpexpand.status;
status | updated
------------+----------------------------
SETUP | 2019-03-27 23:21:49.299565
SETUP DONE | 2019-03-27 23:21:53.737479
(2 rows)
还有多少个表需要重分布,这里我的已经同步完成,因为是测试环境没有什么数据量
test=# select * from gpexpand.expansion_progress ;
name | value
------------------------------+-------
Estimated Expansion Rate |
Estimated Time to Completion |
(2 rows)
完成重分布之后,将schema删除
[gpadmin@gw_mdw1 ~]$ gpexpand -c -D test
这次我们不仅在每台机器上添加1个计算节点,还添加一个数据节点(也就是一个新机器)
这次我们在主机的名单中再增加一个新的机器gw_sdw3,此机器是新添加进来的一个节点。只是安装了软件,没有任何数据。
[gpadmin@gw_mdw1 ~]$ cat seg_hosts
gw_sdw1
gw_sdw2
gw_sdw3
使用同样的方法执行,每一节点也选择增加1个计算节点。如果不想每个节点都添加,只想扩展新增机器节点segment,那么选择增加segment数量时输入0即可。
[gpadmin@gw_mdw1 ~]$ gpexpand -f seg_hosts -D test
生成文件如下:
可以看到,虽然只是添加一个,greenplum查看到gw_sdw3上并没有一个节点,就自动为其一次添加6个,是它与其他两个机器数量一致
[gpadmin@gw_mdw1 ~]$ cat gpexpand_inputfile_20190328_014748
gw_sdw3:gw_sdw3:40000:/data/primary/gpseg10:12:10:p
gw_sdw3:gw_sdw3:40001:/data/primary/gpseg11:13:11:p
gw_sdw3:gw_sdw3:40002:/data/primary/gpseg12:14:12:p
gw_sdw3:gw_sdw3:40003:/data/primary/gpseg13:15:13:p
gw_sdw3:gw_sdw3:40004:/data/primary/gpseg14:16:14:p
gw_sdw1:gw_sdw1:40005:/data/primary/gpseg15:17:15:p
gw_sdw2:gw_sdw2:40005:/data/primary/gpseg16:18:16:p
gw_sdw3:gw_sdw3:40005:/data/primary/gpseg17:19:17:p
执行扩容命令,和上面一样
[gpadmin@gw_mdw1 ~]$ gpexpand -i gpexpand_inputfile_20190328_014748 -D test
20190328:23:50:44:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.1.0 build 6'
20190328:23:50:44:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.1.0 build 6) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jun 11 2014 17:23:40'
....(忽略信息)
20190328:23:54:59:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-Starting Greenplum Database
20190328:23:55:32:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-************************************************
20190328:23:55:32:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-Initialization of the system expansion complete.
20190328:23:55:32:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-To begin table expansion onto the new segments
20190328:23:55:32:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-rerun gpexpand
20190328:23:55:32:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-************************************************
20190328:23:55:32:001498 gpexpand:gw_mdw1:gpadmin-[INFO]:-Exiting...
查看数据库的节点状态,发现已经增加了8个节点,其中2个是纵向添加,6个是横向添加
postgres=# SELECT * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gw_mdw1 | gw_mdw1 | |
2 | 0 | p | p | s | u | 40000 | gw_sdw1 | gw_sdw1 | |
6 | 4 | p | p | s | u | 40000 | gw_sdw2 | gw_sdw2 | |
3 | 1 | p | p | s | u | 40001 | gw_sdw1 | gw_sdw1 | |
7 | 5 | p | p | s | u | 40001 | gw_sdw2 | gw_sdw2 | |
4 | 2 | p | p | s | u | 40002 | gw_sdw1 | gw_sdw1 | |
8 | 6 | p | p | s | u | 40002 | gw_sdw2 | gw_sdw2 | |
5 | 3 | p | p | s | u | 40003 | gw_sdw1 | gw_sdw1 | |
9 | 7 | p | p | s | u | 40003 | gw_sdw2 | gw_sdw2 | |
10 | 8 | p | p | s | u | 40004 | gw_sdw1 | gw_sdw1 | |
12 | 10 | p | p | s | u | 40000 | gw_sdw3 | gw_sdw3 | |
11 | 9 | p | p | s | u | 40004 | gw_sdw2 | gw_sdw2 | |
13 | 11 | p | p | s | u | 40001 | gw_sdw3 | gw_sdw3 | |
14 | 12 | p | p | s | u | 40002 | gw_sdw3 | gw_sdw3 | |
15 | 13 | p | p | s | u | 40003 | gw_sdw3 | gw_sdw3 | |
16 | 14 | p | p | s | u | 40004 | gw_sdw3 | gw_sdw3 | |
17 | 15 | p | p | s | u | 40005 | gw_sdw1 | gw_sdw1 | |
18 | 16 | p | p | s | u | 40005 | gw_sdw2 | gw_sdw2 | |
19 | 17 | p | p | s | u | 40005 | gw_sdw3 | gw_sdw3 | |
(19 rows)
执行表的重分布命令
[gpadmin@gw_mdw1 ~]$ gpexpand -a -d 1:00:00 -D test
20190329:00:00:56:002723 gpexpand:gw_mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.1.0 build 6'
20190329:00:00:56:002723 gpexpand:gw_mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.1.0 build 6) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jun 11 2014 17:23:40'
20190329:00:00:56:002723 gpexpand:gw_mdw1:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20190329:00:01:03:002723 gpexpand:gw_mdw1:gpadmin-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
20190329:00:01:03:002723 gpexpand:gw_mdw1:gpadmin-[INFO]:-Exiting...
去指定数据库下查看,重分布状态
test=# select * from gpexpand.status;
status | updated
--------------------+----------------------------
SETUP | 2019-03-28 23:54:39.325544
SETUP DONE | 2019-03-28 23:54:48.213527
EXPANSION STARTED | 2019-03-29 00:00:58.467156
EXPANSION COMPLETE | 2019-03-29 00:00:58.979106
(4 rows)
完成重分布之后,将schema删除
[gpadmin@gw_mdw1 ~]$ gpexpand -c -D test
至此,两种方式的扩容测试都演示完毕
参考文档
gpexpand | Greenplum数据库文档
gpexpand | Pivotal Greenplum Docs
VMware Tanzu™ Greenplum® 6.20 Documentation | Tanzu Greenplum Docs