使用开源组件mysql2pgsql进行mysql到GP的数据迁移:
下载 mysql2pgsql
下载地址
配置mysql和gp连接信息配置文件,参考
配置文件
以迁移表potluck为例
此处测试版本为mysql 5.7,mysql当前最新版本为8.0
A.全局数据迁移
mysql -u root -p 登录数据库
进入test1数据库 use test1,并找到要迁移的表信息:
MariaDB [test1]> DESCRIBE potluck;
±------------±------------±-----±----±--------±---------------+
| Field | Type | Null | Key | Default | Extra |
±------------±------------±-----±----±--------±---------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | YES | | NULL | |
| food | varchar(30) | YES | | NULL | |
| confirmed | char(1) | YES | | NULL | |
| signup_date | date | YES | | NULL | |
±------------±------------±-----±----±--------±---------------+
MariaDB [test1]> select * from potluck;
±—±------±---------------±----------±------------+
| id | name | food | confirmed | signup_date |
±—±------±---------------±----------±------------+
| 1 | John | Casserole | Y | 2012-04-11 |
| 2 | Sandy | Key Lime Tarts | N | 2012-04-14 |
| 3 | Tom | BBQ | Y | 2012-04-18 |
| 4 | Tina | Salad | Y | 2012-04-10 |
±—±------±---------------±----------±------------+
4 rows in set (0.00 sec)
[gpadmin@vmcentos7 bin]$ cat my.cfg
–mysql源端的连接信息
[src.mysql]
host = “10.132.17.64”
port = “3306”
user = “root”
password = “changeme”
db = “test1”
encodingdir = “share”
encoding = “utf8”
binlogfile = “mysql-bin.000001”
binlogfile_offset = “4”
serverid = 1
[src.pgsql]
connect_string = “host=10.132.17.65 dbname=gpnrt port=5432 user=gpadmin password=gpadmin”
[local.pgsql]
connect_string = “dbname=test port=5888 user=test password=pgsql”
–Greenplum目标端的连接信息
[desc.pgsql]
connect_string = “host=10.132.17.65 dbname=gpnrt port=5432 user=gpadmin password=gpadmin”
target_schema = “public”
ignore_copy_error_count_each_table = “0”
[binlogloader]
loader_table_list = “loader_table_list.txt”
load_batch = 10
load_batch_gap = 10
[gpadmin@vmcentos7 bin]$ cat loader_table_list.txt
potluck
CREATE TABLE public.potluck (id int4, name text, food text, confirmed text, signup_date timestamp) with (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=1, BLOCKSIZE=1048576, OIDS=false) DISTRIBUTED BY ();
– Number of tables: 1
添加分布键语句并且在greenplum里执行DDL建表
CREATE TABLE public.potluck (id int4, name text, food text, confirmed text, signup_date timestamp) with (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=1, BLOCKSIZE=1048576, OIDS=false) DISTRIBUTED BY (id);
test1
.potluck
thread 0 migrate task 0 table test1.potluck 4 rows complete, time cost 71.990 ms
Number of rows migrated: 4 (number of source tables’ rows: 4)
Data sync time cost 184.033 ms
gpadmin=# select * from potluck;
id | name | food | confirmed | signup_date
----±------±---------------±----------±--------------------
2 | Sandy | Key Lime Tarts | N | 2012-04-14 00:00:00
4 | Tina | Salad | Y | 2012-04-10 00:00:00
1 | John | Casserole | Y | 2012-04-11 00:00:00
3 | Tom | BBQ | Y | 2012-04-18 00:00:00
(4 rows)
B.增量数据迁移
增量数据迁移,需要使用到binlog_loader和binlog_miner两个免费组件。
首先需要在mysql里设置一下参数使用row格式的binlog以备解析:
binlog_format = ROW
在客户端主机(也可以部署在其他主机)上启动一个临时 PG 数据库,用于临时存放从MySQL 拉去到的 binlog 数据。
binlog_miner 从源 MySQL 的一个 binlog 文件开始,拉取和解析 binlog 并存放到临时 PG中。
binlog_loader 从临时 PG 中读取增量数据,并做适当的处理,最终批量写入到目标PostgreSQL 或 Greenplum 中去。
这里特别需要配置的
my.cnf
[src.mysql]
host = “192.168.1.1”
port = “3301”
user = “test”
password = “123456”
db = “test”
encodingdir = “share”
encoding = “utf8”
binlogfile = “mysql-bin.000001” --binlog的名称
binlogfile_offset = “4” --binlog中的记录偏移指针,一般从4开始
临时数据库的信息:
my.cnf
[local.pgsql]
connect_string = “dbname=test port=5432 user=test password=pgsql”
CREATE TABLE public.potluck (id int4 PRIMARY KEY, name text, food text, confirmed text, signup_date timestamp) DISTRIBUTED BY (id);
这样所有在mysql中对指定表的各类数据变化都会被同步到目标DB中,包括对应行的 insert update delete等。
5.这两个同步进程所有进行过的同步操作都会在Postgres中间数据中进行记录,及时拉取和写入进程由于异常出现了重启等问题,该程序仍然可以断点续传持续对mysql和GPDB之间的表进行同步操作。
C. 在GPDB中通过外部表映射直接访问mysql中的数据表
通过GPDB提供的外部表功能和gpfdist的transform功能来直接将mysql中的表映射为GPDB的外表具体步骤如下:
编写脚本通过mysql命令输出对应mysql中表的内容:
[gpadmin@vmcentos7 mysql]$ cat mysql2gp.sh
#/bin/bash
PARA=$1
TBNM=${PARA##*/}
cd $(cd "$(dirname "$0")"; pwd)
rm -f /mysql/${TBNM}.csv
. ~/.bashrc
echo "select * from ${TBNM} into outfile '/mysql/${TBNM}.csv' fields terminated by ',' lines terminated by '\r\n'" > ${TBNM}.sql
mysql -uroot -pchangeme test1 < ${TBNM}.sql
cat /mysql/${TBNM}.csv
rm -f /mysql/${TBNM}.csv
VERTION: 1.0
TRANSFORMATIONS:
mysql:
TYPE: input
CONTENT: data
COMMAND: /bin/bash /mysql/mysql2gp.sh %filename%
在GPDB中创建响应外部表:
create external table ext_mysql_potluck( like potluck)
location( ‘gpfdist://192.168.234.130:8080/potluck#transform=mysql’) FORMAT ‘CSV’ (DELIMITER ‘,’) ENCODING ‘GBK’;
启动gpdfist
gpfdist -c mysql.yaml -p 8080 -l gpfdist.log &
直接通过外表访问mysql中的表。
gpadmin_1=# select * from ext_mysql_potluck;
id | name | food | confirmed | signup_date
----±------±---------------±----------±--------------------
1 | John | Casserole | Y | 2012-04-11 00:00:00
2 | Sandy | Key Lime Tarts | N | 2012-04-14 00:00:00
3 | Tom | BBQ | Y | 2012-04-18 00:00:00
4 | Tina | Salad | Y | 2012-04-10 00:00:00
5 | ryan | Casserole | N | 2012-04-11 00:00:00
6 | Alice | Casserole | N | 2012-04-11 00:00:00
7 | ryan | Casserole | N | 2012-04-11 00:00:00
8 | ryan | Casserole | N | 2012-04-11 00:00:00
9 | ryan | Casserole | N | 2012-04-11 00:00:00
10 | Alice | Casserole | Y | 2012-04-11 00:00:00
11 | ryan | Casserole | N | 2012-04-11 00:00:00
(11 rows)