数据量比较大,使用MySQL官方自带的mysqldump难以满足需求的时候,可以使用mydumper执行数据迁移工作。mydumper 最突出的特性就是可采用多线程并行备份,极大提高了数据导出的速度。mydumper的工作原理可以参考:MyDumper原理简介
与其他备份工具一样,mydumper 默认情况下是用 FTWRL (Flush Tables With Read Lock) 全局读锁来保证备份数据的一致性。在开始备份之前,需要对当前数据库负载情况谨慎评估,择时或者从库执行。另外,虽然 mydumper 支持表级别的并行操作,且在导出的时候会对大的表数据进行分块 chunk 导出,但是同一个表的 chunks 是在同一个线程中处理的,并非多线程并行的。
可以考虑在mydumper的GitHub地址中下载release包,或者从TiDB官方网站下载tidb-enterprise-tools,其中也包含了mydumper。这里我们选择从GitHub下载。
# download
wget https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el7.x86_64.rpm
# install
sudo rpm -i mydumper-0.9.5-2.el7.x86_64.rpm
# check installation
mydumper --help
myloader --help
SELECT
TABLE_NAME,
(DATA_LENGTH)/(1024*1024) data_m_bytes,
(INDEX_LENGTH)/(1024*1024) index_m_bytes,
TABLE_ROWS
FROM
INFORMATION_SCHEMA.TABLES
WHERE
TABLE_SCHEMA = 'post_boots'
AND TABLE_NAME IN ( 'index_beauty_is_myhouse_in_hour',
'index_beauty_isnt_myhouse_in_hour', 'index_barm_zoom_style_region' )
order by TABLE_NAME;
screen -R daba_backup
mydumper --verbose=3 -h 10.2.2.2 -P 4000 -u $user -p $passwd -t 16 -F 8 -B post_boots \
-T index_beauty_is_myhouse_in_hour,index_beauty_isnt_myhouse_in_hour,index_barm_zoom_style_region \
--skip-tz-utc -o /home/babe/post_boots_tidb_bak > /home/babe/post_boots_tidb_backup_output.txt 2>&1
参数解释:
-t 线程数;
-F 导出文件分块大小,官方建议是64M,TiDB对事务大小做了限制,如果发现导出时候控制台报错,可以适当降低这个数值;
-B 需要导出的database;
-T 指定需要导出的表,英文逗号分割;
–skip-tz-utc 忽略掉 MySQL 与导数据的机器之间时区设置不一致的情况,禁止自动转换;
-o 指定导出路径,mydumper会自动创建;
# 查看输出是否有错误
vi /home/babe/post_boots_tidb_backup_output.txt
# 由于tidb的限制,确认下生成的文件中没有超过30万行的
find /home/babe/post_boots_tidb_bak -name "*.sql" -exec wc -l {} \; | awk '$1>300000'
tar -zcvf post_boots_doodv1_tidb.tar.gz /home/babe/post_boots_tidb_bak
myloader --verbose=3 -q 200 -h 10.2.2.2 -P 4000 -u $user -p $passwd -t 16 -B post_boots_imported \
-d /home/babe/post_boots_tidb_bak > /home/babe/post_boots_tidb_import_output.txt 2>&1
vi /home/babe/post_boots_tidb_import_output.txt
SELECT
TABLE_NAME,
(DATA_LENGTH)/(1024*1024) data_m_bytes,
(INDEX_LENGTH)/(1024*1024) index_m_bytes,
TABLE_ROWS
FROM
INFORMATION_SCHEMA.TABLES
WHERE
TABLE_SCHEMA = 'post_boots_imported'
AND TABLE_NAME IN ( 'index_beauty_is_myhouse_in_hour',
'index_beauty_isnt_myhouse_in_hour', 'index_barm_zoom_style_region' )
order by TABLE_NAME;
select count(*) from post_boots.index_beauty_is_myhouse_in_hour union all
select count(*) from post_boots_imported.index_beauty_is_myhouse_in_hour;
select count(*) from post_boots.index_beauty_isnt_myhouse_in_hour union all
select count(*) from post_boots_imported.index_beauty_isnt_myhouse_in_hour;
参考 MySQL —— 如何快速对比数据? 的思路,可以用数据行的md5值来比较数据明细是否一致。由于TiDB的group_concat实现不支持order by,因此采用截取行md5值后几位然后求和取整的方式来近似比较。
-- 拼接table column
SELECT
GROUP_CONCAT('IFNULL(',COLUMN_NAME,','''')')
FROM information_schema.COLUMNS
WHERE TABLE_NAME='index_beauty_is_myhouse_in_hour'
and TABLE_SCHEMA='post_boots';
-- 拼接对比明细的查询sql
SELECT min(id) AS min_id,
max(id) AS max_id,
(id div 10000) id_gap,
count(1) AS ROW_COUNT,
cast(
round(
sum(
SUBSTR(
MD5(concat(IFNULL(id,''),IFNULL(date_with_hour,''),IFNULL(is_id,''),IFNULL(source_is_id,''),IFNULL(myhouse_id,''),IFNULL(source_myhouse_id,''))
), -2)
), 2)
as char(12))
AS md5_value
FROM post_boots.index_beauty_is_myhouse_in_hour
GROUP BY id_gap
order by id asc;
在新旧表执行上述语句,导出执行结果——譬如用mysql命令行客户端用-e
选项执行语句然后重定向到文件,然后使用diff命令比较文件差异:
diff -i -B -s post_boots.index_beauty.txt post_boots_imported.index_beauty.txt
参考 MySQL Compare Two Tables 的做法,使用如下语句对比目标库和源库的数据是否一致:
SELECT id, date_with_hour, is_id, source_is_id, myhouse_id, source_myhouse_id
FROM (
SELECT id, date_with_hour, is_id, source_is_id, myhouse_id, source_myhouse_id
from post_boots.index_beauty_is_myhouse_in_hour where id >=60000000
UNION ALL
SELECT id, date_with_hour, is_id, source_is_id, myhouse_id, source_myhouse_id
FROM post_boots_imported.index_beauty_is_myhouse_in_hour where id >=60000000
) tbl
GROUP BY id, date_with_hour, is_id, source_is_id, myhouse_id, source_myhouse_id
HAVING count(*) <> 2
ORDER BY `id`;
注意如果数据量较大,需要按照id
划分区间比较。