4. Xtrabackup 物理备份工具
4.1 介绍
Percona 公司的产品。可以兼容 MySQL 各个版本。
物理备份工具。相当于“CP”文件。
MySQL 8.0之前的版本,PXB需要使用2.4.x版本。
MySQL 8.0+版本,PXB需要使用8.0版本。
4.2 安装pxb
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
yum -y install perl perl-devel libaio libaio-devel perl-Time-HiRes perl-DBD-MySQL libev
wget https://www.percona.com/downloads/XtraBackup/Percona-XtraBackup-2.4.12/binary/redhat/7/x86_64/percona-xtrabackup-24-2.4.12-1.el7.x86_64.rpm
yum install -y percona-xtrabackup-24-2.4.12-1.el7.x86_64.rpm
[root@db01 app]# innobackupex --version
xtrabackup: recognized server arguments: --server-id=6 --log_bin=/data/3306/binlog/mysql-bin --datadir=/data/3306/data --innodb_data_file_path=ibdata1:100M;ibdata2:100M;ibdata3:100M:autoextend --innodb_undo_tablespaces=3 --innodb_undo_directory=/data/3306/undologs --innodb_log_file_size=100M --innodb_log_files_in_group=3 --innodb_buffer_pool_size=256M --innodb_log_buffer_size=33554432
innobackupex version 2.4.12 Linux (x86_64) (revision id: 170eb8c)
4.3 PXB备份逻辑
a. InnoDB
ibddata1 ibd...,采用热备方式。并且可以将备份过程中的redo进行备份。
b. 非InnoDB数据
FTWRL : frm myd myi ...
c. 针对InnoDB数据,原生态支持增量备份。
针对上次备份时的checkpoint_lsn,之后发生的数据页修改。
4.4 应用
4.4.1 前提
备份时,需要数据库实例启动。备份期间,会自动读取my.cnf信息,datadir ,socket。
vim /etc/my.cnf
[client]
socket=/tmp/mysql.sock
4.4.2 全备
[root@db01 backup]# innobackupex --user=root --password 123 --no-timestamp /data/backup/pxb/full
文件功能介绍:
xtrabackup_binlog_info : binlog位置点
xtrabackup_checkpoints : checkpoint lsn,为了下次增量
xtrabackup_info : 总览
xtrabackup_logfile : 部分redo
4.4.3 全备恢复
a. prepare 准备备份过程
--apply-log : 模拟了InnoDB CR的过程,应用redo前滚和undo回滚。让数据达到一致状态方可恢复。
[root@db01 undologs]# innobackupex --apply-log /data/backup/pxb/full/
b. 数据恢复
[root@db01 full]# pkill mysqld
[root@db01 full]# rm -rf /data/3306/data/* /data/3306/undologs/*
[root@db01 undologs]# innobackupex --copy-back /data/backup/pxb/full/
[root@db01 undologs]# chown -R mysql.mysql /data/*
c. 启动数据库
[root@db01 undologs]# /etc/init.d/mysqld start
4.4.4 inc备份
a . 准备基础环境
mysql> create database pxb charset utf8mb4;
mysql> use pxb
mysql> create table t1 (id int);
mysql> insert into t1 values(1);
mysql> commit;
b. 模拟周日23:00 全备
[root@db01 pxb]# innobackupex --user=root --password=123 --no-timestamp /data/backup/pxb/full
c. 模拟周一白天的数据变化
mysql> use pxb
mysql> insert into t1 values(11),(21);
mysql> commit;
d. 模拟周一晚上23:00增量备份
[root@db01 inc1]# innobackupex --user=root --password=123 --no-timestamp --incremental /data/backup/pxb/inc1 --incremental-basedir=/data/backup/pxb/full/
e. 模拟周二白天的数据变化
use pxb
insert into t1 values(22),(33);
commit;
f. 模拟周二增量
innobackupex --user=root --password=123 --no-timestamp --incremental /data/backup/pxb/inc2 --incremental-basedir=/data/backup/pxb/inc1
g. 模拟周三白天的数据变化
use pxb
insert into t1 values(122),(133);
commit;
h. 判断
[root@db01 pxb]# cat full/xtrabackup_checkpoints
backup_type = full-backuped
from_lsn = 0
to_lsn = 250975799
last_lsn = 250975808
compact = 0
recover_binlog_info = 0
[root@db01 pxb]# cat inc1/xtrabackup_checkpoints
backup_type = incremental
from_lsn = 250975799
to_lsn = 250977706
last_lsn = 250977715
compact = 0
recover_binlog_info = 0
[root@db01 pxb]# cat inc2/xtrabackup_checkpoints
backup_type = incremental
from_lsn = 250977706
to_lsn = 250979620
last_lsn = 250979629
compact = 0
recover_binlog_info = 0
[root@db01 pxb]#
4.4.5 搞破坏,并恢复
a. 周三上午10点发生故障
[root@db01 pxb]# pkill mysqld
[root@db01 pxb]# rm -rf /data/3306/data/* /data/3306/undologs/*
b. 准备备份。
1. 基础全备的prepare
[root@db01 pxb]# innobackupex --apply-log --redo-only /data/backup/pxb/full/
--redo-only This option should be used when preparing the base full
backup and when merging all incrementals except the last
one. This forces xtrabackup to skip the "rollback" phase
and do a "redo" only. This is necessary if the backup
will have incremental changes applied to it later. See
the xtrabackup documentation for details.
2. inc1 合并至full中并prepare
innobackupex --apply-log --redo-only --incremental-dir=/data/backup/pxb/inc1/ /data/backup/pxb/full/
3. inc2 合并至full中并prepare
innobackupex --apply-log --incremental-dir=/data/backup/pxb/inc2/ /data/backup/pxb/full/
c. 恢复全备
[root@db01 pxb]# innobackupex --copy-back /data/backup/pxb/full/
[root@db01 pxb]# chown -R mysql.mysql /data/*
d. 截取并恢复binlog
[root@db01 inc2]# cat xtrabackup_binlog_info
mysql-bin.000009 1290 4d98ed45-c0e9-11ea-8dd7-000c295bb94f:1-233,
b6c782f4-c4af-11ea-b173-000c295bb94f:1-5
[root@db01 inc2]#
[root@db01 inc2]# mysqlbinlog --skip-gtids --start-position=1290 /data/3306/binlog/mysql-bin.000009 >/tmp/bin.sql
#作业: 500G的全备,周三白天:误删除的是一张200M的表?
=================================================
第九章节 Replication
1. 简介
1.1 什么是复制?
复制:是将一台MySQL实例(master),发生的DML、DDL等修改操作记录到binlog中,源源不断传输副本库,副本库应用日志,达到一个和主库数据接近一致的状态。
1.2 应用场景
a. 备份。
b. 高可用。
c. 读写分离
d. 分布式架构
2. 主从复制前提(搭建过程)
2.1 2台以上数据库实例,server_id ,server_uuid
[root@db01 oldguo]# systemctl start mysqld3307
[root@db01 oldguo]# systemctl start mysqld3308
[root@db01 oldguo]# systemctl start mysqld3309
[root@db01 oldguo]# netstat -tulnp
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
| 7 |
+-------------+
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| d639b892-ba7b-11ea-9d00-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| d8e965c5-ba7b-11ea-9d1e-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
| 9 |
+-------------+
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| dc20d2d8-ba7b-11ea-9f57-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]#
2.2 主库开启binlog
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@log_bin;"
+-----------+
| @@log_bin |
+-----------+
| 1 |
+-----------+
2.3 主库开启专门的复制用户
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "grant replication slave on *.* to repl@'10.0.0.%' identified by '123'"
2.4 “补课”: 备份主库恢复至从库
root@db01 oldguo]# mysqldump -S /tmp/mysql3307.sock -A --master-data=2 >/tmp/full.sql
[root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "source /tmp/full.sql"
[root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "source /tmp/full.sql"
2.5 开启主从
a. help change master to
[root@db01 data]# grep '\-- \CHANGE' /tmp/full.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=444;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=444,
MASTER_CONNECT_RETRY=10;
b. 开启复制线程
start slave;
c. 检测状态
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Running:"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
[root@db01 data]#
[root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
3. 传统主从复制工作原理
3.1 涉及到的文件
主库:
binlog文件: mysql-bin.000001
mysql> select @@log_bin_basename;
+---------------------------+
| @@log_bin_basename |
+---------------------------+
| /data/3307/logs/mysql-bin |
+---------------------------+
从库:
a. relaylog文件:
存储位置: db01-relay-bin.000001
mysql> select @@relay_log_basename;
+--------------------------------+
| @@relay_log_basename |
+--------------------------------+
| /data/3307/data/db01-relay-bin |
+--------------------------------+
作用:存储接收到的binlog日志
b. master_info 文件:
存储位置:/data/3308/data/master.info
mysql> select @@master_info_repository;
+--------------------------+
| @@master_info_repository |
+--------------------------+
| FILE |
+--------------------------+
作用:用来存储主库相关的信息:server_id,server_uuid,user,password,host,port,binlog位置点。
b. relay_info 文件:
存储位置:/data/3308/data/relay-log.info
mysql> select @@relay_log_info_repository;
+-----------------------------+
| @@relay_log_info_repository |
+-----------------------------+
| FILE |
+-----------------------------+
1 row in set (0.00 sec)
作用:记录relaylog回放到的位置点。
3.2 涉及到的线程
主库:
binlog_dump/binlog_dump_gtid
作用: 监控binlog状态,投递binlog给从库。
mysql> show processlist;
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
| 6 | root | localhost | NULL | Sleep | 9041 | | NULL |
| 7 | repl | db01:34702 | NULL | Binlog Dump | 8856 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 8 | repl | db01:34704 | NULL | Binlog Dump | 8844 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 10 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)
从库:
[root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
a. IO线程 :
作用:连接主库、与binlog_dump交互、接收日志、存储日志等。
b. SQL线程
作用:回放relaylog
3.3 主从复制工作原理文字说明
从库:
a. 从库执行 change master to , 所有信息被保存到master_info.
b. 从库执行 start slave, 启动IO和SQL线程
c. 从库IO线程工作,获取master_info信息,生成指针(MI)。
d. 从库IO线程,连接主库
f. IO线程和binlog_dump交互,验证server_id、server_uuid、clock。从库正式注册到主库中。
g. IO通过最新MI指针中的binlog 位置点,向binlog_dump请求最新日志。
i. 从库IO线程接收主库binlog_dump发送的新的日志,MI指针自动更新,并写入到master_info中
j. IO线程线程最终会将接收到的binlog,写入到relay-bin中继日志中。
k. SQL线程,获取relay-log.info信息(上次回放到的位置点),生成一个RI指针,与relay-bin中继日志中的pos进行对比
L. 如果有新的中继日志生成,就进行回放,回放完成更新RI指针,并更新relay-log.info.
主库:
e. 主库连接层,接收请求,验证用户、权限,并生成binlog_dump线程。
h. binlog_dump线程一直监控着binlog状态,有新的日志就返回从库IO线程。
4. 主从复制监控
4.1 监控方法
a. 主库做个修改操作,看看从库有没有做。
b. 通过相关命令监控 √
c. 通过第三方工具监控
4.2 通过相关命令监控
a. 主库:
mysql> show processlist;
mysql> show slave hosts;
+-----------+----------------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+----------------+------+-----------+--------------------------------------+
| 9 | 10.0.0.51:3309 | 3309 | 7 | dc20d2d8-ba7b-11ea-9f57-000c295bb94f |
| 8 | 10.0.0.51:3308 | 3308 | 7 | d8e965c5-ba7b-11ea-9d1e-000c295bb94f |
+-----------+----------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)
b. 从库监控:
mysql> show slave status \G
# 1. 主库相关信息(来自于master_info)
Master_Host: 10.0.0.51
Master_User: repl
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000008
Read_Master_Log_Pos: 444
# 2. 从库relay-log相关信息(relay_info)
Relay_Log_File: db01-relay-bin.000004
Relay_Log_Pos: 320
#3. relaylog和binlog的对应关系
Relay_Master_Log_File: mysql-bin.000008
Exec_Master_Log_Pos: 444
# 4. 线程状态有关的信息
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
#5. 过滤复制相关信息
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
#5. 主从延时的时间
Seconds_Behind_Master: 0
#6. 延时从库状态信息
SQL_Delay: 0
SQL_Remaining_Delay: NULL
# 7. GTID复制相关
Retrieved_Gtid_Set:
Executed_Gtid_Set:
4.3 主从常见故障分析及处理思路
4.3.1 如何监控
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
4.3.2 IO线程故障
#1. 建立连接 (connecting)
外部因素: 网络不通、防火墙
内部因素:
用户、密码错误
port、IP错误
主库连接数满了,资源耗尽
故障重现:
1. 主库修改repl的密码
mysql> alter user repl@'10.0.0.%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
2. 从库重启线程
stop slave;
start slave;
Slave_IO_Running: Connecting
Last_IO_Errno: 1045
Last_IO_Error: error connecting to master '[email protected]:3307' - retry-time: 10 retries: 1
3. 通用排查方法
[root@db01 data]# mysql -urepl -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl'@'db01' (using password: YES)
[root@db01 data]# mysql -urepl1 -p123456 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)
[root@db01 data]# mysql -urepl1 -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)
[root@db01 data]#
[root@db01 data]# mysql -urepl -p123456 -h 10.0.0.52 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.52' (113)
[root@db01 data]# mysql -urepl -p123456 -h 10.0.0.51 -P3300
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.51' (111)
4. 解决
a. stop slave;
b. 重新 change master to
reset slave all;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=687,
MASTER_CONNECT_RETRY=10;
c. start slave ;
#2. 注册从库到主库 (NO状态)
原因: 主从之间的server_id和server_uuid 重复
故障重现:
1. 修改主库server_id与从库一致。
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=8"
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "select @@server_id"
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "select @@server_id"
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
2. 重启从库线程
stop slave;
start slave;
Slave_IO_Running: No
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
解决方法:
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=7"
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "stop slave ; start slave;"
# 3. 请求二进制日志(NO 状态)
原因:
a. 搭建时位置点写错了。
b. 主库的日志损坏。
故障重现:
1. 搭建时位置点写错了?
mysql -S /tmp/mysql3308.sock
stop slave;
reset slave all;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=1212,
MASTER_CONNECT_RETRY=10;
start slave;
报错信息:
Slave_IO_Running: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'
处理方法:
mysql -S /tmp/mysql3308.sock
stop slave;
reset slave all;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=687,
MASTER_CONNECT_RETRY=10;
start slave;
1. 主库日志被误删除
mysql> show binary logs;
+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| mysql-bin.000001 | 177 |
| mysql-bin.000002 | 464 |
| mysql-bin.000003 | 177 |
| mysql-bin.000004 | 177 |
| mysql-bin.000005 | 177 |
| mysql-bin.000006 | 154 |
| mysql-bin.000007 | 1111 |
| mysql-bin.000008 | 687 |
+------------------+-----------+
8 rows in set (0.00 sec)
mysql> reset master;
Query OK, 0 rows affected (0.01 sec)
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event 'mysql-bin.000008' at 687, the last event read from '/data/3307/logs/mysql-bin.000008' at 123, the last byte read from '/data/3307/logs/mysql-bin.000008' at 687.'
处理方法:
mysql -S /tmp/mysql3308.sock
stop slave;
reset slave all;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154,
MASTER_CONNECT_RETRY=10;
start slave;
4.3.3 SQL线程故障
原因:
a. relay-log 损坏
b. 无法回放日志,就相当于执行SQL语句出现问题。
1. 配置、版本、参数、SQL_MODE
解决方案: 硬件配置一致、版本一致、参数一致、SQL_MODE
2. 约束冲突(PK、唯一键)、对象的存在性
原因在于,从库发生写入了,或者经历过宕机导致数据不一致。
防范方案:
a. 从库禁止写入。read_only=1 innodb_read_only=1 或者使用读写分离中间件。
b. 高可用结构、半同步、MGR等
如果没有防范出现此类问题:
解决思路:
1. PT工具校验主从一致性(pt-table-checksum)
2. 通过校验信息进行同步数据 (pt-table-sync)
3. 跳过错误
方法一:
stop slave;
set global sql_slave_skip_counter = 1;
#将同步指针向下移动一个,如果多次不同步,可以重复操作。
start slave;
方法二:不推荐
/etc/my.cnf
slave-skip-errors = 1032,1062,1007
常见错误代码:
1007:对象已存在
1032:无法执行DML
1062:主键冲突,或约束冲突