- 简介
1.1 什么是复制?
复制:是将一台MySQL实例(master),发生的DML、DDL等修改操作记录到binlog中,源源不断传输副本库,副本库应用日志,达到一个和主库数据接近一致的状态。
1.2 应用场景
a. 备份。
b. 高可用。
c. 读写分离
d. 分布式架构 - 主从复制前提(搭建过程)
2.1 2台以上数据库实例,server_id ,server_uuid
#启动多实例
[root@db01 oldguo]# systemctl start mysqld3307
[root@db01 oldguo]# systemctl start mysqld3308
[root@db01 oldguo]# systemctl start mysqld3309
[root@db01 oldguo]# netstat -tulnp
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
| 7 |
+-------------+
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| d639b892-ba7b-11ea-9d00-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| d8e965c5-ba7b-11ea-9d1e-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
| 9 |
+-------------+
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| dc20d2d8-ba7b-11ea-9f57-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]#
2.2 主库开启binlog
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@log_bin;"
+-----------+
| @@log_bin |
+-----------+
| 1 |
+-----------+
2.3 主库开启专门的复制用户
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "grant replication slave on *.* to repl@'10.0.0.%' identified by '123'"
2.4 “补课”: 备份主库恢复至从库(主库7从库89)
#全备主库数据
root@db01 oldguo]# mysqldump -S /tmp/mysql3307.sock -A --master-data=2 >/tmp/full.sql
#在从库上恢复
[root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "source /tmp/full.sql"
[root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "source /tmp/full.sql"
2.5 开启主从
a.
mysql> help change master to #获取配置信息
CHANGE MASTER TO
MASTER_HOST='master2.example.com',
MASTER_USER='replication',
MASTER_PASSWORD='password',
MASTER_PORT=3306,
MASTER_LOG_FILE='master2-bin.001',
MASTER_LOG_POS=4,
MASTER_CONNECT_RETRY=10;
[root@db01 data]# grep '\-- \CHANGE' /tmp/full.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=21103277;
#分别在从库中执行
[root@db01 /data 15:45:36]# mysql -S /tmp/mysql3308.sock
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000002',
MASTER_LOG_POS=21103277,
MASTER_CONNECT_RETRY=10;
b. 开启复制线程
mysql> start slave;
c. 检测状态
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Running:"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
[root@db01 data]#
[root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
3. 传统主从复制工作原理
3.1 涉及到的文件
主库:
binlog文件: mysql-bin.000001
mysql> select @@log_bin_basename;
+---------------------------+
| @@log_bin_basename |
+---------------------------+
| /data/3307/logs/mysql-bin |
+---------------------------+
从库:
a. relaylog文件: 存储接收到的binlog日志
存储位置: db01-relay-bin.000001
mysql> select @@relay_log_basename;
+--------------------------------+
| @@relay_log_basename |
+--------------------------------+
| /data/3307/data/db01-relay-bin |
+--------------------------------+
b. master_info 文件:用来存储主库相关的信息:
存储位置:/data/3308/data/master.info
mysql> select @@master_info_repository;
+--------------------------+
| @@master_info_repository |
+--------------------------+
| FILE |
+--------------------------+
server_id,server_uuid,user,password,host,port,binlog位置点。
b. relay_info 文件: 记录relaylog回放到的位置点。
存储位置:/data/3308/data/relay-log.info
mysql> select @@relay_log_info_repository;
+-----------------------------+
| @@relay_log_info_repository |
+-----------------------------+
| FILE |
+-----------------------------+
1 row in set (0.00 sec)
3.2 涉及到的线程
主库:
binlog_dump/binlog_dump_gtid
作用: 监控binlog状态,投递binlog给从库。
mysql> show processlist;
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
| 6 | root | localhost | NULL | Sleep | 9041 | | NULL |
| 7 | repl | db01:34702 | NULL | Binlog Dump | 8856 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 8 | repl | db01:34704 | NULL | Binlog Dump | 8844 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 10 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)
从库:
[root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
a. IO线程 :
作用:连接主库、与binlog_dump交互、接收日志、存储日志等。
b. SQL线程
作用:回放relaylog
1.原理:
从库:
a.从库执行 change master to 所用信息被保存到master_info
b.从库执行 start slave,启动IO和SQL线程
c.从库IO线程工作,获取master_info信息,生成指针(MI)
d.从库IO线程,连接主库
f.IO线程和binlog_dump交互,验证server_id、server_uuid、clock
g.IO通过最新MI指针中的binlog位置点,向bing_dump请求更新日志
i.从库IO线程接收主库binlog_dump发送的新的日志,MI指针自动更新,并写入到master_info中
j.IO线程最终坏将接收到的binlog,写入到rely-bin中日志
k.SQL线程,获取relay-log.info信息(上次回放到的位置点),生成一个RI指针,与relay-bin中继日志中的pos进行对比
L.如果有新的中继日志生成,就进行回放,回放完成更新RI指针,并更新relay-log.info
主库:
e.主库连接层,接收请求,验证用户,权限,并生成binlog_dump线程
h.binlog_dump线程一直监控着binglog状态,有新的日志就返回新的binlog给从库IO线程
2.主从复制监控
2.1监控方法
a.主库做个修改操作,看看从库有没有做操作
b.通过相关命令监控
c.通过第三方工具监控
2.2通过相关命令监控
从库
[root@db01 /data 12:49:33]# vim 3308/my.cnf
report_host=10.0.0.51:3308 #让主库能发现自己的IP地址和端口
[root@db01 /data 12:52:59]# systemctl restart mysqld3308.service
a.主库:
mysql>show processlist; #查看所有从库的连接
mysql> show slave hosts; #
+-----------+----------------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+----------------+------+-----------+--------------------------------------+
| 9 | 10.0.0.51:3309 | 3309 | 7 | e54df4b7-c5a2-11ea-8530-000c29b9d34b |
| 8 | 10.0.0.51:3308 | 3308 | 7 | c0f75dfe-c5a1-11ea-b643-000c29b9d34b |
+-----------+----------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)
b.从库监控:
mysql> show slave status \G
# 1. 主库相关信息(来自于master_info)
Master_Host:10.0.0.51 #主库地址
Master_User: repl #主库的复制用户
Master_Port: 3307 #主库的端口号
Connect_Retry: 10 #重连间隔秒数
Master_Log_File: mysql-bin.000008
Read_Master_Log_Pos: 444 #这两条是从库请求到主库的位置点
# 2. 从库relay-log相关信息(relay_info)
Relay_Log_File: db01-relay-bin.000004
Relay_Log_Pos: 320 #这两条是上次回放到的位置点
#3. relaylog和binlog的对应关系
Relay_Master_Log_File: mysql-bin.000008
Exec_Master_Log_Pos: 444 #从库的bin.00004 320 对应的主库的000008 444位置点
# 4. 线程状态有关的信息
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0 #我一般不会选择这些,我的解决思路是从last_IO_Error出发找打问题所在
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
#5. 过滤复制相关信息
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
#6. 主从延时的时间
Seconds_Behind_Master: 0
#7. 延时从库状态信息
SQL_Delay: 0
SQL_Remaining_Delay: NULL
# 8. GTID复制相关
Retrieved_Gtid_Set:
Executed_Gtid_Set:
3.主从常见故障分析及处理思路
3.1如何监控
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
a. IO线程故障
1.建立连接(connecting)
外部因素: 网络不通、防火墙未关闭
内部因素:
用户、密码错误
port、IP错误
主库连接数满了,资源耗尽
故障重现:
- 主库修改repl的密码
mysql> alter user repl@'10.0.0.%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
- 从库重启线程
stop slave;
start slave;
mysql > show mysql status \G;
Slave_IO_Running: Connecting
Last_IO_Errno: 1045
Last_IO_Error: error connecting to master '[email protected]:3307' - retry-time: 10 retries: 1
- 通用排查方法
[root@db01 data]# mysql -urepl -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl'@'db01' (using password: YES)
[root@db01 data]# mysql -urepl1 -p123456 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)
[root@db01 data]# mysql -urepl1 -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)
1045 用户名或密码错误
2003 地址或端口错误 113--地址错误 111---端口错误
[root@db01 data]# mysql -urepl -p123456 -h 10.0.0.52 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.52' (113)
[root@db01 data]# mysql -urepl -p123456 -h 10.0.0.51 -P3300
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.51' (111)
- 解决
a.停从库
mysql> stop slave; (SQL线程和IO线程都停了要想停一个后面要加参数 io_thread;)
b. 重新 change master to
mysql>reset slave all;
mysql> CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=687,
MASTER_CONNECT_RETRY=10;
c. 启动线程
mysql> start slave ;
2. 注册从库到主库 (NO状态)
原因: 主从之间的server_id和server_uuid 重复
故障重现:
- 修改主库server_id与从库一致。
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=8"
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "select @@server_id"
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "select @@server_id"
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
- 重启从库线程
mysql> stop slave;
mysql> start slave;
Slave_IO_Running: No
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
解决方法:
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=7"
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "stop slave ; start slave;"
3. 请求二进制日志(NO 状态)
原因:
a. 搭建时位置点写错了。
b. 主库的日志损坏。
故障重现:
- 搭建时位置点写错了?
mysql -S /tmp/mysql3308.sock
mysql> stop slave;
mysql> reset slave all;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=1212,
MASTER_CONNECT_RETRY=10;
start slave;
报错信息:
mysql> show slave status;
Slave_IO_Running: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'
处理方法:
mysql -S /tmp/mysql3308.sock
mysql> stop slave;
mysql> reset slave all;
mysql> CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000008',
MASTER_LOG_POS=687,#修改位置点
MASTER_CONNECT_RETRY=10;
mysql> start slave;
- 主库日志被误删除
mysql> show binary logs;
+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| mysql-bin.000001 | 177 |
| mysql-bin.000002 | 464 |
| mysql-bin.000003 | 177 |
| mysql-bin.000004 | 177 |
| mysql-bin.000005 | 177 |
| mysql-bin.000006 | 154 |
| mysql-bin.000007 | 1111 |
| mysql-bin.000008 | 687 |
+------------------+-----------+
8 rows in set (0.00 sec)
mysql> reset master;
Query OK, 0 rows affected (0.01 sec)
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event 'mysql-bin.000008' at 687, the last event read from '/data/3307/logs/mysql-bin.000008' at 123, the last byte read from '/data/3307/logs/mysql-bin.000008' at 687.'
处理方法:
mysql -S /tmp/mysql3308.sock
stop slave;
reset slave all;
CHANGE MASTER TO
MASTER_HOST='10.0.0.51',
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000001', #000001
MASTER_LOG_POS=154,#154是起点,每一个日志都有,不用复制
MASTER_CONNECT_RETRY=10;
start slave;
b SQL线程故障
原因:回放日志出现问题,就相当于执行SQL 语句出现问题
3.relay-log 损坏
1.无法回放日志,就相当于执行SQL语句出现问题
1)配置、版本、参数、SQL_MODE
解决方案:硬件配置一致、版本一致、参数一致、SQL_MODE
2)约束冲突(PK、唯一键)、对象的存在性
原因在于:从库发生写入了,或者经历过宕机导致数据不一致
防范方案:
1>从库禁止写入 read_only=1 innodb_readd_only=1 或者使用读写分离中间建
2>高可用结构,半同步,MGR等
如果没有防范出现此类问题:
解决思路:
①PT工具校验主从一致性
②通过校验信息进行同步数据
③跳过错误
方法一:
stop slave;
set global sql_slave_skip_counter = 1;
#将同步指针向下移动一个,如果多次不同步,可以重复操作。
start slave;
方法二:不推荐
/etc/my.cnf
slave-skip-errors = 1032,1062,1007
常见错误代码:
1007:对象已存在
1032:无法执行DML
1062:主键冲突,或约束冲突
- 主从复制延时
5.1 什么是延时?
主库做的事,从库好久才做。
5.2 如何监控?
5.2.1 传输过程监控
主库:
mysql> show master status ;
从库:
mysql> show slave status \G
对比的是Read_Master_Log_Pos的值,如果主库的远大于从库的值就是延时了
例子:
[root@db01 ~]# mysql -S /tmp/mysql3307.sock -e "show master status;"
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 154 | | | |
+------------------+----------+--------------+------------------+-------------------+
[root@db01 ~]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Read_Master_Log_Pos"
Read_Master_Log_Pos: 154
[root@db01 ~]#
1000以上告警,10000以上紧急
5.2.2 回放是否及时
[root@db01 ~]# mysql -S /tmp/mysql3307.sock -e "show master status;"
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 154 | | | |
+------------------+----------+--------------+------------------+-------------------+
[root@db01 ~]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep " Exec_Master_Log_Pos"
Exec_Master_Log_Pos: 154
对比这两个值
10000字节=100K以上告警,100000以上紧急
5.3 主从延时的原因 (从库一般都不会超过五个)
5.3.1 外部因素
网络
主从配置(cpu\mem\io)
参数配置
等。
从库太多
5.3.2 主库方面
# dump线程是串行工作的模式。
5.6以前的版本,只能一个一个事务投递binlog。
5.6+版本以后,出现了group commit(按组提交).0
再不是一个一个提交了;
# binlog日志落地不及时。
采用ssd专门存储binlog
5.3.3 从库问题
# IO落地relaylog
一般建议采用SSD
# SQL线程
默认情况只有一个SQL线程,只能串行工作。
主库可以并发事务。
高并发场景下,会造成较高延时。
出现大事务的时候,都会造成较高延时。
解决方案:
1. 5.6版本,多SQL线程回放功能。但是只能根据不同库(database)进行回放。
2. 5.7+版本中,加入MTS机制。能够按照group commit 的逻辑时钟,进行并行回放。
人的问题
高并发场景下,会造成较高延时。
出现大事务的时候,都会造成较高延时。
锁问题严重。