主从复制(上)

  1. 简介
    1.1 什么是复制?
    复制:是将一台MySQL实例(master),发生的DML、DDL等修改操作记录到binlog中,源源不断传输副本库,副本库应用日志,达到一个和主库数据接近一致的状态。
    1.2 应用场景
    a. 备份。
    b. 高可用。
    c. 读写分离
    d. 分布式架构
  2. 主从复制前提(搭建过程)
    2.1 2台以上数据库实例,server_id ,server_uuid
#启动多实例
[root@db01 oldguo]# systemctl start mysqld3307
[root@db01 oldguo]# systemctl start mysqld3308
[root@db01 oldguo]# systemctl start mysqld3309
[root@db01 oldguo]# netstat -tulnp
[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
|           7 |
+-------------+
+--------------------------------------+
| @@server_uuid                        |
+--------------------------------------+
| d639b892-ba7b-11ea-9d00-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
|           8 |
+-------------+
+--------------------------------------+
| @@server_uuid                        |
+--------------------------------------+
| d8e965c5-ba7b-11ea-9d1e-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "select @@server_id;select @@server_uuid"
+-------------+
| @@server_id |
+-------------+
|           9 |
+-------------+
+--------------------------------------+
| @@server_uuid                        |
+--------------------------------------+
| dc20d2d8-ba7b-11ea-9f57-000c295bb94f |
+--------------------------------------+
[root@db01 oldguo]# 

2.2 主库开启binlog

[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "select @@log_bin;"
+-----------+
| @@log_bin |
+-----------+
|         1 |
+-----------+

2.3 主库开启专门的复制用户

[root@db01 oldguo]# mysql -S /tmp/mysql3307.sock -e "grant replication slave on *.* to repl@'10.0.0.%' identified by '123'"

2.4 “补课”: 备份主库恢复至从库(主库7从库89)

#全备主库数据
root@db01 oldguo]# mysqldump -S /tmp/mysql3307.sock -A --master-data=2 >/tmp/full.sql
#在从库上恢复
[root@db01 oldguo]# mysql -S /tmp/mysql3308.sock -e "source /tmp/full.sql"
[root@db01 oldguo]# mysql -S /tmp/mysql3309.sock -e "source /tmp/full.sql"

2.5 开启主从
a.

mysql> help change master to #获取配置信息
CHANGE MASTER TO
  MASTER_HOST='master2.example.com',
  MASTER_USER='replication',
  MASTER_PASSWORD='password',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='master2-bin.001',
  MASTER_LOG_POS=4,
  MASTER_CONNECT_RETRY=10;
[root@db01 data]# grep '\-- \CHANGE' /tmp/full.sql 
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=21103277;
#分别在从库中执行
[root@db01 /data 15:45:36]# mysql -S /tmp/mysql3308.sock 
CHANGE MASTER TO
  MASTER_HOST='10.0.0.51',
  MASTER_USER='repl',
  MASTER_PASSWORD='123',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000002',
  MASTER_LOG_POS=21103277,
  MASTER_CONNECT_RETRY=10;

b. 开启复制线程

mysql> start slave;

c. 检测状态

[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Running:"
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[root@db01 data]# 

[root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
             Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

3. 传统主从复制工作原理

3.1 涉及到的文件

主库:

binlog文件: mysql-bin.000001

mysql> select @@log_bin_basename;
+---------------------------+
| @@log_bin_basename        |
+---------------------------+
| /data/3307/logs/mysql-bin |
+---------------------------+

从库:

a. relaylog文件: 存储接收到的binlog日志
存储位置: db01-relay-bin.000001

mysql> select @@relay_log_basename;
+--------------------------------+
| @@relay_log_basename           |
+--------------------------------+
| /data/3307/data/db01-relay-bin |
+--------------------------------+

b. master_info 文件:用来存储主库相关的信息:
存储位置:/data/3308/data/master.info

mysql> select @@master_info_repository;
+--------------------------+
| @@master_info_repository |
+--------------------------+
| FILE                     |
+--------------------------+
server_id,server_uuid,user,password,host,port,binlog位置点。

b. relay_info 文件: 记录relaylog回放到的位置点。
存储位置:/data/3308/data/relay-log.info

mysql> select @@relay_log_info_repository;
+-----------------------------+
| @@relay_log_info_repository |
+-----------------------------+
| FILE                        |
+-----------------------------+
1 row in set (0.00 sec)

3.2 涉及到的线程

主库:

binlog_dump/binlog_dump_gtid
作用: 监控binlog状态,投递binlog给从库。

mysql> show processlist;
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host       | db   | Command     | Time | State                                                         | Info             |
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
|  6 | root | localhost  | NULL | Sleep       | 9041 |                                                               | NULL             |
|  7 | repl | db01:34702 | NULL | Binlog Dump | 8856 | Master has sent all binlog to slave; waiting for more updates | NULL             |
|  8 | repl | db01:34704 | NULL | Binlog Dump | 8844 | Master has sent all binlog to slave; waiting for more updates | NULL             |
| 10 | root | localhost  | NULL | Query       |    0 | starting                                                      | show processlist |
+----+------+------------+------+-------------+------+---------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)

从库:

[root@db01 data]# mysql -S /tmp/mysql3309.sock -e "show slave status\G"|grep "Running:"
             Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

a. IO线程 :
作用:连接主库、与binlog_dump交互、接收日志、存储日志等。

b. SQL线程
作用:回放relaylog
1.原理:
从库:
a.从库执行 change master to 所用信息被保存到master_info
b.从库执行 start slave,启动IO和SQL线程
c.从库IO线程工作,获取master_info信息,生成指针(MI)
d.从库IO线程,连接主库
f.IO线程和binlog_dump交互,验证server_id、server_uuid、clock
g.IO通过最新MI指针中的binlog位置点,向bing_dump请求更新日志
i.从库IO线程接收主库binlog_dump发送的新的日志,MI指针自动更新,并写入到master_info中
j.IO线程最终坏将接收到的binlog,写入到rely-bin中日志
k.SQL线程,获取relay-log.info信息(上次回放到的位置点),生成一个RI指针,与relay-bin中继日志中的pos进行对比
L.如果有新的中继日志生成,就进行回放,回放完成更新RI指针,并更新relay-log.info

主库:
e.主库连接层,接收请求,验证用户,权限,并生成binlog_dump线程
h.binlog_dump线程一直监控着binglog状态,有新的日志就返回新的binlog给从库IO线程

主从复制原理图

2.主从复制监控
2.1监控方法
a.主库做个修改操作,看看从库有没有做操作
b.通过相关命令监控
c.通过第三方工具监控

2.2通过相关命令监控
从库

[root@db01 /data 12:49:33]# vim 3308/my.cnf 
report_host=10.0.0.51:3308  #让主库能发现自己的IP地址和端口
[root@db01 /data 12:52:59]# systemctl restart mysqld3308.service 

a.主库:

mysql>show processlist;  #查看所有从库的连接
mysql> show slave hosts;  #
+-----------+----------------+------+-----------+--------------------------------------+
| Server_id | Host           | Port | Master_id | Slave_UUID                           |
+-----------+----------------+------+-----------+--------------------------------------+
|         9 | 10.0.0.51:3309 | 3309 |         7 | e54df4b7-c5a2-11ea-8530-000c29b9d34b |
|         8 | 10.0.0.51:3308 | 3308 |         7 | c0f75dfe-c5a1-11ea-b643-000c29b9d34b |
+-----------+----------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

b.从库监控:

mysql> show slave status \G

# 1. 主库相关信息(来自于master_info)
Master_Host:10.0.0.51   #主库地址
Master_User: repl       #主库的复制用户
Master_Port: 3307       #主库的端口号
Connect_Retry: 10       #重连间隔秒数
Master_Log_File: mysql-bin.000008
Read_Master_Log_Pos: 444  #这两条是从库请求到主库的位置点

# 2. 从库relay-log相关信息(relay_info)
Relay_Log_File: db01-relay-bin.000004 
Relay_Log_Pos: 320      #这两条是上次回放到的位置点

#3. relaylog和binlog的对应关系
Relay_Master_Log_File: mysql-bin.000008
Exec_Master_Log_Pos: 444   #从库的bin.00004 320 对应的主库的000008 444位置点

# 4. 线程状态有关的信息    
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0  #我一般不会选择这些,我的解决思路是从last_IO_Error出发找打问题所在
Last_IO_Error: 
Last_SQL_Errno: 0
Last_SQL_Error: 

#5. 过滤复制相关信息
Replicate_Do_DB: 
Replicate_Ignore_DB: 
Replicate_Do_Table: 
Replicate_Ignore_Table: 
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 

#6. 主从延时的时间
Seconds_Behind_Master: 0

#7. 延时从库状态信息
SQL_Delay: 0
SQL_Remaining_Delay: NULL

# 8. GTID复制相关   
Retrieved_Gtid_Set: 
Executed_Gtid_Set:

3.主从常见故障分析及处理思路
3.1如何监控
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:

a. IO线程故障

1.建立连接(connecting)

外部因素: 网络不通、防火墙未关闭
内部因素: 
        用户、密码错误
        port、IP错误
        主库连接数满了,资源耗尽

故障重现:

  1. 主库修改repl的密码
mysql> alter user repl@'10.0.0.%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
  1. 从库重启线程
stop slave;
start slave;
mysql > show mysql status \G;
Slave_IO_Running: Connecting
Last_IO_Errno: 1045
Last_IO_Error: error connecting to master '[email protected]:3307' - retry-time: 10  retries: 1
  1. 通用排查方法
[root@db01 data]# mysql -urepl -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl'@'db01' (using password: YES)

[root@db01 data]# mysql -urepl1 -p123456 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)

[root@db01 data]# mysql -urepl1 -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl1'@'db01' (using password: YES)

1045  用户名或密码错误
2003  地址或端口错误  113--地址错误 111---端口错误
[root@db01 data]# mysql -urepl -p123456  -h 10.0.0.52 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.52' (113)

[root@db01 data]# mysql -urepl -p123456  -h 10.0.0.51 -P3300
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.51' (111)
  1. 解决
a.停从库 
mysql> stop slave; (SQL线程和IO线程都停了要想停一个后面要加参数 io_thread;)
b. 重新 change master to 
mysql>reset slave all;
mysql> CHANGE MASTER TO
  MASTER_HOST='10.0.0.51',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000008',
  MASTER_LOG_POS=687,
  MASTER_CONNECT_RETRY=10;

c.  启动线程
mysql> start slave ;

2. 注册从库到主库 (NO状态)

原因: 主从之间的server_id和server_uuid 重复

故障重现:

  1. 修改主库server_id与从库一致。
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=8"
[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "select @@server_id"
+-------------+
| @@server_id |
+-------------+
|           8 |
+-------------+
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "select @@server_id"
+-------------+
| @@server_id |
+-------------+
|           8 |
+-------------+
  1. 重启从库线程
mysql> stop slave; 
mysql> start slave;

Slave_IO_Running: No
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).

解决方法:

[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=7"
[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "stop slave ; start slave;"

3. 请求二进制日志(NO 状态)

原因:
a. 搭建时位置点写错了。
b. 主库的日志损坏。

故障重现:

  1. 搭建时位置点写错了?
mysql -S /tmp/mysql3308.sock 
mysql> stop slave;
mysql> reset slave all; 
CHANGE MASTER TO
  MASTER_HOST='10.0.0.51',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000008',
  MASTER_LOG_POS=1212,
  MASTER_CONNECT_RETRY=10;
start slave;

报错信息:

mysql> show slave status;
Slave_IO_Running: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'

处理方法:

mysql -S /tmp/mysql3308.sock 
mysql> stop slave;
mysql> reset slave all; 
mysql> CHANGE MASTER TO
  MASTER_HOST='10.0.0.51',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000008',
  MASTER_LOG_POS=687,#修改位置点
  MASTER_CONNECT_RETRY=10;
mysql> start slave;
  1. 主库日志被误删除
mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000001 |       177 |
| mysql-bin.000002 |       464 |
| mysql-bin.000003 |       177 |
| mysql-bin.000004 |       177 |
| mysql-bin.000005 |       177 |
| mysql-bin.000006 |       154 |
| mysql-bin.000007 |      1111 |
| mysql-bin.000008 |       687 |
+------------------+-----------+
8 rows in set (0.00 sec)
mysql> reset master;
Query OK, 0 rows affected (0.01 sec)

Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event 'mysql-bin.000008' at 687, the last event read from '/data/3307/logs/mysql-bin.000008' at 123, the last byte read from '/data/3307/logs/mysql-bin.000008' at 687.'

处理方法:

mysql -S /tmp/mysql3308.sock 
stop slave;
reset slave all; 
CHANGE MASTER TO
  MASTER_HOST='10.0.0.51',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000001', #000001
  MASTER_LOG_POS=154,#154是起点,每一个日志都有,不用复制
  MASTER_CONNECT_RETRY=10;
start slave;

b SQL线程故障

原因:回放日志出现问题,就相当于执行SQL 语句出现问题

3.relay-log 损坏
1.无法回放日志,就相当于执行SQL语句出现问题
1)配置、版本、参数、SQL_MODE
解决方案:硬件配置一致、版本一致、参数一致、SQL_MODE
2)约束冲突(PK、唯一键)、对象的存在性
原因在于:从库发生写入了,或者经历过宕机导致数据不一致
防范方案:
    1>从库禁止写入 read_only=1  innodb_readd_only=1 或者使用读写分离中间建
    2>高可用结构,半同步,MGR等
如果没有防范出现此类问题:
解决思路:
   ①PT工具校验主从一致性
   ②通过校验信息进行同步数据
   ③跳过错误

方法一:

stop slave; 
set global sql_slave_skip_counter = 1;
#将同步指针向下移动一个,如果多次不同步,可以重复操作。
start slave;

方法二:不推荐

/etc/my.cnf
slave-skip-errors = 1032,1062,1007
常见错误代码:
1007:对象已存在
1032:无法执行DML
1062:主键冲突,或约束冲突
  1. 主从复制延时
    5.1 什么是延时?
    主库做的事,从库好久才做。
    5.2 如何监控?
    5.2.1 传输过程监控
    主库:
    mysql> show master status ;

从库:

    mysql> show slave status \G 

对比的是Read_Master_Log_Pos的值,如果主库的远大于从库的值就是延时了
例子:

[root@db01 ~]# mysql -S /tmp/mysql3307.sock -e "show master status;"
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |      154 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
[root@db01 ~]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep "Read_Master_Log_Pos"
Read_Master_Log_Pos: 154
[root@db01 ~]# 
1000以上告警,10000以上紧急

5.2.2 回放是否及时

[root@db01 ~]# mysql -S /tmp/mysql3307.sock -e "show master status;"
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |      154 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
[root@db01 ~]# mysql -S /tmp/mysql3308.sock -e "show slave status\G"|grep " Exec_Master_Log_Pos"
Exec_Master_Log_Pos: 154
对比这两个值
10000字节=100K以上告警,100000以上紧急

5.3 主从延时的原因 (从库一般都不会超过五个)
5.3.1 外部因素

网络 
主从配置(cpu\mem\io)
参数配置
等。
从库太多

5.3.2 主库方面

# dump线程是串行工作的模式。
5.6以前的版本,只能一个一个事务投递binlog。
5.6+版本以后,出现了group commit(按组提交).0
再不是一个一个提交了;
# binlog日志落地不及时。
采用ssd专门存储binlog

5.3.3 从库问题

# IO落地relaylog 
一般建议采用SSD
# SQL线程
默认情况只有一个SQL线程,只能串行工作。
主库可以并发事务。
高并发场景下,会造成较高延时。
出现大事务的时候,都会造成较高延时。

解决方案:

    1. 5.6版本,多SQL线程回放功能。但是只能根据不同库(database)进行回放。
    2. 5.7+版本中,加入MTS机制。能够按照group commit 的逻辑时钟,进行并行回放。

人的问题
高并发场景下,会造成较高延时。
出现大事务的时候,都会造成较高延时。
锁问题严重。

你可能感兴趣的:(主从复制(上))