MySQL 主从复制 配置ddl_exist_errors 坑

MySQL 主从复制 配置ddl_exist_errors 坑

  • 1、用途
  • 2、参数配置
  • 3、模拟场景
    • 从库主键冲突错误
    • 从库删除一个不存在的表造成主从断开
  • 4、扩展
  • 5、结论
  • 6、需要部分的知识点

数据库版本 5.7.22-log MySQL Community
测试参数 ddl_exist_errors

1、用途

跳过主从复制错误,比如主键冲突,修改的数据不存在,删除表不存在等,避免造成主从复制中断。

2、参数配置

slave_skip_errors选项有四个可用值,分别为:off、all、ErorCode、ddl_exist_errors。默认为off。
ddl_exist_errors 包括( 1007,1008,1050,1051,1054,1060,1061,1068,1091(官方文档是1094 实际测试时时1091),1146)

errorCode

官方解释链接地址

数值 含义
1007 数据库已存在,创建数据库失败 Can’t create database ‘%s’; database exists
1008 数据库不存在,删除数据库失败 Can’t drop database ‘%s’; database doesn’t exist
1050 数据表已存在,创建数据表失败 Table ‘%s’ already exists
1051 数据表不存在,删除数据表失败 ER_BAD_TABLE_ERROR/Unknown table ‘%s’
1054 字段不存在, Unknown column ‘%s’ in ‘%s’
1060 字段重复 Duplicate column name ‘%s’
1061 重复键名 Duplicate key name ‘%s’
1068 定义了多个主键 Multiple primary key defined
1091 不能删除一个字段或者key Can’t DROP ‘%s’; check that column/key exists
1146 数据表缺失, Table ‘%s.%s’ doesn’t exist
1053 复制过程中主服务器宕机 Server shutdown in progress
1062 主键冲突 Duplicate entry ‘%s’ for key %d

3、模拟场景

从库主键冲突错误

  1. 主库创建表
create database db1;
create table t1(id int primary key, name varchar(20));
  1. 主库插入数据
insert into t1 values(1,'name1');
insert into t1 values(2,'name2');
  1. 从库插入一条数据
use db1
insert into t1 values(3,'name3');
  1. 主库插入一条和从库相同的数据
use db1
insert into t1 values(3,'name3');
  1. 查看从库show slave status
    此时主从因为主键冲突,停止了主从复制,
[root@localhost] {19:46:47} (db1) [5]> show slave status \G
*************************** 1. row ***************************
              Slave_IO_State: Waiting for master to send event
                 Master_Host: 172.16.4.227
                 Master_User: repl_user
                 Master_Port: 3339
               Connect_Retry: 60
             Master_Log_File: mysql_bin.000001
         Read_Master_Log_Pos: 1448
              Relay_Log_File: relay.000002
               Relay_Log_Pos: 1345
       Relay_Master_Log_File: mysql_bin.000001
            Slave_IO_Running: Yes
           Slave_SQL_Running: No
             Replicate_Do_DB:
         Replicate_Ignore_DB:
          Replicate_Do_Table:
      Replicate_Ignore_Table:
     Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table:
                  Last_Errno: 1062
                  Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3e76c54a-99ae-11ea-9402-00163f00ad67:5' at master log mysql_bin.000001, end_log_pos 1417. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                Skip_Counter: 0
         Exec_Master_Log_Pos: 1132
             Relay_Log_Space: 1858
             Until_Condition: None
              Until_Log_File:
               Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
             Master_SSL_Cert:
           Master_SSL_Cipher:
              Master_SSL_Key:
       Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
               Last_IO_Errno: 0
               Last_IO_Error:
              Last_SQL_Errno: 1062
              Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3e76c54a-99ae-11ea-9402-00163f00ad67:5' at master log mysql_bin.000001, end_log_pos 1417. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
 Replicate_Ignore_Server_Ids:
            Master_Server_Id: 1007566
                 Master_UUID: 3e76c54a-99ae-11ea-9402-00163f00ad67
            Master_Info_File: mysql.slave_master_info
                   SQL_Delay: 0
         SQL_Remaining_Delay: NULL
     Slave_SQL_Running_State:
          Master_Retry_Count: 86400
                 Master_Bind:
     Last_IO_Error_Timestamp:
    Last_SQL_Error_Timestamp: 200519 19:47:57
              Master_SSL_Crl:
          Master_SSL_Crlpath:
          Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-5
           Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-4,
8fcbcabc-99ae-11ea-b785-00163e094944:1
               Auto_Position: 1
        Replicate_Rewrite_DB:
                Channel_Name:
          Master_TLS_Version:
1 row in set (0.00 sec)
  1. 配置从库跳过错误
    vim /etc/mysql57/my.cnf
slave_skip_errors = 1062
  1. 重启数据库
systemctl restart mysql57@3339.service
  1. 观察show slave status
    主从同步已经恢复正常
*************************** 1. row ***************************
              Slave_IO_State: Waiting for master to send event
                 Master_Host: 172.16.4.227
                 Master_User: repl_user
                 Master_Port: 3339
               Connect_Retry: 60
             Master_Log_File: mysql_bin.000001
         Read_Master_Log_Pos: 1448
              Relay_Log_File: relay.000004
               Relay_Log_Pos: 730
       Relay_Master_Log_File: mysql_bin.000001
            Slave_IO_Running: Yes
           Slave_SQL_Running: Yes
             Replicate_Do_DB:
         Replicate_Ignore_DB:
          Replicate_Do_Table:
      Replicate_Ignore_Table:
     Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table:
                  Last_Errno: 0
                  Last_Error:
                Skip_Counter: 0
         Exec_Master_Log_Pos: 1448
             Relay_Log_Space: 927
             Until_Condition: None
              Until_Log_File:
               Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
             Master_SSL_Cert:
           Master_SSL_Cipher:
              Master_SSL_Key:
       Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
               Last_IO_Errno: 0
               Last_IO_Error:
              Last_SQL_Errno: 0
              Last_SQL_Error:
 Replicate_Ignore_Server_Ids:
            Master_Server_Id: 1007566
                 Master_UUID: 3e76c54a-99ae-11ea-9402-00163f00ad67
            Master_Info_File: mysql.slave_master_info
                   SQL_Delay: 0
         SQL_Remaining_Delay: NULL
     Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
          Master_Retry_Count: 86400
                 Master_Bind:
     Last_IO_Error_Timestamp:
    Last_SQL_Error_Timestamp:
              Master_SSL_Crl:
          Master_SSL_Crlpath:
          Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:5
           Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-5,
8fcbcabc-99ae-11ea-b785-00163e094944:1
               Auto_Position: 1
        Replicate_Rewrite_DB:
                Channel_Name:
          Master_TLS_Version:
1 row in set (0.00 sec)

  1. 从库插入数据
insert into t1 values(5,'name555');
  1. 主库插入和从库插入数据主键相同,但是name 字段不同的数据
insert into t1 values(5,'name5');
  1. 查看主库数据
[root@localhost] {20:03:36} (db1) [20]> select * from t1 where id=5;
+----+-------+
| id | name  |
+----+-------+
|  5 | name5 |
+----+-------+
1 row in set (0.00 sec)
  1. 查看从库数据
[root@localhost] {20:03:39} (db1) [16]> select * from t1 where id=5;
+----+---------+
| id | name    |
+----+---------+
|  5 | name555 |
+----+---------+
1 row in set (0.01 sec)
  1. 查看主从复制状态无异常, show slave status 显示从库和主库执行的事物一致
Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:5-8
Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-8,
8fcbcabc-99ae-11ea-b785-00163e094944:1-3
  1. 但是此时主从数据已经不一致。
  2. 对主从不一致的数据再主库上修改
 update t1 set  name='name123' where id=5;
  1. 观察主从上的数据发现已经一致
[root@localhost] {20:10:29} (db1) [24]> select * from t1 where id=5;
+----+---------+
| id | name    |
+----+---------+
|  5 | name123 |
+----+---------+
1 row in set (0.00 sec)

从库删除一个不存在的表造成主从断开

  1. 主库上创建表t3 t4
use db1
create table t2 (id int primary key , name varchar(20));
  1. 在从库上删除表t2
use db1
drop table t2;
  1. 在主库上删除t2
use db1
drop table t2;
  1. 从库观察 主从复制已经断开
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.4.227
                  Master_User: repl_user
                  Master_Port: 3339
                Connect_Retry: 60
              Master_Log_File: mysql_bin.000001
          Read_Master_Log_Pos: 3612
               Relay_Log_File: relay.000006
                Relay_Log_Pos: 1114
        Relay_Master_Log_File: mysql_bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1051
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3e76c54a-99ae-11ea-9402-00163f00ad67:13' at master log mysql_bin.000001, end_log_pos 3612. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3434
              Relay_Log_Space: 1489
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1051
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3e76c54a-99ae-11ea-9402-00163f00ad67:13' at master log mysql_bin.000001, end_log_pos 3612. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1007566
                  Master_UUID: 3e76c54a-99ae-11ea-9402-00163f00ad67
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 200519 20:19:54
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:10-13
            Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-12,
8fcbcabc-99ae-11ea-b785-00163e094944:1-4
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
  1. 配置从库跳过错误 重启数据库
slave_skip_errors = 1051
  1. 再次观察数据库主从复制已经正常, 但是Retrieved gtid 和Executed gtid 不一致
Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:13
Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-12,
8fcbcabc-99ae-11ea-b785-00163e094944:1-4
Auto_Position: 1
  1. 在主库上插入一条数据
insert into t1 values(8,'name8');
  1. 观察主从同步, 发现表不存在和主键冲突不同的是,在Executed_Gtid_Set变量中跳过了会造成表不存在报错的事物3e76c54a-99ae-11ea-9402-00163f00ad67:13。
Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:13-14
Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-12:14,
8fcbcabc-99ae-11ea-b785-00163e094944:1-4
  1. 主库上把包含3e76c54a-99ae-11ea-9402-00163f00ad67:13的binlog 清空
[root@localhost] {20:24:42} (db1) [15]> flush logs;
Query OK, 0 rows affected (0.00 sec)

[root@localhost] {20:28:53} (db1) [16]> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql_bin.000001 |      3975 |
| mysql_bin.000002 |       194 |
+------------------+-----------+
2 rows in set (0.00 sec)

[root@localhost] {20:28:57} (db1) [17]> purge master to 'mysql_bin.000002';
  1. 从库上去掉 slave_skip_errors 参数重启数据库
#slave_skip_errors = 1051
  1. 观察主从同步 发现虽然之前跳过错误,但是如果去掉slave_skip_errors 依然会报错,原因是从库不会把这种错误记录到已经执行的事物中
                Last_IO_Error:
               Last_SQL_Errno: 1051
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3e76c54a-99ae-11ea-9402-00163f00ad67:13' at master log mysql_bin.000001, end_log_pos 3612. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1007566
                  Master_UUID: 3e76c54a-99ae-11ea-9402-00163f00ad67
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 200519 20:31:19
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:13
            Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-12:14,
8fcbcabc-99ae-11ea-b785-00163e094944:1-4
                Auto_Position: 1
  1. 从库执行一个和引起主从复制错误的事物id一样的空事物
stop slave;
set gtid_next='3e76c54a-99ae-11ea-9402-00163f00ad67:13';
begin;commit;
set gtid_next='AUTOMATIC';
start slave ;
  1. 观察此时主从同步恢复正常, 并且从库执行的事物和主库执行的事物一致
#从库
Retrieved_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:13
Executed_Gtid_Set: 3e76c54a-99ae-11ea-9402-00163f00ad67:1-14,
#主库

+------------------+----------+--------------+------------------+-------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
+------------------+----------+--------------+------------------+-------------------------------------------+
| mysql_bin.000002 |      194 |              |                  | 3e76c54a-99ae-11ea-9402-00163f00ad67:1-14 |
+------------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)

4、扩展

模拟测试了ddl_exist_errors错误 从库都不会记录报错的事物到已经执行的事物中,这类似事物如果主库purge 对应的binlog ,服务器重启以后会抱找不到事物的错误。

1007 数据库已存在,创建数据库失败 Can’t create database ‘%s’; database exists
1008 数据库不存在,删除数据库失败 Can’t drop database ‘%s’; database doesn’t exist
1050 数据表已存在,创建数据表失败 Table ‘%s’ already exists
1051 数据表不存在,删除数据表失败 ER_BAD_TABLE_ERROR/Unknown table ‘%s’
1054 字段不存在, Unknown column ‘%s’ in ‘%s’
1060 字段重复 Duplicate column name ‘%s’
1061 重复键名 Duplicate key name ‘%s’
1068 定义了多个主键 Multiple primary key defined
1091 不能删除一个字段或者key Can’t DROP ‘%s’; check that column/key exists

5、结论

1、虽然跳过主键冲突错误,可能会造成主从数据的一致性,后面如果主库有更新操作,会同步到从库,并且主从数据变为一致
2、对于 数据表数据库存不存在类的错误,不会记录到从库已经执行的事物中, 如果主库purge 了相关的binglog ,从库重启(stop slave start slave 不会) 会提示找不到事物的错误(but the master has purged binary logs containing GTIDs that the slave requires.)。虽然配置了slave-skip-errors=all 参数,并且之前跳过了这个错误,依然会造成主从复制断开。

其他实验结论
3、如果从库表结构和主库的表结构不一致,比如从库多字段或者少字段,不论是update delete ,insert 用不用主键id作为条件都不不影响主从复制,不会报警,但是会造成主从数据不一致。 如果没有主键除了插入操作不报错 update 和delete 操作可能会报错(比如主从字段名主从不一致的情况下)。

6、需要部分的知识点

直接

在这里插入代码片

你可能感兴趣的:(MySQL,mysql)