昨天不少同学讨论《小心,前方有雷 —— sql_slave_skip_counter》,有说作者在玩文字游戏,扯了那么多sql_slave_skip_counter=1不还是跳过一个事务嘛。自己看了几遍原文,好像是那么回事,但又没明白slave_exec_mode参数如何影响。只能说一百个读者有一百种见解,甚至随着读者的切入点、知识的变化而改变。
计划用两篇文章写写跳过复制错误相关的三个参数sql_slave_skip_counter、slave_skip_errors、slave_exec_mode
一、基本环境
VMware10.0+CentOS6.9+MySQL5.7.19
ROLE | HOSTNAME | BASEDIR | DATADIR | IP | PORT |
Master | ZST1 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.132 | 3306 |
Slave | ZST2 | /usr/local/mysql | /data/mysql/mysql3306/data | 192.168.85.133 | 3306 |
基于Row+Position搭建的一主一从异步复制结构:Master->{Slave}
二、sql_slave_skip_counter官方解释
https://dev.mysql.com/doc/refman/5.7/en/set-global-sql-slave-skip-counter.html
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.SET GLOBAL sql_slave_skip_counter = N
When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.
• For transactional tables, an event group corresponds to a transaction.
• For nontransactional tables, an event group corresponds to a single SQL statement.
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.
三、测试案例
从官方解释我们知道, sql_slave_skip_counter以event为单位skip,直到skip完第N个event所在的event group才停止。对于事务表,一个event group对应一个事务;对于非事务表,一个event group对应一条SQL语句。一个event group包含多个events。
这里我只针对显式事务模拟insert遇到Duplicate entry(1062错误),知道了问题本质,delete/update中的1032错误类似去分析
3.1、测试数据
主库创建一个事务表和一个非事务表,然后从库往各表写入id=1的记录
# 主库创建测试表 mydba@192.168.85.132,3306 [replcrash]> create table repl_innodb(id int primary key,name1 char(10),name2 char(10)) engine=innodb; mydba@192.168.85.132,3306 [replcrash]> create table repl_myisam(id int primary key,name1 char(10),name2 char(10)) engine=myisam; # 从库往测试表中添加数据,不记入binlog mydba@192.168.85.133,3306 [replcrash]> set sql_log_bin=0; mydba@192.168.85.133,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'s1062-1','s1062-1'); mydba@192.168.85.133,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'s1062-1','s1062-1'); mydba@192.168.85.133,3306 [replcrash]> set sql_log_bin=1;
3.2、transactional tables
主库往事务表中添加数据
# 主库往事务表中添加数据 mydba@192.168.85.132,3306 [replcrash]> begin; mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> commit; mydba@192.168.85.132,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
很明显,从库先写入数据占用id=1,主库再写入数据,复制将主库id=1的写入记录传递到从库,造成从库key冲突(1062错误)
我们尝试使用sql_slave_skip_counter跳过错误(实际遇到1062写入key冲突,我们应该根据 Duplicate entry 删除从库对应记录)
# 从库跳过“1个”错误,并启动sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | +----+---------+---------+
从库不仅跳过了id=1的记录,还跳过了id=2的记录
分析:主库上的begin..commit之间对事务表的操作记录为一个事务,对应一个event group。id=1应用于从库遇到Duplicate entry错误,我们使用sql_slave_skip_counter跳过这个event之后,还在此group内,需要继续跳过此group中的后续events。因此在从库不会有id=2的记录~
[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=1869 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 1869 #171201 10:15:11 server id 1323306 end_log_pos 1934 CRC32 0x3a86cd44 Anonymous_GTID last_committed=5 sequence_number=6 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 1934 #171201 10:14:43 server id 1323306 end_log_pos 2011 CRC32 0x83c239df Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512094483/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1436549152/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; ==================== repl_innodb表写入id=1、2的记录,在一个事务中Start ==================== BEGIN /*!*/; # at 2011 #171201 10:14:43 server id 1323306 end_log_pos 2076 CRC32 0x0f3612fe Table_map: `replcrash`.`repl_innodb` mapped to number 263 # at 2076 #171201 10:14:43 server id 1323306 end_log_pos 2132 CRC32 0x01de5dbd Write_rows: table id 263 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 2132 #171201 10:14:50 server id 1323306 end_log_pos 2197 CRC32 0xf838b054 Table_map: `replcrash`.`repl_innodb` mapped to number 263 # at 2197 #171201 10:14:50 server id 1323306 end_log_pos 2253 CRC32 0xbd9ae02a Write_rows: table id 263 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 2253 #171201 10:15:11 server id 1323306 end_log_pos 2284 CRC32 0x0292df6a Xid = 60 COMMIT/*!*/; ==================== repl_innodb表写入id=1、2的记录,在一个事务中End ==================== SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ZST1 logs]#
3.3、nontransactional tables
主库往非事务表中添加数据
# 主库往非事务表中添加数据 mydba@192.168.85.132,3306 [replcrash]> begin; mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> commit; mydba@192.168.85.132,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
同理,从库先写入数据占用id=1,主库再写入数据,复制将主库id=1的写入记录传递到从库,造成从库key冲突(1062错误)
我们尝试使用sql_slave_skip_counter跳过错误(实际遇到1062写入key冲突,我们应该根据 Duplicate entry 删除从库对应记录)
# 从库跳过“1个”错误,并启动sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
从库跳过了id=1的记录,但复制了id=2的记录
分析:主库上的begin..commit之间对非事务表的操作记录为多个事务,每一条SQL语句对应一个event group。id=1应用于从库遇到Duplicate entry错误,我们使用sql_slave_skip_counter跳过这个event之后,已经到了此group的末尾。SQL thread直接从下一个event group开始,这里就是repl_myisam.id=2的那条语句。因此在从库会有id=2的记录~
实际它在执行第一条insert语句后,从库就报1062错误;前面的transactional tables需要在事务commit后从库才报错
[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=2284 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 2284 #171201 10:30:31 server id 1323306 end_log_pos 2349 CRC32 0x5d208979 Anonymous_GTID last_committed=6 sequence_number=7 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 2349 #171201 10:30:31 server id 1323306 end_log_pos 2426 CRC32 0xe4ce4da8 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095431/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1436549152/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; ==================== repl_myisam表写入id=1的记录Start ==================== BEGIN /*!*/; # at 2426 #171201 10:30:31 server id 1323306 end_log_pos 2491 CRC32 0x76a45e15 Table_map: `replcrash`.`repl_myisam` mapped to number 261 # at 2491 #171201 10:30:31 server id 1323306 end_log_pos 2547 CRC32 0xd187097a Write_rows: table id 261 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 2547 #171201 10:30:31 server id 1323306 end_log_pos 2625 CRC32 0xc8210551 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095431/*!*/; COMMIT /*!*/; # at 2625 ==================== repl_myisam表写入id=1的记录End ==================== #171201 10:30:44 server id 1323306 end_log_pos 2690 CRC32 0x22b268fd Anonymous_GTID last_committed=7 sequence_number=8 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 2690 #171201 10:30:44 server id 1323306 end_log_pos 2767 CRC32 0x43061ce5 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095444/*!*/; ==================== repl_myisam表写入id=2的记录Start ==================== BEGIN /*!*/; # at 2767 #171201 10:30:44 server id 1323306 end_log_pos 2832 CRC32 0xe1c084b9 Table_map: `replcrash`.`repl_myisam` mapped to number 261 # at 2832 #171201 10:30:44 server id 1323306 end_log_pos 2888 CRC32 0x56bacb73 Write_rows: table id 261 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 2888 #171201 10:30:44 server id 1323306 end_log_pos 2966 CRC32 0x6527c3b6 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512095444/*!*/; COMMIT /*!*/; ==================== repl_myisam表写入id=2的记录End ==================== SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ZST1 logs]#
3.4、一个事务中包含事务表和非事务表操作
为了方便,我将表中数据置为初始状态,主库两表为空,从库两表各有id=1的记录
主库往事务表和非事务表中添加数据
# 主库往事务表、非事务表中添加数据 mydba@192.168.85.132,3306 [replcrash]> begin; mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1'); mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(2,'m1062-2','m1062-2'); mydba@192.168.85.132,3306 [replcrash]> commit; mydba@192.168.85.132,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+ mydba@192.168.85.132,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
根据前面的分析,我们知道从库上的repl_innodb、repl_myisam表都存在key冲突(1062错误)
我们尝试使用sql_slave_skip_counter跳过错误(实际遇到1062写入key冲突,我们应该根据 Duplicate entry 删除从库对应记录)
# 从库跳过“1个”错误,并启动sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | +----+---------+---------+ mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
从库repl_innodb表暂时没有操作;从库repl_myisam表跳过id=1的记录,复制了id=2的记录
注意:此时跳过的是replcrash.repl_myisam上的Duplicate entry错误,对于非事务表一条SQL语句对应一个event group,SQL thread直接从下一个event group开始,这里就是repl_myisam.id=2的那条语句。因此从库repl_myisam表会有id=2的记录~
紧接着复制又会报key冲突(1062错误),因为还有repl_innodb.id=1这个key,我们继续跳过
# 从库跳过“1个”错误,并启动sql_thread mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | +----+---------+---------+ mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam; +----+---------+---------+ | id | name1 | name2 | +----+---------+---------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+---------+---------+
从库repl_innodb表跳过id=1的记录,还跳过了id=2的记录;从库repl_myisam表暂时没有操作
主库上执行的语句明明是先insert repl_innodb,再insert repl_myisam,为什么sql_slave_skip_counter是先跳过repl_myisam表上的错误,再跳过repl_innodb上的错误?
这就要从事务表和非事务表的区别去分析,主库显式的在一个事务中操作事务表+非事务表,实际上所有对事务表的操作是在同一个显式事务中;所有对非事务表的操作,每条SQL语句单独对应一个事务。因此主库上的操作可理解成下面操作:
开启显式事务1,往repl_innodb表写入id=1、2两条记录-->开启事务2,往repl_myisam表写入id=1记录,提交事务2-->开启事务3,往repl_myisam表写入id=2记录,提交事务3-->提交显式事务1
当事务2提交后,从库报repl_myisam上的Duplicate entry错误;我们跳过这个错误,当事务3提交后,从库写入repl_myisam.id=2的记录;当事务1提交后,从库报repl_innodb上的Duplicate entry错误;我们再跳过这个错误,复制就正常了~
我们看下对应的binlog
[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=2966 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 2966 #171201 10:54:02 server id 1323306 end_log_pos 3031 CRC32 0x9a009a72 Anonymous_GTID last_committed=8 sequence_number=9 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 3031 #171201 10:54:02 server id 1323306 end_log_pos 3108 CRC32 0x9738837a Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096842/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1436549152/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; ==================== repl_myisam表写入id=1的记录Start ==================== BEGIN /*!*/; # at 3108 #171201 10:54:02 server id 1323306 end_log_pos 3173 CRC32 0x8c4283c5 Table_map: `replcrash`.`repl_myisam` mapped to number 265 # at 3173 #171201 10:54:02 server id 1323306 end_log_pos 3229 CRC32 0xd8953aae Write_rows: table id 265 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 3229 #171201 10:54:02 server id 1323306 end_log_pos 3307 CRC32 0x218fdb23 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096842/*!*/; COMMIT /*!*/; # at 3307 ==================== repl_myisam表写入id=1的记录End ==================== #171201 10:54:36 server id 1323306 end_log_pos 3372 CRC32 0x0ba119ad Anonymous_GTID last_committed=9 sequence_number=10 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 3372 #171201 10:54:36 server id 1323306 end_log_pos 3449 CRC32 0x9bcdeee5 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096876/*!*/; ==================== repl_myisam表写入id=2的记录Start ==================== BEGIN /*!*/; # at 3449 #171201 10:54:36 server id 1323306 end_log_pos 3514 CRC32 0xd52491e6 Table_map: `replcrash`.`repl_myisam` mapped to number 265 # at 3514 #171201 10:54:36 server id 1323306 end_log_pos 3570 CRC32 0x23bcd75d Write_rows: table id 265 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_myisam` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 3570 #171201 10:54:36 server id 1323306 end_log_pos 3648 CRC32 0x3ba9a1a1 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096876/*!*/; COMMIT /*!*/; # at 3648 ==================== repl_myisam表写入id=2的记录End ==================== #171201 10:54:41 server id 1323306 end_log_pos 3713 CRC32 0x122cdb79 Anonymous_GTID last_committed=10 sequence_number=11 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/; # at 3713 #171201 10:53:22 server id 1323306 end_log_pos 3790 CRC32 0x68d45d7b Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1512096802/*!*/; ==================== repl_innodb表写入id=1、2的记录,在一个事务中Start ==================== BEGIN /*!*/; # at 3790 #171201 10:53:22 server id 1323306 end_log_pos 3855 CRC32 0xf4359a8d Table_map: `replcrash`.`repl_innodb` mapped to number 264 # at 3855 #171201 10:53:22 server id 1323306 end_log_pos 3911 CRC32 0x9975aac8 Write_rows: table id 264 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=1 ### @2='m1062-1' ### @3='m1062-1' # at 3911 #171201 10:53:30 server id 1323306 end_log_pos 3976 CRC32 0xc5ac7f71 Table_map: `replcrash`.`repl_innodb` mapped to number 264 # at 3976 #171201 10:53:30 server id 1323306 end_log_pos 4032 CRC32 0x1ad72c78 Write_rows: table id 264 flags: STMT_END_F ### INSERT INTO `replcrash`.`repl_innodb` ### SET ### @1=2 ### @2='m1062-2' ### @3='m1062-2' # at 4032 #171201 10:54:41 server id 1323306 end_log_pos 4063 CRC32 0x4f265b37 Xid = 83 COMMIT/*!*/; ==================== repl_innodb表写入id=1、2的记录,在一个事务中End ==================== SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@ZST1 logs]#
可以看到binlog和我们的分析一致,也从侧面说明binlog是按事务提交顺序写入的(Undo按事务发生顺序写入)
3.5、N应该设多大
# 主库往非事务表中添加数据 mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1');
我们习惯性在skip错误的时候,将N设成1。潜移默化地认为1就是跳过一个错误,管它一个event、一个event group、一条SQL语句、一个事务,反正它就是跳过去了。
根据官方解释,这条对非事务表的insert操作语句,对应一个event group,它里面实际有多个events,而不是只有一个event!
使用set global sql_slave_skip_counter=1,跳过一个event,由于它还在event group中,它会继续跳过此group中的后续events!如果只看表象的话,还真以为它在binlog中只有一个event(⊙_⊙)
可以用下面方法验证,它不只对应一个event
# 从库设置sql_slave_skip_counter mydba@192.168.85.133,3306 [replcrash]> stop slave sql_thread; mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=100; mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread; # 主库往非事务表中添加数据(主库空表,从库存在id=1的记录) mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1'); # 从库查看Skip_Counter计数 mydba@192.168.85.133,3306 [replcrash]> pager grep Skip_Counter; mydba@192.168.85.133,3306 [replcrash]> show slave status\G Skip_Counter: 95
在从库设置跳过100个events,然后在主库执行这条语句,再到从库查看show slave status\G返回的Skip_Counter列,你会发现它并不是由100变成99,至于一条语句到底对应多少个events,得自行脑补●-●
不要以为在3.4中set global sql_slave_skip_counter=3;就能跳过repl_myisam.id=1、repl_myisam.id=2、repl_innodb.id=1对应的三条SQL语句
四、总结
写了那么多,感觉也在玩文字游戏。 sql_slave_skip_counter以event为单位skip,直到skip完第N个event所在的event group才停止。对于事务表,一个event group对应一个事务;对于非事务表,一个event group对应一条SQL语句。一个event group包含多个events。
delete在从库找不到对应行,sql_slave_skip_counter可能省事,但极有可能跳过其他events,导致主从数据不一致。对于1032、1062错误尽量修补数据,让复制进程在从库应用变更。