参考文档: https://blog.csdn.net/sj349781478/article/details/79492895
在主库发生故障后,MHA会自动选一个最新二制日志更新的slave来接管master,在故障转移时,经常遇到的问题就是同步报错;在数据库很小的时候,利用mysqldump备份完再导入就可以,但线上的数据库都是150G-200G,如果用单纯的这种方法,耗费时间,成本太高。以下是我遇到的一些问题以及解决方案:
问题1:主键重复;在slave已经有该记录,又在master上插入了同一条记录。
Last_Error: Could not execute Write_rows event on table discuz.pre_common_syscache; Duplicate entry 'historyposts' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000002, end_log_pos 3157
mysql> show slave status\G;
解决方案:删除重复的主键
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| discuz |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> use discuz
Database changed
mysql> desc discuz.pre_common_syscache;
+----------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------------------+------+-----+---------+-------+
| cname | varchar(32) | NO | PRI | NULL | |
| ctype | tinyint(3) unsigned | NO | | NULL | |
| dateline | int(10) unsigned | NO | | NULL | |
| data | mediumblob | NO | | NULL | |
+----------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
#根据报错信息删除重复的主键
mysql> delete from discuz.pre_common_syscache where cname='historyposts';
Query OK, 1 row affected (0.10 sec)
mysql> start slave;
Query OK, 0 rows affected (0.05 sec)
测试:发现出现了新的问题
问题2:在master上删除一条记录,而slave上找不到。
Last_Error: Could not execute Delete_rows event on table discuz.pre_common_statuser; Can't find record in 'pre_common_statuser', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000002, end_log_pos 31059
mysql> show slave status\G;
mysql> set global sql_slave_skip_counter=1;
ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction
执行后发现新的报错:
问题3:不支持GTID_MODE 模式运行的数据库
ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction
解决方案:采用基于GTID模式的主从错误跳过
#在主库上,查看GTID序列号
mysql> show master status;
#在从库上,先重置master和slave,再手动同步GTID
mysql> reset master;
Query OK, 0 rows affected (0.13 sec)
mysql> stop slave;
Query OK, 0 rows affected (0.13 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.20 sec)
#注意:GTID序列号需要加1
mysql> set global gtid_purged='f721d89b-ca08-11e9-b097-525400bf22fd:1-348';
Query OK, 0 rows affected (0.04 sec)
mysql> start slave;
Query OK, 0 rows affected (0.12 sec)
测试:虽然 Slave_SQL_Running: Yes 但是发现出现了新问题:Slave_IO_Running: No
问题4:Slave_IO_Running: No
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica'
mysql> show slave status\G;
解决方案:
#在主库上,查看master的状态
mysql> show master status\G;
#刷新后,发现日志Pos号会 +1
mysql> flush logs;
Query OK, 0 rows affected (0.32 sec)
mysql> show master status\G;
#在从库上,先停止slave,再重新设置二进制日志文件和Pos号
mysql> stop slave;
Query OK, 0 rows affected (0.04 sec)
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006',MASTER_LOG_POS=234;
ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.
哦,NO!!! 执行后又发现了新的错误 ???
问题5:无法设置参数
ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.
解决方案:
mysql> change master to master_auto_position=0;
Query OK, 0 rows affected (0.25 sec)
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006',MASTER_LOG_POS=234;
Query OK, 0 rows affected (0.22 sec)
mysql> start slave;
Query OK, 0 rows affected (0.04 sec)
测试:此时发现SQL线程与IO线程终于连接成功咯~~~
mysql> show slave status\G;