线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案

参考文档:   https://blog.csdn.net/sj349781478/article/details/79492895

在主库发生故障后,MHA会自动选一个最新二制日志更新的slave来接管master,在故障转移时,经常遇到的问题就是同步报错;在数据库很小的时候,利用mysqldump备份完再导入就可以,但线上的数据库都是150G-200G,如果用单纯的这种方法,耗费时间,成本太高。以下是我遇到的一些问题以及解决方案:

问题1:主键重复;在slave已经有该记录,又在master上插入了同一条记录。

Last_Error: Could not execute Write_rows event on table discuz.pre_common_syscache; Duplicate entry 'historyposts' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000002, end_log_pos 3157

mysql> show slave status\G;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第1张图片

解决方案:删除重复的主键

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| discuz             |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> use discuz
Database changed

mysql> desc discuz.pre_common_syscache;
+----------+---------------------+------+-----+---------+-------+
| Field    | Type                | Null | Key | Default | Extra |
+----------+---------------------+------+-----+---------+-------+
| cname    | varchar(32)         | NO   | PRI | NULL    |       |
| ctype    | tinyint(3) unsigned | NO   |     | NULL    |       |
| dateline | int(10) unsigned    | NO   |     | NULL    |       |
| data     | mediumblob          | NO   |     | NULL    |       |
+----------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

#根据报错信息删除重复的主键
mysql> delete from discuz.pre_common_syscache where cname='historyposts';
Query OK, 1 row affected (0.10 sec)

mysql> start slave;
Query OK, 0 rows affected (0.05 sec)

测试:发现出现了新的问题

问题2:在master上删除一条记录,而slave上找不到。

Last_Error: Could not execute Delete_rows event on table discuz.pre_common_statuser; Can't find record in 'pre_common_statuser', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000002, end_log_pos 31059

mysql> show slave status\G;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第2张图片
解决方案:直接跳过复制错误

mysql> set global sql_slave_skip_counter=1;
ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction

执行后发现新的报错:

问题3:不支持GTID_MODE 模式运行的数据库

ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction

解决方案:采用基于GTID模式的主从错误跳过

#在主库上,查看GTID序列号
mysql> show master status;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第3张图片

#在从库上,先重置master和slave,再手动同步GTID
mysql> reset master;
Query OK, 0 rows affected (0.13 sec)

mysql> stop slave; 
Query OK, 0 rows affected (0.13 sec)

mysql> reset slave; 
Query OK, 0 rows affected (0.20 sec)

#注意:GTID序列号需要加1
mysql> set global gtid_purged='f721d89b-ca08-11e9-b097-525400bf22fd:1-348';
Query OK, 0 rows affected (0.04 sec)

mysql>  start slave;
Query OK, 0 rows affected (0.12 sec)

测试:虽然 Slave_SQL_Running: Yes  但是发现出现了新问题:Slave_IO_Running: No

问题4:Slave_IO_Running: No

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica'

mysql> show slave status\G;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第4张图片

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第5张图片

解决方案:

#在主库上,查看master的状态
mysql> show master status\G;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第6张图片

#刷新后,发现日志Pos号会 +1
mysql> flush logs;
Query OK, 0 rows affected (0.32 sec)

mysql> show master status\G;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第7张图片

#在从库上,先停止slave,再重新设置二进制日志文件和Pos号
mysql> stop slave;
Query OK, 0 rows affected (0.04 sec)
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006',MASTER_LOG_POS=234;
ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.

哦,NO!!!  执行后又发现了新的错误  ???

问题5:无法设置参数

ERROR 1776 (HY000): Parameters MASTER_LOG_FILE, MASTER_LOG_POS, RELAY_LOG_FILE and RELAY_LOG_POS cannot be set when MASTER_AUTO_POSITION is active.

解决方案:

mysql> change master to master_auto_position=0;
Query OK, 0 rows affected (0.25 sec)

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006',MASTER_LOG_POS=234;
Query OK, 0 rows affected (0.22 sec)

mysql> start slave;
Query OK, 0 rows affected (0.04 sec)

测试:此时发现SQL线程与IO线程终于连接成功咯~~~

mysql> show slave status\G;

线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案_第8张图片

你可能感兴趣的:(线上完成主从复制后,在MHA切换时,MySQL主从复制 出现各种 SQL / IO 报错 --- 解决方案)