RDS配置多可用区的时候,主库有自动failover的功能,具体生效时间以实例大小和配置有关。线上有一个业务之前一主两从都是用的RDS的实例,但是后来出于成本考虑想要把mysql从RDS迁移至EC2。由于不能停业务,最终决定主库继续放在RDS,在EC2上重做从库。
既然决定实施,那就要测一下主库failover的时候的对从库的影响。
1,申请一个配置有多可用区的测试实例:masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com
2,授权复制帐号和测试帐号(sysbench做测试)
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl'@'172.31.%' IDENTIFIED BY 'repl'; grant select,create,index,insert,update,delete,drop on *.* to hatest@'172.31.%' identified by '123456'; flush privileges;
3,得到主库上的binlog位置
>show master status\G *************************** 1. row *************************** File: mysql-bin-changelog.000023 Position: 5768524 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 1 row in set (0.01 sec)
4,在EC2准备一台从库,指定好主从关系
>CHANGE MASTER TO MASTER_HOST='masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin-changelog.000023', MASTER_LOG_POS=5768524, MASTER_USER='repl', MASTER_PASSWORD='repl'; Query OK, 0 rows affected, 2 warnings (0.02 sec) >start slave; Query OK, 0 rows affected (0.00 sec) >show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin-changelog.000023 Read_Master_Log_Pos: 5768524 Relay_Log_File: relay-log.000002 Relay_Log_Pos: 293 Relay_Master_Log_File: mysql-bin-changelog.000023 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: mysql.% Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 5768524 Relay_Log_Space: 460 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 459205842 Master_UUID: 46da27c8-007c-11e7-9b5b-06051bcfdaae Master_Info_File: /data/dbdata/mysqldata/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)
5,用sysbench往RDS写入数据的时候,通过重启实例来模拟failover
/usr/local/sysbench/bin/sysbench --mysql-host=masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com --mysql-port=3306 --mysql-user=hatest --mysql-password=123456 --mysql-db=failovertest --oltp-tables-count=5 --oltp-table-size=6000000 --max-requests=100000000 --test=/usr/local/sysbench/tests/db/select.lua prepare
6,此时连mysql的时候会报错,管理后台显示该实例正在修复中
$ mysql -uroot -p -hmasterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com ERROR 2003 (HY000): Can't connect to MySQL server on 'masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com' (110)
7,查看slave的复制状态是否正常。
>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin-changelog.000024 Read_Master_Log_Pos: 24121968 Relay_Log_File: relay-log.000004 Relay_Log_Pos: 24122141 Relay_Master_Log_File: mysql-bin-changelog.000024 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: mysql.% Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 24121968 Relay_Log_Space: 24122365 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 459205842 Master_UUID: 46da27c8-007c-11e7-9b5b-06051bcfdaae Master_Info_File: /data/dbdata/mysqldata/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)
多次执行show slave status\G命令发现复制状态都显示正常,但是位置一直卡在mysql-bin-changelog.000024 24121968这里不动,此时RDS上面的主库一直有用sysbench写数据进来。可以断定虽然复现状态显示正常,但是其实连接已经失效。
8,我们在RDS主库上面查看一下binary log的信息可以发现从库已经执行完了mysql-bin-changelog.000024日志的所有event,从日志偏移量可以看出mysql-bin-changelog.000024是failover之前的日志,mysql-bin-changelog.000025是failover之后的日志。
>show binary logs; +----------------------------+-----------+ | Log_name | File_size | +----------------------------+-----------+ | mysql-bin-changelog.000020 | 134243795 | | mysql-bin-changelog.000021 | 111694820 | | mysql-bin-changelog.000022 | 134243505 | | mysql-bin-changelog.000023 | 134244377 | | mysql-bin-changelog.000024 | 24121968 | | mysql-bin-changelog.000025 | 134244377 | | mysql-bin-changelog.000026 | 134243504 | | mysql-bin-changelog.000027 | 134243503 | | mysql-bin-changelog.000028 | 114316703 | +----------------------------+-----------+ 9 rows in set (0.01 sec)
9,怎样才能让从库继续从RDS主库拉日志呢,挠头想了一下之后给你顿时有了思路,由于EC2的从库连接RDS是用的域名,failover后这个域名已经指向了另外一个ip,而当前复制的连接一直没有释放,在是连的之前对应的ip。于是在从库上面执行stop slave;start slave;再查看复制状态时,发现已经正常,从库已经开始拉failover之后的日志了。
>show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: masterfailovertest.cojn00n8zy2d.us-east-1.rds.amazonaws.com Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin-changelog.000025 Read_Master_Log_Pos: 86524981 Relay_Log_File: relay-log.000006 Relay_Log_Pos: 63976502 Relay_Master_Log_File: mysql-bin-changelog.000025 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: mysql.% Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 63976329 Relay_Log_Space: 86525494 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 191 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 459205842 Master_UUID: 46da27c8-007c-11e7-9b5b-06051bcfdaae Master_Info_File: /data/dbdata/mysqldata/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: update Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)
测试完毕,由此可见主库放RDS,从库放EC2这样的形式是可行的,但是当检测到主库failover时需要马上在从库上面执行stop slave;start slave;命令,这样从库才不至于让从库落后太久。