MySQL-MHA是日本MySQL专家用Perl写的一套MySQL故障切换方案,它有Node、Manager两种角色。Node需要安装在所有MySQL服务器不管主还是从,Manager运行在独立服务器。
环境:centos6.5
硬件:四台虚拟机,如下
manager:192.168.1.10
db1 (主) :192.168.1.11
db2 (备) :192.168.1.12
db3 (从) :192.168.1.13
其中Manager是管理节点,db1是主库,db2是Master备节点,db3是从库。
(2) 配置复制用户
db1主库、db2备库(db2也要授权,因为这个是备用master):
mysql> grant replication slave on *.* to 'repl'@'%' identified by '123456';
mysql> flush privileges;
查看db1主库状态,并记录下pos,在从库配置中会用到
mysql> show master status\G;
5、验证mysql复制是否成功
shell> masterha_check_repl --conf=/usr/local/mha/mha.conf
Wed Aug 26 17:40:19 2015 - [info] Checking replication health on 192.168.1.12..
Wed Aug 26 17:40:19 2015 - [info] ok.
Wed Aug 26 17:40:19 2015 - [info] Checking replication health on 192.168.1.13.
Wed Aug 26 17:40:19 2015 - [info] ok.
Wed Aug 26 17:40:19 2015 - [warning] master_ip_failover_script is not defined.
Wed Aug 26 17:40:19 2015 - [warning] shutdown_script is not defined.
Wed Aug 26 17:40:19 2015 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
※如果这里碰到问题,在所有node节点下查看路径是否正确
[root@localhost]# which apply_diff_relay_logs
然后在每个node节点执行下面的命令,做个软连接
[root@localhost]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
[root@localhost]# ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
6、启动mha管理节点
[root@localhost]# nohup masterha_manager --conf=/usr/local/mha/mha.conf > /tmp/mha_manager.log < /dev/null 2>&1
7、查看mha状态
[root@localhost]# masterha_check_status --conf=/usr/local/mha/mha.conf
[root@localhost]# app1 (pid:26683) is running(0:PING_OK), master:192.168.1.11
结果说明MHA运行正常,正在监控master是否正常工作
也可以在app1的配置文件对manager_log参数设置的路径中找到manager.log文件,查看状态
Mon Aug 31 10:39:21 2015 - [info] Slaves settings check done.
Mon Aug 31 10:39:21 2015 - [info]
192.168.1.11(192.168.1.11:3306) (current master)
+--192.168.1.12(192.168.1.12:3306)
+--192.168.1.13(192.168.1.13:3306)
五、测试MHA
1、在db1(192.168.1.11)上停止mysqld服务
[root@localhost]# server mysqld stop
2、manager.log,查看状态
......
----- Failover Report -----
app1: MySQL Master failover 192.168.1.11(192.168.1.11:3306) to 192.168.1.12(192.168.1.12:3306) succeeded
Master 192.168.1.11(192.168.1.11:3306) is down!
Check MHA Manager logs at localhost.mhaMaster:/var/log/mha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.1.12(192.168.1.12:3306) has all relay logs for recovery.
Selected 192.168.1.12(192.168.1.12:3306) as a new master.
192.168.1.12(192.168.1.12:3306): OK: Applying all logs succeeded.
192.168.1.13(192.168.1.13:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.13(192.168.1.13:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.1.12(192.168.1.12:3306)
192.168.1.12(192.168.1.12:3306): Resetting slave info succeeded.
Master failover to 192.168.1.12(192.168.1.12:3306) completed successfully.
log文件说明192.168.1.11已经关闭,192.168.1.12升为master,192.168.1.13是192.168.1.12的从库,故障转移正常结束。
※如果在最后出现Lastfailover was done一类的问题,删除掉app1.failover.complete,测试3台机器数据一致性,然后从步骤1开始测试
3、在192.168.1.12插入数据,192.168.1.13也同时显示数据,测试完毕。
注意:当主DB故障,切换到另外的服务器上后,主DB恢复了,如果想继续作为主,得重新部署;如果只是加入到集群,那么可以作为备库,使用change master加入到集群。 而且当发生一次切换后,管理节点的监控进程就会自动退出,需要手动启动或者用脚本来自动启动。另外还得删除app1.failover.complete这个文件,否则新的主DB出现问题MHA就不会切换了。
MySQL数据库主从复制在缺省情况下从库的relay logs会在SQL线程执行完毕后被自动删除,但是对于MHA场景下,对于某些滞后从库的恢复依赖于其他从库的relay log,因此采取禁用自动删除功能以及定期清理的办法。对于清理过多过大的relay log需要注意引起的复制延迟资源开销等。MHA可通过purge_relay_logs脚本及配合cronjob来完成此项任务。