MySQL简单MHA环境搭建

MySQL-MHA是日本MySQL专家用Perl写的一套MySQL故障切换方案,它有Node、Manager两种角色。Node需要安装在所有MySQL服务器不管主还是从,Manager运行在独立服务器。

环境:centos6.5
硬件:四台虚拟机,如下
manager:192.168.1.10
db1 (主)  :192.168.1.11
db2 (备)  :192.168.1.12
db3 (从)  :192.168.1.13
其中Manager是管理节点,db1是主库,db2是Master备节点,db3是从库。


一、安装MySQL
     步骤省略

二、搭建主从复制
1、配置my.cnf文件
     db1主库:     server-id = 1     log-bin=mysql-bin   
     db2备库:     server-id = 2     log-bin=mysql-bin     relay_log = mysql-relay-bin
     db3从库:     server-id = 3     log-bin=mysql-bin     relay_log = mysql-relay-bin

2、配置主从    
(1) 在3台库上配置mha用户
mysql> grant all privileges on *.* to 'mha_manager'@'%' identified by '123456';

(2) 配置复制用户
   db1主库、db2备库(db2也要授权,因为这个是备用master):
mysql> grant replication slave on *.* to 'repl'@'%' identified by '123456';
mysql> flush privileges;
    查看db1主库状态,并记录下pos,在从库配置中会用到
mysql> show master status\G;

    db2备库、db3从库:
mysql> CHANGE MASTER TO
          -> MASTER_HOST='192.168.1.10',
          -> MASTER_USER='repl',
          -> MASTER_PASSWORD='123456',
          -> MASTER_LOG_FILE='mysql-bin.000010',
          -> MASTER_LOG_POS=528;
    查看从库状态
mysql> show slave status\G;
    启动复制
mysql> start slave;

三、ssh-keygen实现四台主机之间相互免密钥登录
1、在db1执行ssh-keygen
shell> ssh-keygen -t rsa -b 2048

2、将公钥id_rsa.pub发送到其他3台机器
[root@localhost]# scp id_rsa.pub [email protected]:/root/.ssh/key
[root@localhost]# scp id_rsa.pub [email protected]:/root/.ssh/key
[root@localhost]# scp id_rsa.pub [email protected]:/root/.ssh/key

3、在接收端的3台机器的/root/.ssh目录下找到key文件,改名为authorized_keys

4、在其他3台机器重复步骤1-3,如果有已经存在的authorized_keys,将key文件下的密码拷贝到authorized_keys中。保证每台机器的authorized_keys都存在其他3台机器的公钥。

5、在每台机器测试ssh其他任意机器是否成功
[root@localhost]# ssh IP


四、安装MHA

1、Node角色:db1主、db2备、db3从
      yum install perl-DBD-MySQL
      这里可能遇到依赖问题,比如yum源是mysq5.1 lib,但是安装的是mysql5.6,就算卸载了mysq5.1 lib还是报错,方法是安装MySQL-shared-compat就可以了。
      yum install mha4mysql-node-0.56-0.el6.noarch.rpm

2、Manager角色:manager管理节点
# yum install perl cpan
# yum install perl-Config-Tiny
# yum install perl-Time-HiRes
# yum install -y rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate

wget http://rpmfind.net/linux/dag/redhat/el6/en/x86_64/dag/RPMS/perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm
wget ftp://rpmfind.net/linux/dag/redhat/el6/en/x86_64/dag/RPMS/perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm
wget ftp://rpmfind.net/linux/dag/redhat/el6/en/x86_64/dag/RPMS/perl-Mail-Sender-0.8.16-1.el6.rf.noarch.rpm
wget ftp://rpmfind.net/linux/dag/redhat/el6/en/x86_64/dag/RPMS/perl-Mail-Sendmail-0.79-1.2.el6.rf.noarch.rpm

rpm -ivh perl-Mail-Sender-0.8.16-1.el6.rf.noarch.rpm
rpm -ivh perl-Mail-Sendmail-0.79-1.2.el6.rf.noarch.rpm
yum localinstall perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm
yum localinstall perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm

3、建立mha文件夹和配置文件
shell> mkdir /etc/masterha
shell> vim /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1
#master_binlog_dir路径是master下的binlog文件地址
master_binlog_dir=/usr/local/mysql/data
remote_workdir=/var/log/mha/app1
user=mha_manager
password=123456
repl_user=repl
repl_password=123456
ssh_user=root
ping_interval=1

[server1]
hostname=192.168.1.11
#可以升为master
candidate_master=1

[server2]
hostname=192.168.1.12
#可以升为master
candidate_master=1
#默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,
check_repl_delay=0

[server3]
hostname=192.168.1.13
#不设置为master
no_master=1

4、验证ssh信任登录是否成功
shell> masterha_check_ssh --conf=/etc/masterha/app1.cnf
Tue Aug 25 15:40:11 2015 - [debug]   ok.
Tue Aug 25 15:40:11 2015 - [info] All SSH connection tests passed successfully.

5、验证mysql复制是否成功
shell> masterha_check_repl --conf=/usr/local/mha/mha.conf
Wed Aug 26 17:40:19 2015 - [info] Checking replication health on 192.168.1.12..
Wed Aug 26 17:40:19 2015 - [info]  ok.
Wed Aug 26 17:40:19 2015 - [info] Checking replication health on 192.168.1.13.
Wed Aug 26 17:40:19 2015 - [info]  ok.
Wed Aug 26 17:40:19 2015 - [warning] master_ip_failover_script is not defined.
Wed Aug 26 17:40:19 2015 - [warning] shutdown_script is not defined.
Wed Aug 26 17:40:19 2015 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

※如果这里碰到问题,在所有node节点下查看路径是否正确
[root@localhost]# which apply_diff_relay_logs
然后在每个node节点执行下面的命令,做个软连接  
[root@localhost]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
[root@localhost]# ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql


6、启动mha管理节点
[root@localhost]# nohup masterha_manager --conf=/usr/local/mha/mha.conf > /tmp/mha_manager.log  < /dev/null 2>&1


7、查看mha状态
[root@localhost]# masterha_check_status --conf=/usr/local/mha/mha.conf
[root@localhost]# app1 (pid:26683) is running(0:PING_OK), master:192.168.1.11
     结果说明MHA运行正常,正在监控master是否正常工作
    也可以在app1的配置文件对manager_log参数设置的路径中找到manager.log文件,查看状态
Mon Aug 31 10:39:21 2015 - [info] Slaves settings check done.
Mon Aug 31 10:39:21 2015 - [info]
192.168.1.11(192.168.1.11:3306) (current master)
 +--192.168.1.12(192.168.1.12:3306)
 +--192.168.1.13(192.168.1.13:3306)


、测试MHA
1、db1192.168.1.11)上停止mysqld服务
[root@localhost]# server mysqld stop

2、manager.log,查看状态
......
----- Failover Report -----

app1: MySQL Master failover 192.168.1.11(192.168.1.11:3306) to 192.168.1.12(192.168.1.12:3306) succeeded

Master 192.168.1.11(192.168.1.11:3306) is down!

Check MHA Manager logs at localhost.mhaMaster:/var/log/mha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.1.12(192.168.1.12:3306) has all relay logs for recovery.
Selected 192.168.1.12(192.168.1.12:3306) as a new master.
192.168.1.12(192.168.1.12:3306): OK: Applying all logs succeeded.
192.168.1.13(192.168.1.13:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.13(192.168.1.13:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.1.12(192.168.1.12:3306)
192.168.1.12(192.168.1.12:3306): Resetting slave info succeeded.
Master failover to 192.168.1.12(192.168.1.12:3306) completed successfully.

     log文件说明192.168.1.11已经关闭,192.168.1.12升为master,192.168.1.13192.168.1.12的从库,故障转移正常结束。

※如果在最后出现Lastfailover was done一类的问题,删除掉app1.failover.complete,测试3台机器数据一致性,然后从步骤1开始测试


3、在192.168.1.12插入数据,192.168.1.13也同时显示数据测试完毕。


注意:当主DB故障,切换到另外的服务器上后,主DB恢复了,如果想继续作为主,得重新部署;如果只是加入到集群,那么可以作为备库,使用change master加入到集群。 而且当发生一次切换后,管理节点的监控进程就会自动退出,需要手动启动或者用脚本来自动启动。另外还得删除app1.failover.complete这个文件,否则新的主DB出现问题MHA就不会切换了。
MySQL数据库主从复制在缺省情况下从库的relay logs会在SQL线程执行完毕后被自动删除,但是对于MHA场景下,对于某些滞后从库的恢复依赖于其他从库的relay log,因此采取禁用自动删除功能以及定期清理的办法。对于清理过多过大的relay log需要注意引起的复制延迟资源开销等。MHA可通过purge_relay_logs脚本及配合cronjob来完成此项任务。


你可能感兴趣的:(MySQL简单MHA环境搭建)