一、环境 三台服务器
ip db01 10.0.0.51
ip db02 10.0.0.52
ip db03 10.0.0.53
把MHA文件安装包分别传入三台服务器中并进行解压。
MHA搭建
配置关键程序软连接
10.0.0.51/52/53:
ln -s /application/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /application/mysql/bin/mysql /usr/bin/mysql
配置各节点互信(SSH协议)
在db01上创建秘钥:
rm -rf /root/.ssh
ssh-keygen #生成秘钥 一直回车
cd /root/.ssh
mv id_rsa.pub authorized_keys
scp -r /root/.ssh 10.0.0.52:/root
scp -r /root/.ssh 10.0.0.53:/root
第二种方法在各节点都分发秘钥
cat /server/scripts/ssh.sh #!/bin/bash
rm -rf ~/.ssh/*
#make key pair
ssh-keygen -t dsa -f ~/.ssh/id_dsa -P '' #fenfa public key
for ip in 51 52 53
do
sshpass -p123456 ssh-copy-id -o StrictHostKeyChecking=no 10.0.0.$ip; done
各节点验证
db01:
ssh 10.0.0.51 date ssh 10.0.0.52 date ssh 10.0.0.53 date db02:
ssh 10.0.0.51 date ssh 10.0.0.52 date ssh 10.0.0.53 date db03:
ssh 10.0.0.51 date ssh 10.0.0.52 date ssh 10.0.0.53 date
二、安装软件
下载mha软件
mha官网:https://code.google.com/archive/p/mysql-master-ha/ github下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
所有节点安装Node软件依赖包
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
在db01主库中创建mha需要的用户
grant all privileges on *.* to mha@'10.0.0.%' identified by 'mha';
Manager软件安装(db03)
yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
三、配置文件准备(db03)
创建配置文件目录
mkdir -p /etc/mha
创建日志目录
mkdir -p /var/log/mha/app1
编辑mha配置文件
vim /etc/mha/app1.cnf
[server default] manager_log=/var/log/mha/app1/manager manager_workdir=/var/log/mha/app1 master_binlog_dir=/data/binlog user=mha
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
[server1]
hostname=10.0.0.51
port=3306
[server2]
hostname=10.0.0.52
port=3306
[server3]
hostname=10.0.0.53
port=3306
四、状态检查
互信检查
masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:39:34 2019 from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from [email protected](10.0.0.51:22) to [email protected](10.0.0.53:22).. Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:36 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected](10.0.0.52:22) to [email protected](10.0.0.51:22).. Fri Apr 19 16:39:35 2019 - [debug] ok.
2019
- [info] Reading application default configuration - [info] Reading server configuration from
- [info] Starting SSH connection tests.. - [debug]
- [debug] Connecting via SSH from
Fri Apr 19 16:39:34
/etc/mha/app1.cnf..
Fri Apr 19 16:39:34
Fri Apr 19 16:39:35
Fri Apr 19 16:39:34
[email protected](10.0.0.51:22) to [email protected](10.0.0.52:22).. Fri Apr 19 16:39:34 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected](10.0.0.52:22) to [email protected](10.0.0.53:22).. Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected](10.0.0.53:22) to [email protected](10.0.0.51:22).. Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected](10.0.0.53:22) to [email protected](10.0.0.52:22).. Fri Apr 19 16:39:36 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.
主从状态检查
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 - Version=5.7.20-log (oldest
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 - 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - Version=5.7.20-log (oldest
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 - 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 - 10.0.0.52(10.0.0.52:3306).
Fri Apr 19 16:40:51 2019 - 10.0.0.53(10.0.0.53:3306).
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 -
Fri Apr 19 16:40:51 2019 - Skipping all SSH and Node package Fri Apr 19 16:40:51 2019 - [info] settings on the current master.. Fri Apr 19 16:40:51 2019 - [info] reachable.
10.0.0.51(10.0.0.51:3306) 10.0.0.52(10.0.0.52:3306) 10.0.0.53(10.0.0.53:3306)
[info]
[info]
[info]
[info]
[info]
major version between slaves) log-bin:enabled
[info]
[info]
[info]
[info]
[info]
[info]
GTID ON
Replicating from
Current Alive Master:
Checking slave configurations.. read_only=1 is not set on slave
read_only=1 is not set on slave
Checking replication filtering settings.. binlog_do_db= , binlog_ignore_db= Replication filtering check ok.
GTID (with auto-pos) is supported. checking.
Checking SSH publickey authentication
HealthCheck: SSH to 10.0.0.51 is
[info]
[info]
[info]
[info]
Alive Slaves: 10.0.0.52(10.0.0.52:3306)
[info]
[info]
[info]
major version between slaves) log-bin:enabled
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.51(10.0.0.51:3306) (current master)
+--10.0.0.52(10.0.0.52:3306) +--10.0.0.53(10.0.0.53:3306)
Fri Apr 19 16:40:51 2019 - [info] Checking
Fri Apr 19 16:40:51 2019 - [info] ok.
Fri Apr 19 16:40:51 2019 - [info] Checking
Fri Apr 19 16:40:51 2019 - [info] ok.
Fri Apr 19 16:40:51 2019 - [warning] master_ip_failover_script is not defined.
Fri Apr 19 16:40:51 2019 - [warning] shutdown_script is not defined. Fri Apr 19 16:40:51 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
四、开启MHA(db03):
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf -- ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
五、查看MHA状态(db03):
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:4719) is running(0:PING_OK), master:10.0.0.51
六、测试
把10.0.0.51主库down掉
/etc/init.d/mysqld stop
去10.0.0.52从库中查询状态
show slave status\G
显示为空
再去10.0.0.53从库中查询状态
show slave status\G
发现52变成了主库
这就说明mha作用生效了。
七、恢复
(1)修复主库
/etc/init.d/mysqld start
(2)修复主从
mysql -e "change master to master_host='10.0.0.52',master_user='repl',master_password='123' ,MASTER_AUTO_POSITION=1;"
mysql -e "start slave;"
[root@db01 ~]# mysql -e "show slave status \G"|grep Running:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
(3) 恢复配置文件
[root@db03 /etc/mha]# cat app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/binlog
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
user=mha
[server1]
hostname=10.0.0.51
port=3306
[server2]
hostname=10.0.0.52
port=3306
[server3]
hostname=10.0.0.53
port=3306
(4)启动MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
五、 Manager额外参数介绍
说明:
主库宕机谁来接管?
- 所有从节点日志都是一致的,默认会以配置文件的顺序去选择一个新主。
- 从节点日志不一致,自动选择最接近于主库的从库
- 如果对于某节点设定了权重(candidate_master=1),权重节点会优先选择。
但是此节点日志量落后主库100M日志的话,也不会被选择。可以配合check_repl_delay=0,关闭日志量的检查,强制选择候选节点。
(1) ping_interval=1
设置监控主库,发送ping包的时间间隔,尝试三次没有回应的时候自动进行failover
(2) candidate_master=1
设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
(3)check_repl_delay=0
默认情况下如果一个slave落后master 100M的relay logs的话,
MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master