MySQL高可用架构之MHA(完整版)
1.准备4台机器,做好以下配置
(1)ip与hostname对应关系
192.168.100.128 manager.test.com
192.168.100.129 master.test.com
192.168.100.130 slave01.test.com
192.168.100.131 slave02.test.com
(2)配置hostname和hosts解析
hostnamectl set-hostname manager.test.com
hostnamectl set-hostname master.test.com
hostnamectl set-hostname slave01.test.com
hostnamectl set-hostname slave02.test.com
[root@localhost ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.128 manager.test.com
192.168.100.129 master.test.com
192.168.100.130 slave01.test.com
192.168.100.131 slave02.test.com
scp /etc/hosts root@192.168.100.129:/etc/
scp /etc/hosts root@192.168.100.130:/etc/
scp /etc/hosts root@192.168.100.131:/etc/
(3)4台机器完成ssh互信任,免密码登录
ssh-keygen
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
cat /root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@192.168.100.129:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@192.168.100.130:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@192.168.100.131:/root/.ssh/authorized_keys
2.完成MySQL 1主2从的搭建配置
(1)上传mysql-8.0.15-el7-x86_64.tar.gz到4台机器上
(2)在master、slave01、slave02上安装mysql-server,可以使用我写的安装脚本快速搭建,manager上只需要配置好MySQL的环境变量即可
vim install_mysql.sh
#!/bin/bash
mysql_package=mysql-8.0.15-el7-x86_64.tar.gz
mysql_file=mysql-8.0.15-el7-x86_64
#yum eraser mariadb-lib
yum erase -y mariadb*
#create mysql user
useradd -r -s /sbin/nologin mysql
#install mysql server
tar -zxvf $mysql_package
mv $mysql_file /usr/local/
mv /usr/local/$mysql_file /usr/local/mysql
#create data log dir
mkdir -p /usr/local/mysql/{data,log}
#change owner and group
chown -R mysql:mysql /usr/local/mysql
#create configure file to /etc/my.cnf
echo "[mysqld]
port = 3306
mysqlx_port = 33060
mysqlx_socket = /tmp/mysqlx.sock
datadir = /usr/local/mysql/data
socket = /tmp/mysql.sock
pid-file = /tmp/mysqld.pid
log-error = /usr/local/mysql/log/error.log
slow-query-log = 1
slow-query-log-file = /usr/local/mysql/log/slow.log
long_query_time = 1
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect ='SET NAMES utf8mb4'
innodb_buffer_pool_size = 512M
join_buffer_size = 128M
sort_buffer_size = 2M
read_rnd_buffer_size = 2M
log_timestamps = SYSTEM
lower_case_table_names = 1
default-authentication-plugin =mysql_native_password
#我这里是基于GTID的主从复制,所以加上下面这2行
gtid_mode = ON
enforce_gtid_consistency = ON
#在slave01上需要加上以下配置
#server_id = 2
relay_log_purge = 0
#在slave02上需要加上以下配置
#server_id = 3
relay_log_purge = 0 >> /etc/my.cnf
#initial mysql server
cd /usr/local/mysql && bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data
#get tmp password
tmp_password=`cat /usr/local/mysql/log/error.log | grep root@localhost | awk -F ':' '{print $5}'`
echo $tmp_password
#add var
echo "PATH=\$PATH:\$HOME/bin:/usr/local/mysql/bin">> /etc/profile
source /etc/profile
#copy start file
cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql
#start mysql server
/etc/init.d/mysql start
chkconfig mysql on
#脚本完成
sh install_mysql.sh
(3)基本配置
#到这里基本就安装完成,还需要完成,3台MySQL都需要进行这样的配置
source /etc/profile
mysql -u root -p
#这里要输入刚才日志里显示的临时密码
#还需要修改下密码
mysql> alter user root@localhost identified by 'dddddd';
#再创建一个可以远程的用户,给个超级管理员的权限吧.
mysql> create user 'root'@'192.168.100.%' identified by 'dddddd';
mysql> grant all on *.* to 'root'@'192.168.100.%';
#在master上创建一个同步用户
mysql> create user 'rep'@'192.168.100.%' identified by 'dddddd';
mysql> grant replication slave on *.* to 'root'@'192.168.100.%';
#在slave01、slave02上配置同步
mysql> change master to master_host='192.168.91.129',master_port=3306,master_user='rep',master_password='dddddd',master_auto_position=1;
mysql> start slave;
#测试同步是否正常
过程略
在manager上配置一个MySQL的环境变量
tar -zxvf mysql-8.0.15-el7-x86_64.tar.gz
mv mysql-8.0.15-el7-x86_64 /usr/local/mysql
vim /etc/profile
PATH=$PATH:$HOME/bin:/usr/local/mysql/bin
source /etc/profile
PATH=$PATH:$HOME/bin:/usr/local/mysql/bin
4.测试manager能否远程连接MySQL
mysql -u root -h 192.168.100.129 -pdddddd -e "select version()"
mysql -u root -h 192.168.100.130 -pdddddd -e "select version()"
mysql -u root -h 192.168.100.131 -pdddddd -e "select version()"
能显示版本号就行了
5.安装mha的node包,4个节点都要安装
yum install -y perl-DBD-MySQL epel-release
rpm -ivh mha4mysql-node-0.58-0.noarch.rpm
6.manager节点安装manager包
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y
rpm -ivh mha4mysql-manager-0.58-0.el7.noarch.rpm
7.四个机器都做软链接
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
8.在manager上编辑配置文件
vim /etc/app1.cnf
[server default]
user=root
password=dddddd
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
repl_user=rep
repl_password=dddddd
master_ip_failover_script=/usr/local/bin/master_ip_failover
#这个就是故障转移脚本
[server1]
hostname=192.168.100.129
port=3306
master_binlog_dir=/usr/local/mysql/data
[server2]
hostname=192.168.100.130
port=3306
master_binlog_dir=/usr/local/mysql/data
[server3]
hostname=192.168.100.131
#这个地方还是用IP吧,不然后边配置故障转移脚本会有问题
port=3306
master_binlog_dir=/usr/local/mysql/data
9.编辑故障转移脚本
vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#就修改这4行
my $vip = '192.168.100.125/24';#这里就是定义一个VIP,后边要在master上绑定
my $key = '2';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
10.在master上增加VIP的配置
在master上配置IP
[root@master ~]# ifconfig ens32:2 192.168.100.125/24
[root@master ~]# ip a
1: lo: ,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens32: ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:50:56:22:a3:c2 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.129/24 brd 192.168.100.255 scope global noprefixroute dynamic ens32
valid_lft 1152sec preferred_lft 1152sec
inet 192.168.100.125/24 brd 192.168.100.255 scope global secondary ens32:2
valid_lft forever preferred_lft forever
inet6 fe80::9f2c:dfc9:2d00:b69c/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::19ab:a4b0:a76a:ee1f/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::88c4:652b:3291:d3c1/64 scope link noprefixroute
valid_lft forever preferred_lft forever
注意这个IP是写在内存的,记得不要重启网卡,这里得写自己的网卡名称
11.在manager上测试SSH配置是否正常
[root@localhost ~]# masterha_check_ssh --conf=/etc/app1.cnf
Wed Apr 24 18:29:35 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Apr 24 18:29:35 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Wed Apr 24 18:29:35 2019 - [info] Reading server configuration from /etc/app1.cnf..
Wed Apr 24 18:29:35 2019 - [info] Starting SSH connection tests..
Wed Apr 24 18:29:36 2019 - [debug]
Wed Apr 24 18:29:35 2019 - [debug] Connecting via SSH from root@slave01.test.com(192.168.100.130:22) to root@master.test.com(192.168.100.129:22)..
Wed Apr 24 18:29:36 2019 - [debug] ok.
Wed Apr 24 18:29:36 2019 - [debug] Connecting via SSH from root@slave01.test.com(192.168.100.130:22) to root@slave02.test.com(192.168.100.131:22)..
Wed Apr 24 18:29:36 2019 - [debug] ok.
Wed Apr 24 18:29:36 2019 - [debug]
Wed Apr 24 18:29:35 2019 - [debug] Connecting via SSH from root@master.test.com(192.168.100.129:22) to root@slave01.test.com(192.168.100.130:22)..
Wed Apr 24 18:29:36 2019 - [debug] ok.
Wed Apr 24 18:29:36 2019 - [debug] Connecting via SSH from root@master.test.com(192.168.100.129:22) to root@slave02.test.com(192.168.100.131:22)..
Wed Apr 24 18:29:36 2019 - [debug] ok.
Wed Apr 24 18:29:37 2019 - [debug]
Wed Apr 24 18:29:36 2019 - [debug] Connecting via SSH from root@slave02.test.com(192.168.100.131:22) to root@master.test.com(192.168.100.129:22)..
Wed Apr 24 18:29:36 2019 - [debug] ok.
Wed Apr 24 18:29:36 2019 - [debug] Connecting via SSH from root@slave02.test.com(192.168.100.131:22) to root@slave01.test.com(192.168.100.130:22)..
Wed Apr 24 18:29:36 2019 - [debug] ok.
Wed Apr 24 18:29:37 2019 - [info] All SSH connection tests passed successfully.
成功就行了,如果有问题,检查互信任配置.
12.在manager上测试MySQL复制是否正常
[root@manager ~]# masterha_check_repl --conf=/etc/app1.cnf
Mon May 20 14:31:59 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 20 14:31:59 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 20 14:31:59 2019 - [info] Reading server configuration from /etc/app1.cnf..
Mon May 20 14:31:59 2019 - [info] MHA::MasterMonitor version 0.58.
Mon May 20 14:32:01 2019 - [info] GTID failover mode = 1
Mon May 20 14:32:01 2019 - [info] Dead Servers:
Mon May 20 14:32:01 2019 - [info] Alive Servers:
Mon May 20 14:32:01 2019 - [info] master.test.com(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info] slave01.test.com(192.168.100.130:3306)
Mon May 20 14:32:01 2019 - [info] slave02.test.com(192.168.100.131:3306)
Mon May 20 14:32:01 2019 - [info] Alive Slaves:
Mon May 20 14:32:01 2019 - [info] slave01.test.com(192.168.100.130:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 14:32:01 2019 - [info] GTID ON
Mon May 20 14:32:01 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info] slave02.test.com(192.168.100.131:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 14:32:01 2019 - [info] GTID ON
Mon May 20 14:32:01 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info] Current Alive Master: master.test.com(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info] Checking slave configurations..
Mon May 20 14:32:01 2019 - [info] read_only=1 is not set on slave slave01.test.com(192.168.100.130:3306).
Mon May 20 14:32:01 2019 - [info] read_only=1 is not set on slave slave02.test.com(192.168.100.131:3306).
Mon May 20 14:32:01 2019 - [info] Checking replication filtering settings..
Mon May 20 14:32:01 2019 - [info] binlog_do_db= , binlog_ignore_db=
Mon May 20 14:32:01 2019 - [info] Replication filtering check ok.
Mon May 20 14:32:01 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon May 20 14:32:01 2019 - [info] Checking SSH publickey authentication settings on the current master..
Mon May 20 14:32:01 2019 - [info] HealthCheck: SSH to master.test.com is reachable.
Mon May 20 14:32:01 2019 - [info]
master.test.com(192.168.100.129:3306) (current master)
+--slave01.test.com(192.168.100.130:3306)
+--slave02.test.com(192.168.100.131:3306)
Mon May 20 14:32:01 2019 - [info] Checking replication health on slave01.test.com..
Mon May 20 14:32:01 2019 - [info] ok.
Mon May 20 14:32:01 2019 - [info] Checking replication health on slave02.test.com..
Mon May 20 14:32:01 2019 - [info] ok.
Mon May 20 14:32:01 2019 - [info] Checking master_ip_failover_script status:
Mon May 20 14:32:01 2019 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=master.test.com --orig_master_ip=192.168.100.129 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===
Checking the Status of the script.. OK
Mon May 20 14:32:01 2019 - [info] OK.
Mon May 20 14:32:01 2019 - [warning] shutdown_script is not defined.
Mon May 20 14:32:01 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
13.启动mha
[root@manager ~]# nohup masterha_manager --conf=/etc/app1.cnf >/var/log/masterha/app1/mha_manager.log < /dev/null &
[1] 25455
[root@manager ~]# nohup: redirecting stderr to stdout
[root@manager ~]# cd /var/log/masterha/app1/
[root@manager app1]# ls
app1.master_status.health mha_manager.log
[root@manager app1]# pwd
/var/log/masterha/app1
[root@manager app1]# tailf app1.master_status.health
25455 0:PING_OK master:master.test.com
14.测试故障转移,以及VIP漂移
这里我们要停止master的MySQL
[root@master ~]# /etc/init.d/mysql stop
Shutting down MySQL............ SUCCESS!
[root@manager app1]# tailf mha_manager.log
IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===
Checking the Status of the script.. OK
Mon May 20 17:14:31 2019 - [info] OK.
Mon May 20 17:14:31 2019 - [warning] shutdown_script is not defined.
Mon May 20 17:14:31 2019 - [info] Set master ping interval 3 seconds.
Mon May 20 17:14:31 2019 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon May 20 17:14:31 2019 - [info] Starting ping health check on 192.168.100.129(192.168.100.129:3306)..
Mon May 20 17:14:31 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Mon May 20 17:14:55 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon May 20 17:14:55 2019 - [info] Executing SSH check script: exit 0
Mon May 20 17:14:56 2019 - [info] HealthCheck: SSH to 192.168.100.129 is reachable.
Mon May 20 17:14:58 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.129' (111))
Mon May 20 17:14:58 2019 - [warning] Connection failed 2 time(s)..
Mon May 20 17:15:01 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.129' (111))
Mon May 20 17:15:01 2019 - [warning] Connection failed 3 time(s)..
Mon May 20 17:15:04 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.129' (111))
Mon May 20 17:15:04 2019 - [warning] Connection failed 4 time(s)..
Mon May 20 17:15:04 2019 - [warning] Master is not reachable from health checker!
Mon May 20 17:15:04 2019 - [warning] Master 192.168.100.129(192.168.100.129:3306) is not reachable!
Mon May 20 17:15:04 2019 - [warning] SSH is reachable.
Mon May 20 17:15:04 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Mon May 20 17:15:04 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 20 17:15:04 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 20 17:15:04 2019 - [info] Reading server configuration from /etc/app1.cnf..
Mon May 20 17:15:05 2019 - [info] GTID failover mode = 1
Mon May 20 17:15:05 2019 - [info] Dead Servers:
Mon May 20 17:15:05 2019 - [info] 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:05 2019 - [info] Alive Servers:
Mon May 20 17:15:05 2019 - [info] 192.168.100.130(192.168.100.130:3306)
Mon May 20 17:15:05 2019 - [info] 192.168.100.131(192.168.100.131:3306)
Mon May 20 17:15:05 2019 - [info] Alive Slaves:
Mon May 20 17:15:05 2019 - [info] 192.168.100.130(192.168.100.130:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:05 2019 - [info] GTID ON
Mon May 20 17:15:05 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:05 2019 - [info] 192.168.100.131(192.168.100.131:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:05 2019 - [info] GTID ON
Mon May 20 17:15:05 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:05 2019 - [info] Checking slave configurations..
Mon May 20 17:15:05 2019 - [info] read_only=1 is not set on slave 192.168.100.130(192.168.100.130:3306).
Mon May 20 17:15:05 2019 - [info] read_only=1 is not set on slave 192.168.100.131(192.168.100.131:3306).
Mon May 20 17:15:05 2019 - [info] Checking replication filtering settings..
Mon May 20 17:15:05 2019 - [info] Replication filtering check ok.
Mon May 20 17:15:05 2019 - [info] Master is down!
Mon May 20 17:15:05 2019 - [info] Terminating monitoring script.
Mon May 20 17:15:05 2019 - [info] Got exit code 20 (Master dead).
Mon May 20 17:15:05 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 20 17:15:05 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 20 17:15:05 2019 - [info] Reading server configuration from /etc/app1.cnf..
Mon May 20 17:15:05 2019 - [info] MHA::MasterFailover version 0.58.
Mon May 20 17:15:05 2019 - [info] Starting master failover.
Mon May 20 17:15:05 2019 - [info]
Mon May 20 17:15:05 2019 - [info] * Phase 1: Configuration Check Phase..
Mon May 20 17:15:05 2019 - [info]
Mon May 20 17:15:07 2019 - [info] GTID failover mode = 1
Mon May 20 17:15:07 2019 - [info] Dead Servers:
Mon May 20 17:15:07 2019 - [info] 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] Checking master reachability via MySQL(double check)...
Mon May 20 17:15:07 2019 - [info] ok.
Mon May 20 17:15:07 2019 - [info] Alive Servers:
Mon May 20 17:15:07 2019 - [info] 192.168.100.130(192.168.100.130:3306)
Mon May 20 17:15:07 2019 - [info] 192.168.100.131(192.168.100.131:3306)
Mon May 20 17:15:07 2019 - [info] Alive Slaves:
Mon May 20 17:15:07 2019 - [info] 192.168.100.130(192.168.100.130:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info] GTID ON
Mon May 20 17:15:07 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] 192.168.100.131(192.168.100.131:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info] GTID ON
Mon May 20 17:15:07 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] Starting GTID based failover.
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] Forcing shutdown so that applications never connect to the current master..
Mon May 20 17:15:07 2019 - [info] Executing master IP deactivation script:
Mon May 20 17:15:07 2019 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.100.129 --orig_master_ip=192.168.100.129 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===
Disabling the VIP on old master: 192.168.100.129
Mon May 20 17:15:07 2019 - [info] done.
Mon May 20 17:15:07 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon May 20 17:15:07 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] * Phase 3: Master Recovery Phase..
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] The latest binary log file/position on all slaves is binlog.000010:195
Mon May 20 17:15:07 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon May 20 17:15:07 2019 - [info] 192.168.100.130(192.168.100.130:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info] GTID ON
Mon May 20 17:15:07 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] 192.168.100.131(192.168.100.131:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info] GTID ON
Mon May 20 17:15:07 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] The oldest binary log file/position on all slaves is binlog.000010:195
Mon May 20 17:15:07 2019 - [info] Oldest slaves:
Mon May 20 17:15:07 2019 - [info] 192.168.100.130(192.168.100.130:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info] GTID ON
Mon May 20 17:15:07 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] 192.168.100.131(192.168.100.131:3306) Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info] GTID ON
Mon May 20 17:15:07 2019 - [info] Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] * Phase 3.3: Determining New Master Phase..
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] Searching new master from slaves..
Mon May 20 17:15:07 2019 - [info] Candidate masters from the configuration file:
Mon May 20 17:15:07 2019 - [info] Non-candidate masters:
Mon May 20 17:15:07 2019 - [info] New master is 192.168.100.130(192.168.100.130:3306)
Mon May 20 17:15:07 2019 - [info] Starting master failover..
Mon May 20 17:15:07 2019 - [info]
From:
192.168.100.129(192.168.100.129:3306) (current master)
+--192.168.100.130(192.168.100.130:3306)
+--192.168.100.131(192.168.100.131:3306)
To:
192.168.100.130(192.168.100.130:3306) (new master)
+--192.168.100.131(192.168.100.131:3306)
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] * Phase 3.3: New Master Recovery Phase..
Mon May 20 17:15:07 2019 - [info]
Mon May 20 17:15:07 2019 - [info] Waiting all logs to be applied..
Mon May 20 17:15:07 2019 - [info] done.
Mon May 20 17:15:07 2019 - [info] Getting new master's binlog name and position..
Mon May 20 17:15:07 2019 - [info] binlog.000010:195
Mon May 20 17:15:07 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.100.130', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Mon May 20 17:15:07 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: binlog.000010, 195, 5a8c1fd3-7abc-11e9-b4c7-00505622a3c2:1-4
Mon May 20 17:15:07 2019 - [info] Executing master IP activate script:
Mon May 20 17:15:07 2019 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.100.129 --orig_master_ip=192.168.100.129 --orig_master_port=3306 --new_master_host=192.168.100.130 --new_master_ip=192.168.100.130 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===
Enabling the VIP - 192.168.100.125/24 on the new master - 192.168.100.130
Mon May 20 17:15:08 2019 - [info] OK.
Mon May 20 17:15:08 2019 - [info] ** Finished master recovery successfully.
Mon May 20 17:15:08 2019 - [info] * Phase 3: Master Recovery Phase completed.
Mon May 20 17:15:08 2019 - [info]
Mon May 20 17:15:08 2019 - [info] * Phase 4: Slaves Recovery Phase..
Mon May 20 17:15:08 2019 - [info]
Mon May 20 17:15:08 2019 - [info]
Mon May 20 17:15:08 2019 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon May 20 17:15:08 2019 - [info]
Mon May 20 17:15:08 2019 - [info] -- Slave recovery on host 192.168.100.131(192.168.100.131:3306) started, pid: 26532. Check tmp log /var/log/masterha/app1/192.168.100.131_3306_20190520171505.log if it takes time..
Mon May 20 17:15:09 2019 - [info]
Mon May 20 17:15:09 2019 - [info] Log messages from 192.168.100.131 ...
Mon May 20 17:15:09 2019 - [info]
Mon May 20 17:15:08 2019 - [info] Resetting slave 192.168.100.131(192.168.100.131:3306) and starting replication from the new master 192.168.100.130(192.168.100.130:3306)..
Mon May 20 17:15:08 2019 - [info] Executed CHANGE MASTER.
Mon May 20 17:15:08 2019 - [info] Slave started.
Mon May 20 17:15:08 2019 - [info] gtid_wait(5a8c1fd3-7abc-11e9-b4c7-00505622a3c2:1-4) completed on 192.168.100.131(192.168.100.131:3306). Executed 0 events.
Mon May 20 17:15:09 2019 - [info] End of log messages from 192.168.100.131.
Mon May 20 17:15:09 2019 - [info] -- Slave on host 192.168.100.131(192.168.100.131:3306) started.
Mon May 20 17:15:09 2019 - [info] All new slave servers recovered successfully.
Mon May 20 17:15:09 2019 - [info]
Mon May 20 17:15:09 2019 - [info] * Phase 5: New master cleanup phase..
Mon May 20 17:15:09 2019 - [info]
Mon May 20 17:15:09 2019 - [info] Resetting slave info on the new master..
Mon May 20 17:15:09 2019 - [info] 192.168.100.130: Resetting slave info succeeded.
Mon May 20 17:15:09 2019 - [info] Master failover to 192.168.100.130(192.168.100.130:3306) completed successfully.
Mon May 20 17:15:09 2019 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.100.129(192.168.100.129:3306) to 192.168.100.130(192.168.100.130:3306) succeeded
Master 192.168.100.129(192.168.100.129:3306) is down!
Check MHA Manager logs at manager.test.com for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.100.129(192.168.100.129:3306)
Selected 192.168.100.130(192.168.100.130:3306) as a new master.
192.168.100.130(192.168.100.130:3306): OK: Applying all logs succeeded.
192.168.100.130(192.168.100.130:3306): OK: Activated master IP address.
192.168.100.131(192.168.100.131:3306): OK: Slave started, replicating from 192.168.100.130(192.168.100.130:3306)
192.168.100.130(192.168.100.130:3306): Resetting slave info succeeded.
Master failover to 192.168.100.130(192.168.100.130:3306) completed successfully.
15.检查master的VIP是否漂移,MySQL的主服务器是否转移到130上
过程略
16.其他部分
(1)mha的手动切换
手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA 来进行故障切换操作,具体命令如下:
先关闭mha进程,确保不会自动执行切换
[root@manager ~]# masterha_stop --conf=/etc/app1.cnf
再关闭maser主库
[root@master ~]# /etc/init.d/mysql.server stop
Shutting down MySQL............ SUCCESS!
执行手动切换
[root@manager ~]# masterha_master_switch --master_state=dead --conf=/etc/app1.cnf -dead_master_host=master --dead_master_port=3306 --new_master_ip=192.168.100.130 -new_master_port=3306
(2)在线切换
为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会 切换成功,否则会切换失败。
1.所有slave的IO线程都在运行
2.所有slave的SQL线程都在运行
3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于 running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况 下running_updates_limit为1秒。
4.在master端,通过show processlist输出,没有一个更新花费的时间大于 running_updates_limit秒。
先关闭mha进程,确保不会自动执行切换
[root@manager ~]# masterha_stop --conf=/etc/app1.cnf
其次,进行在线切换操作(模拟在线切换主库操作,原主库192.168.237.128变为slave, 192.168.237.131提升为新的主库)
[root@manager ~]# masterha_master_switch --conf=/etc/app1.cnf --master_state=alive -new_master_host=slave2 --new_master_port=3306 --orig_master_is_new_slave -running_updates_limit=10000
-orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
–running_updates_limit=10000 故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由 recover 时relay 日志的大小决定