MySQL高可用架构MHA(完整版)

MySQL高可用架构之MHA(完整版)

1.准备4台机器,做好以下配置

(1)ip与hostname对应关系
192.168.100.128 manager.test.com
192.168.100.129 master.test.com
192.168.100.130 slave01.test.com
192.168.100.131 slave02.test.com

(2)配置hostname和hosts解析

hostnamectl set-hostname manager.test.com
hostnamectl set-hostname master.test.com
hostnamectl set-hostname slave01.test.com
hostnamectl set-hostname slave02.test.com
[root@localhost ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.128 manager.test.com
192.168.100.129 master.test.com
192.168.100.130 slave01.test.com
192.168.100.131 slave02.test.com
scp /etc/hosts root@192.168.100.129:/etc/
scp /etc/hosts root@192.168.100.130:/etc/
scp /etc/hosts root@192.168.100.131:/etc/

(3)4台机器完成ssh互信任,免密码登录

ssh-keygen
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
cat /root/.ssh/authorized_keys


scp /root/.ssh/authorized_keys root@192.168.100.129:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@192.168.100.130:/root/.ssh/authorized_keys
scp /root/.ssh/authorized_keys root@192.168.100.131:/root/.ssh/authorized_keys

2.完成MySQL 1主2从的搭建配置

(1)上传mysql-8.0.15-el7-x86_64.tar.gz到4台机器上

(2)在master、slave01、slave02上安装mysql-server,可以使用我写的安装脚本快速搭建,manager上只需要配置好MySQL的环境变量即可

vim install_mysql.sh
#!/bin/bash

mysql_package=mysql-8.0.15-el7-x86_64.tar.gz
mysql_file=mysql-8.0.15-el7-x86_64

#yum eraser mariadb-lib
yum erase -y mariadb*

#create mysql user
useradd -r -s /sbin/nologin mysql 

#install mysql server
tar -zxvf $mysql_package
mv $mysql_file /usr/local/
mv /usr/local/$mysql_file /usr/local/mysql

#create data log dir
mkdir -p /usr/local/mysql/{data,log}

#change owner and group
chown -R mysql:mysql /usr/local/mysql

#create configure file to /etc/my.cnf
echo "[mysqld]
port                           = 3306
mysqlx_port                    = 33060
mysqlx_socket                  = /tmp/mysqlx.sock
datadir                        = /usr/local/mysql/data
socket                         = /tmp/mysql.sock
pid-file                       = /tmp/mysqld.pid
log-error                      = /usr/local/mysql/log/error.log
slow-query-log                 = 1
slow-query-log-file            = /usr/local/mysql/log/slow.log
long_query_time                = 1
character-set-client-handshake = FALSE
character-set-server           = utf8mb4
collation-server               = utf8mb4_unicode_ci
init_connect                   ='SET NAMES utf8mb4'
innodb_buffer_pool_size        = 512M
join_buffer_size               = 128M
sort_buffer_size               = 2M
read_rnd_buffer_size           = 2M
log_timestamps                 = SYSTEM
lower_case_table_names         = 1
default-authentication-plugin =mysql_native_password
#我这里是基于GTID的主从复制,所以加上下面这2行
gtid_mode 	                   = ON
enforce_gtid_consistency       = ON
#在slave01上需要加上以下配置
#server_id                      = 2
relay_log_purge		     = 0 
#在slave02上需要加上以下配置
#server_id                      = 3
relay_log_purge		     = 0 >> /etc/my.cnf

#initial mysql server
cd /usr/local/mysql && bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data

#get tmp password
tmp_password=`cat /usr/local/mysql/log/error.log | grep root@localhost | awk -F ':' '{print $5}'`
echo $tmp_password

#add var
echo "PATH=\$PATH:\$HOME/bin:/usr/local/mysql/bin">> /etc/profile
source /etc/profile

#copy start file
cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql

#start mysql server
/etc/init.d/mysql start
chkconfig mysql on

#脚本完成

sh install_mysql.sh

(3)基本配置

#到这里基本就安装完成,还需要完成,3台MySQL都需要进行这样的配置

source /etc/profile
mysql -u root -p

#这里要输入刚才日志里显示的临时密码
#还需要修改下密码

mysql> alter user root@localhost identified by 'dddddd';

#再创建一个可以远程的用户,给个超级管理员的权限吧.

mysql> create user 'root'@'192.168.100.%' identified by 'dddddd';
mysql> grant all on *.* to 'root'@'192.168.100.%';

#在master上创建一个同步用户

mysql> create user 'rep'@'192.168.100.%' identified by 'dddddd';
mysql> grant replication slave on *.* to 'root'@'192.168.100.%';

#在slave01、slave02上配置同步

mysql> change master to master_host='192.168.91.129',master_port=3306,master_user='rep',master_password='dddddd',master_auto_position=1;
mysql> start slave;

#测试同步是否正常
过程略

在manager上配置一个MySQL的环境变量

tar -zxvf mysql-8.0.15-el7-x86_64.tar.gz
mv mysql-8.0.15-el7-x86_64 /usr/local/mysql
vim /etc/profile
PATH=$PATH:$HOME/bin:/usr/local/mysql/bin
source /etc/profile
PATH=$PATH:$HOME/bin:/usr/local/mysql/bin

4.测试manager能否远程连接MySQL

mysql -u root -h 192.168.100.129 -pdddddd -e "select version()"
mysql -u root -h 192.168.100.130 -pdddddd -e "select version()"
mysql -u root -h 192.168.100.131 -pdddddd -e "select version()"

能显示版本号就行了

5.安装mha的node包,4个节点都要安装

yum install -y perl-DBD-MySQL epel-release
rpm -ivh mha4mysql-node-0.58-0.noarch.rpm

6.manager节点安装manager包

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y
rpm -ivh mha4mysql-manager-0.58-0.el7.noarch.rpm

7.四个机器都做软链接

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql

8.在manager上编辑配置文件

vim /etc/app1.cnf

[server default]
 
user=root
password=dddddd

ssh_user=root
# working directory on the manager  
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers  
remote_workdir=/var/log/masterha/app1

repl_user=rep
repl_password=dddddd
master_ip_failover_script=/usr/local/bin/master_ip_failover
#这个就是故障转移脚本
[server1]  
hostname=192.168.100.129
port=3306
master_binlog_dir=/usr/local/mysql/data  
[server2]
hostname=192.168.100.130
port=3306
master_binlog_dir=/usr/local/mysql/data  
[server3] 
hostname=192.168.100.131
#这个地方还是用IP吧,不然后边配置故障转移脚本会有问题
port=3306
master_binlog_dir=/usr/local/mysql/data

9.编辑故障转移脚本

vim /usr/local/bin/master_ip_failover

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
#就修改这4行
my $vip = '192.168.100.125/24';#这里就是定义一个VIP,后边要在master上绑定
my $key = '2';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

10.在master上增加VIP的配置

在master上配置IP

[root@master ~]# ifconfig ens32:2 192.168.100.125/24
[root@master ~]# ip a
1: lo: ,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:22:a3:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.129/24 brd 192.168.100.255 scope global noprefixroute dynamic ens32
       valid_lft 1152sec preferred_lft 1152sec
    inet 192.168.100.125/24 brd 192.168.100.255 scope global secondary ens32:2
       valid_lft forever preferred_lft forever
    inet6 fe80::9f2c:dfc9:2d00:b69c/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::19ab:a4b0:a76a:ee1f/64 scope link tentative noprefixroute dadfailed 
       valid_lft forever preferred_lft forever
    inet6 fe80::88c4:652b:3291:d3c1/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

注意这个IP是写在内存的,记得不要重启网卡,这里得写自己的网卡名称

11.在manager上测试SSH配置是否正常

[root@localhost ~]# masterha_check_ssh --conf=/etc/app1.cnf
Wed Apr 24 18:29:35 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Apr 24 18:29:35 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Wed Apr 24 18:29:35 2019 - [info] Reading server configuration from /etc/app1.cnf..
Wed Apr 24 18:29:35 2019 - [info] Starting SSH connection tests..
Wed Apr 24 18:29:36 2019 - [debug] 
Wed Apr 24 18:29:35 2019 - [debug]  Connecting via SSH from root@slave01.test.com(192.168.100.130:22) to root@master.test.com(192.168.100.129:22)..
Wed Apr 24 18:29:36 2019 - [debug]   ok.
Wed Apr 24 18:29:36 2019 - [debug]  Connecting via SSH from root@slave01.test.com(192.168.100.130:22) to root@slave02.test.com(192.168.100.131:22)..
Wed Apr 24 18:29:36 2019 - [debug]   ok.
Wed Apr 24 18:29:36 2019 - [debug] 
Wed Apr 24 18:29:35 2019 - [debug]  Connecting via SSH from root@master.test.com(192.168.100.129:22) to root@slave01.test.com(192.168.100.130:22)..
Wed Apr 24 18:29:36 2019 - [debug]   ok.
Wed Apr 24 18:29:36 2019 - [debug]  Connecting via SSH from root@master.test.com(192.168.100.129:22) to root@slave02.test.com(192.168.100.131:22)..
Wed Apr 24 18:29:36 2019 - [debug]   ok.
Wed Apr 24 18:29:37 2019 - [debug] 
Wed Apr 24 18:29:36 2019 - [debug]  Connecting via SSH from root@slave02.test.com(192.168.100.131:22) to root@master.test.com(192.168.100.129:22)..
Wed Apr 24 18:29:36 2019 - [debug]   ok.
Wed Apr 24 18:29:36 2019 - [debug]  Connecting via SSH from root@slave02.test.com(192.168.100.131:22) to root@slave01.test.com(192.168.100.130:22)..
Wed Apr 24 18:29:36 2019 - [debug]   ok.
Wed Apr 24 18:29:37 2019 - [info] All SSH connection tests passed successfully.

成功就行了,如果有问题,检查互信任配置.

12.在manager上测试MySQL复制是否正常

[root@manager ~]# masterha_check_repl --conf=/etc/app1.cnf
Mon May 20 14:31:59 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 20 14:31:59 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 20 14:31:59 2019 - [info] Reading server configuration from /etc/app1.cnf..
Mon May 20 14:31:59 2019 - [info] MHA::MasterMonitor version 0.58.
Mon May 20 14:32:01 2019 - [info] GTID failover mode = 1
Mon May 20 14:32:01 2019 - [info] Dead Servers:
Mon May 20 14:32:01 2019 - [info] Alive Servers:
Mon May 20 14:32:01 2019 - [info]   master.test.com(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info]   slave01.test.com(192.168.100.130:3306)
Mon May 20 14:32:01 2019 - [info]   slave02.test.com(192.168.100.131:3306)
Mon May 20 14:32:01 2019 - [info] Alive Slaves:
Mon May 20 14:32:01 2019 - [info]   slave01.test.com(192.168.100.130:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 14:32:01 2019 - [info]     GTID ON
Mon May 20 14:32:01 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info]   slave02.test.com(192.168.100.131:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 14:32:01 2019 - [info]     GTID ON
Mon May 20 14:32:01 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info] Current Alive Master: master.test.com(192.168.100.129:3306)
Mon May 20 14:32:01 2019 - [info] Checking slave configurations..
Mon May 20 14:32:01 2019 - [info]  read_only=1 is not set on slave slave01.test.com(192.168.100.130:3306).
Mon May 20 14:32:01 2019 - [info]  read_only=1 is not set on slave slave02.test.com(192.168.100.131:3306).
Mon May 20 14:32:01 2019 - [info] Checking replication filtering settings..
Mon May 20 14:32:01 2019 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon May 20 14:32:01 2019 - [info]  Replication filtering check ok.
Mon May 20 14:32:01 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon May 20 14:32:01 2019 - [info] Checking SSH publickey authentication settings on the current master..
Mon May 20 14:32:01 2019 - [info] HealthCheck: SSH to master.test.com is reachable.
Mon May 20 14:32:01 2019 - [info] 
master.test.com(192.168.100.129:3306) (current master)
 +--slave01.test.com(192.168.100.130:3306)
 +--slave02.test.com(192.168.100.131:3306)

Mon May 20 14:32:01 2019 - [info] Checking replication health on slave01.test.com..
Mon May 20 14:32:01 2019 - [info]  ok.
Mon May 20 14:32:01 2019 - [info] Checking replication health on slave02.test.com..
Mon May 20 14:32:01 2019 - [info]  ok.
Mon May 20 14:32:01 2019 - [info] Checking master_ip_failover_script status:
Mon May 20 14:32:01 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=master.test.com --orig_master_ip=192.168.100.129 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===

Checking the Status of the script.. OK 
Mon May 20 14:32:01 2019 - [info]  OK.
Mon May 20 14:32:01 2019 - [warning] shutdown_script is not defined.
Mon May 20 14:32:01 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

13.启动mha

[root@manager ~]# nohup masterha_manager --conf=/etc/app1.cnf >/var/log/masterha/app1/mha_manager.log < /dev/null &
[1] 25455
[root@manager ~]# nohup: redirecting stderr to stdout

[root@manager ~]# cd /var/log/masterha/app1/
[root@manager app1]# ls
app1.master_status.health  mha_manager.log
[root@manager app1]# pwd
/var/log/masterha/app1
[root@manager app1]# tailf app1.master_status.health 
25455	0:PING_OK	master:master.test.com

14.测试故障转移,以及VIP漂移

这里我们要停止master的MySQL

[root@master ~]# /etc/init.d/mysql stop
Shutting down MySQL............ SUCCESS! 
[root@manager app1]# tailf mha_manager.log 

IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===

Checking the Status of the script.. OK 
Mon May 20 17:14:31 2019 - [info]  OK.
Mon May 20 17:14:31 2019 - [warning] shutdown_script is not defined.
Mon May 20 17:14:31 2019 - [info] Set master ping interval 3 seconds.
Mon May 20 17:14:31 2019 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon May 20 17:14:31 2019 - [info] Starting ping health check on 192.168.100.129(192.168.100.129:3306)..
Mon May 20 17:14:31 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Mon May 20 17:14:55 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Mon May 20 17:14:55 2019 - [info] Executing SSH check script: exit 0
Mon May 20 17:14:56 2019 - [info] HealthCheck: SSH to 192.168.100.129 is reachable.
Mon May 20 17:14:58 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.129' (111))
Mon May 20 17:14:58 2019 - [warning] Connection failed 2 time(s)..
Mon May 20 17:15:01 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.129' (111))
Mon May 20 17:15:01 2019 - [warning] Connection failed 3 time(s)..
Mon May 20 17:15:04 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.129' (111))
Mon May 20 17:15:04 2019 - [warning] Connection failed 4 time(s)..
Mon May 20 17:15:04 2019 - [warning] Master is not reachable from health checker!
Mon May 20 17:15:04 2019 - [warning] Master 192.168.100.129(192.168.100.129:3306) is not reachable!
Mon May 20 17:15:04 2019 - [warning] SSH is reachable.
Mon May 20 17:15:04 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Mon May 20 17:15:04 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 20 17:15:04 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 20 17:15:04 2019 - [info] Reading server configuration from /etc/app1.cnf..
Mon May 20 17:15:05 2019 - [info] GTID failover mode = 1
Mon May 20 17:15:05 2019 - [info] Dead Servers:
Mon May 20 17:15:05 2019 - [info]   192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:05 2019 - [info] Alive Servers:
Mon May 20 17:15:05 2019 - [info]   192.168.100.130(192.168.100.130:3306)
Mon May 20 17:15:05 2019 - [info]   192.168.100.131(192.168.100.131:3306)
Mon May 20 17:15:05 2019 - [info] Alive Slaves:
Mon May 20 17:15:05 2019 - [info]   192.168.100.130(192.168.100.130:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:05 2019 - [info]     GTID ON
Mon May 20 17:15:05 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:05 2019 - [info]   192.168.100.131(192.168.100.131:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:05 2019 - [info]     GTID ON
Mon May 20 17:15:05 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:05 2019 - [info] Checking slave configurations..
Mon May 20 17:15:05 2019 - [info]  read_only=1 is not set on slave 192.168.100.130(192.168.100.130:3306).
Mon May 20 17:15:05 2019 - [info]  read_only=1 is not set on slave 192.168.100.131(192.168.100.131:3306).
Mon May 20 17:15:05 2019 - [info] Checking replication filtering settings..
Mon May 20 17:15:05 2019 - [info]  Replication filtering check ok.
Mon May 20 17:15:05 2019 - [info] Master is down!
Mon May 20 17:15:05 2019 - [info] Terminating monitoring script.
Mon May 20 17:15:05 2019 - [info] Got exit code 20 (Master dead).
Mon May 20 17:15:05 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon May 20 17:15:05 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Mon May 20 17:15:05 2019 - [info] Reading server configuration from /etc/app1.cnf..
Mon May 20 17:15:05 2019 - [info] MHA::MasterFailover version 0.58.
Mon May 20 17:15:05 2019 - [info] Starting master failover.
Mon May 20 17:15:05 2019 - [info] 
Mon May 20 17:15:05 2019 - [info] * Phase 1: Configuration Check Phase..
Mon May 20 17:15:05 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] GTID failover mode = 1
Mon May 20 17:15:07 2019 - [info] Dead Servers:
Mon May 20 17:15:07 2019 - [info]   192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] Checking master reachability via MySQL(double check)...
Mon May 20 17:15:07 2019 - [info]  ok.
Mon May 20 17:15:07 2019 - [info] Alive Servers:
Mon May 20 17:15:07 2019 - [info]   192.168.100.130(192.168.100.130:3306)
Mon May 20 17:15:07 2019 - [info]   192.168.100.131(192.168.100.131:3306)
Mon May 20 17:15:07 2019 - [info] Alive Slaves:
Mon May 20 17:15:07 2019 - [info]   192.168.100.130(192.168.100.130:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info]     GTID ON
Mon May 20 17:15:07 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info]   192.168.100.131(192.168.100.131:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info]     GTID ON
Mon May 20 17:15:07 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] Starting GTID based failover.
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] Forcing shutdown so that applications never connect to the current master..
Mon May 20 17:15:07 2019 - [info] Executing master IP deactivation script:
Mon May 20 17:15:07 2019 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=192.168.100.129 --orig_master_ip=192.168.100.129 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===

Disabling the VIP on old master: 192.168.100.129 
Mon May 20 17:15:07 2019 - [info]  done.
Mon May 20 17:15:07 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon May 20 17:15:07 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] * Phase 3: Master Recovery Phase..
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] The latest binary log file/position on all slaves is binlog.000010:195
Mon May 20 17:15:07 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon May 20 17:15:07 2019 - [info]   192.168.100.130(192.168.100.130:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info]     GTID ON
Mon May 20 17:15:07 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info]   192.168.100.131(192.168.100.131:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info]     GTID ON
Mon May 20 17:15:07 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] The oldest binary log file/position on all slaves is binlog.000010:195
Mon May 20 17:15:07 2019 - [info] Oldest slaves:
Mon May 20 17:15:07 2019 - [info]   192.168.100.130(192.168.100.130:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info]     GTID ON
Mon May 20 17:15:07 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info]   192.168.100.131(192.168.100.131:3306)  Version=8.0.15 (oldest major version between slaves) log-bin:enabled
Mon May 20 17:15:07 2019 - [info]     GTID ON
Mon May 20 17:15:07 2019 - [info]     Replicating from 192.168.100.129(192.168.100.129:3306)
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] * Phase 3.3: Determining New Master Phase..
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] Searching new master from slaves..
Mon May 20 17:15:07 2019 - [info]  Candidate masters from the configuration file:
Mon May 20 17:15:07 2019 - [info]  Non-candidate masters:
Mon May 20 17:15:07 2019 - [info] New master is 192.168.100.130(192.168.100.130:3306)
Mon May 20 17:15:07 2019 - [info] Starting master failover..
Mon May 20 17:15:07 2019 - [info] 
From:
192.168.100.129(192.168.100.129:3306) (current master)
 +--192.168.100.130(192.168.100.130:3306)
 +--192.168.100.131(192.168.100.131:3306)

To:
192.168.100.130(192.168.100.130:3306) (new master)
 +--192.168.100.131(192.168.100.131:3306)
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info] * Phase 3.3: New Master Recovery Phase..
Mon May 20 17:15:07 2019 - [info] 
Mon May 20 17:15:07 2019 - [info]  Waiting all logs to be applied.. 
Mon May 20 17:15:07 2019 - [info]   done.
Mon May 20 17:15:07 2019 - [info] Getting new master's binlog name and position..
Mon May 20 17:15:07 2019 - [info]  binlog.000010:195
Mon May 20 17:15:07 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.100.130', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Mon May 20 17:15:07 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: binlog.000010, 195, 5a8c1fd3-7abc-11e9-b4c7-00505622a3c2:1-4
Mon May 20 17:15:07 2019 - [info] Executing master IP activate script:
Mon May 20 17:15:07 2019 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.100.129 --orig_master_ip=192.168.100.129 --orig_master_port=3306 --new_master_host=192.168.100.130 --new_master_ip=192.168.100.130 --new_master_port=3306 --new_master_user='root'   --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/sbin/ifconfig ens32:2 down==/sbin/ifconfig ens32:2 192.168.100.125/24===

Enabling the VIP - 192.168.100.125/24 on the new master - 192.168.100.130 
Mon May 20 17:15:08 2019 - [info]  OK.
Mon May 20 17:15:08 2019 - [info] ** Finished master recovery successfully.
Mon May 20 17:15:08 2019 - [info] * Phase 3: Master Recovery Phase completed.
Mon May 20 17:15:08 2019 - [info] 
Mon May 20 17:15:08 2019 - [info] * Phase 4: Slaves Recovery Phase..
Mon May 20 17:15:08 2019 - [info] 
Mon May 20 17:15:08 2019 - [info] 
Mon May 20 17:15:08 2019 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon May 20 17:15:08 2019 - [info] 
Mon May 20 17:15:08 2019 - [info] -- Slave recovery on host 192.168.100.131(192.168.100.131:3306) started, pid: 26532. Check tmp log /var/log/masterha/app1/192.168.100.131_3306_20190520171505.log if it takes time..
Mon May 20 17:15:09 2019 - [info] 
Mon May 20 17:15:09 2019 - [info] Log messages from 192.168.100.131 ...
Mon May 20 17:15:09 2019 - [info] 
Mon May 20 17:15:08 2019 - [info]  Resetting slave 192.168.100.131(192.168.100.131:3306) and starting replication from the new master 192.168.100.130(192.168.100.130:3306)..
Mon May 20 17:15:08 2019 - [info]  Executed CHANGE MASTER.
Mon May 20 17:15:08 2019 - [info]  Slave started.
Mon May 20 17:15:08 2019 - [info]  gtid_wait(5a8c1fd3-7abc-11e9-b4c7-00505622a3c2:1-4) completed on 192.168.100.131(192.168.100.131:3306). Executed 0 events.
Mon May 20 17:15:09 2019 - [info] End of log messages from 192.168.100.131.
Mon May 20 17:15:09 2019 - [info] -- Slave on host 192.168.100.131(192.168.100.131:3306) started.
Mon May 20 17:15:09 2019 - [info] All new slave servers recovered successfully.
Mon May 20 17:15:09 2019 - [info] 
Mon May 20 17:15:09 2019 - [info] * Phase 5: New master cleanup phase..
Mon May 20 17:15:09 2019 - [info] 
Mon May 20 17:15:09 2019 - [info] Resetting slave info on the new master..
Mon May 20 17:15:09 2019 - [info]  192.168.100.130: Resetting slave info succeeded.
Mon May 20 17:15:09 2019 - [info] Master failover to 192.168.100.130(192.168.100.130:3306) completed successfully.
Mon May 20 17:15:09 2019 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.100.129(192.168.100.129:3306) to 192.168.100.130(192.168.100.130:3306) succeeded

Master 192.168.100.129(192.168.100.129:3306) is down!

Check MHA Manager logs at manager.test.com for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.100.129(192.168.100.129:3306)
Selected 192.168.100.130(192.168.100.130:3306) as a new master.
192.168.100.130(192.168.100.130:3306): OK: Applying all logs succeeded.
192.168.100.130(192.168.100.130:3306): OK: Activated master IP address.
192.168.100.131(192.168.100.131:3306): OK: Slave started, replicating from 192.168.100.130(192.168.100.130:3306)
192.168.100.130(192.168.100.130:3306): Resetting slave info succeeded.
Master failover to 192.168.100.130(192.168.100.130:3306) completed successfully.

15.检查master的VIP是否漂移,MySQL的主服务器是否转移到130上
过程略

16.其他部分

(1)mha的手动切换

手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA 来进行故障切换操作,具体命令如下:

先关闭mha进程,确保不会自动执行切换

[root@manager ~]# masterha_stop --conf=/etc/app1.cnf

再关闭maser主库

[root@master ~]# /etc/init.d/mysql.server stop 
Shutting down MySQL............ SUCCESS!
执行手动切换 
[root@manager ~]# masterha_master_switch --master_state=dead --conf=/etc/app1.cnf -dead_master_host=master --dead_master_port=3306 --new_master_ip=192.168.100.130 -new_master_port=3306

(2)在线切换

为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会 切换成功,否则会切换失败。
1.所有slave的IO线程都在运行
2.所有slave的SQL线程都在运行
3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于 running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况 下running_updates_limit为1秒。
4.在master端,通过show processlist输出,没有一个更新花费的时间大于 running_updates_limit秒。

先关闭mha进程,确保不会自动执行切换
[root@manager ~]# masterha_stop --conf=/etc/app1.cnf
其次,进行在线切换操作(模拟在线切换主库操作,原主库192.168.237.128变为slave, 192.168.237.131提升为新的主库)

[root@manager ~]# masterha_master_switch --conf=/etc/app1.cnf --master_state=alive -new_master_host=slave2 --new_master_port=3306 --orig_master_is_new_slave -running_updates_limit=10000

-orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
–running_updates_limit=10000 故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由 recover 时relay 日志的大小决定

你可能感兴趣的:(database,mysql)