MySQL-MHA 搭建文档

MHA软件介绍


  • MHA是由日本Mysql专家用Perl写的一套Mysql故障切换方案以保障数据库的高可用性,它的功能是能在0一30s之内实现主Mysql故障转移(failover),凡MHA故障转移叮以很好的帮我们解决从库数据的一致性问题.同时最大化挽回故障发生后的数据.MHA里有两个角色一个是node节点一个是manager节点,要实现这个MHA,必须最少要二台数据库服务器,一主多备,即一台充当master,一台充当master的备份机,另外一台是从属机,这里实验为了实现更好的效果使用三台机器,需要说明的是一旦主服务器宕机.备份机即开始充当master提供服务,如果上服务器上线也不会再成为master了.因为如果这样数据库的一致性就被改变了。
  • MHA监控复制架构的主服务器,一旦检侧到主服务器故障.就会自动进行故障转移。即使有些从服务器没有收到最新的relay log自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了 , 期间通常在几秒内完成故障转移,9一12秒可以检测出主服务器故障,7一10秒内关闭故障的主服务器以避免脑裂,几秒中内应用差异的relay10到新的主上, 整个过程在0-30s 内完成。


搭建环境

mha-master:4核  2G内存  Redhat 6.5  IP:192.168.86.148 主机名:mha-master 安装MySQL 5.6.36
mha-manager :4核  2G内存  Redhat 6.5  IP:192.168.86.149 主机名:mha-manager 安装MySQL 5.6.36
mha-bak :4核  2G内存  Redhat 6.5  IP:192.168.86.150 主机名:mha-bak 安装MySQL 5.6.36

搭建步骤

一、搭建主从(一主三从)

主库:mha-master
从库:mha-manager、mha-bak

注意:要在mha-master 端新建复制用户并授予复制权限
1
mysql> grant replication slave,replication client on *.* to 'repl'@'192.168.86.%' identified by '123456';

二、安装MHA及基本环境配置

三台机器都需执行1-2步骤
1、在所有的节点上安装MHA node所需的perl模块(DBD-mysql)以及MHA node节点
1
$ yum install perl-DBD-MySQL -y

2、在所有的节点上安装MHA的node数据节点
1
$ rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
2
Preparing...                ########################################### [100%]
3
   1:mha4mysql-node         ########################################### [100%]
安装完成后,看/usr/bin目录下是否生成以下脚本文件:
1
$ ll /usr/bin/ | grep logs
2
-rwxr-xr-x. 1 root root       15977 Dec  1  2012 apply_diff_relay_logs
3
-rwxr-xr-x. 1 root root        4807 Dec  1  2012 filter_mysqlbinlog
4
-rwxr-xr-x. 1 root root        7401 Dec  1  2012 purge_relay_logs
5
-rwxr-xr-x. 1 root root        7263 Dec  1  2012 save_binary_logs

3、在manager节点(mha-manager)上安装MHA manager以及依赖包
1
$ ls
2
perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
3
perl-Email-Date-Format-1.002-5.el6.noarch.rpm
4
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
5
perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
6
perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
7
perl-MailTools-2.04-4.el6.noarch.rpm
8
perl-MIME-Lite-3.027-2.el6.noarch.rpm
9
perl-MIME-Lite-HTML-1.23-2.el6.noarch.rpm
10
perl-MIME-Types-1.28-2.el6.noarch.rpm
11
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
12
perl-Params-Validate-0.92-3.el6.x86_64.rpm
13
perl-TimeDate-1.16-13.el6.noarch.rpm
1
$ yum localinstall perl-* -y
1
$ rpm -ivh mha4mysql-manager-0.55-0.el6.noarch.rpm 
2
Preparing...                ########################################### [100%]
3
   1:mha4mysql-manager      ########################################### [100%]
安装成功后,生成以下脚本文件:
1
$ ls /usr/bin/ | grep "masterha_*"
2
masterha_check_repl
3
masterha_check_ssh
4
masterha_check_status
5
masterha_conf_host
6
masterha_manager
7
masterha_master_monitor
8
masterha_master_switch
9
masterha_secondary_check
10
masterha_stop

4、配置主机名字(三台机器都要配置
1
$ vim /etc/hosts
2
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
3
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
4
192.168.86.148 mha-master
5
192.168.86.149 mha-manager
6
192.168.86.150 mha-bak

5、配置SSH登录无密码验证
方法一:
在mha-master上:
①生成rsa秘钥
1
$ ssh-keygen -t rsa
2
Generating public/private rsa key pair.
3
Enter file in which to save the key (/root/.ssh/id_rsa): 
4
Created directory '/root/.ssh'.
5
Enter passphrase (empty for no passphrase): 
6
Enter same passphrase again: 
7
Your identification has been saved in /root/.ssh/id_rsa.
8
Your public key has been saved in /root/.ssh/id_rsa.pub.
9
The key fingerprint is:
10
5a:26:2d:d2:63:48:52:d1:0f:54:b0:fc:8c:6e:93:9e root@mha-master
11
The key's randomart image is:
12
+--[ RSA 2048]----+
13
|    o+oo.        |
14
|   . .o.         |
15
|  . . oo         |
16
|   o o =.        |
17
|    o B S        |
18
|     + O         |
19
|      *          |
20
|     o o         |
21
|      E          |
22
+-----------------+
②将生成的公钥追加到认证文件中
1
$ cd /root/.ssh/
2
$ cat id_rsa.pub >> authorized_keys
③修改ssh文件夹和公钥的权限
chmod 700 /root/.ssh/
chmod 600 /root/.ssh/authorized_keys

④将公钥传输到mha-manager和mha-bak上
1
$ ssh-copy-id -i id_rsa.pub [email protected]
2
The authenticity of host '192.168.86.149 (192.168.84.149)' can't be established.
3
RSA key fingerprint is 22:00:37:a2:be:c4:98:12:72:57:d4:74:bb:07:02:50.
4
Are you sure you want to continue connecting (yes/no)? yes
5
Warning: Permanently added '192.168.86.149' (RSA) to the list of known hosts.
6
[email protected]'s password: 
7
Now try logging into the machine, with "ssh '[email protected]'", and check in:
8
9
  .ssh/authorized_keys
10
11
to make sure we haven't added extra keys that you weren't expecting.
1
 ssh-copy-id -i id_rsa.pub [email protected]
2
The authenticity of host '192.168.86.150 (192.168.86.150)' can't be established.
3
RSA key fingerprint is 22:00:37:a2:be:c4:98:12:72:57:d4:74:bb:07:02:50.
4
Are you sure you want to continue connecting (yes/no)? yes
5
Warning: Permanently added '192.168.86.150' (RSA) to the list of known hosts.
6
[email protected]'s password: 
7
Now try logging into the machine, with "ssh '[email protected]'", and check in:
8
9
  .ssh/authorized_keys
10
11
to make sure we haven't added extra keys that you weren't expecting.
在mha-manager和mha-bak上重复①-④步,至三台主机的authorized_keys文件中都包含了三台主机

方法二:
在mha-master上执行以下脚本,并按提示输入密码
1
#!/bin/bash
2
node=( 192.168.86.148 192.168.86.149 192.168.86.150)  # hostname for each node
3
username=root            # username to be interconnected
4
homename=$username        # home dir, i.e. home/zhangyang
5
if [ "$username" = "root" ];
6
then
7
        homename=root
8
else
9
        homename=home/$username
10
fi
11
12
for((i=0; i<${#node[*]}; i++))
13
do
14
        ssh $username@${node[i]} 'ssh-keygen -t rsa; chmod 755 ~/.ssh'
15
done
16
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
17
18
echo "batch authorized_keys created..."
19
echo "start scp..."
20
21
#scp node003:/$homename/.ssh/authorized_keys /$homename/.ssh/node003.key
22
for((i=0; i<${#node[*]}; i++))
23
do
24
        scp ${node[i]}:/$homename/.ssh/id_rsa.pub /$homename/.ssh/${node[i]}.key
25
        echo "scp from ${node[i]} finished..."
26
done
27
28
echo "append key to authorized_keys..."
29
for((i=0; i<${#node[*]}; i++))
30
do
31
        cat /$homename/.ssh/${node[i]}.key >> /$homename/.ssh/authorized_keys
32
        echo "append ${node[i]}.key finished..."
33
done
34
35
echo "append all key finished..."
36
loop=${#node[*]}
37
let subloop=loop-1
38
echo "starting scp complete authorized_keys to ${node[0]}~${node[subloop]}"
39
for((i=0; i<${#node[*]}; i++))
40
do
41
        scp /$homename/.ssh/authorized_keys ${node[i]}:/$homename/.ssh/authorized_keys
42
        echo "scp to ${node[i]} finished..."
43
done
44
echo "scp all nodes finished..."
45
46
# delete intermediate files
47
rm -rf /$homename/.ssh/*.key
48
echo "all configuration finished..."

6、测试三台主机是否实现无密码相互登录
在mha-master上
1
$ ssh mha-manager
2
$ ssh mha-bak
在mha-manager上
1
$ ssh mha-master
2
$ ssh mha-manager
在mha-bak上
1
$ ssh mha-master
2
$ ssh mha-manager

7、在mha-master上创建监控用户,使得两个从库可以连接
1
mysql> grant all privileges on *.* to 'mha_monitor'@'192.168.86.%' identified by '123456';
2
Query OK, 0 rows affected, 1 warning (0.21 sec) 

8、再其他两台机器上测试是否可以登录
1
$ mysql -umha_monitor -p123456 -h192.168.86.148

三、配置MHA

在mha-manager上执行
1、创建MHA的工作目录
1
$ mkdir -p /etc/masterha 

2、修改相关配置文件/etc/masterha/app1.cnf,内容如下
1
[server default]
2
manager_log=/var/log/masterha/app1/manager.log
3
manager_workdir=/var/log/masterha/app1
4
master_binlog_dir=/data
5
remote_workdir=/tmp
6
ping_interval=1
7
repl_user=repl
8
repl_password=123456
9
repl_user=repl
10
ssh_user=root
11
user=mha_monitor
12
password=123456
13
14
[server1]
15
hostname=192.168.86.148
16
port=3306
17
18
[server2]
19
hostname=192.168.86.149
20
port=3306
21
22
[server3]
23
candidate_master=1
24
check_repl_delay=0
25
hostname=192.168.86.150
26
port=3306

3、在每个slave节点上设置relay_log的清除方式
1
mysql> set global relay_log_purge=0;(在 2 个 slave 上执行)
2
Query OK, 0 rows affected (0.00 sec)

注意:
  • MHA 在发生切换的过程中, 从库的恢复过程中依赖于 relay log 的相关信息, 所以这里要将 relay log 的自动清除设置为 OFF, 采用手动清除 relay log 的方式。 在默认情况下, 从服务器上的中继日志会在 SQL 线程执行完毕后被自动删除。 但是在 MHA 环境中, 这些中继日志在恢复其他从服务器时可能会被用到, 因此需要禁用中继日志的自动删除功能。 定期清除中继日志需要考虑到复制延时的问题。 在 ext3 的文件系统下, 删除大的文件需要一定的时间, 会导致严重的复制延时。 为了避免复制延时, 需要暂时为中继日志创建硬链接, 因为在 linux 系统中通过硬链接删除大文件速度会很快。 (在 mysql 数据库中, 删除大表时, 通常也采用建立硬链接的方式)
  • MHA 节点中包含了 pure_relay_logs 命令工具, 它可以为中继日志创建硬链接, 执行 SET GLOBAL relay_log_purge=1,等待几秒钟以便 SQL 线程切换到新的中继日志, 再执行 SET GLOBAL relay_log_purge=0。
  • 如下脚本可实现清空relay日志
    • 1
      vim /root/purge_relay_log.sh
      2
      3
      #!/bin/bash
      4
      user=root
      5
      passwd=123456
      6
      port=3306
      7
      log_dir='/var/log/masterha/app1/log'
      8
      work_dir='/var/log/masterha/app1'
      9
      purge='/usr/local/bin/purge_relay_logs'
      10
      if [ ! -d $log_dir ]
      11
      then
      12
      mkdir $log_dir -p
      13
      fi
      14
      $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port
      15
      --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
      将脚本添加至crontab定期执行
    • 1
      $ crontab -l
      2
      0 4 * * * /bin/bash /root/purge_relay_log.sh

4、设置VIP(Virtual IP)
1
$ ifconfig eth0:1 192.168.86.155/24 

5. 编辑两个IP漂移脚本
failover脚本内容如下,注意修改脚本中的VIP(一处)
1
$ vim /usr/bin/master_ip_failover 
2
#!/usr/bin/env perl
3
use strict;  
4
use warnings FATAL =>'all';  
5
  
6
use Getopt::Long;  
7
  
8
my (  
9
$command,          $ssh_user,        $orig_master_host, $orig_master_ip,  
10
$orig_master_port, $new_master_host, $new_master_ip,    $new_master_port  
11
);  
12
  
13
my $vip = '192.168.86.155/24';  # Virtual IP  
14
my $key = "1";  
15
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";  
16
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";  
17
my $exit_code = 0;  
18
  
19
GetOptions(  
20
'command=s'          => \$command,  
21
'ssh_user=s'         => \$ssh_user,  
22
'orig_master_host=s' => \$orig_master_host,  
23
'orig_master_ip=s'   => \$orig_master_ip,  
24
'orig_master_port=i' => \$orig_master_port,  
25
'new_master_host=s'  => \$new_master_host,  
26
'new_master_ip=s'    => \$new_master_ip,  
27
'new_master_port=i'  => \$new_master_port,  
28
);  
29
  
30
exit &main();  
31
  
32
sub main {  
33
  
34
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  
35
  
36
if ( $command eq "stop" || $command eq "stopssh" ) {  
37
  
38
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
39
        # If you manage master ip address at global catalog database,  
40
        # invalidate orig_master_ip here.  
41
        my $exit_code = 1;  
42
        eval {  
43
            print "\n\n\n***************************************************************\n";  
44
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";  
45
            print "***************************************************************\n\n\n\n";  
46
&stop_vip();  
47
            $exit_code = 0;  
48
        };  
49
        if ($@) {  
50
            warn "Got Error: $@\n";  
51
            exit $exit_code;  
52
        }  
53
        exit $exit_code;  
54
}  
55
elsif ( $command eq "start" ) {  
56
  
57
        # all arguments are passed.  
58
        # If you manage master ip address at global catalog database,  
59
        # activate new_master_ip here.  
60
        # You can also grant write access (create user, set read_only=0, etc) here.  
61
my $exit_code = 10;  
62
        eval {  
63
            print "\n\n\n***************************************************************\n";  
64
            print "Enabling the VIP - $vip on new master: $new_master_host \n";  
65
            print "***************************************************************\n\n\n\n";  
66
&start_vip();  
67
            $exit_code = 0;  
68
        };  
69
        if ($@) {  
70
            warn $@;  
71
            exit $exit_code;  
72
        }  
73
        exit $exit_code;  
74
}  
75
elsif ( $command eq "status" ) {  
76
        print "Checking the Status of the script.. OK \n";  
77
        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;  
78
        exit 0;  
79
}  
80
else {  
81
&usage();  
82
        exit 1;  
83
}  
84
}  
85
  
86
# A simple system call that enable the VIP on the new master  
87
sub start_vip() {  
88
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;  
89
}  
90
# A simple system call that disable the VIP on the old_master  
91
sub stop_vip() {  
92
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;  
93
}  
94
  
95
sub usage {  
96
print  
97
"Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=po  
98
rt –new_master_host=host –new_master_ip=ip –new_master_port=port\n";  
99
}
online change脚本内容如下,注意修改脚本中的VIP(一处)
1
#/bin/bash
2
source /root/.bash_profile
3
vip=`echo '192.168.86.155/24'` # Virtual IP
4
key=`echo '1'`
5
6
command=`echo "$1" | awk -F = '{print $2}'`
7
orig_master_host=`echo "$2" | awk -F = '{print $2}'`
8
new_master_host=`echo "$7" | awk -F = '{print $2}'`
9
orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'`
10
new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'`
11
12
stop_vip=`echo "ssh root@$orig_master_host /sbin/ifconfig eth0:$key down"`
13
start_vip=`echo "ssh root@$new_master_host /sbin/ifconfig eth0:$key $vip"`
14
15
if [ $command = 'stop' ]
16
then
17
echo -e "\n\n\n***************************************************************\n"
18
echo -e "Disabling the VIP - $vip on old master: $orig_master_host\n"
19
$stop_vip
20
if [ $? -eq 0 ]
21
then
22
echo "Disabled the VIP successfully"
23
else
24
echo "Disabled the VIP failed"
25
fi
26
echo -e "***************************************************************\n\n\n\n"
27
fi
28
if [ $command = 'start' -o $command = 'status' ]
29
then
30
echo -e "\n\n\n***************************************************************\n"
31
echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"
32
$start_vip
33
if [ $? -eq 0 ]
34
then
35
echo "Enabled the VIP successfully"
36
else
37
echo "Enabled the VIP failed"
38
fi
39
echo -e "***************************************************************\n\n\n\n"
40
fi
对脚本授权
1
$ chmod +x /usr/bin/master_ip_failover /usr/bin/master_ip_online_change

四、MHA常用操作

1、执行SSH检查
1
$ masterha_check_ssh --conf=/etc/masterha/app1.cnf  
2
Fri Sep  1 01:31:11 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
3
Fri Sep  1 01:31:11 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
4
Fri Sep  1 01:31:11 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
5
Fri Sep  1 01:31:11 2017 - [info] Starting SSH connection tests..
6
Fri Sep  1 01:31:12 2017 - [debug] 
7
Fri Sep  1 01:31:11 2017 - [debug]  Connecting via SSH from [email protected](192.168.86.148:22) to [email protected](192.168.86.149:22)..
8
Fri Sep  1 01:31:11 2017 - [debug]   ok.
9
Fri Sep  1 01:31:11 2017 - [debug]  Connecting via SSH from [email protected](192.168.86.148:22) to [email protected](192.168.86.150:22)..
10
Fri Sep  1 01:31:11 2017 - [debug]   ok.
11
Fri Sep  1 01:31:12 2017 - [debug] 
12
Fri Sep  1 01:31:11 2017 - [debug]  Connecting via SSH from [email protected](192.168.86.149:22) to [email protected](192.168.86.148:22)..
13
Fri Sep  1 01:31:11 2017 - [debug]   ok.
14
Fri Sep  1 01:31:11 2017 - [debug]  Connecting via SSH from [email protected](192.168.86.149:22) to [email protected](192.168.86.150:22)..
15
Fri Sep  1 01:31:12 2017 - [debug]   ok.
16
Fri Sep  1 01:31:12 2017 - [debug] 
17
Fri Sep  1 01:31:12 2017 - [debug]  Connecting via SSH from [email protected](192.168.86.150:22) to [email protected](192.168.86.148:22)..
18
Fri Sep  1 01:31:12 2017 - [debug]   ok.
19
Fri Sep  1 01:31:12 2017 - [debug]  Connecting via SSH from [email protected](192.168.86.150:22) to [email protected](192.168.86.149:22)..
20
Fri Sep  1 01:31:12 2017 - [debug]   ok.
21
Fri Sep  1 01:31:12 2017 - [info] All SSH connection tests passed successfully.
出现All SSH connection tests passed successfully说明ssh检查通过

2、执行复制状态检查
1
$ masterha_check_repl --conf=/etc/masterha/app1.cnf 
2
Fri Sep  1 01:36:29 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
3
Fri Sep  1 01:36:29 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
4
Fri Sep  1 01:36:29 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
5
Fri Sep  1 01:36:29 2017 - [info] MHA::MasterMonitor version 0.56.
6
Fri Sep  1 01:36:29 2017 - [info] GTID failover mode = 0
7
Fri Sep  1 01:36:29 2017 - [info] Dead Servers:
8
Fri Sep  1 01:36:29 2017 - [info] Alive Servers:
9
Fri Sep  1 01:36:29 2017 - [info]   192.168.86.148(192.168.86.148:3306)
10
Fri Sep  1 01:36:29 2017 - [info]   192.168.86.149(192.168.86.149:3306)
11
Fri Sep  1 01:36:29 2017 - [info]   192.168.86.150(192.168.86.150:3306)
12
Fri Sep  1 01:36:29 2017 - [info] Alive Slaves:
13
Fri Sep  1 01:36:29 2017 - [info]   192.168.86.149(192.168.86.149:3306)  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
14
Fri Sep  1 01:36:29 2017 - [info]     Replicating from 192.168.86.148(192.168.86.148:3306)
15
Fri Sep  1 01:36:29 2017 - [info]   192.168.86.150(192.168.86.150:3306)  Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
16
Fri Sep  1 01:36:29 2017 - [info]     Replicating from 192.168.86.148(192.168.86.148:3306)
17
Fri Sep  1 01:36:29 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
18
Fri Sep  1 01:36:29 2017 - [info] Current Alive Master: 192.168.86.148(192.168.86.148:3306)
19
Fri Sep  1 01:36:29 2017 - [info] Checking slave configurations..
20
Fri Sep  1 01:36:29 2017 - [info]  read_only=1 is not set on slave 192.168.86.150(192.168.86.150:3306).
21
Fri Sep  1 01:36:29 2017 - [info] Checking replication filtering settings..
22
Fri Sep  1 01:36:29 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
23
Fri Sep  1 01:36:29 2017 - [info]  Replication filtering check ok.
24
Fri Sep  1 01:36:29 2017 - [info] GTID (with auto-pos) is not supported
25
Fri Sep  1 01:36:29 2017 - [info] Starting SSH connection tests..
26
Fri Sep  1 01:36:31 2017 - [info] All SSH connection tests passed successfully.
27
Fri Sep  1 01:36:31 2017 - [info] Checking MHA Node version..
28
Fri Sep  1 01:36:31 2017 - [info]  Version check ok.
29
Fri Sep  1 01:36:31 2017 - [info] Checking SSH publickey authentication settings on the current master..
30
Fri Sep  1 01:36:31 2017 - [info] HealthCheck: SSH to 192.168.86.148 is reachable.
31
Fri Sep  1 01:36:32 2017 - [info] Master MHA Node version is 0.56.
32
Fri Sep  1 01:36:32 2017 - [info] Checking recovery script configurations on 192.168.86.148(192.168.86.148:3306)..
33
Fri Sep  1 01:36:32 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-server.000007 
34
Fri Sep  1 01:36:32 2017 - [info]   Connecting to [email protected](192.168.86.148:22).. 
35
  Creating /tmp if not exists..    ok.
36
  Checking output directory is accessible or not..
37
   ok.
38
  Binlog found at /data/, up to mysql-server.000007
39
Fri Sep  1 01:36:32 2017 - [info] Binlog setting check done.
40
Fri Sep  1 01:36:32 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
41
Fri Sep  1 01:36:32 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.86.149 --slave_ip=192.168.86.149 --slave_port=3306 --workdir=/tmp --target_version=5.6.20-log --manager_version=0.56 --relay_log_info=/data/relay-log.info  --relay_dir=/data/  --slave_pass=xxx
42
Fri Sep  1 01:36:32 2017 - [info]   Connecting to [email protected](192.168.86.149:22).. 
43
  Checking slave recovery environment settings..
44
    Opening /data/relay-log.info ... ok.
45
    Relay log found at /data, up to mysqld-relay-bin.000003
46
    Temporary relay log file is /data/mysqld-relay-bin.000003
47
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
48
 done.
49
    Testing mysqlbinlog output.. done.
50
    Cleaning up test file(s).. done.
51
Fri Sep  1 01:36:32 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.86.150 --slave_ip=192.168.86.150 --slave_port=3306 --workdir=/tmp --target_version=5.6.20-log --manager_version=0.56 --relay_log_info=/data/relay-log.info  --relay_dir=/data/  --slave_pass=xxx
52
Fri Sep  1 01:36:32 2017 - [info]   Connecting to [email protected](192.168.86.150:22).. 
53
  Checking slave recovery environment settings..
54
    Opening /data/relay-log.info ... ok.
55
    Relay log found at /data, up to mysqld-relay-bin.000003
56
    Temporary relay log file is /data/mysqld-relay-bin.000003
57
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
58
 done.
59
    Testing mysqlbinlog output.. done.
60
    Cleaning up test file(s).. done.
61
Fri Sep  1 01:36:33 2017 - [info] Slaves settings check done.
62
Fri Sep  1 01:36:33 2017 - [info] 
63
192.168.86.148(192.168.86.148:3306) (current master)
64
 +--192.168.86.149(192.168.86.149:3306)
65
 +--192.168.86.150(192.168.86.150:3306)
66
67
Fri Sep  1 01:36:33 2017 - [info] Checking replication health on 192.168.86.149..
68
Fri Sep  1 01:36:33 2017 - [info]  ok.
69
Fri Sep  1 01:36:33 2017 - [info] Checking replication health on 192.168.86.150..
70
Fri Sep  1 01:36:33 2017 - [info]  ok.
71
Fri Sep  1 01:36:33 2017 - [info] Checking master_ip_failover_script status:
72
Fri Sep  1 01:36:33 2017 - [info]   /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.86.148 --orig_master_ip=192.168.86.148 --orig_master_port=3306 
73
Checking the Status of the script.. OK 
74
Fri Sep  1 01:36:33 2017 - [info]  OK.
75
Fri Sep  1 01:36:33 2017 - [warning] shutdown_script is not defined.
76
Fri Sep  1 01:36:33 2017 - [info] Got exit code 0 (Not master dead).
77
78
MySQL Replication Health is OK.
出现MySQL Replication Health is OK说明复制状态检查通过

3、执行manager节点状态检查
执行检查时会出现如下三种情况
①manager服务未启动,会出现OT_RUNNING
1
$ masterha_check_status --conf=/etc/masterha/app1.cnf 
2
app1 is stopped(2:NOT_RUNNING).
②manager服务已启动,会出现master的ip地址
1
$ masterha_check_status --conf=/etc/masterha/app1.cnf 
2
app1 (pid:15469) is running(0:PING_OK), master:192.168.86.148
③manager服务正在启动
1
$ masterha_check_status --conf=/etc/masterha/app1.cnf 
2
app1 monitoring program is now on initialization phase(10:INITIALIZING_MONITOR). Wait for a while and try checking again.
4、启动MHA manager服务
1
$ nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master --ignore_last_failover  /masterha/app1/manager.log 2>&1 &

5、关闭MHA manager服务
1
$ masterha_stop --conf=/etc/masterha/app1.cnf

6、在配置文件中添加参数
1
$ masterha_conf_host --command=add --conf=/etc/masterha/app1.cnf --hostname=192.168.68.102 --block=server1

7、执行主库在线切换
1
$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.86.151 --orig_master_is_new_slave --running_updates_limit=10000;

8、借助MHA日志完成对原master的修复
在 mha-manager 监控机器上查看当时的 change master 信息如下,可以通过日志中给出的change master命令修复原master
1
$ cat /var/log/masterha/app1/manager.log |grep -i "All other slaves should start"
2
Fri Dec 30 18:09:35 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER
3
TO MASTER_HOST='192.168.86.150', MASTER_PORT=3306, MASTER_LOG_FILE='mha-server.000001', MASTER_LOG_POS=154,
4
MASTER_USER='repl', MASTER_PASSWORD='xxx';

五、使用场景

1、自动Failover(需启动MHA manager)

a.启动manager服务
1
$ nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master --ignore_last_failover  /masterha/app1/manager.log 2>&1 &
2
$ masterha_check_status --conf=/etc/masterha/app1.cnf 
3
app1 (pid:15469) is running(0:PING_OK), master:192.168.86.148
b.模拟主库故障,将主库的MySQL进程杀死,进行自动failover操作
在mha-master上执行:
1
$ pkill -9 mysqld
c.查看manager上的MHA切换日志,了解整个切换过程
1
$ cat /masterha/app1/manager.log
整个MHA的切换过程,共包括以下的步骤:
  • 1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置
  • 2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作(这个我这里还没有实现,需要研究)
  • 3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
  • 4.识别含有最新更新的slave
  • 5.应用从master保存的二进制日志事件(binlog events)
  • 6.提升一个slave为新的master进行复制
  • 7.使其他的slave连接新的master进行复制
d.启动MHA Manger服务,查看集群中现在的master(自动切换后manager服务会停止)
1
$ nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master --ignore_last_failover  /masterha/app1/manager.log 2>&1 &
2
$ masterha_check_status --conf=/etc/masterha/app1.cnf 
3
app1 (pid:15469) is running(0:PING_OK), master:192.168.86.150

2、手动failover(MHA manager未启动)

当主服务器故障时,可以手动进行故障切换操作
a.执行手动在线切换命令
1
$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.86.148 --dead_master_port=3306 --new_master_host=192.168.86.150 --new_master_port=3306 --ignore_last_failover
b.根据提示输入进行切换

3、修复宕机的master(或在线切换master)

通常情况下自动切换以后, 原master可能已经废弃掉, 待原master主机修复后, 如果数据完整的情况下, 可能想把原来master重新作为
新主库的slave 这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。
a.在mha-manager监控机器上查看当时切换日志中的 change master 信息
可以通过日志中给出的change master命令使原master加入集群
1
$ cat /var/log/masterha/app1/manager.log |grep -i "All other slaves should start"
2
Fri Dec 30 18:09:35 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER
3
TO MASTER_HOST='192.168.86.150', MASTER_PORT=3306, MASTER_LOG_FILE='mha-server.000001', MASTER_LOG_POS=154,
4
MASTER_USER='repl', MASTER_PASSWORD='xxx';
b.在宕机后修复的mha-master上执行change master命令
使原master重新加入集群
1
mysql> CHANGE MASTER
2
TO MASTER_HOST='192.168.86.150', MASTER_PORT=3306, MASTER_LOG_FILE='mha-server.000001', MASTER_LOG_POS=154,
3
MASTER_USER='repl', MASTER_PASSWORD='123456';
c.在mha-manager上在线切换master
使原master重新成为主库
1
$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.86.148 --orig_master_is_new_slave --running_updates_limit=10000;


你可能感兴趣的:(MySQL,mysql,数据库)