本实验环境共有四个节点, 其角色分配如下(实验机器均为centos 7.x)
机器名称 | IP配置 | 服务角色 |
---|---|---|
manager | 172.16.90.211 | manager控制器 |
master | 172.16.90.212 | 数据库主服务器 |
slave1 | 172.16.90.213 | 数据库从服务器 |
slave2 | 172.16.90.214 | 数据库从服务器 |
为了方便我们后期的操作,我们在各节点的/etc/hosts文件配置内容中添加如下内容:
172.16.90.211 node1
172.16.90.212 node2
172.16.90.213 node3
172.16.90.214 node4
在master上添加以下信息
#vim /etc/my.cnf
[mysqld]
server_id = 1 #复制集群中的各节点的id均必须唯一
skip_name_resolve #关闭名称解析
gtid-mode = on #启用gtid类型
enforce-gtid-consistency = true #强制GTID的一致性
log-slave-updates = 1 #slave更新是否记入日志
log-bin = mysql-bin #开启二进制日志
relay-log = relay-log #开启中继日志
[mysql]
prompt = (\\u@\\h) [\d] >\\
no_auto_rehash
修改两个 slave 的数据库配置文件,两台机器都做如下操作:
#vim /etc/my.cnf
[mysqld]
server_id = 2和3 #复制集群中的各节点的id均必须唯一
skip_name_resolve #关闭名称解析
gtid-mode = on #启用gtid类型
enforce-gtid-consistency = true #强制GTID的一致性
log-slave-updates = 1 #slave更新是否记入日志
log-bin = mysql-bin #开启二进制日志
relay-log = relay-log #开启中继日志
read_only = ON #启用只读属性
relay_log_purge = 0 #是否自动清空不再需要中继日志
[mysql]
prompt = (\\u@\\h) [\d] >\\ #连接上之后会有提示符
no_auto_rehash
之后三台机器都重启mysqld # systemctl restart mysqld
master 节点上:
(root@localhost) [(none)]>create user rep@'%' identified with mysql_native_paassword by 'MySQL@123';
(root@localhost) [(none)]>grant replication slave,replication client on *.* to rep@'%';
两个slave 节点上:
(root@localhost) [(none)]>change master to
-> master_user='rep',
-> master_password='MySQL@123',
-> master_host='172.16.90.212',
-> master_auto_position=1;
(root@localhost) [(none)]>start slave;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [(none)]>show slave status \G
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
下载两个rpm包
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
mha4mysql-node-0.58-0.el7.centos.noarch.rpm
node4安装 yum install mha*.rpm
node1 node2 node3 安装 mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@all ~]# ssh-keygen -f ~/.ssh/id_rsa -P '' -q
[root@all ~]# ssh-copy-id node1
所有节点都生成密钥对,然后拷贝到管理节点,最后管理节点的authorized_keys分发到其他节点
[root@node1 ~]# cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDN0CAV4pyxzBTev1ooGEp8DCr9RjQYvXwMga79zpLqdWu3sqWu8s5+QM6ylZ127cU66f5aYM/aNW6fujViI40ILOzD1GNGsWGcFCmGd0Do5hCs/rNf+4+FvanCOWGTkr/px15akaLI3YApX4e7qjPwVO2UEEnEXGB/9cGMzN000Nl8EZMq6S+DWZ4W2uap0K+xyznHy1SadjV4bDAaQtAWks8HYWSSQLWHn20LCc66FMVXeanZOd3wc4KgI2tA/3tfNbSnYUhX6+Yu3D8FAqdBdgD0QfKfROSDcwf3qZA9VQsYn6FZ//IFYx95KI/pDMmUD0Fe7ARlhv/Mc0dw9gGf [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0ujR4l0Y1XsBRtTCKxViXDs7kF7ephdaW0ovMFMLsFq6WsffrW3aiJyTvDIN6+7WSuuCVsVuxT1GLyWzYgnYnAa6bEYRuBK39idGiVCSdTJ/i3wY0XCiFXSmL2UiW4nBQQQUQd/xtu/PAHFNJ1WUT+DNxUz5pwfP6hYTN5xJ0YDqS2nmAkrP/9Elgb75yGOKIvCpyCZPWzGsAclmr6aw2WngAHoBtKgMP3mYOdTMaF/xQLK7W0n7QB0jaY9culCPKfFkxcHEiDJRxyHPz0SIFvGXxoJ4fHXO41p8nBipIXXOM3QMO8P8s6BM1g+MW9uvEgVi4Gk15jXtyYMnZNbL7 root@node1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvRsXlBbUuZDzoeBJ4qYkH+avwDjH3qkgUbHjKcxXolvtScShy4rH2xFdCVcwCltb8HeV1P3O1+f5HCHiCzNxQgrDW/pFmCJmtLojVVijZvpYrtRWlRUZtUOnau438q+WgxJED7RJc2J6Sah+3pxVGuK5AcQhX94VVbl45ZvOvqQd9AoYtyJwepp+B6BcscMPr6xuJhtgQPIPBFzwiqyQ7MWITirZqQvlK0rYw30VYuo3znh1PnmH8BHIm1Z0RPueCNZLnzTyu3+ZrUODcEqipWVemhntGMkadABEiBikmlqJpNPdqlg8uzdmwN4zoP4z05sEptfcN4kNg0/ejy9Tf [email protected]
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCyf81Si+QgwEXektvtqxloYI46QtTe3xxRv4cU6hy3AbqJobgk3+d9mrxUkQxtvUA+BfPTQOBAMXYCVJIOSDMMZnkjbHRN3rNsTzoGZ91V8L5rVDwOPjSZC+GZB+8p8EHOtd7ofYcnzp/rBGoBmNo2AC3HAyZp5ZRmxE4/jayVy6XgYVPGSuSitBqWRiKxNLJfbYgpHdXjUhWVA02u9iR9RD+4i7Qy0hYaf3Cv5hEiZwUxnxbmIuVGDuoJAIOEUSK5g9hdcOKxCTuz3mA02bocVKRouIVIK12PQo/t9n8Q5MEQgGVMGQrz0eSxLzLYwJdJuq5pN3K5uYFuslUnDXRJ [email protected]
[root@node1 ~]# scp ~/.ssh/authorized_keys node2:~/.ssh/
root@node2's password:
authorized_keys 100% 1598 3.4MB/s 00:00
[root@node1 ~]# scp ~/.ssh/authorized_keys node3:~/.ssh/
root@node3's password:
authorized_keys 100% 1598 2.9MB/s 00:00
[root@node1 ~]# scp ~/.ssh/authorized_keys node4:~/.ssh/
root@node4's password:
authorized_keys 100% 1598 2.9MB/s 00:00
验证:所有节点验证
[root@node4 ~]# for i in node{1..4};do ssh $i hostname;done
node1
node2
node3
node4
Manager 节点需要为每个监控的 master/slave 集群提供一个专用的配置文件,而所有的 master/slave 集群也可共享全局配置。全局配置文件默认为/etc/masterha_default.cnf
,其为可选配置。如果仅监控一组 master/slave 集群,也可直接通过 application 的配置来提供各服务器的默认配置信息。而每个 application 的配置文件路径为自定义。具体操作见下一步骤。
在所有 Mysql 节点授权拥有管理权限的用户可在本地网络中由其他节点上远程访问。 当然, 此时仅需要且只能在 master 节点运行类似如下 SQL 语句即可。
(root@localhost) [(none)]> create user mhaadmin@'%' identified with mysql_native_pasword by 'Mha@1234';
(root@localhost) [(none)]> grant all on *.* to mhaadmin@'%';
Query OK, 0 rows affected, 1 warning (0.00 sec)
定义 MHA 管理配置文件
为MHA专门创建一个管理用户, 方便以后使用, 在mysql的主节点上, 三个节点自动同步:
创建配置文件目录
# mkdir /etc/mha
创建日志目录
# mkdir -p /var/log/mha/app1
# vim /etc/mha/app1.cnf
配置文件内容如下;
# vim /etc/mha/app1.cnf
[server default] //适用于server1,2,3个server的配置
user=mhaadmin //mha管理用户
password=Mha@1234 //mha管理密码
manager_workdir=/var/log/mha/app1 //mha_master自己的工作路径
manager_log=/var/log/mha/app1/manager.log // mha_master自己的日志文件
ssh_user=root // 基于ssh的密钥认证
repl_user=rep //数据库用户名
repl_password=MySQL@123 //数据库密码
ping_interval=1 //ping间隔时长
[server1] //节点2
hostname=172.16.90.212 //节点2主机地址
ssh_port=22 //节点2的ssh端口
candidate_master=1 //将来可不可以成为master候选节点/主节点
[server2]
hostname=172.16.90.213
ssh_port=22
candidate_master=1
[server3]
hostname=172.16.90.214
ssh_port=22
candidate_master=1
1)检测各节点间 ssh 互信通信配置是否 ok 我们在 Manager 机器上输入下述命令来检测:
[root@node1 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Wed Nov 24 11:36:56 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Nov 24 11:36:56 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Wed Nov 24 11:36:56 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
Wed Nov 24 11:36:56 2021 - [info] Starting SSH connection tests..
Wed Nov 24 11:36:57 2021 - [debug]
Wed Nov 24 11:36:56 2021 - [debug] Connecting via SSH from [email protected](172.16.90.212:22) to [email protected](172.16.90.213:22)..
Wed Nov 24 11:36:56 2021 - [debug] ok.
Wed Nov 24 11:36:56 2021 - [debug] Connecting via SSH from [email protected](172.16.90.212:22) to [email protected](172.16.90.214:22)..
Wed Nov 24 11:36:56 2021 - [debug] ok.
Wed Nov 24 11:36:57 2021 - [debug]
Wed Nov 24 11:36:56 2021 - [debug] Connecting via SSH from [email protected](172.16.90.213:22) to [email protected](172.16.90.212:22)..
Wed Nov 24 11:36:56 2021 - [debug] ok.
Wed Nov 24 11:36:56 2021 - [debug] Connecting via SSH from [email protected](172.16.90.213:22) to [email protected](172.16.90.214:22)..
Wed Nov 24 11:36:57 2021 - [debug] ok.
Wed Nov 24 11:36:58 2021 - [debug]
Wed Nov 24 11:36:57 2021 - [debug] Connecting via SSH from [email protected](172.16.90.214:22) to [email protected](172.16.90.212:22)..
Wed Nov 24 11:36:57 2021 - [debug] ok.
Wed Nov 24 11:36:57 2021 - [debug] Connecting via SSH from [email protected](172.16.90.214:22) to [email protected](172.16.90.213:22)..
Wed Nov 24 11:36:57 2021 - [debug] ok.
Wed Nov 24 11:36:58 2021 - [info] All SSH connection tests passed successfully.
2)检查管理的MySQL复制集群的连接配置参数是否OK
[root@node1 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
...
Wed Nov 24 11:37:55 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
我们在 manager 节点上执行以下命令来启动 MHA:
[root@node1 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[1] 11457
启动成功以后,我们来查看一下 master 节点的状态:
[root@node1 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:11457) is running(0:PING_OK), master:172.16.90.212
查看监控日志命令如下:
[root@node1 ~]# tail -f /var/log/mha/app1/manager.log
172.16.90.212(172.16.90.212:3306) (current master)
+--172.16.90.213(172.16.90.213:3306)
+--172.16.90.214(172.16.90.214:3306)
Wed Nov 24 11:45:30 2021 - [warning] master_ip_failover_script is not defined.
Wed Nov 24 11:45:30 2021 - [warning] shutdown_script is not defined.
Wed Nov 24 11:45:30 2021 - [info] Set master ping interval 1 seconds.
Wed Nov 24 11:45:30 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Nov 24 11:45:30 2021 - [info] Starting ping health check on 172.16.90.212(172.16.90.212:3306)..
Wed Nov 24 11:45:30 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
上面启动太麻烦了,可以用脚本来启动
[root@node1 ~]# vim /etc/init.d/masterha_managerd
#!/bin/bash
# chkconfig: 35 80 20
# description: MHA management script.
STARTEXEC="/usr/bin/masterha_manager --conf"
STOPEXEC="/usr/bin/masterha_stop --conf"
CONF="/etc/mha/app1.cnf"
process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
PARAMS="--ignore_last_failover"
case "$1" in
start)
if [ $process_count -gt 1 ]
then
echo "masterha_manager exists, process is already running"
else
echo "Starting Masterha Manager"
$STARTEXEC $CONF $PARAMS < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
fi
;;
stop)
if [ $process_count -eq 0 ]
then
echo "Masterha Manager does not exist, process is not running"
else
echo "Stopping ..."
$STOPEXEC $CONF
while(true)
do
process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
if [ $process_count -gt 0 ]
then
sleep 1
else
break
fi
done
echo "Master Manager stopped"
fi
;;
*)
echo "Please use start or stop as first argument"
;;
esac
[root@node1 ~]# chmod +x /etc/init.d/masterha_managerd
[root@node1 ~]# chkconfig --add masterha_managerd
[root@node1 ~]# chkconfig masterha_managerd on
测试服务脚本:
[root@node1 ~]# systemctl start masterha_managerd
[root@node1 ~]# systemctl status masterha_managerd
● masterha_managerd.service - SYSV: MHA management script.
Loaded: loaded (/etc/rc.d/init.d/masterha_managerd; bad; vendor preset: disabled)
Active: active (running) since Wed 2021-11-24 15:22:55 CST; 9s ago
Docs: man:systemd-sysv-generator(8)
Process: 25727 ExecStart=/etc/rc.d/init.d/masterha_managerd start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/masterha_managerd.service
└─25733 perl /usr/bin/masterha_manager --conf /etc/mha/app1.cnf --ignore_last_failover
Nov 24 15:22:54 node1.kongd.com systemd[1]: Starting SYSV: MHA management script....
Nov 24 15:22:55 node1.kongd.com masterha_managerd[25727]: Starting Masterha Manager
Nov 24 15:22:55 node1.kongd.com systemd[1]: Started SYSV: MHA management script..
[root@node1 ~]# ps -ef | grep -w masterha_manager
root 25733 1 0 15:22 ? 00:00:00 perl /usr/bin/masterha_manager --conf /etc/mha/app1.cnf --ignore_last_failover
root 25800 1169 0 15:23 pts/0 00:00:00 grep --color=auto -w masterha_manager
[root@node1 ~]# systemctl stop masterha_managerd
[root@node1 ~]# ps -ef | grep -w masterha_manager
root 25856 1169 0 15:24 pts/0 00:00:00 grep --color=auto -w masterha_manager
vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式 (即不需要keepalived或者heartbeat类似的软件).
为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成。
1、编写脚本
[root@node1 ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '172.16.90.210/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@node1 ~]# chmod +x /usr/local/bin/master_ip_failover
2、更改manager配置文件
[root@node1 ~]# vim /etc/mha/app1.cnf
[server default]
添加:
master_ip_failover_script=/usr/local/bin/master_ip_failover
3、主库上,手工生成第一个vip地址
[root@node2 ~]# ifconfig eth0:1 172.16.90.210/24
注意:第一次需要在主库上手动配置vip
[root@node2 ~]# ifconfig -a |grep -A 2 "eth0:1"
eth0:1: flags=4163 mtu 1500
inet 172.16.90.210 netmask 255.255.255.0 broadcast 172.16.90.255
ether 28:6e:d4:89:3b:3e txqueuelen 1000 (Ethernet)
4、重启MHA
[root@node1 ~]# systemctl restart masterha_managerd
在 master 节点关闭 mysql 服务,模拟主节点数据崩溃
[root@node2 ~]# systemctl stop mysqld.service
在 manger 节点查看日志
[root@node1 ~]# tail -1 /var/log/mha/app1/manager.log
Master failover to 172.16.90.213(172.16.90.213:3306) completed successfully.
检查VIP
[root@node3 ~]# ifconfig -a |grep -A 2 "eth0:1"
eth0:1: flags=4163 mtu 1500
inet 172.16.90.210 netmask 255.255.255.0 broadcast 172.16.90.255
ether 28:6e:d4:89:3b:3e txqueuelen 1000 (Ethernet)
表示 manager 检测到172.16.90.212节点故障, 而后自动执行故障转移, 将172.16.90.213提升为主节点。 注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:
[root@node1 ~]# masterha_check_status -conf=/etc/mha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,并再次检测其状态。
[root@node2 ~]# systemctl start mysql.service
(root@localhost) [(none)]> change master to
-> master_user='rep',
-> master_password='MySQL@123',
-> master_host='172.16.90.213',
-> master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.00 sec)
(root@localhost) [(none)]> start slave;
Query OK, 0 rows affected (0.01 sec)
(root@localhost) [(none)]> show slave status \G
...
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
添加配置:
[root@node1 ~]# vim /etc/mha/app1.cnf
[server1]
hostname=172.16.90.212
ssh_port=22
candidate_master=1
[root@node1 ~]# systemctl restart masterha_managerd
[root@node1 ~]# masterha_check_status -conf=/etc/mha/app1.cnf
app1 (pid:26445) is running(0:PING_OK), master:172.16.90.213