1、机器准备:所有机器均为centos7
IP | 主从 | 主机名 |
---|---|---|
192.168.213.134 | Master | Master(node) |
192.168.213.133 | Slave | Node(node) |
192.168.213.132 | Slave | Agent1(node) |
192.168.213.137 | Manager | Manager(manager) |
2、MHA原理架构图
实验原理:
1)从宕机崩溃的master保存二进制日志事件(binlog events);
2.)识别含有最新更新的slave;
3.)应用差异的中继日志(relay log)到其他的slave;
4.)应用从master保存的二进制日志事件(binlog events);
5.)提升一个slave为新的master;
6.)使其他的slave连接新的master进行复制;
3、实验需求
通过MHA监控MySQL数据库故障切换,且不影响业务
4、实现思路
1) 安装MySQL
2) 配置mysql主从
3) 安装MHA
4) 配置无密码认证
5) 配置MHAMySQL高可用
6) 模拟master故障切换
1、安装配置MHA
2、模拟主库故障
3、提升主备为主库
4、将slave2由原来的主库重新指向新的主库
1、所有机器安装mysql这里就不过多赘述
2、配置MySQL主从,可参考MySQL 主从配置详解 - 腾讯云开发者社区-腾讯云 (tencent.com)
3、安装MHA
3.1在master上授权
grant all on *.* to 'mha'@'192.168.213.%' identified by 'Stt@123456';
3.2安装mha所需的依赖(所有机器都要安装)
[root@mha_manager ~]# yum install epel-release --nogpgcheck -y
[root@mha_manager ~]# yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN
3.3下载mhamanager以及node安装包
链接:https://pan.baidu.com/s/1Nvr88OoTH1JliUM-YUdISg?pwd=9qh0
提取码:9qh0
3.4所有机器都安装node组件(以master机器为例)
[root@master ~]# tar zxvf mha4mysql-node-0.57.tar.gz
[root@master ~]# cd mha4mysql-node-0.57/
[root@master ~]# perl Makefile.PL && make && make install
3.5在manager机器上安装manager组件 注意:一定要先安装node组件之后在安装manager且只在manager机器上安装。
[root@mha_manager ~]# tar zxvf mha4mysql-manager-0.57.tar.gz
[root@mha_manager ~]# cd mha4mysql-manager-0.57/
[root@mha_manager mha4mysql-manager-0.57]# perl Makefile.PL
[root@mha_manager mha4mysql-manager-0.57]# make
[root@mha_manager mha4mysql-manager-0.57]# make install
在manager机器上配置所有机器的免密认证
[root@MHA-manager ~ ]# ssh-keygen -t rsa //一路按回车键
[root@MHA-manager ~]# ssh-copy-id 192.168.213.133
[root@MHA-manager ~]# ssh-copy-id 192.168.213.134
[root@MHA-manager ~]# ssh-copy-id 192.168.213.132
在master机器上配置slave1,slave2的免密认证
[root@master ~]# ssh-keygen -t rsa //一路按回车键
[root@master ~]# ssh-copy-id 192.168.213.133
[root@master ~]# ssh-copy-id 192.168.213.132
[root@master ~]# ssh-copy-id 192.168.213.137
在slave1上配置master、slave2的免密认证
[root@slave1 ~]# ssh-keygen -t rsa //一路按回车键
[root@slave1 ~]# ssh-copy-id 192.168.213.134
[root@slave1 ~]# ssh-copy-id 192.168.213.132
[root@master ~]# ssh-copy-id 192.168.213.137
在slave2上配置master、slave1的免密认证
[root@slave1 ~]# ssh-keygen -t rsa //一路按回车键
[root@slave1 ~]# ssh-copy-id 192.168.213.134
[root@slave1 ~]# ssh-copy-id 192.168.213.133
[root@master ~]# ssh-copy-id 192.168.213.137
5.1在manager节点上复制相关脚本到/usr/local/bin目录下
[root@manager bin]# cp -ra /root/mha4mysql-manager-0.57/samples/scripts /usr/local/bin
[root@manager ~]# ll /usr/local/bin/scripts/
total 32
-rwxr-xr-x 1 mysq| mysq| 36485月31 2015 master_ ip_ failover #自动切换时VIP管理的脚本
-rwxr-xr-x 1 mysql mysql 9872 5月25 09:07 master ip_ online_ change #在线切换时vip的管理
-rwxr-xr-x 1 mysql mysql 11867 5月31 2015 power_ manager #故障发生后关闭主机的脚本
-rwxr-xr-x 1 mysql mysql 13605月31 2015 send_ _report #因故障切换后发送报警的脚本
5.2通过修改脚本master_ip_failover,将master 发生宕机时的处理方式写入该脚本。
[root@MHA-manager ~ ]#vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
############################添加内容部分#########################################
my $vip = '192.168.213.200';
my $brdc = '192.168.213.255';
my $ifdev = 'ens33';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
#################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
5.3编译配置文件app1.conf内容
[root@MHA-manager ~]# mkdir /etc/masterha
[root@MHA-manager ~]# cp /root/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha/
[root@MHA-manager ~]# vim /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/val/bin/mysql
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=Stt@123456
ping_interval=1
remote_workdir=/tmp
repl_password=Stt@123456
repl_user=rep
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.213.132 -s 192.168.213.133
shutdown_script=""
ssh_user=root
user=mha
[server1]
hostname=192.168.213.134
port=3306
[server2]
candidate_master=1
hostname=192.168.213.133
check_repl_delay=0
port=3306
[server3]
hostname=192.168.213.132
port=3306
配置文件解析
#cat /etc/masterha/app1/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1.log #设置manager的工作目录
manager_log=/var/log/masterha/app1/manager.log #设置manager的日志
master_binlog_dir=/val/lib/mysql #设置master 保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql的数据目录
master_ip_failover_script= /etc/masterha/scripts/master_ip_failover #设置自动failover时候的切换脚本
master_ip_online_change_script=/etc/masterha/scripts/master_ip_online_change
#设置手动切换时候的切换脚本
password=Stt@123456 #设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码
user=mha #设置监控用户root
ping_interval=1 #设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/data #设置远端mysql在发生切换时binlog的保存位置
repl_password=Stt@123456 #设置复制用户的密码
repl_user=rep #设置复制环境中的复制用户名
report_script=/usr/local/bin/send_report #设置发生切换后发送的报警的脚本
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.213.133 -s 192.168.213.134
#一旦MHA到server02的监控之间出现问题,MHA Manager将会尝试从server03登录到server02
shutdown_script="" #设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
ssh_user=root #设置ssh的登录用户名
[server1]
hostname=192.168.213.134
port=3306
candidate_master=1
[server2]
hostname=192.168.213.133
port=3306
candidate_master=1 #设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0 #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
[server3]
hostname=192.168.213.132
port=3306
测试ssh免密认证
[root@manager masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Tue Oct 18 18:25:31 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Oct 18 18:25:31 2022 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Oct 18 18:25:31 2022 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Oct 18 18:25:31 2022 - [info] Starting SSH connection tests..
Tue Oct 18 18:25:32 2022 - [debug]
Tue Oct 18 18:25:31 2022 - [debug] Connecting via SSH from [email protected](192.168.213.133:22) to [email protected](192.168.213.134:22)..
Tue Oct 18 18:25:32 2022 - [debug] ok.
Tue Oct 18 18:25:32 2022 - [debug] Connecting via SSH from [email protected](192.168.213.133:22) to [email protected](192.168.213.132:22)..
Tue Oct 18 18:25:32 2022 - [debug] ok.
Tue Oct 18 18:25:38 2022 - [debug]
Tue Oct 18 18:25:32 2022 - [debug] Connecting via SSH from [email protected](192.168.213.132:22) to [email protected](192.168.213.134:22)..
Tue Oct 18 18:25:32 2022 - [debug] ok.
Tue Oct 18 18:25:32 2022 - [debug] Connecting via SSH from [email protected](192.168.213.132:22) to [email protected](192.168.213.133:22)..
Tue Oct 18 18:25:38 2022 - [debug] ok.
Tue Oct 18 18:25:42 2022 - [debug]
Tue Oct 18 18:25:31 2022 - [debug] Connecting via SSH from [email protected](192.168.213.134:22) to [email protected](192.168.213.133:22)..
Tue Oct 18 18:25:41 2022 - [debug] ok.
Tue Oct 18 18:25:41 2022 - [debug] Connecting via SSH from [email protected](192.168.213.134:22) to [email protected](192.168.213.132:22)..
Tue Oct 18 18:25:42 2022 - [debug] ok.
Tue Oct 18 18:25:42 2022 - [info] All SSH connection tests passed successfully.
显示successfully则说明测试成功
测试集群状态,出现以下报错
masterha_check_repl --conf=/etc/masterha/app1.cnf
Tue Oct 18 18:38:55 2022 - [info] Checking master_ip_failover_script status:
Tue Oct 18 18:38:55 2022 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.213.134 --orig_master_ip=192.168.213.134 --orig_master_port=3306
"my" variable $ssh_start_vip masks earlier declaration in same scope at /usr/local/bin/master_ip_failover line 22.
Tue Oct 18 18:38:55 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln229] Failed to get master_ip_failover_script status with return code 255:0.
Tue Oct 18 18:38:55 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48.
Tue Oct 18 18:38:55 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Tue Oct 18 18:38:55 2022 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
原因 master_ip_failover 没有选择VIP 方式。MHA 支持 脚本管理VIP 和 keepalive 方式。出现这个错误是因为没有选择VIP管理方式,也没有做相应的配置。
解决方法 选择好配置VIP方式,并修改master_ip_failover脚本。
需要手动开启master上的虚拟IP
ifconfig ens33:1 192.168.213.200/24
回车后没有任何反馈则说明配置正确
之后再进行集群状态检测,成功
[root@manager ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
......
Tue Oct 18 19:26:04 2022 - [info] Checking replication health on 192.168.213.133..
Tue Oct 18 19:26:04 2022 - [info] ok.
Tue Oct 18 19:26:04 2022 - [info] Checking replication health on 192.168.213.132..
Tue Oct 18 19:26:04 2022 - [info] ok.
Tue Oct 18 19:26:04 2022 - [info] Checking master_ip_failover_script status:
Tue Oct 18 19:26:04 2022 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.213.134 --orig_master_ip=192.168.213.134 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.213.200===
Checking the Status of the script.. OK
Tue Oct 18 19:26:04 2022 - [info] OK.
Tue Oct 18 19:26:04 2022 - [warning] shutdown_script is not defined.
Tue Oct 18 19:26:04 2022 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
开启mha
[root@manager masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[1] 24794 #程序启动进程号
参数解释:
–remove_ dead_ master_ conf该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
–manger_ log日志存放位置。
–ignore_ last failover在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间 隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该 参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志记 目录,也就是.上面设置的日志app1.failover.complete文件,下次再次切换的时候如果发现 该目录下存在该文件将不允许触发切换,除非在第一 次切换后收到删除该文件,为了方便, 这里设置为-ignore last failover。
查看状态
[root@manager masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:24794) is running(0:PING_OK), master:192.168.213.134
退出直接kill -9 pid
手动停止master节点上的mysql服务
[root@master ~]# pkill mysqld
此时查看manager mha状态
[root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:27759) is running(0:PING_OK), master:192.168.213.133
可以看到此时的master已经变为133机器了,也就是slave1
查看slave1 ifconfig,可以看到此时的vip已经转移到这台机器上了
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1F6n7sib-1666114109876)(C:\Users\sunny\AppData\Roaming\Typora\typora-user-images\image-20221018204448748.png)]
7.1恢复原master节点mysql服务
systemctl start mysqld
mysql -uroot -pStt@123456 -e 'set global read_only=1'
mysql -uroot -pStt@123456 -e ' set global relay_log_purge=0'
# 登录数据库执行如下操作
mysql> change master to
-> master_host='192.168.213.133', #新的masterIP地址
-> master_user='rep',
-> master_password='Stt@123456',
-> master_log_file='master-bin.000001',#在新的master数据库中执行show master status\G;查看
-> master_log_pos=462;
Query OK, 0 rows affected, 2 warnings (0.03 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.03 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.213.133
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000001
Read_Master_Log_Pos: 462
Relay_Log_File: master-relay-bin.000002
Relay_Log_Pos: 321
Relay_Master_Log_File: master-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
#数据同步成功
7.2修改manager中的mha主配置文件,将恢复机器的信息加入其中
vim /etc/masterha/app1.conf
......
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.213.133
port=3306
[server3]
hostname=192.168.213.132
port=3306
[server1] #将宕机删除的机器信息重新加入mha集群中
hostname=192.168.213.134
port=3306
masterha_check_repl --conf=/etc/masterha/app1.cnf #查看集群状态
......
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 192.168.213.200===
Checking the Status of the script.. OK
Wed Oct 19 01:12:33 2022 - [info] OK.
Wed Oct 19 01:12:33 2022 - [warning] shutdown_script is not defined.
Wed Oct 19 01:12:33 2022 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@manager masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & #启动mha并在后台运行
[root@manager masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf #查看运行状态
app1 (pid:57420) is running(0:PING_OK), master:192.168.213.133