MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的在MySQL集群环境中故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
(1)从宕机崩溃的master保存二进制日志事件(binlog events)
(2)识别含有最新更新的slave
(3)应用差异的中继日志(relay log)到其他的slave
(4)提升一个slave为新的master
(5)使其他的slave连接新的master进行复制
MHA与半同步复制结合,将最新数据的slave的二进制日志应用于其他所有的slave服务器上,保证所有节点的数据一致性
MHA组成 | 功能作用 | 部署机器 |
---|---|---|
MHA Manager(管理节点) | 管理一个或多个master-slave集群,故障切换,主从提升 MHA Manager定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master |
部署在一台独立的机器上,也可以部署在一台slave节点上 |
MHA Node(数据节点) | MHA Node通过监控具备解析和清理logs功能的脚本,加快故障转移 作用:(1)复制主节点的binlog数据(2)对比从节点的中继日志文件(3)无需停止从节点的SQL线程,定时删除中继日志 |
运行在每台MySQL服务器上 |
MHA软件由两部分组成,Manager工具包和Node工具包
Manager工具包主要包括以下几个工具:
masterha_check_ssh #检查MHA的SSH配置状况
masterha_check_repl #检查MySQL复制状况
masterha_manger #启动MHA
masterha_check_status #检测当前MHA运行状态
masterha_master_monitor #检测master是否宕机
masterha_master_switch #控制故障转移(自动或者手动)
masterha_conf_host #添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs #保存和复制master的二进制日志
apply_diff_relay_logs #识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog #去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs #清除中继日志(不会阻塞SQL线程)
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库。
角色 | IP地址 | 主机名 | SERVER_ID | 类型 |
---|---|---|---|---|
Master/MHAManager | 192.168.213.126 | mha | 1 | 写入/监控复制组 |
Slave/备选Master | 192.168.213.131 | mha-node1 | 2 | 读 |
Slave | 192.168.213.132 | mha-node2 | 3 | 读 |
其中master对外提供写服务,备选Master提供读服务,slave也提供相关读服务,一旦Master宕机,将会把备选Master提升为新的Master,slave指向新的Master。
节点系统均为Centos7.7
[root@mysql ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
三节点配置epel的yum源,安装依赖包
[root@mha ~]# yum list |grep epel-release
epel-release.noarch 7-12 @epel
[root@mha ~]# yum install epel-release.noarch -y
[root@mha ~]# yum install perl-DBD-MYSQL ncftp -y
配置主机名及域名解析
[root@mha ~]# tail -n 3 /etc/hosts
192.168.213.126 mha
192.168.213.131 mha-node1
192.168.213.132 mha-node2
配置时间同步
[root@mha ~]# ntpdate cn.pool.ntp.org
[root@mha ~]# hwclock --systohc
处理防火墙
[root@mha ~]# systemctl stop firewalld
[root@mha ~]# systemctl disable firewalld
[root@mha ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
[root@mha ~]# setenforce 0
(1)在主端mha上还原数据库重新进行初始化
[root@mha ~]# systemctl stop mysqld
[root@mha ~]# systemctl disable mysqld.service
[root@mha ~]# cd /var/lib/mysql
[root@mha mysql]# rm -rf *
修改配置文件
[root@mha mysql]# cat /etc/my.cnf
[mysqld]
server-id=1
gtid_mode=ON
enforce-gtid-consistency=true
master_info_repository=TABLE
relay_log_info_repository=TABLE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
default-authentication-plugin=mysql_native_password
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
三个节点的配置文件只需要server_id不同即可,将该配置文件拷贝到其他两节点后,只修改server_id
[root@mha ~]# scp /etc/my.cnf mha-node1:/etc/
[root@mha ~]# scp /etc/my.cnf mha-node2:/etc/
[root@mha-node1 ~]# vim /etc/my.cnf
server-id=2
[root@mha-node2 ~]# vim /etc/my.cnf
server-id=3
进行主节点mha的配置
[root@mha ~]# systemctl start mysqld
[root@mha ~]# grep password /var/log/mysqld.log #过滤出密码
初始化安装
[root@mha ~]# mysql_secure_installation
Enter password for user root:输入过滤出的密码
New password:
Re-enter new password:
Change the password for root ? ((Press y|Y for Yes, any other key for No) : N
全部输入Y
All done!
[root@mha ~]# mysql -p
mysql> show master status\G
*************************** 1. row ***************************
File: binlog.000002
Position: 1084
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: ed0154be-5b91-11ea-84a3-0050563abf3f:1-4
mysql> create user 'copy'@'%' identified with mysql_native_password by 'Cloudbu@123';
mysql> grant replication slave on *.* to 'copy'@'%';
mysql> flush privileges;
(2)进行从端mha-node1的配置还原数据库
[root@mha-node1 ~]# systemctl start mysqld
[root@mha-node1 ~]# grep password /var/log/mysqld.log
[root@mha-node1 ~]# mysql_secure_installation
[root@mha-node1 ~]# mysql -p
mysql> CHANGE MASTER TO
MASTER_HOST='192.168.213.131',
MASTER_USER='copy',
MASTER_PASSWORD='Cloudbu@123',
MASTER_AUTO_POSITION=1;
mysql> start slave;
mysql> show slave status\G
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
(3)进行从端mha-node2的配置还原数据库
与从端mha-node1的配置相同
(4)测试在主端mha上创建数据库
[root@mha ~]# mysql -p
mysql> create database anliu;
在两个从端查看效果已经完成了主从复制
[root@mha-node1 ~]# mysql -p
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| anliu |
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
[root@mha-node2 ~]# mysql -p
mysql> show databases;
在三个 mysql 节点分别执行如下操作:(三个都有,包括自己ssh自己)
[root@mha ~]# ssh-keygen -t rsa
[root@mha ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@mha ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@mha ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
(1)在3个节点上安装
[root@mha ~]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@mha ~]# rpm -ivh epel-release-latest-7.noarch.rpm
[root@mha ~]# yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
[root@mha ~]# wget https://qiniu.wsfnk.com/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@mha ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
(2)在manage节点上
[root@mha ~]# wget https://qiniu.wsfnk.com/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@mha ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
(3)在server01创建MHA的工作目录,并且创建相关配置文件
[root@mha ~]# mkdir /etc/masterha
[root@mha ~]# vim /etc/masterha/app1.cnf
[server default]
manager_workdir=/etc/masterha/
manager_log=/etc/masterha/app1.log
master_binlog_dir=/var/lib/mysql
#master_ip_failover_script=/usr/local/bin/master_ip_failover
#master_ip_online_change_script=/usr/local/bin/master_ip_online_change
user=root
password=Zhao123@com
ping_interval=1
remote_workdir=/tmp
repl_user=copy
repl_password=Cloudbu@123
#report_script=/usr/local/send_report
#secondary_check_script=/usr/local/bin/masterha_secondary_check -s server03 -s server02
#shutdown_script=""
ssh_user=root
[server01]
hostname=192.168.213.126
port=3306
[server02]
hostname=192.168.213.131
port=3306
candidate_master=1
check_repl_delay=0
[server03]
hostname=192.168.213.132
port=3306
#no_master=1
(4)检查MHA Manger到所有MHA Node的SSH连接状态:
[root@mha ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
(5)给root授权(只需授权mha即可,特别注意:mysql8的授权规则不同于老版本)
[root@mha ~]# mysql -p
mysql> create user 'root'@'%' identified by 'Zhao123@com';
mysql> grant all on *.* to 'root'@'%' with grant option;
mysql> flush privileges;
(6)通过masterha_check_repl脚本查看整个集群的状态
[root@mha ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
(1)将mha的master切换到mha-node1上面
[root@mha ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.213.131 --new_master_port=3306 --orig_master_is_new_slave
#所有提示均选择yes
Mon Mar 2 17:01:24 2020 - [info] Switching master to 192.168.213.131(192.168.213.131:3306) co mpleted successfully.
(2)在mha-node1查看已经变成了master端
[root@mha-node1 ~]# mysql -p
mysql> show slave status\G
Empty set (0.00 sec)
mysql> show master status;
+---------------+----------+--------------+------------------+-------------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------------------------------------------------------------------------+
| binlog.000002 | 3343 | | | c38d86b0-5b95-11ea-b431-00505632d828:1-3,
ed0154be-5b91-11ea-84a3-0050563abf3f:1-11 |
+---------------+----------+--------------+------------------+-------------------------------------------------------------------------------------+
(3)在mha和mha-node2查看slave状态中主端为mha-node1
[root@mha-node2 ~]# mysql -p
[root@mha ~]# mysql -p
mysql> show slave status\G
mysql> create table anliu.linux(
username varchar(10) not null,
password varchar(10) not null);
mysql> show tables in anliu;
+-----------------+
| Tables_in_anliu |
+-----------------+
| linux |
+-----------------+
(5)在mha和mha-node2查看数据表,数据已经同步
[root@mha ~]# mysql -p
[root@mha-node2 ~]# mysql -p
mysql> show tables in anliu;
+-----------------+
| Tables_in_anliu |
+-----------------+
| linux |
+-----------------+
(1)启动进程
[root@mha masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover &
[1] 5294
[root@mha masterha]# nohup: ignoring input and appending output to ‘nohup.out’
(2)模拟故障
[root@mha-node1 ~]# systemctl stop mysqld
(3)在mha-node2上查看,mha已经为主库
(4)在mha查看
[root@mha ~]# mysql -p
mysql> show slave status\G
Empty set (0.00 sec)
通过vip实现mysql的高可用,即客户端只需要访问配置好的虚拟ip即可
(1)修改配置文件
[root@mha ~]# cat /etc/masterha/app1.cnf
[server default]
manager_workdir=/etc/masterha/
manager_log=/etc/masterha/app1.log
master_binlog_dir=/var/lib/mysql
master_ip_failover_script=/usr/local/bin/master_ip_failover #添加脚本
master_ip_online_change_script=/usr/local/bin/master_ip_failover
user=root
password=Zhao123@com
ping_interval=1
remote_workdir=/tmp
repl_user=copy
repl_password=Cloudbu@123
#report_script=/usr/local/send_report
#secondary_check_script=/usr/local/bin/masterha_secondary_check -s server03 -s server02
#shutdown_script=""
ssh_user=root
[server01]
hostname=192.168.213.126
port=3306
candidate_master=1
[server02]
hostname=192.168.213.131
port=3306
candidate_master=1
check_repl_delay=0
[server03]
hostname=192.168.213.132
port=3306
no_master=1
(2)编写脚本,给脚本加上可执行权限
[root@mha ~]# chmod +x /usr/local/bin/master_ip_failover
[root@mha ~]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.213.222/24';
my $key = '0';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
(3)在当前的master上添加VIP
[root@mha ~]# ifconfig ens33:0 192.168.213.222
(4)进行repl测试,若测试通过,再启动MHA监控
[root@mha ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
MySQL Replication Health is OK.
[root@mha ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1 &
[root@mha ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1908) is running(0:PING_OK), master:192.168.213.126
(5)VIP漂移测试
mha主服务器上有虚拟IP,停掉MySQL服务,查看VIP的漂移和主从的切换
[root@mha ~]# systemctl stop mysqld
[root@mha ~]# ifconfig #ens33:0不在了
[root@mha-node1 ~]# ifconfig ens33:0 #漂移到了备用主服务器上
ens33:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.213.222 netmask 255.255.255.0 broadcast 192.168.213.255
ether 00:50:56:32:d8:28 txqueuelen 1000 (Ethernet)
在备用服务器上查看
[root@mha-node1 ~]# mysql -p -e "show processlist"
Enter password:
+----+-----------------+-----------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 4 | event_scheduler | localhost | NULL | Daemon | 8089 | Waiting on empty queue | NULL |
| 56 | copy | mha-node2:55444 | NULL | Binlog Dump GTID | 1288 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 63 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-----------------+-----------------+------+------------------+------+---------------------------------------------------------------+------------------+
可以看到虚拟IP已经漂移,并且主从已经切换
(6)后续恢复
修复好mha后,配置其主服务器为mha-node1
[root@mha ~]# rm -rf /etc/masterha/mha.failover.complete
[root@mha ~]# systemctl start mysqld
[root@mha ~]# mysql -p
mysql> CHANGE MASTER TO
MASTER_HOST='192.168.213.131',
MASTER_USER='copy',
MASTER_PASSWORD='Cloudbu@123',
MASTER_AUTO_POSITION=1;
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.213.131
Master_User: copy
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000004
Read_Master_Log_Pos: 235
Relay_Log_File: mha-relay-bin.000002
Relay_Log_Pos: 363
Relay_Master_Log_File: binlog.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
1.查看ssh是否可以免密连接
masterha_check_ssh --conf=/etc/masterha/app1.cnf
2.查看复制是否建立好
masterha_check_repl --conf=/etc/masterha/app1.cnf
3.启动mha
nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1 &
4.检查启动的状态
masterha_check_status --conf=/etc/masterha/app1.cnf
5.停止mha
masterha_stop masterha_check_status --conf=/etc/masterha/app1.cnf
6.failover后下次重启
rm -rf /etc/masterha/mha.failover.complete
每次failover切换后会在管理目录生成文件app1.failover.complete ,下次在切换的时候会发现有这个文件导致切换不成功,需要手动清理掉
(1)安装keepalive软件
在当前的主和要作为备份的主上,安装keepalive
yum install -y keepalived
(2)修改主和备用主的keepalived配置文件
[root@mha ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id MYSQL_HA
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.213.222
}
}
keepalive配置完成之后,启动keepalive,可以先测试keepalive是否会完成VIP的漂移
[root@mha ~]# service keepalived start
[root@mha ~]# ip a
[root@mha-node1 ~]# service keepalived start
[root@mha ~]# service keepalived stop
[root@mha-node1 ~]# ip a #VIP已漂移√
(3)设置failover脚本,在MHA配置文件中指定failover的脚本位置
[root@mha ~]# vim /etc/masterha/app1.cnf
#failover参数位置
master_ip_failover_script= /usr/local/bin/master_ip_failover --ssh_user=root
#在线修改脚步位置
master_ip_online_change_script=/usr/local/bin/master_ip_online_change --ssh_user=root
编辑failover脚本
[root@mha ~]# chmod +x /usr/local/bin/master_ip_failover
[root@mha ~]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $ssh_start_vip = "service keepalived start";
#my $ssh_start_vip = "systemctl start keepalived.service";
#my $ssh_stop_vip = "systemctl stop keepalived.service";
my $ssh_stop_vip = "service keepalived stop";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
检查复制状态
[root@mha ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
MySQL Replication Health is OK.
(4)开启MHA的监控状态
nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log 2>&1 &
检查启动的状态
masterha_check_status --conf=/etc/masterha/app1.cnf
1.按照此方法搭建的环境测试失败,systemctl stop mysqld
无法触发MHA的failover脚本,keepalived的VIP漂移切换失败,按照keepalived实现mysql双主架构(check_mysql.sh)搭建可以进行keepalived切换。
失败的原因可能是master_ip_failover脚本逻辑设计错误,检测时是MySQL Replication Health is OK.
2.keepalived本身是在机器宕机时才会实现漂移功能,我们的目标是要MySQL实例宕机后要实现故障切换,还需要辅助的脚本来帮助keepalived来实现更灵活的漂移。
1.单独使用MHA自带脚本可以实现VIP漂移,配置mysql一主多从架构
2.单独使用keepalived可以实现VIP漂移,配置mysql双主架构和一主多从架构
3.那么什么情况下应该使用MHA,什么情况下使用keepalived?
MHA,其实是实现了数据一致性的问题的,主要考虑在master宕机了后保证slave的数据损失最小,而keepalived就是实现vip的高可用。