当master出现故障时,通过对比slave之间I/O线程读取master上binlog的位置,选取最接近的slave做为最新的slave(latest slave)。
其它slave通过与latest slave对比生成差异中继日志,并应用。
在latest slave上应用从master保存的binlog,同时将latest slave提升为master。
最后在其它slave上应用相应的差异中继日志并开始从新的master开始复制.
工具 | 说明 |
---|---|
masterha_check_ssh | 检查MHA的SSH配置 |
masterha_check_repl | 检查MySQL复制 |
masterha_manager | 启动MHA |
masterha_check_status | 检测当前MHA运行状态 |
masterha_master_monitor | 监测master是否宕机 |
masterha_master_switch | 控制故障转移(自动或手动) |
masterha_conf_host | 添加或删除配置的server信息 |
工具 | 说明 |
---|---|
save_binary_logs | 保存和复制master的二进制日志 |
apply_diff_relay_logs | 识别差异的中继日志事件并应用于其它slave |
filter_mysqlbinlog | 去除不必要的ROLLBACK事件(MHA已不再使用这个工具) |
purge_relay_logs | 清除中继日志(不会阻塞SQL线程) |
主机名 | IP地址 | server-id | 功能 | 备注 |
---|---|---|---|---|
mha | 192.168.226.10 | — | 管理节点 | |
master | 192.168.226.20 | 20 | 数据节点 | 写 |
slave1 | 192.168.226.30 | 30 | 数据节点 | 读 |
slave2 | 192.168.226.40 | 40 | 数据节点 | 读 |
[root@server1 ~]# hostnamectl set-hostname mha
[root@server1 ~]# su
[root@server2 ~]# hostnamectl set-hostname master
[root@server2 ~]# su
[root@server3 ~]# hostnamectl set-hostname slave1
[root@server3 ~]# su
[root@server4 ~]# hostnamectl set-hostname slave2
[root@server4 ~]# su
systemctl stop NetworkManager
systemctl disable NetworkManager
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i '/SELINUX=enforcing/cSELINUX=disabled' /etc/selinux/config
cat >> /etc/hosts <.168.226.10 mha
192.168.226.20 master
192.168.226.30 slave1
192.168.226.40 slave2
EOF
yum install -y epel-release.noarch
yum install -y rsync.x86_64
yum install -y lrzsz
rz
#传入mysql软件包(mha节点可以省略)
tar -xzf mysql-5.7.31-linux-glibc2.12-x86_64.tar.gz
#安装依赖库
yum -y install perl-DBD-MySQL \
perl-Config-Tiny \
perl-Time-HiRes \
perl-Mail-Sender \
perl-Mail-Sendmail \
perl-MIME-Base32 \
perl-MIME-Charset \
perl-MIME-EncWords \
perl-Params-Classify \
perl-Params-Validate.x86_64 \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
net-tools
ntpdate cn.ntp.org.cn
[root@master ~]# vim mysql.sh
#!/bin/bash
#Desc='此脚本用于安装MySQL软件'
#Usage=source mysql.sh
yum install -y libaio
rm -rf /usr/local/mysql
mv mysql-5.7.31-linux-glibc2.12-x86_64 /usr/local/mysql
useradd -r -s /sbin/nologin mysql
rm -rf /etc/my.cnf
cd /usr/local/mysql
mkdir mysql-files
chown mysql:mysql mysql-files
chmod 750 mysql-files
bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql &> /root/password.txt
bin/mysql_ssl_rsa_setup --datadir=/usr/local/mysql/data
cp support-files/mysql.server /etc/init.d/mysqld
service mysqld start
echo 'export PATH=$PATH:/usr/local/mysql/bin' >> /etc/profile
source /etc/profile
[root@master ~]# source mysql.sh
[root@master mysql]# cat /root/password.txt |grep password
2021-09-30T12:46:56.088480Z 1 [Note] A temporary password is generated for root@localhost: z(Avfsdgd1/3
[root@master mysql]# mysql_secure_installation
[root@master mysql]# cd /usr/local/mysql/
[root@master mysql]# vim my.cnf
[mysqld]
basedir=/usr/local/mysql
datadir=/usr/local/mysql/data
socket=/tmp/mysql.sock
port=3306
log-error=/usr/local/mysql/data/mysql.err
log-bin=/usr/local/mysql/data/binlog
server-id=20
character_set_server=utf8mb4
gtid-mode=on
log-slave-updates=1
enforce-gtid-consistency
[root@master mysql]# service mysqld start
[root@slave1 ~]# vim mysql.sh
#!/bin/bash
#Desc='此脚本用于slave上安装MySQL软件'
#Usage=source mysql.sh
yum install -y libaio
rm -rf /usr/local/mysql
mv mysql-5.7.31-linux-glibc2.12-x86_64 /usr/local/mysql
useradd -r -s /sbin/nologin mysql
rm -rf /etc/my.cnf
cd /usr/local/mysql
mkdir mysql-files
chown mysql:mysql mysql-files
chmod 750 mysql-files
#bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql &> /root/password.txt
#bin/mysql_ssl_rsa_setup --datadir=/usr/local/mysql/data
cp support-files/mysql.server /etc/init.d/mysqld
#service mysqld start
echo 'export PATH=$PATH:/usr/local/mysql/bin' >> /etc/profile
source /etc/profile
[root@slave1 ~]# source mysql.sh
[root@slave1 mysql]# cd /usr/local/mysql/
[root@slave1 mysql]# vim my.cnf
[mysqld]
basedir=/usr/local/mysql
datadir=/usr/local/mysql/data
socket=/tmp/mysql.sock
port=3306
log-error=/usr/local/mysql/data/mysql.err
#log-bin=/usr/local/mysql/data/binlog
relay-log=/usr/local/mysql/data/relaylog
server-id=30
character_set_server=utf8mb4
gtid-mode=on
log-slave-updates=1
enforce-gtid-consistency
skip-slave-start
log-bin=/usr/local/mysql/data/binlog
在master上操作
[root@master mysql]# service mysqld stop
[root@master mysql]# rm -rf /usr/local/mysql/data/auto.cnf
[root@master mysql]# rsync -av /usr/local/mysql/data [email protected]:/usr/local/mysql/
service mysqld start
在master中操作
[root@master mysql]# mysql -uroot -p123
mysql> create user 'slave'@'192.168.226.%' identified by '123';
mysql> grant replication slave on *.* to 'slave'@'192.168.226.%';
mysql> flush privileges;
创建一个mha账号,用于后期MHA监控主从同步状态
mysql> create user 'mha'@'192.168.226.%' identified by '123';
mysql> grant all privileges on *.* to 'mha'@'192.168.226.%';
mysql> flush privileges;
在slave中操作
[root@slave1 mysql]# mysql -uroot -p123
mysql> change master to
master_host='192.168.226.20',
master_user='slave',
master_password='123',
master_port=3306,
master_auto_position=1;
mysql> start slave;
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.226.20
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 1387
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 1594
Relay_Master_Log_File: binlog.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
rz
#传入提前下载的mha4mysql-node-0.57-0.el7.noarch.rpm
yum install -y mha4mysql-node-0.57-0.el7.noarch.rpm
[root@mha mha]# rz
#传入提前下载的mha4mysql-manager-0.57-0.el7.noarch.rpm
[root@mha mha]# yum install -y mha4mysql-manager-0.57-0.el7.noarch.rpm
创建admin账号
useradd admin
echo 123|passwd --stdin admin
su - admin
[admin@mha ~]$ ssh-keygen
[admin@mha ~]$ for i in 20 30 40;do ssh-copy-id admin@192.168.226.$i;done
[admin@master ~]$ ssh-keygen
[admin@master ~]$ for i in 10 30 40;do ssh-copy-id admin@192.168.226.$i;done
[admin@slave1 ~]$ ssh-keygen
[admin@slave1 ~]$ for i in 10 20 40;do ssh-copy-id admin@192.168.226.$i;done
[admin@slave2 ~]$ ssh-keygen
[admin@slave2 ~]$ for i in 10 20 30;do ssh-copy-id admin@192.168.226.$i;done
上述操作有些麻烦,可直接拷贝.ssh目录(公钥和私钥),就能实现互信的目的
[admin@mha ~]$ ssh-keygen [admin@mha ~]$ cd .ssh/ [admin@mha .ssh]$ mv id_rsa.pub authorized_keys [admin@mha .ssh]$ for i in 10 20 30;do scp -r ../.ssh/ 192.168.226.$i:~/;done
[admin@master ~]$ su - root
[root@master ~]# vim /etc/sudoers.d/admin
User_Alias MYSQL_USERS = admin
Runas_Alias MYSQL_RUNAS = root
Cmnd_Alias MYSQL_CMNDS = /sbin/ifconfig,/sbin/arping
MYSQL_USERS ALL = (MYSQL_RUNAS) NOPASSWD: MYSQL_CMNDS
#选项说明:
User_Alias 表示具有sudo权限的用户列表; Host_Alias表示主机的列表
Runas_Alias 表示用户以什么身份登录
Cmnd_Alias 表示允许执行的命令列表(命令需要使用完整路径)
把该权限分发给从服务器,当故障发生时,从服务器也可以自己设置VIP
[root@master ~]# for i in 30 40;do scp /etc/sudoers.d/admin 192.168.226.$i:/etc/sudoers.d/;done
测试master上的admin是否可以挂载VIP
[root@master ~]# su - admin
[admin@master ~]$ sudo /sbin/ifconfig ens33:1 192.168.226.100 broadcast 192.168.226.255 netmask 255.255.255.0
[admin@master ~]$ ip a
1: lo: ,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: ,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:60:de:7a brd ff:ff:ff:ff:ff:ff
inet 192.168.226.20/24 brd 192.168.226.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.226.100/24 brd 192.168.226.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
inet6 fe80::1b78:bfb3:4567:b45c/64 scope link noprefixroute
valid_lft forever preferred_lft forever
创建工作目录
[admin@mha ~]$ su - root
[root@mha ~]# mkdir /etc/mha
[root@mha ~]# mkdir -p /data/mha/masterha/app1
[root@mha ~]# chown -R admin.admin /data/mha
编写配置文件
[root@mha ~]# vim /etc/mha/app1.conf
[server default]
# 设置监控用户和密码
user=mha
password=123
# 设置复制环境中的复制用户和密码
repl_user=slave
repl_password=123
# 设置ssh的登录用户名
ssh_user=admin
# 设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行failover
ping_interval=3
# 设置mha的工作目录
manager_workdir=/data/mha/masterha/app1
# 设置mysql master保存binlog的目录,以便MHA可以找到master的二进制日志
master_binlog_dir=/usr/local/mysql/data
# 设置master的pid文件
master_pid_file=/usr/local/mysql/data/master.pid
# 设置mysql master在发生切换时保存binlog的目录(在mysql master上创建这个目录)
remote_workdir=/data/mysql/mha
# 设置mha日志文件
manager_log=/data/mha/masterha/app1/app1-3306.log
# MHA到master的监控之间出现问题,MHA Manager将会尝试从slave1和slave2登录到master上
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.226.30 -s 192.168.226.40 --user=admin --port=22 --master_host=192.168.226.20 --master_port=3306
# 设置自动failover时候的切换脚本(故障发生时,自动挂载VIP到SLAVE1或SLAVE2)
master_ip_failover_script="/etc/mha/master_ip_failover.sh 192.168.226.100 1"
# 设置手动切换时候的切换脚本
#master_ip_online_change_script="/etc/mha/master_ip_online_change.sh 192.168.226.100 1"
# 设置故障发生后关闭故障主机脚本
# shutdown_script="/etc/mha/power_manager"
[server1]
hostname=192.168.226.20
port= 3306
candidate_master=1
[server2]
hostname=192.168.226.30
port= 3306
candidate_master=1
[server3]
hostname=192.168.226.40
port= 3306
candidate_master=1
上传脚本
[root@mha mha]# rz
#传入准备好的脚本
[root@mha mha]# ls
app1.conf master_ip_failover.sh
[root@mha mha]# vim master_ip_failover.sh
注意:脚本内容中要修改网卡和连接用户为
my $interface = 'ens33'; 网卡名(38行)
my $ssh_user = "admin"; 用户名(110行)
[root@mha mha]# chmod +x /etc/mha/master_ip_failover.sh
检查ssh互信
[root@mha mha]# su - admin
[admin@mha ~]$ masterha_check_ssh --conf=/etc/mha/app1.conf
Fri Oct 1 00:05:11 2021 - [info] All SSH connection tests passed successfully.
检测MySQL主从集群状态
[admin@mha ~]$ masterha_check_repl --conf=/etc/mha/app1.conf
Fri Oct 1 00:46:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln265] Checking slave sta on 192.168.226.20(192.168.226.20:3306). err=Got error when executing SHOW SLAVE STATUS. Access denied; at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation
MySQL Replication Health is NOT OK!
报错: Checking slave sta on 192.168.226.20(192.168.226.20:3306). err=Got error when executing SHOW SLAVE STATUS. Access denied; at least one of) the SUPER, REPLICATION
原因:经排查知,给mha授权时出现了问题
解决:grant all privileges on . to ‘mha’@‘192.168.226.%’;
[admin@mha ~]$ masterha_check_repl --conf=/etc/mha/app1.conf
MySQL Replication Health is OK.
检查mha状态
[admin@mha ~]$ masterha_check_status --conf=/etc/mha/app1.conf
app1 is stopped(2:NOT_RUNNING).
#将mha在后台启动
[admin@mha ~]$ nohup masterha_manager --conf=/etc/mha/app1.conf --remove_dead_master_conf --ignore_last_failover &
[admin@mha ~]$ jobs
[1]+ 运行中 nohup masterha_manager --conf=/etc/mha/app1.conf --remove_dead_master_conf --ignore_last_failover &
[admin@mha ~]$ masterha_check_status --conf=/etc/mha/app1.conf
app1 (pid:12758) is running(0:PING_OK), master:192.168.226.20
在master上安装测试工具
[admin@master ~]$ su - root
[root@master ~]# yum -y install sysbench
插入测试数据
[root@master ~]# mysql -uroot -p123
mysql> create database test charset utf8mb4;
mysql> grant all on *.* to 'mha'@'localhost' identified by '123';
mysql> flush privileges;
mysql> quit
[root@master ~]# sysbench /usr/share/sysbench/oltp_read_only.lua \
--mysql-host=192.168.226.20 --mysql-port=3306 --mysql-user=mha \
--mysql-password=123 --mysql-socket=/tmp/mysql.sock \
--mysql-db=test --db-driver=mysql --tables=1 \
--table-size=100000 --report-interval=10 --threads=128 --time=120 prepare
mysql> use test;
mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
在mha上监控日志
[root@mha ~]# tail -f /data/mha/masterha/app1/app1-3306.log
模拟故障
[root@master ~]# service mysqld stop
mha上的监控显示
#####VIP漂移
Enabling the VIP - 192.168.226.100 on the new master - 192.168.226.30
我们信任您已经从系统管理员那里了解了日常注意事项。
总结起来无外乎这三点:
#1) 尊重别人的隐私。
#2) 输入前要先考虑(后果和风险)。
#3) 权力越大,责任越大。
sudo: 没有终端存在,且未指定 askpass 程序
=======sudo /sbin/ifconfig ens33:1 192.168.226.100 broadcast netmask && sudo /sbin/arping -f -q -c 5 -w 5 -I ens33 -s 192.168.226.100 -U =================
#####故障切换
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.226.20(192.168.226.20:3306)
Selected 192.168.226.30(192.168.226.30:3306) as a new master.
192.168.226.30(192.168.226.30:3306): OK: Applying all logs succeeded.
192.168.226.30(192.168.226.30:3306): OK: Activated master IP address.
192.168.226.40(192.168.226.40:3306): OK: Slave started, replicating from 192.168.226.30(192.168.226.30:3306)
192.168.226.30(192.168.226.30:3306): Resetting slave info succeeded.
Master failover to 192.168.226.30(192.168.226.30:3306) completed successfully.
#自动选出192.168.226.30作为主服务器
#查看192.168.226.40(slave2)Master_Host: 192.168.226.30
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.226.30
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000002
Read_Master_Log_Pos: 19092701
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes