Mysql8 MHA高可用搭建

文章目录

  • 前言:
        • 环境准备:
        • 机器配置
  • 一、mysql8单机安装
        • 1、安装:
        • 2、编辑配置文件:
        • 3、生成日志文件和目录:
        • 4、配置环境变量
        • 5、制作启动文件:
        • 6、启动服务:
        • 7、用初始密码进入mysql,修改初始密码:
        • 8、一些注意事项
  • 二、mysql8主从配置
        • 1、在所有主从上执行,因为每个机器都有可能使master:
        • 2、修改所有主从的配置文件my.cnf:
        • 3、重启这三台机器,分别执行:
        • 4、进入slave01的mysql命令行,执行:
        • 5、配置ssh密钥登入各个机器互登:
        • 6、遇到的问题总结
  • 三、mysql8基于MHA的高可用集群
        • 1、安装MHA软件
        • 2、安装mha监控manager,只要在77机器上安装:
        • 3、在manager管理机器上配置管理节点:
        • 4、编写配置文件中提到的/data1/mysql_mha/master_ip_failover脚本:
        • 5、给该脚本添加可执行权限:
        • 6、在其他所有节点上创建mha的工作目录:
        • 7、在master上创建mha这个用户来访问数据库节点:
        • 8、进行检测工作,检测ssh免密和主从,在manager上执行:
        • 9、检测没有报错,可以在manager上启动MHA:
        • 10、用ps检测有没有进程:
        • 11、第一次启动的时候,需要给master机器设置vip:
        • 12、配置邮件提醒
        • 13、配置企业微信提醒
        • 14、MHA发生故障转移之后,再次启动步骤
  • 四、一些问题汇总
        • 1、执行masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf出现问题
        • 2、执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现问题
        • 3、mysql-如何删除主从同步
        • 4、重新进行主从同步的时候出现Slave_SQL_Running: NO
        • 5、重新进行主从同步的时候出现Slave_IO_Running: NO
        • 6、当主库挂了之后,mha进行故障转移的时候出现如下错误
        • 7、当mha进行了故障转移之后mha会自动停止
        • 8、新节点上线, 故障转换恢复注意事项
        • 9、第二次启动mha之后,主节点挂掉出错
        • 10、VIP设置问题
        • 11、解决vip漂移问题 网卡名称不一致问题?
        • 12、 两个节点多次进行主从切换,执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf报错
        • 13、centos7设置mysql开机自启动
        • 14、mysql_mha.failover.complete文件
        • 15、如果前面启动mha时加了--remove_dead_master_conf参数,则会将旧的主库的信息删除
        • 16、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误
        • 17、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误
        • 18、启动10.6.119.241的mysql出现错误
        • 19、yum 不能使用时候,可以直接从另外一台相似环境上复制/etc/yum.repo.d目录,就可以使用了

前言:

现在网上的文章都是抄来抄去的,都是没有经过实验就发出来的文章。以下内容为我真实的搭建过程

环境准备:

操作系统:centos7
mysql版本:mysql 8
mha版本:0.58

机器配置

节点ip 角色 机器属性
10.8.40.77 物理机
10.8.40.68 主备 物理机
10.6.119.241 从节点 虚拟机(sysadm转root)
10.6.110.170 MHA manager 虚拟机(sysadm转root)
10.8.40.79 vip 虚拟IP

参考博客地址:https://blog.csdn.net/qq_37369726/article/details/104462513

一、mysql8单机安装

1、安装:

mkdir /data1
cd /data1
mkdir mysql8
cd /data1/mysql8
#下载地址,mysql8好像没有tar.gz换成了tar.xz
wget http://mirrors.163.com/mysql/Downloads/MySQL-8.0/mysql-8.0.23-linux-glibc2.12-x86_64.tar.xz

#解压
xz -d mysql-8.0.23-linux-glibc2.12-x86_64.tar.xz
tar -xvf mysql-8.0.23-linux-glibc2.12-x86_64.tar

mv mysql-8.0.23-linux-glibc2.12-x86_64 /usr/local/mysql8/
cd /usr/local/mysql8/

#创建目录和用户
mkdir /data1/mysql8

# 用户组
groupadd mysql
# 用户 (用户名/密码)
useradd -g mysql mysql
授权data文件夹权限
chown  -R  mysql:mysql /data1/mysql8/
#初始化,mysql8会生成一个随机密码,一定要记下来
/usr/local/mysql8/bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql8/ --datadir=/data1/mysql8/data

#注意此处需要保证--datadir目录为空,否在会报错
2021-02-04T06:14:21.041598Z 0 [ERROR] [MY-010457] [Server] --initialize specified but the data directory has files in it. Aborting.
2021-02-04T06:14:21.041614Z 0 [ERROR] [MY-013236] [Server] The designated data directory /data1/mysql8/ is unusable. You can remove all files that the server added to it.
2021-02-04T06:14:21.041758Z 0 [ERROR] [MY-010119] [Server] Aborting

#会生成一个临时密码,要记住
A temporary password is generated for root@localhost: *Pj>af)-P0mA

2、编辑配置文件:

#位置:/etc/my.cnf
vim my.cnf
#内容
[mysqld]
datadir=/data1/mysql8/data
socket=/var/lib/mysql/mysql.sock
port=3307
user=mysql
default_authentication_plugin=mysql_native_password
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld_safe]
log-error=/data1/mysql8/log/mariadb/mysql8.log
pid-file=/data1/mysql8/run/mariadb/mysql8.pid

[client]
default-character-set=utf8
socket=/var/lib/mysql/mysql.sock

[mysql]
default-character-set=utf8
socket=/var/lib/mysql/mysql.sock

#
# include all files from the config directory
#
!includedir /etc/my.cnf.d

3、生成日志文件和目录:

#不需要此行cp /etc/my.cnf /usr/local/mysql8/
mkdir /var/lib/mysql
chown mysql:mysql -R /var/lib/mysql	mkdir -p /data1/mysql8/log/mariadb/
touch /data1/mysql8/log/mariadb/mysql8.log
chown mysql:mysql -R /data1/mysql8/log/mariadb/

4、配置环境变量

echo "export PATH=$PATH:/usr/local/mysql8/bin"  >>  /etc/profile
source /etc/profile

5、制作启动文件:

cp /usr/local/mysql8/support-files/mysql.server /etc/init.d/mysqld8
vim /etc/init.d/mysqld8
#修改
basedir=/usr/local/mysql8
datadir=/data1/mysql8/data
#下面两行配置不需要取消修改
#conf=$basedir/my.cnf
#在$bindir/mysqld_safe后面添加my.cnf的配置路径
#$bindir/mysqld_safe --defaults-file="$conf" --datadir="$datadir"

6、启动服务:

#有两种方式启动
#一
/etc/init.d/mysqld8 start
#二,指定配置文件启动(有问题,用第一种)
 /usr/local/mysql8/bin/mysqld_safe  --defaults-file=/usr/local/mysql8/my.cnf

7、用初始密码进入mysql,修改初始密码:

mysql -uroot -p"lfsjj9)-"
#修改密码
ALTER USER 'root'@'localhost'  IDENTIFIED WITH mysql_native_password BY '123456';
FLUSH PRIVILEGES;
select host, user, authentication_string, plugin from user;
update user set host='%',plugin='mysql_native_password' where user='root';

8、一些注意事项

#启动mysql 
/etc/init.d/mysqld8 start
#重启mysql 
/etc/init.d/mysqld8 restart
#查看mysql启动状态
/etc/rc.d/init.d/mysqld8 status
ps aux |grep mysql
#问题1、密码失效
	1、修改my.cnf 增加 skip-grant-tables=1 /登录时,跳过权限验证
	2、update user set authentication_string=MD5('root') where user='root' and Host = 'localhost';
	或者update user set authentication_string=SHA1('root') where user='root' and Host = 'localhost';
	FLUSH PRIVILEGES;
	select host, user, authentication_string, plugin from user;

	如果直接执行ALTER USER报错,可以按下面顺序执行
 	update user set host='%',plugin='mysql_native_password',authentication_string='' where user='root';
  	FLUSH PRIVILEGES;
  	ALTER USER 'root'@'localhost'  IDENTIFIED WITH mysql_native_password BY '123456';

#问题2、 Starting MySQL.2021-02-05T01:43:59.542284Z mysqld_safe Directory '/var/lib/mysql' for UNIX socket file don't exists.
	解决:mkdir /var/lib/mysql
	
#问题3、Starting MySQL... ERROR! The server quit without updating PID file (/data1/mysql8/data/CT-DevOps-DB.pid).
	解决查看错误日志:Could not create unix socket lock file /var/lib/mysql/mysql.sock.lock.
 	这个是权限不足导致,设置权限 chown -R mysql:mysql /var/lib/mysql/
 	grant all privileges on *.* to 'root'@'%' ;
 	
# 问题4、navicate连接不上
	解决:添加需要监听的端口/sbin/iptables -I INPUT -p tcp --dport 3307 -j ACCEPT
	iptables-save > /etc/sysconfig/iptables iptables-save是将规则追加到一个文件

=======================================================================================

二、mysql8主从配置

1、在所有主从上执行,因为每个机器都有可能使master:

create user 'repl'@'%' identified with mysql_native_password by '123456';
grant replication slave on *.* to 'repl'@'%';
flush privileges;

2、修改所有主从的配置文件my.cnf:

[mysqld]
#server_id不可以相同,可以改成ip对应
server_id=160   
log_bin=mysql_bin 
relay_log=relay_bin
log_slave_updates=on
#开启gtid模式
gtid_mode=ON   
enforce_gtid_consistency=ON

3、重启这三台机器,分别执行:

/etc/init.d/mysqld8 restart

4、进入slave01的mysql命令行,执行:

stop slave;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=1;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456', master_log_file='mysql_bin.000020',master_log_pos=236;
start slave;
#查看是否主从成功
show slave status\G;
#结果显示两个yes表示成功
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

进入slave2,操作同上,一模一样。

5、配置ssh密钥登入各个机器互登:

#其中一台master示例
#生成密钥
ssh-keygen

#配置免密登入
cd /root/.ssh/
ssh-copy-id -i id_rsa [email protected]
#241机器特殊,需要先登录sysadm之后,再转root,所以应该进入cd /home/sysadm/.ssh目录
ssh-copy-id -i id_rsa [email protected]
ssh-copy-id -i id_rsa [email protected]

其他机器也一样。

6、遇到的问题总结

问题1:ERROR 1777 (HY000): CHANGE MASTER TO MASTER_AUTO_POSITION = 1 cannot be executed because @@GLOBAL.GTID_MODE = OFF

解决:
1.在所有主从实例结构中执行:
		set global   ENFORCE_GTID_CONSISTENCY   = WARN;
		##说明,这是一个警告模式,如果有还没有执行完的sql
2.如果没有警告,所有实例中执行:
	set global ENFORCE_GTID_CONSISTENCY = ON;

3.在所有实例中执行:
	set global GTID_MODE = OFF_PERMISSIVE;

4.在所有实例中执行:
	set global GTID_MODE = ON_PERMISSIVE;

5,在所有主从实例中执行,检查是否还有事务没有结束:
	SHOW STATUS LIKE 'ONGOING_ANONYMOUS_TRANSACTION_COUNT';
	#注意value一定要为0

6.检查slave的binlog点位,如果还没有应用完binlog,则需等待
	show slave status\G;

7.在所有实例中执行:
	set global GTID_MODE = ON;

8.在从库上执行:
	stop slave;
	CHANGE MASTER TO MASTER_AUTO_POSITION = 1;
	start slave;

已经升级为GTID模式了。
最后记得更改my.cnf文件:
	gtid_mode = on
	enforce_gtid_consistency = on

问题2:[root@CT-DevOps-DB sysadm]# mysql -u root -p bash: mysql: command not found…

解决:
#添加软链接
ln -s  /usr/local/mysql8/bin/mysql  /usr/bin

=======================================================================================

三、mysql8基于MHA的高可用集群

1、安装MHA软件

#下载地址:
	https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58
#注意,所有节点都需要安装MHA node
#1、先安装相关依赖:
	yum -y install epel-release
	yum -y install perl-DBD-MySQL perl-DBI ncftp
#2、安装mha:
	rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

2、安装mha监控manager,只要在77机器上安装:

#下载地址
	https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
#下载好了之后,先安装依赖
yum -y install epel-release
yum -y install perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-DBD-MySQL ncftp
#如果有些依赖没有,就重新安装epel包
yum -y remove epel-release
yum -y install epel-release

#下载manager包之后安装
 rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
######################################################################################
#离线安装epel-release
Download the latest epel-release rpm from
http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/,下载rpm文件
Install epel-release rpm:
# rpm -Uvh epel-release*rpm
	
#离线安装perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-DBD-MySQL ncftp
Download the latest epel-release rpm from
http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/p/,下载rpm文件

3、在manager管理机器上配置管理节点:

#创建相关目录
mkdir /etc/mha
mkdir /data1/mysql_mha
#编写配置文件
vim /etc/mha/mysql_mha.cnf
#添加

mysql_mha.cnf文件内容:

[server default]
#mha访问数据库的账号与密码
user=mha
password=123456
port=3307
#指定mha的工作目录
manager_workdir=/data1/mysql_mha
#指定管理日志路径
manager_log=/data1/mysql_mha/manager.log
#指定master节点存放binlog的日志文件的目录 log_bin=mysql_bin默认是在/var/lib/mysql
master_binlog_dir=/data1/mysql8/data
#指定mha在远程节点上的工作目录
remote_workdir=/data1/mysql_mha
#指定主从复制的mysq用户和密码
repl_user=repl
repl_password=123456
#指定检测间隔时间
ping_interval=1
#指定一个脚本,该脚本实现了在主从切换之后,将虚拟ip漂移到新的master上
master_ip_failover_script=/data1/mysql_mha/master_ip_failover
#指定用于二次检查节点状态的节点,这里不要配置主节点的ip,否则主节点网络断掉或者机器断电就无法切换
secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.68 -s 10.6.119.241
#用于故障切换的时候发送邮件提醒
report_script=/data1/mysql_mha/send_mail
[server1]
hostname=10.8.40.77
port=3307
ssh_user=root
candidate_master=1
#check_repl_delay=0
[server2]
hostname=10.8.40.68
port=3307
ssh_user=root
candidate_master=1
#check_repl_delay=0

#注意,当241和MHA manager部署在一台机器的时候下面的内容直接注释掉,否则校验不通过
[server3]
hostname=10.6.119.241
ssh_user=sysadm
port=3307
no_master=1
ignore_fail=1

4、编写配置文件中提到的/data1/mysql_mha/master_ip_failover脚本:

master_ip_failover脚本内容:

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command, $orig_master_host, $orig_master_ip,$ssh_user,
    $orig_master_port, $new_master_host, $new_master_ip,$new_master_port,
    $orig_master_ssh_port,$new_master_ssh_port,$new_master_user,$new_master_password
);

# 这里定义的虚拟IP配置要注意,这个ip必须要与你自己的集群在同一个网段,否则无效
my $vip = '10.8.40.79/24';
my $key = '1';
# 这里的网卡名称 “eno1” 需要根据你机器的网卡名称进行修改
# 如果多台机器直接的网卡名称不统一,有两种方式,一个是改脚本,二是把网卡名称修改成统一
# 我这边实际情况是修改成统一的网卡名称
my $ssh_start_vip = "sudo /sbin/ifconfig eno1:$key $vip";
my $ssh_stop_vip = "sudo /sbin/ifconfig eno1:$key down";
my $ssh_Bcast_arp= "sudo /sbin/arping -I bond0 -c 3 -A $vip";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'orig_master_ssh_port=i' => \$orig_master_ssh_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
    'new_master_ssh_port' => \$new_master_ssh_port,
    'new_master_user' => \$new_master_user,
    'new_master_password' => \$new_master_password

);

exit &main();

sub main {
    $ssh_user = defined $ssh_user ? $ssh_user : 'root';
    print "\n\nIN SCRIPT TEST====$ssh_user|$ssh_stop_vip==$ssh_user|$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
        &start_arp();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub start_arp() {
    `ssh $ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`;
}
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --ssh_user=user --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}


5、给该脚本添加可执行权限:

chmod a+x /data1/mysql_mha/master_ip_failover

6、在其他所有节点上创建mha的工作目录:

mkdir /data1/mysql_mha

7、在master上创建mha这个用户来访问数据库节点:

mysql -uroot -p'123456' 
create user 'mha'@'%' identified with mysql_native_password by '123456';
grant all privileges on *.* to 'mha'@'%';
flush privileges;

8、进行检测工作,检测ssh免密和主从,在manager上执行:

masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf

9、检测没有报错,可以在manager上启动MHA:

nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf &> /data1/mysql_mha/manager.log &
#停止
masterha_stop --conf=/etc/mha/mysql_mha.cnf
#查看master状态
masterha_check_status --conf=/etc/mha/mysql_mha.cnf

10、用ps检测有没有进程:

ps aux |grep masterha_manager
root      10819  0.2  4.5 299648 22032 pts/0    S    19:23   0:00 perl /usr/bin/masterha_manager --conf=/etc/mha/mysql_mha.cnf

11、第一次启动的时候,需要给master机器设置vip:

ifconfig eno1:1 10.8.40.79/24

12、配置邮件提醒

​ 网上提供的邮件发送脚本都是需要SMTP授权码,但是由于我们公司没有SMTP授权码,所以自己写了一个邮件发送的脚本

#/etc/mha/mysql_mha.cnf增加下面内容
report_script=/data1/mysql_mha/send_mail

​ send_mail脚本内容:

#!/bin/bash
# 脚本的日志文件
LOGFILE="/data1/mysql_mha/email.log"
:>"$LOGFILE"
exec 1>"$LOGFILE"
exec 2>&1
SMTP_server='smtp.123.com'
username='[email protected]'
password='111*'
from_email_address='[email protected]'
to_email_address='***@***,***@***'

message_subject_utf8="MHA集群主库故障转移提醒"

HTML_PATH=html_path
echo "

">">$HTML_PATH echo "MHA集群主节点发生故障,进行节点故障转移,请及时解决查看!!!">>$HTML_PATH echo "

"
>>$HTML_PATH echo "

以下为MHA集群的相关信息:

"
>>$HTML_PATH echo "" cellspacing="0" width="700">
节点角色 作用
10.6.110.170MHA manager MHA监控节点
10.8.40.77master/master.bak 主库或者主备
10.8.40.68master/master.bak 主库或者主备
10.6.119.241slave 从库
10.8.40.79VIP 虚拟ip
"
>>$HTML_PATH echo "
"
>>$HTML_PATH echo "

详细错误日志路径为:10.6.110.170:/data1/mysql_mha/manager.log

"
>>$HTML_PATH message_body_utf8=$(cat $HTML_PATH) #message_body_utf8="mysql的MHA集群主节点发生故障,进行节点故障转移,请及时解决查看!!!" # 转换邮件标题为GB2312,解决邮件标题含有中文,收到邮件显示乱码的问题。 message_subject_gb2312=`iconv -t GB2312 -f UTF-8 << EOF $message_subject_utf8 EOF` [ $? -eq 0 ] && message_subject="$message_subject_gb2312" || message_subject="$message_subject_utf8" # 转换邮件内容为GB2312,解决收到邮件内容乱码 message_body_gb2312=`iconv -t GB2312 -f UTF-8 << EOF $message_body_utf8 EOF` [ $? -eq 0 ] && message_body="$message_body_gb2312" || message_body="$message_body_utf8" # 发送邮件 sendEmail='/usr/bin/sendEmail' set -x $sendEmail -s "$SMTP_server" -xu "$username" -xp "$password" -f "$from_email_address" -t "$to_email_address" -u "$message_subject" -m "$message_body" -o message-content-type=html -o message-charset=gb2312 #同时配置了企业微信通知 sh /data1/mysql_mha/send_wechat

13、配置企业微信提醒

​ 这个是调用我们公司已经配置好的企业微信接口

​ /data1/mysql_mha/send_wechat

#!/bin/bash

curl -X POST -H 'Content-Type: application/json'  --data '{"content": "MHA集群主节点发生故障,进行节点故障转移,请及时解决查看!!!详细错误日志路径为:10.6.110.170:/data1/mysql_mha/manager.log","users": "xxx"}'  http://ip:port/wechat/api/sendText > /data1/mysql_mha/wechat.log

14、MHA发生故障转移之后,再次启动步骤

#1、
在10.6.110.170节点删除/data1/mysql_mha/mysql_mha.failover.complete文件

#2、把宕机节点重新加入mysql集群
stop slave;
change master to master_host='10.8.40.68', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456', master_log_file='mysql_bin.000002',master_log_pos=156;
start slave;
show slave status\G;
	
#3、修改/etc/mha/mysql_mha.cnf中的
secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.68 -s 10.6.119.241 把主节点去掉,增加新的从节点
	
#4、做网络互联检查
masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf
	
#5、mha状态检查
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
	
#6、启动mha
nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf &> /data1/mysql_mha/manager.log &
	
#7、查看mha状态
masterha_check_status --conf=/etc/mha/mysql_mha.cnf

=======================================================================================

四、一些问题汇总

1、执行masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf出现问题

Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Feb 19 14:41:23 2021 - [debug]  Connecting via SSH from [email protected](10.8.40.77:22) to [email protected](10.6.119.241:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from [email protected](10.8.40.77:22) to [email protected](10.6.119.241:22) failed!
Fri Feb 19 14:41:25 2021 - [debug]

解决:是因为mha的manager和slave在一台机器上,所以/etc/mha/mysql_mha.cnf最后一个注释掉,即把与manager在一台机器上的[server3]注释即可

	#[server3]
	#hostname=10.6.119.241
	#指定该节点不参与master选举
	#no_master=1

再次出现1的问题:现象是在sysadm用户下看到的–conf=/etc/mha/mysql_mha.cnf内容与root权限下看到的不一致
解决:原因是从root权限内切换到sysadm 导致ssh免密互通无效,重新打开一个连接session,再次设置ssh-keygen就可以了(这个问题卡了一下午)

2、执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现问题

Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Feb 19 14:47:07 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!

解决:修改配置文件

[server1]
hostname=10.8.40.77
port=3307
#指定该节点可以参与master选举
candidate_master=1
[server2]
hostname=10.8.40.68
port=3307
candidate_master=1
check_repl_delay=0

3、mysql-如何删除主从同步

stop slave;
reset slave all;

4、重新进行主从同步的时候出现Slave_SQL_Running: NO

解决:进入主库
show master status;
	+------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
	| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                    |
	+------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
	| mysql_bin.000003 |     2216 |              |                  | 7d3cc0a4-6955-11eb-8bbd-0cc47a9cd887:1-9,
	
找到File和Position

在从节点执行
stop slave;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_log_file='mysql_bin.mysql_bin.000009',master_log_pos=156;
start slave;
show slave status\G;

5、重新进行主从同步的时候出现Slave_IO_Running: NO

解决:进入错误日志查看具体的报错信息,日志位置/data1/mysql8/log/mariadb/

6、当主库挂了之后,mha进行故障转移的时候出现如下错误

ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
	*Copyright(c) UIH All rights Reserved*
	Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
	Monitoring server 10.6.119.241 is NOT reachable!
	Mon Feb 22 14:31:33 2021 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
	Mon Feb 22 14:31:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
	Mon Feb 22 14:31:33 2021 - [warning] Connection failed 2 time(s)..
	Mon Feb 22 14:31:34 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
	Mon Feb 22 14:31:34 2021 - [warning] Connection failed 3 time(s)..
	Mon Feb 22 14:31:35 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
	Mon Feb 22 14:31:35 2021 - [warning] Connection failed 4 time(s)..
	Mon Feb 22 14:31:35 2021 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
	Mon Feb 22 14:31:36 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
	Mon Feb 22 14:31:36 2021 - [warning] Connection failed 1 time(s)..

解决:At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
这句话意思是至少有一个监控节点是不可达的,就是上面的10.6.119.241这个节点
这是因为77和68都是物理机不是虚机,所以他们之间是有区别的,
77和68都是用root用户登录的,而241节点是需要先登录sysadm账号的,进而才能切换成root账户
是由配置secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.77 -s 10.8.40.68 -s 10.6.119.241引起的
所以不能直接使用masterha_secondary_check,需要对masterha_secondary_check脚本进行修改
当是241服务器的时候切换登录账号为sysadm

  foreach my $monitoring_server (@monitoring_servers) {
  ###############在脚本中增加这一段###########################
	  my $slave_ip = '10.6.119.241';
	  if ( $monitoring_server eq "$slave_ip" ){
		$ssh_user = "sysadm";
	  }
  ###############在脚本中增加这一段###########################
	  my $ssh_user_host = $ssh_user . '@' . $monitoring_server;
	  my $command =
	"ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \"perl -e "
		. "\\\"use IO::Socket::INET; my \\\\\\\$sock = IO::Socket::INET->new"
		. "(PeerAddr => \\\\\\\"$master_host\\\\\\\", PeerPort=> $master_port, "
		. "Proto =>'tcp', Timeout => $timeout); if(\\\\\\\$sock) { close(\\\\\\\$sock); "
		. "exit 3; } exit 0;\\\" \"";
	  print "=====$command";
	  my $ret = system($command);

7、当mha进行了故障转移之后mha会自动停止

注意:在重新启动mha之前一定要先删除mysql_mha.failover.complete 这个文件
a、注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:
[root@myql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha is stopped(2:NOT_RUNNING).
	
b、提供新的从节点以修复复制集群
	原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。
	注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,
	并再次检测其状态。   
	我们就以刚刚关闭的那台主作为新添加的机器,来进行数据库的恢复:   
	原本的 slave1 已经成为了新的主机器,所以,我们对其进行完全备份,而后把备份的数据发送到我们新添加的机器上:
[root@mysql-slave1 ~]# mkdir /backup
[root@mysql-slave1 ~]# mysqldump --all-database > /backup/mysql-backup-`date +%F-%T`-all.sql
[root@mysql-slave1 ~]# scp /backup/mysql-backup-2017-11-23-09\:57\:09-all.sql root@node2:~
然后在 node2 节点上进行数据恢复:

[root@mysql-master ~]# mysql < mysql-backup-2017-11-23-09\:57\:09-all.sql
接下来就是配置主从。照例查看一下现在的主的二进制日志和位置,然后就进行如下设置:

mysql> change master to master_host='172.16.14.202',master_port=3306,
		-> master_user='slave',master_password='slave',
		-> master_log_file='mysql-bin.000005',master_log_pos=154;
	Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> start slave;
	Query OK, 0 rows affected (0.00 sec)

mysql> show slave status \G;
	*************************** 1. row ***************************
				   Slave_IO_State: Waiting for master to send event
					  Master_Host: 172.16.14.202
					  Master_User: slave
					  Master_Port: 3306
					Connect_Retry: 60
				  Master_Log_File: mysql-bin.000005
			  Read_Master_Log_Pos: 358
				   Relay_Log_File: localhost-relay-bin.000002
					Relay_Log_Pos: 524
			Relay_Master_Log_File: mysql-bin.000005
				 Slave_IO_Running: Yes
				 Slave_SQL_Running: Yes
	 可以看出,我们的主从已经配置好了。	

c、/etc/mha/mysql_mha.cnf 修改配置文件
secondary_check_script=/usr/bin/masterha_secondary_check  -s 10.8.40.68  -s 10.6.119.241
修改成除了主节点外的其他节点
d、新节点提供后再次执行检查操作
 masterha_check_repl --conf=/etc/mha/mysql_mha.cnf

8、新节点上线, 故障转换恢复注意事项

  1. 在生产环境中, 当你的主节点挂了后, 一定要在从节点上做一个备份, 拿着备份文件把主节点手动提升为从节点, 并指明从哪一个日志文件的位置开始复制

  2. 每一次自动完成转换后, 每一次的(replication health )检测不ok始终都是启动不了必须手动修复主节点, 除非你改配置文件

  3. 手动修复主节点提升为从节点后, 再次运行检测命令

    [root@mysql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
    mha (pid:9561) is running(0:PING_OK), master:192.168.37.133
    
  4. 再次运行起来就恢复成功了

    [root@mysql-mha ~]# masterha_manager --conf=/etc/mha_master/mha.cnf
    

9、第二次启动mha之后,主节点挂掉出错

[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln310] Last failover was done at 2021/02/22 14:31:44. Current time is too early to do failover again. If you want to do failover, manually remove /data1/mysql_mha/mysql_mha.failover.complete and run this script again.
Mon Feb 22 15:41:52 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_manager line 65.

解决:先删除/data1/mysql_mha/mysql_mha.failover.complete

10、VIP设置问题

解决:vip地址应该选择与服务器地址在同一网段的ip 如:10.8.40.79
10.8.40.80 可以ping通 但是navicat连接不上(这个ip已经被别人占用)
10.8.40.79 成功
ip a 看到如下内容

ip a
inet 10.8.40.79/24 brd 10.8.40.255 scope global secondary eno1:1

11、解决vip漂移问题 网卡名称不一致问题?

解决

修改网卡名称:
1)、vi /etc/default/grub
增加net.ifnames=0 biosdevname=0,最终如下
GRUB_CMDLINE_LINUX=“crashkernel=auto rhgb net.ifnames=0 biosdevname=0 quiet”
2)、重新生成GRUB配置并更新内核参数
grub2-mkconfig -o /boot/grub2/grub.cfg
3)、 进入/etc/sysconfig/network-scripts 对应网卡文件中修改
HWADDR=08:00:27:9f:1d:c5(要修改的eno1的MAC地址)
DEVICE=eno1
NAME=eno1
4)、在/etc/udev/rules.d/70-persistent-net.rules中添加自定义规则,若是没有70-persistent-net.rules新建就可以了
增加UBSYSTEM==“net”, ACTION==“add”, DRIVERS=="?*", ATTR{address}==“mac地址”, NAME=“eno1”
5)、reboot命令重启

12、 两个节点多次进行主从切换,执行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf报错

Mon Feb 22 16:17:21 2021 - [warning] SQL Thread is stopped(no error) on 10.8.40.77(10.8.40.77:3307)
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
Master 10.8.40.68(10.8.40.68:3307), replicating from 10.8.40.77(10.8.40.77:3307)
Master 10.8.40.77(10.8.40.77:3307), replicating from 10.8.40.68(10.8.40.68:3307)

Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Feb 22 16:17:21 2021 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

解决:从服务器set global read_only=1;

13、centos7设置mysql开机自启动

vi /etc/rc.local
/etc/init.d/mysqld8 start

14、mysql_mha.failover.complete文件

​ 如果两次切换之间时间太短(8小时),需要将此文件删掉

15、如果前面启动mha时加了–remove_dead_master_conf参数,则会将旧的主库的信息删除

16、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误

Thu Feb 25 09:00:28 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012  --slave_pass=xxx
Thu Feb 25 09:00:28 2021 - [info]   Connecting to [email protected](10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
  Checking slave recovery environment settings..
readdir() attempted on invalid dirhandle $dir at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 271.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:00:29 2021 - [info] Got exit code 1 (Not master dead).

解决:给10.6.119.241下面的sysadm用户的/data1/mysql8赋予权限
chown -R sysadm:sysadm /data1/mysql8/data

17、运行masterha_check_repl --conf=/etc/mha/mysql_mha.cnf出现下面错误

Thu Feb 25 09:06:57 2021 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012  --slave_pass=xxx
Thu Feb 25 09:06:57 2021 - [info]   Connecting to [email protected](10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
  Checking slave recovery environment settings..
    Relay log found at /data1/mysql8/data, up to relay_bin.000012
    Temporary relay log file is /data1/mysql8/data/relay_bin.000012
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1030 (HY000) at line 1: Got error 168 - 'Unknown (generic) error from engine' from storage engine
mysql command failed with rc 1:0!
 at /usr/bin/apply_diff_relay_logs line 404.
        main::check() called at /usr/bin/apply_diff_relay_logs line 536
        eval {...} called at /usr/bin/apply_diff_relay_logs line 514
        main::main() called at /usr/bin/apply_diff_relay_logs line 121
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:06:58 2021 - [info] Got exit code 1 (Not master dead).

解决:在mysql配置文件my.cnf增加配置

default-storage-engine=innodb
innodb_force_recovery=0
max_allowed_packet=1024M

18、启动10.6.119.241的mysql出现错误

 Starting MySQL.... ERROR! The server quit without updating PID file (/data1/mysql8/data/CT-DevOps-DB.pid).

解决:先去查看错误日志
2021-02-25T01:18:56.929627Z 0 [ERROR] [MY-010274] [Server] Could not open unix socket lock file /var/lib/mysql/mysql.sock.lock.
还是权限的问题
给sysadm用户增加/var/lib/mysql/权限
chown -R sysadm:sysadm /var/lib/mysql

19、yum 不能使用时候,可以直接从另外一台相似环境上复制/etc/yum.repo.d目录,就可以使用了

你可能感兴趣的:(Mysql,mysql)