现在网上的文章都是抄来抄去的,都是没有经过实验就发出来的文章。以下内容为我真实的搭建过程
操作系统:centos7
mysql版本:mysql 8
mha版本:0.58
节点ip | 角色 | 机器属性 |
---|---|---|
10.8.40.77 | 主 | 物理机 |
10.8.40.68 | 主备 | 物理机 |
10.6.119.241 | 从节点 | 虚拟机(sysadm转root) |
10.6.110.170 | MHA manager | 虚拟机(sysadm转root) |
10.8.40.79 | vip | 虚拟IP |
参考博客地址:https://blog.csdn.net/qq_37369726/article/details/104462513
mkdir /data1
cd /data1
mkdir mysql8
cd /data1/mysql8
#下载地址,mysql8好像没有tar.gz换成了tar.xz
wget http://mirrors.163.com/mysql/Downloads/MySQL-8.0/mysql-8.0.23-linux-glibc2.12-x86_64.tar.xz
#解压
xz -d mysql-8.0.23-linux-glibc2.12-x86_64.tar.xz
tar -xvf mysql-8.0.23-linux-glibc2.12-x86_64.tar
mv mysql-8.0.23-linux-glibc2.12-x86_64 /usr/local/mysql8/
cd /usr/local/mysql8/
#创建目录和用户
mkdir /data1/mysql8
# 用户组
groupadd mysql
# 用户 (用户名/密码)
useradd -g mysql mysql
授权data文件夹权限
chown -R mysql:mysql /data1/mysql8/
#初始化,mysql8会生成一个随机密码,一定要记下来
/usr/local/mysql8/bin/mysqld --initialize --user=mysql --basedir=/usr/local/mysql8/ --datadir=/data1/mysql8/data
#注意此处需要保证--datadir目录为空,否在会报错
2021-02-04T06:14:21.041598Z 0 [ERROR] [MY-010457] [Server] --initialize specified but the data directory has files in it. Aborting.
2021-02-04T06:14:21.041614Z 0 [ERROR] [MY-013236] [Server] The designated data directory /data1/mysql8/ is unusable. You can remove all files that the server added to it.
2021-02-04T06:14:21.041758Z 0 [ERROR] [MY-010119] [Server] Aborting
#会生成一个临时密码,要记住
A temporary password is generated for root@localhost: *Pj>af)-P0mA
#位置:/etc/my.cnf
vim my.cnf
#内容
[mysqld]
datadir=/data1/mysql8/data
socket=/var/lib/mysql/mysql.sock
port=3307
user=mysql
default_authentication_plugin=mysql_native_password
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld_safe]
log-error=/data1/mysql8/log/mariadb/mysql8.log
pid-file=/data1/mysql8/run/mariadb/mysql8.pid
[client]
default-character-set=utf8
socket=/var/lib/mysql/mysql.sock
[mysql]
default-character-set=utf8
socket=/var/lib/mysql/mysql.sock
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
#不需要此行cp /etc/my.cnf /usr/local/mysql8/
mkdir /var/lib/mysql
chown mysql:mysql -R /var/lib/mysql mkdir -p /data1/mysql8/log/mariadb/
touch /data1/mysql8/log/mariadb/mysql8.log
chown mysql:mysql -R /data1/mysql8/log/mariadb/
echo "export PATH=$PATH:/usr/local/mysql8/bin" >> /etc/profile
source /etc/profile
cp /usr/local/mysql8/support-files/mysql.server /etc/init.d/mysqld8
vim /etc/init.d/mysqld8
#修改
basedir=/usr/local/mysql8
datadir=/data1/mysql8/data
#下面两行配置不需要取消修改
#conf=$basedir/my.cnf
#在$bindir/mysqld_safe后面添加my.cnf的配置路径
#$bindir/mysqld_safe --defaults-file="$conf" --datadir="$datadir"
#有两种方式启动
#一
/etc/init.d/mysqld8 start
#二,指定配置文件启动(有问题,用第一种)
/usr/local/mysql8/bin/mysqld_safe --defaults-file=/usr/local/mysql8/my.cnf
mysql -uroot -p"lfsjj9)-"
#修改密码
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '123456';
FLUSH PRIVILEGES;
select host, user, authentication_string, plugin from user;
update user set host='%',plugin='mysql_native_password' where user='root';
#启动mysql
/etc/init.d/mysqld8 start
#重启mysql
/etc/init.d/mysqld8 restart
#查看mysql启动状态
/etc/rc.d/init.d/mysqld8 status
ps aux |grep mysql
#问题1、密码失效
1、修改my.cnf 增加 skip-grant-tables=1 /登录时,跳过权限验证
2、update user set authentication_string=MD5('root') where user='root' and Host = 'localhost';
或者update user set authentication_string=SHA1('root') where user='root' and Host = 'localhost';
FLUSH PRIVILEGES;
select host, user, authentication_string, plugin from user;
如果直接执行ALTER USER报错,可以按下面顺序执行
update user set host='%',plugin='mysql_native_password',authentication_string='' where user='root';
FLUSH PRIVILEGES;
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '123456';
#问题2、 Starting MySQL.2021-02-05T01:43:59.542284Z mysqld_safe Directory '/var/lib/mysql' for UNIX socket file don't exists.
解决:mkdir /var/lib/mysql
#问题3、Starting MySQL... ERROR! The server quit without updating PID file (/data1/mysql8/data/CT-DevOps-DB.pid).
解决查看错误日志:Could not create unix socket lock file /var/lib/mysql/mysql.sock.lock.
这个是权限不足导致,设置权限 chown -R mysql:mysql /var/lib/mysql/
grant all privileges on *.* to 'root'@'%' ;
# 问题4、navicate连接不上
解决:添加需要监听的端口/sbin/iptables -I INPUT -p tcp --dport 3307 -j ACCEPT
iptables-save > /etc/sysconfig/iptables iptables-save是将规则追加到一个文件
=======================================================================================
create user 'repl'@'%' identified with mysql_native_password by '123456';
grant replication slave on *.* to 'repl'@'%';
flush privileges;
[mysqld]
#server_id不可以相同,可以改成ip对应
server_id=160
log_bin=mysql_bin
relay_log=relay_bin
log_slave_updates=on
#开启gtid模式
gtid_mode=ON
enforce_gtid_consistency=ON
/etc/init.d/mysqld8 restart
stop slave;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=1;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456', master_log_file='mysql_bin.000020',master_log_pos=236;
start slave;
#查看是否主从成功
show slave status\G;
#结果显示两个yes表示成功
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
进入slave2,操作同上,一模一样。
#其中一台master示例
#生成密钥
ssh-keygen
#配置免密登入
cd /root/.ssh/
ssh-copy-id -i id_rsa [email protected]
#241机器特殊,需要先登录sysadm之后,再转root,所以应该进入cd /home/sysadm/.ssh目录
ssh-copy-id -i id_rsa [email protected]
ssh-copy-id -i id_rsa [email protected]
其他机器也一样。
问题1:ERROR 1777 (HY000): CHANGE MASTER TO MASTER_AUTO_POSITION = 1 cannot be executed because @@GLOBAL.GTID_MODE = OFF
解决:
1.在所有主从实例结构中执行:
set global ENFORCE_GTID_CONSISTENCY = WARN;
##说明,这是一个警告模式,如果有还没有执行完的sql
2.如果没有警告,所有实例中执行:
set global ENFORCE_GTID_CONSISTENCY = ON;
3.在所有实例中执行:
set global GTID_MODE = OFF_PERMISSIVE;
4.在所有实例中执行:
set global GTID_MODE = ON_PERMISSIVE;
5,在所有主从实例中执行,检查是否还有事务没有结束:
SHOW STATUS LIKE 'ONGOING_ANONYMOUS_TRANSACTION_COUNT';
#注意value一定要为0
6.检查slave的binlog点位,如果还没有应用完binlog,则需等待
show slave status\G;
7.在所有实例中执行:
set global GTID_MODE = ON;
8.在从库上执行:
stop slave;
CHANGE MASTER TO MASTER_AUTO_POSITION = 1;
start slave;
已经升级为GTID模式了。
最后记得更改my.cnf文件:
gtid_mode = on
enforce_gtid_consistency = on
问题2:[root@CT-DevOps-DB sysadm]# mysql -u root -p bash: mysql: command not found…
解决:
#添加软链接
ln -s /usr/local/mysql8/bin/mysql /usr/bin
=======================================================================================
#下载地址:
https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58
#注意,所有节点都需要安装MHA node
#1、先安装相关依赖:
yum -y install epel-release
yum -y install perl-DBD-MySQL perl-DBI ncftp
#2、安装mha:
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
#下载地址
https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
#下载好了之后,先安装依赖
yum -y install epel-release
yum -y install perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-DBD-MySQL ncftp
#如果有些依赖没有,就重新安装epel包
yum -y remove epel-release
yum -y install epel-release
#下载manager包之后安装
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
######################################################################################
#离线安装epel-release
Download the latest epel-release rpm from
http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/,下载rpm文件
Install epel-release rpm:
# rpm -Uvh epel-release*rpm
#离线安装perl-Config-Tiny perl-Time-HiRes perl-Parallel-ForkManager perl-Log-Dispatch perl-DBD-MySQL ncftp
Download the latest epel-release rpm from
http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/p/,下载rpm文件
#创建相关目录
mkdir /etc/mha
mkdir /data1/mysql_mha
#编写配置文件
vim /etc/mha/mysql_mha.cnf
#添加
mysql_mha.cnf文件内容:
[server default]
#mha访问数据库的账号与密码
user=mha
password=123456
port=3307
#指定mha的工作目录
manager_workdir=/data1/mysql_mha
#指定管理日志路径
manager_log=/data1/mysql_mha/manager.log
#指定master节点存放binlog的日志文件的目录 log_bin=mysql_bin默认是在/var/lib/mysql
master_binlog_dir=/data1/mysql8/data
#指定mha在远程节点上的工作目录
remote_workdir=/data1/mysql_mha
#指定主从复制的mysq用户和密码
repl_user=repl
repl_password=123456
#指定检测间隔时间
ping_interval=1
#指定一个脚本,该脚本实现了在主从切换之后,将虚拟ip漂移到新的master上
master_ip_failover_script=/data1/mysql_mha/master_ip_failover
#指定用于二次检查节点状态的节点,这里不要配置主节点的ip,否则主节点网络断掉或者机器断电就无法切换
secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.68 -s 10.6.119.241
#用于故障切换的时候发送邮件提醒
report_script=/data1/mysql_mha/send_mail
[server1]
hostname=10.8.40.77
port=3307
ssh_user=root
candidate_master=1
#check_repl_delay=0
[server2]
hostname=10.8.40.68
port=3307
ssh_user=root
candidate_master=1
#check_repl_delay=0
#注意,当241和MHA manager部署在一台机器的时候下面的内容直接注释掉,否则校验不通过
[server3]
hostname=10.6.119.241
ssh_user=sysadm
port=3307
no_master=1
ignore_fail=1
master_ip_failover脚本内容:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $orig_master_host, $orig_master_ip,$ssh_user,
$orig_master_port, $new_master_host, $new_master_ip,$new_master_port,
$orig_master_ssh_port,$new_master_ssh_port,$new_master_user,$new_master_password
);
# 这里定义的虚拟IP配置要注意,这个ip必须要与你自己的集群在同一个网段,否则无效
my $vip = '10.8.40.79/24';
my $key = '1';
# 这里的网卡名称 “eno1” 需要根据你机器的网卡名称进行修改
# 如果多台机器直接的网卡名称不统一,有两种方式,一个是改脚本,二是把网卡名称修改成统一
# 我这边实际情况是修改成统一的网卡名称
my $ssh_start_vip = "sudo /sbin/ifconfig eno1:$key $vip";
my $ssh_stop_vip = "sudo /sbin/ifconfig eno1:$key down";
my $ssh_Bcast_arp= "sudo /sbin/arping -I bond0 -c 3 -A $vip";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_ssh_port=i' => \$orig_master_ssh_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_ssh_port' => \$new_master_ssh_port,
'new_master_user' => \$new_master_user,
'new_master_password' => \$new_master_password
);
exit &main();
sub main {
$ssh_user = defined $ssh_user ? $ssh_user : 'root';
print "\n\nIN SCRIPT TEST====$ssh_user|$ssh_stop_vip==$ssh_user|$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
&start_arp();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub start_arp() {
`ssh $ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --ssh_user=user --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
chmod a+x /data1/mysql_mha/master_ip_failover
mkdir /data1/mysql_mha
mysql -uroot -p'123456'
create user 'mha'@'%' identified with mysql_native_password by '123456';
grant all privileges on *.* to 'mha'@'%';
flush privileges;
masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf &> /data1/mysql_mha/manager.log &
#停止
masterha_stop --conf=/etc/mha/mysql_mha.cnf
#查看master状态
masterha_check_status --conf=/etc/mha/mysql_mha.cnf
ps aux |grep masterha_manager
root 10819 0.2 4.5 299648 22032 pts/0 S 19:23 0:00 perl /usr/bin/masterha_manager --conf=/etc/mha/mysql_mha.cnf
ifconfig eno1:1 10.8.40.79/24
网上提供的邮件发送脚本都是需要SMTP授权码,但是由于我们公司没有SMTP授权码,所以自己写了一个邮件发送的脚本
#/etc/mha/mysql_mha.cnf增加下面内容
report_script=/data1/mysql_mha/send_mail
send_mail脚本内容:
#!/bin/bash
# 脚本的日志文件
LOGFILE="/data1/mysql_mha/email.log"
:>"$LOGFILE"
exec 1>"$LOGFILE"
exec 2>&1
SMTP_server='smtp.123.com'
username='[email protected]'
password='111*'
from_email_address='[email protected]'
to_email_address='***@***,***@***'
message_subject_utf8="MHA集群主库故障转移提醒"
HTML_PATH=html_path
echo "">"
>$HTML_PATH
echo "MHA集群主节点发生故障,进行节点故障转移,请及时解决查看!!!">>$HTML_PATH
echo "">>$HTML_PATH
echo "以下为MHA集群的相关信息:
">>$HTML_PATH
echo "" cellspacing="0" width="700">节点 角色 作用 10.6.110.170 MHA manager MHA监控节点 10.8.40.77 master/master.bak 主库或者主备 10.8.40.68 master/master.bak 主库或者主备 10.6.119.241 slave 从库 10.8.40.79 VIP 虚拟ip
">>$HTML_PATH
echo "
">>$HTML_PATH
echo "详细错误日志路径为:10.6.110.170:/data1/mysql_mha/manager.log
">>$HTML_PATH
message_body_utf8=$(cat $HTML_PATH)
#message_body_utf8="mysql的MHA集群主节点发生故障,进行节点故障转移,请及时解决查看!!!"
# 转换邮件标题为GB2312,解决邮件标题含有中文,收到邮件显示乱码的问题。
message_subject_gb2312=`iconv -t GB2312 -f UTF-8 << EOF
$message_subject_utf8
EOF`
[ $? -eq 0 ] && message_subject="$message_subject_gb2312" || message_subject="$message_subject_utf8"
# 转换邮件内容为GB2312,解决收到邮件内容乱码
message_body_gb2312=`iconv -t GB2312 -f UTF-8 << EOF
$message_body_utf8
EOF`
[ $? -eq 0 ] && message_body="$message_body_gb2312" || message_body="$message_body_utf8"
# 发送邮件
sendEmail='/usr/bin/sendEmail'
set -x
$sendEmail -s "$SMTP_server" -xu "$username" -xp "$password" -f "$from_email_address" -t "$to_email_address" -u "$message_subject" -m "$message_body" -o message-content-type=html -o message-charset=gb2312
#同时配置了企业微信通知
sh /data1/mysql_mha/send_wechat
这个是调用我们公司已经配置好的企业微信接口
/data1/mysql_mha/send_wechat
#!/bin/bash
curl -X POST -H 'Content-Type: application/json' --data '{"content": "MHA集群主节点发生故障,进行节点故障转移,请及时解决查看!!!详细错误日志路径为:10.6.110.170:/data1/mysql_mha/manager.log","users": "xxx"}' http://ip:port/wechat/api/sendText > /data1/mysql_mha/wechat.log
#1、
在10.6.110.170节点删除/data1/mysql_mha/mysql_mha.failover.complete文件
#2、把宕机节点重新加入mysql集群
stop slave;
change master to master_host='10.8.40.68', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456', master_log_file='mysql_bin.000002',master_log_pos=156;
start slave;
show slave status\G;
#3、修改/etc/mha/mysql_mha.cnf中的
secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.68 -s 10.6.119.241 把主节点去掉,增加新的从节点
#4、做网络互联检查
masterha_check_ssh --conf=/etc/mha/mysql_mha.cnf
#5、mha状态检查
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
#6、启动mha
nohup masterha_manager --conf=/etc/mha/mysql_mha.cnf &> /data1/mysql_mha/manager.log &
#7、查看mha状态
masterha_check_status --conf=/etc/mha/mysql_mha.cnf
=======================================================================================
Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Feb 19 14:41:23 2021 - [debug] Connecting via SSH from [email protected](10.8.40.77:22) to [email protected](10.6.119.241:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Feb 19 14:41:24 2021 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from [email protected](10.8.40.77:22) to [email protected](10.6.119.241:22) failed!
Fri Feb 19 14:41:25 2021 - [debug]
解决:是因为mha的manager和slave在一台机器上,所以/etc/mha/mysql_mha.cnf最后一个注释掉,即把与manager在一台机器上的[server3]注释即可
#[server3]
#hostname=10.6.119.241
#指定该节点不参与master选举
#no_master=1
再次出现1的问题:现象是在sysadm用户下看到的–conf=/etc/mha/mysql_mha.cnf内容与root权限下看到的不一致
解决:原因是从root权限内切换到sysadm 导致ssh免密互通无效,重新打开一个连接session,再次设置ssh-keygen就可以了(这个问题卡了一下午)
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Fri Feb 19 14:47:07 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Feb 19 14:47:07 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
解决:修改配置文件
[server1]
hostname=10.8.40.77
port=3307
#指定该节点可以参与master选举
candidate_master=1
[server2]
hostname=10.8.40.68
port=3307
candidate_master=1
check_repl_delay=0
stop slave;
reset slave all;
解决:进入主库
show master status;
+------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+--------------------------------------------------------------------------------------+
| mysql_bin.000003 | 2216 | | | 7d3cc0a4-6955-11eb-8bbd-0cc47a9cd887:1-9,
找到File和Position
在从节点执行
stop slave;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_auto_position=0;
change master to master_host='10.8.40.77', master_port=3307, master_user='repl', master_password='123456',master_log_file='mysql_bin.mysql_bin.000009',master_log_pos=156;
start slave;
show slave status\G;
解决:进入错误日志查看具体的报错信息,日志位置/data1/mysql8/log/mariadb/
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Monitoring server 10.6.119.241 is NOT reachable!
Mon Feb 22 14:31:33 2021 - [warning] At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
Mon Feb 22 14:31:33 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:33 2021 - [warning] Connection failed 2 time(s)..
Mon Feb 22 14:31:34 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:34 2021 - [warning] Connection failed 3 time(s)..
Mon Feb 22 14:31:35 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:35 2021 - [warning] Connection failed 4 time(s)..
Mon Feb 22 14:31:35 2021 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Mon Feb 22 14:31:36 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.8.40.77' (111))
Mon Feb 22 14:31:36 2021 - [warning] Connection failed 1 time(s)..
解决:At least one of monitoring servers is not reachable from this script. This is likely a network problem. Failover should not happen.
这句话意思是至少有一个监控节点是不可达的,就是上面的10.6.119.241这个节点
这是因为77和68都是物理机不是虚机,所以他们之间是有区别的,
77和68都是用root用户登录的,而241节点是需要先登录sysadm账号的,进而才能切换成root账户
是由配置secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.77 -s 10.8.40.68 -s 10.6.119.241引起的
所以不能直接使用masterha_secondary_check,需要对masterha_secondary_check脚本进行修改
当是241服务器的时候切换登录账号为sysadm
foreach my $monitoring_server (@monitoring_servers) {
###############在脚本中增加这一段###########################
my $slave_ip = '10.6.119.241';
if ( $monitoring_server eq "$slave_ip" ){
$ssh_user = "sysadm";
}
###############在脚本中增加这一段###########################
my $ssh_user_host = $ssh_user . '@' . $monitoring_server;
my $command =
"ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \"perl -e "
. "\\\"use IO::Socket::INET; my \\\\\\\$sock = IO::Socket::INET->new"
. "(PeerAddr => \\\\\\\"$master_host\\\\\\\", PeerPort=> $master_port, "
. "Proto =>'tcp', Timeout => $timeout); if(\\\\\\\$sock) { close(\\\\\\\$sock); "
. "exit 3; } exit 0;\\\" \"";
print "=====$command";
my $ret = system($command);
注意:在重新启动mha之前一定要先删除mysql_mha.failover.complete 这个文件
a、注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:
[root@myql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha is stopped(2:NOT_RUNNING).
b、提供新的从节点以修复复制集群
原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。
注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,
并再次检测其状态。
我们就以刚刚关闭的那台主作为新添加的机器,来进行数据库的恢复:
原本的 slave1 已经成为了新的主机器,所以,我们对其进行完全备份,而后把备份的数据发送到我们新添加的机器上:
[root@mysql-slave1 ~]# mkdir /backup
[root@mysql-slave1 ~]# mysqldump --all-database > /backup/mysql-backup-`date +%F-%T`-all.sql
[root@mysql-slave1 ~]# scp /backup/mysql-backup-2017-11-23-09\:57\:09-all.sql root@node2:~
然后在 node2 节点上进行数据恢复:
[root@mysql-master ~]# mysql < mysql-backup-2017-11-23-09\:57\:09-all.sql
接下来就是配置主从。照例查看一下现在的主的二进制日志和位置,然后就进行如下设置:
mysql> change master to master_host='172.16.14.202',master_port=3306,
-> master_user='slave',master_password='slave',
-> master_log_file='mysql-bin.000005',master_log_pos=154;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.14.202
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 358
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 524
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
可以看出,我们的主从已经配置好了。
c、/etc/mha/mysql_mha.cnf 修改配置文件
secondary_check_script=/usr/bin/masterha_secondary_check -s 10.8.40.68 -s 10.6.119.241
修改成除了主节点外的其他节点
d、新节点提供后再次执行检查操作
masterha_check_repl --conf=/etc/mha/mysql_mha.cnf
在生产环境中, 当你的主节点挂了后, 一定要在从节点上做一个备份, 拿着备份文件把主节点手动提升为从节点, 并指明从哪一个日志文件的位置开始复制
每一次自动完成转换后, 每一次的(replication health )检测不ok始终都是启动不了必须手动修复主节点, 除非你改配置文件
手动修复主节点提升为从节点后, 再次运行检测命令
[root@mysql-mha ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:9561) is running(0:PING_OK), master:192.168.37.133
再次运行起来就恢复成功了
[root@mysql-mha ~]# masterha_manager --conf=/etc/mha_master/mha.cnf
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln310] Last failover was done at 2021/02/22 14:31:44. Current time is too early to do failover again. If you want to do failover, manually remove /data1/mysql_mha/mysql_mha.failover.complete and run this script again.
Mon Feb 22 15:41:52 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_manager line 65.
解决:先删除/data1/mysql_mha/mysql_mha.failover.complete
解决:vip地址应该选择与服务器地址在同一网段的ip 如:10.8.40.79
10.8.40.80 可以ping通 但是navicat连接不上(这个ip已经被别人占用)
10.8.40.79 成功
ip a 看到如下内容
ip a
inet 10.8.40.79/24 brd 10.8.40.255 scope global secondary eno1:1
解决
修改网卡名称:
1)、vi /etc/default/grub
增加net.ifnames=0 biosdevname=0,最终如下
GRUB_CMDLINE_LINUX=“crashkernel=auto rhgb net.ifnames=0 biosdevname=0 quiet”
2)、重新生成GRUB配置并更新内核参数
grub2-mkconfig -o /boot/grub2/grub.cfg
3)、 进入/etc/sysconfig/network-scripts 对应网卡文件中修改
HWADDR=08:00:27:9f:1d:c5(要修改的eno1的MAC地址)
DEVICE=eno1
NAME=eno1
4)、在/etc/udev/rules.d/70-persistent-net.rules中添加自定义规则,若是没有70-persistent-net.rules新建就可以了
增加UBSYSTEM==“net”, ACTION==“add”, DRIVERS=="?*", ATTR{address}==“mac地址”, NAME=“eno1”
5)、reboot命令重启
Mon Feb 22 16:17:21 2021 - [warning] SQL Thread is stopped(no error) on 10.8.40.77(10.8.40.77:3307)
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781] Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead! Check configurations for details. Master configurations are as below:
Master 10.8.40.68(10.8.40.68:3307), replicating from 10.8.40.77(10.8.40.77:3307)
Master 10.8.40.77(10.8.40.77:3307), replicating from 10.8.40.68(10.8.40.68:3307)
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Mon Feb 22 16:17:21 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Mon Feb 22 16:17:21 2021 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
解决:从服务器set global read_only=1;
vi /etc/rc.local
/etc/init.d/mysqld8 start
如果两次切换之间时间太短(8小时),需要将此文件删掉
Thu Feb 25 09:00:28 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012 --slave_pass=xxx
Thu Feb 25 09:00:28 2021 - [info] Connecting to [email protected](10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
Checking slave recovery environment settings..
readdir() attempted on invalid dirhandle $dir at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 271.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:00:29 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:00:29 2021 - [info] Got exit code 1 (Not master dead).
解决:给10.6.119.241下面的sysadm用户的/data1/mysql8赋予权限
chown -R sysadm:sysadm /data1/mysql8/data
Thu Feb 25 09:06:57 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.6.119.241 --slave_ip=10.6.119.241 --slave_port=3307 --workdir=/data1/mysql_mha --target_version=8.0.23 --manager_version=0.58 --relay_dir=/data1/mysql8/data --current_relay_log=relay_bin.000012 --slave_pass=xxx
Thu Feb 25 09:06:57 2021 - [info] Connecting to [email protected](10.6.119.241:22)..
Welcome to UIH
ATTENTION: You have logged onto a secured device. ONLY Authorized users can access.
*Copyright(c) UIH All rights Reserved*
Checking slave recovery environment settings..
Relay log found at /data1/mysql8/data, up to relay_bin.000012
Temporary relay log file is /data1/mysql8/data/relay_bin.000012
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1030 (HY000) at line 1: Got error 168 - 'Unknown (generic) error from engine' from storage engine
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 404.
main::check() called at /usr/bin/apply_diff_relay_logs line 536
eval {...} called at /usr/bin/apply_diff_relay_logs line 514
main::main() called at /usr/bin/apply_diff_relay_logs line 121
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48.
Thu Feb 25 09:06:58 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Feb 25 09:06:58 2021 - [info] Got exit code 1 (Not master dead).
解决:在mysql配置文件my.cnf增加配置
default-storage-engine=innodb
innodb_force_recovery=0
max_allowed_packet=1024M
Starting MySQL.... ERROR! The server quit without updating PID file (/data1/mysql8/data/CT-DevOps-DB.pid).
解决:先去查看错误日志
2021-02-25T01:18:56.929627Z 0 [ERROR] [MY-010274] [Server] Could not open unix socket lock file /var/lib/mysql/mysql.sock.lock.
还是权限的问题
给sysadm用户增加/var/lib/mysql/权限
chown -R sysadm:sysadm /var/lib/mysql