MySQL InnoDB Cluster+Keepalived高可用

一、高可用集群基本信息

1、服务器及IP地址规划

rdc-manager1                192.168.2.109

rdc-manager2                192.168.2.110

MySQL-1                        192.168.2.112

MySQL-2                        192.168.2.113

MySQL-3                        192.168.2.114

MySQL-Roter VIP            192.168.2.100

2、软件安装规划

rdc-manager1                            Keepalived、MySQL-shell、MySQL-Router、MySQL-client

rdc-manager2                            Keepalived、MySQL-shell、MySQL-Router、MySQL-client

sql-1                                           MySQL服务端、MySQL-shell

sql-2                                            MySQL服务端、MySQL-shell

sql-3                                            MySQL服务端、MySQL-shell

3、操作系统

操作系统要求:CentOS Linux release 7.4.1708 (Core)

4、特殊说明

(1)、文中标红的地方都是特殊要注意事宜,包括指令或软件包所需要操作的机器

(2)、请务必理解文中的keepalived的VIP地址具体指的是什么地址

(3)、配置内容较多,请大家一定仔细操作,该文档所有内容均已经在测试环境搭建验证,所以很多问题一定是操作不当导致的请一定要仔细研究本文档。

(5)、常见故障

rpm -ivh gssproxy-0.7.0-17.el7.x86_64.rpm 
error: Failed dependencies:
selinux-policy < 3.13.1-166.el7.noarch conflicts with gssproxy-0.7.0-17.el7.x86_64

在安装了本文提供的依赖包selinux-policy后依然提示这个信息,一般这种情况是由于系统存在了高低两个版本的该软件,确认系统是否有多个版本该软件。

二、配置前准备

1、配置各个主机名称解析到hosts文件(集群中的每一台服务器)

vi /etc/hosts
192.168.2.109 rdc-manager1
192.168.2.110 rdc-manager2
192.168.2.112 sql-1
192.168.2.113 sql-2
192.168.2.114 sql-3

2、关闭SElinux及防火墙(集群中的每一台服务器)


(1)、关闭SElinux

setenforce 0

修改/etc/selinux/config

vim /etc/selinux/config
SELINUX=disabled

(2)、关闭防火墙

systemctl stop firewalld
systemctl disabled firewalld

3、修改系统最大文件打开数(集群中的每一台服务器)

(1)、编辑/etc/security/limits.conf文件,添加内容如图:

MySQL InnoDB Cluster+Keepalived高可用_第1张图片

(2)、注销用户重新登录系统

执行ulimit -n如图:

三、配置Innodb Cluster 集群

1、安装mysql和mysql-shell(每个mysql节点:sql-1、sql-2、sql-3)

(1)、添加mysql组合mysql用户

groupadd mysql
useradd -r -g mysql -s /bin/false mysql

(2)、解压并安装mysql

tar -zvxf mysql-5.7.19-linux-glibc2.12-x86_64.tar.gz
mv  mysql-5.7.19-linux-glibc2.12-x86_64 /usr/local/mysql

(3)、设置环境mysql环境变量

编辑/etc/profile.d/mysql.sh,添加如下内容

MYSQL=/usr/local/mysql/bin
export PATH=$PATH:$MYSQL

加载mysql环境变量

source /etc/profile.d/mysql.sh

(4)、初始化mysq

 备份原有mysql配置文件

cp /etc/my.cnf{,.backup}

清空该配置文件

cat /dev/null >/etc/my.cnf

 创建数据库目录

mkdir -pv /usr/local/mysql/data
chown mysql:mysql /usr/local/mysql/data

初始化数据库

mysqld --initialize --user=mysql --basedir=/usr/local/mysql/ --datadir=/usr/local/mysql/data

(注意:此处会初始化出mysql的初始密码!!!)

(5)、设置mysql开机启动并启动mysql

cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql
chkconfig --add mysql
chkconfig --level 345 mysql on

(6)、初始化mysql密码

mysqladmin -uroot -p'初始化密码' password'123456'

(7)、安装mysql-shell

rpm -ivh mysql-shell-8.0.11-1.el7.x86_64.rpm

2、配置Innodb Cluster集群

(1)、检查并配置实例(每个mysql节点)

mysqlsh

检查实例:

dba.chekInstanceConfiguration('root@localhost:3306')

如图:

MySQL InnoDB Cluster+Keepalived高可用_第2张图片

 配置实例:

dba.configureLocalInstance('root@localhost:3306');

 一个交互式输入1,第二交互式输入192.168.2.%(该地址请根据客户现场实际情况进行调整),第三处指明mysql配置文件位置/etc/my.cnf,第四处输入y,第五处输入y,如下图:

MySQL InnoDB Cluster+Keepalived高可用_第3张图片

 退出mysqlsh输入\q

重启mysql数据库

 service mysql restart

重新登录mysqlsh,并验证实例的配置信息

mysqlsh

检查实例:

dba.chekInstanceConfiguration('root@localhost:3306')

出现如图所示信息证明验证成功

MySQL InnoDB Cluster+Keepalived高可用_第4张图片

(2)、创建cluster集群(确认每个SQL节点的实例都完成上述实例配置并且验证成功)

在任意一台mysql实例节点执行以下命令:

mysqlsh
shell.connect('root@sql-1:3306')
var cluster = dba.createCluster('yspCluster');

如果创建成功输出的信息中会有类似“Cluster successfully created.”的语句

将另外两个节点加入到Cluster集群中

cluster.addInstance('root@sql-2:3306');
cluster.addInstance('root@sql-3:3306');

 整个过程如图所示:

MySQL InnoDB Cluster+Keepalived高可用_第5张图片

 查看集群及个节点配置信息,确认集群是否配置成功

cluster.status();

显示如下内容证明成功:

MySQL InnoDB Cluster+Keepalived高可用_第6张图片

(3)、保存配置信息到配置文件(在每个SQL节点执行)

mysqlsh
dba.configureLocalInstance('root@localhost:3306');

四、配置MySQL-Router

1、安装MySQL-Router和mysqlsh(所有的rdc-manager:分别是rdc-manager1、rdc-manager2)

rpm -ivh mysql-shell-8.0.11-1.el7.x86_64.rpm
rpm -ivh mysql-router-8.0.11-1.el7.x86_64.rpm

2、配置MySQL-Router(所有的rdc-manager主机)

(1)、配置mysqlrouter

mysqlrouter --bootstrap root@sql-1:3306 --user=mysqlrouter

如图:

MySQL InnoDB Cluster+Keepalived高可用_第7张图片

(2)、设置开机启动并启动MySQL-Router

编辑 /etc/rc.local追加如下内容:

nohup mysqlrouter &

添加可执行权限给该文件

chmod +x /etc/rc.local

五、测试MySQL-Router(在rdc-manager1主机上)

1、在rdc-manager1上面安装mysql客户端

(1)、添加mysql组合mysql用户

groupadd mysql
useradd -r -g mysql -s /bin/false mysql

(2)、解压并安装mysql

tar -zvxf mysql-5.7.19-linux-glibc2.12-x86_64.tar.gz
mv  mysql-5.7.19-linux-glibc2.12-x86_64 /usr/local/mysql

(3)、设置环境mysql环境变量

编辑/etc/profile.d/mysql.sh,添加如下内容

MYSQL=/usr/local/mysql/bin
export PATH=$PATH:$MYSQ

加载mysql环境变量

source /etc/profile.d/mysql.sh

2、在rdc-manager1上面连接两个rdc-manager节点上的router

mysql -uroot -p -h 192.168.2.109 -P 6446
mysql -uroot -p -h 192.168.2.110 -P 6446

如图:

MySQL InnoDB Cluster+Keepalived高可用_第8张图片

 能正常登陆表示配置成功!!!

六、配置Keepalived

(1)、安装相关依赖包(请以实际情况为准)(所有rdc-manager节点

rpm -ivh perl-*
rpm -ivh net-snmp-*

安装keepalived

rpm -ivh keepalived-1.3.5-1.el7.x86_64.rpm

(2)、配置keepalived(rdc-manager1)

修改配置文件/etc/keepalived/keepalived.conf内容如下

MySQL InnoDB Cluster+Keepalived高可用_第9张图片

(3)、配置keepalived(rdc-manager2)

修改配置文件/etc/keepalived/keepalived.conf内容如下:

MySQL InnoDB Cluster+Keepalived高可用_第10张图片

(4)、添加检查脚本到/etc/keepalived文件夹下(所有rdc-manager节点)

添加check_port.sh到/etc/keepalived文件夹下面,并添加可执行权限

chmod +x check_port.sh

 脚本内容如下:

#!/bin/env bash
#
##############################################################
# Author sam                        #
##############################################################
# Check_Port - Keepalived Port detection script              #
#                                                            #
# Date: 2018-05-09                                           #
#                                                            #
# System: CentOS 7                                           #
#                                                            #
##############################################################
# Development environment CentOS 7                           #    
##############################################################
# Para,eters for Script
#
CHECK_PORT=$1              # Port to check
#
#
#
# The test port exists or not
#
CHECK_PORT_COMMANDS="grep -w $CHECK_PORT $PORT_LIST"
#
################    Main Script ##############################
#
if [ "$CHECK_PORT" != "" ];then
#
# Check Port
#
    if ! ss -tpnl|$CHECK_PORT_COMMANDS &>/dev/null;then
        sleep 2
        ss -tpnl|$CHECK_PORT_COMMANDS &>/dev/null || exit 1

    fi
fi
#
#

请注意该脚本的文件名要和keepalived.conf中的名称保持一致。

(5)、设置开机启动(所有rdc-manager节点)

systemctl enable keepalived

(6)、启动Keepalived

systemctl start keepalived

(7)、在rdc-manager1上面连MySQL-Router VIP验证可用性

mysql -uroot -h 192.168.2.100 -P 6446 -p

七、常见问题汇总

1、如果节点加入集群前执行了写操作,加入集群时会报类似如下错误!

The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication (RuntimeError)

节点中的错误日志类似如下:

[ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 605da5eb-347d-11e7-b68b-bef8d5ac5be4:1,

cf2fe6ca-3460-11e7-aab3-bef8d5ac5be4:1-7 > Group transactions: 8399a91c-3483-11e7-b68b-bef8d5ac5be4:1-5,

cf2fe6ca-3460-11e7-aab3-bef8d5ac5be4:1-15'

解决办法登录此节点执行,reset master

2、重启节点后需要手动重新加入集群

MySQL InnoDB Cluster+Keepalived高可用_第11张图片
将该节点重新加入集群:

MySQL InnoDB Cluster+Keepalived高可用_第12张图片

3、集群中所有节点发生重启,所有节点都offline,直接获取集群信息失败

mysql-js> var cluster=dba.getCluster('yspCluster')

Dba.getCluster: This function is not available through a session to a standalone instance (RuntimeError)

执行rebootClusterFromCompleteOutage命令,恢复集群

mysqljs>dba.rebootClusterFromCompleteOutage('yspCluster')
Reconfiguring the cluster 'mycluster' from complete outage...
The instance 'sql-2:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y|N]: y
The instance 'sql-3:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y|N]: y
The cluster was successfully rebooted.

4、脑裂场景

当集群中有部分节点出现UNREACHABLE状态,此时集群无法做出决策,,会出现以下局面,此时只剩下一个活跃节点,此节点只能提供查询,无法写入,执行写入操作会hang住。

mysql-js> cluster.status()
{
    "clusterName": "mycluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "sql-1:3306",
        "status": "NO_QUORUM",
        "statusText": "Cluster has no quorum as visible from 'sql-1:3306' and cannot process write transactions. 2 members are not active",
        "topology": {
            "sql-1:3306": {
                "address": "sql-1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "sql-2:3306": {
                "address": "sql-2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "UNREACHABLE"
            },
            "sql-3:3306": {
                "address": "sql-3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            }
        }
    }
}

修复这种状态,需要执行forceQuorumUsingPartitionOf指定当前活跃节点(如果是多个则选择primary node),此时活跃节点可以提供读写操作,然后将其他节点加入此集群。

mysql-js> cluster.forceQuorumUsingPartitionOf('root@sql-1:3306')
Restoring replicaset 'default' from loss of quorum, by using the partition composed of [sql-1:3306]
Please provide the password for 'root@sql-1:3306':
Restoring the InnoDB cluster ...
The InnoDB cluster was successfully restored using the partition from the instance 'root@sql-1:3306'.
WARNING: To avoid a split-brain scenario, ensure that all other members of the replicaset are removed or joined back to the group that was restored.
mysql-js> cluster.status()
{
    "clusterName": "mycluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "sql-1:3306",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
        "topology": {
            "sql-1:3306": {
                "address": "sql-1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "sql-2:3306": {
                "address": "sql-2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "sql-3:3306": {
                "address": "sql-3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
               "status": "(MISSING)"
            }
        }
    }
}
mysql-js> cluster.rejoinInstance('root@sql-2:3306')
mysql-js> cluster.rejoinInstance('root@sql-3:3306')

5、数据库常见问题

(1)、如果需要在从库做写入数据操作(维护时执行SQL语句)需要更改super_read_only = ON 为OFF

mysql> show global variables like '%read_only%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| innodb_read_only | OFF   |
| read_only        | ON    |
| super_read_only  | ON    |
| tx_read_only     | OFF   |
+------------------+-------+
set global read_only=0;


节点状态

ONLINE  - 节点状态正常。

OFFLINE  -   实例在运行,但没有加入任何Cluster。

RECOVERING - 实例已加入Cluster,正在同步数据。

ERROR  -  同步数据发生异常。

UNREACHABLE -  与其他节点通讯中断,可能是网络问题,可能是节点crash。

MISSING 节点已加入集群,但未启动group replication

集群状态

OK – 所有节点处于online状态,有冗余节点。

OK_PARTIAL – 有节点不可用,但仍有冗余节点。

OK_NO_TOLERANCE – 有足够的online节点,但没有冗余,例如:两个节点的Cluster,其中一个挂了,集群就不可用了。

NO_QUORUM – 有节点处于online状态,但达不到法定节点数,此状态下Cluster无法写入,只能读取。

UNKNOWN – 不是online或recovering状态,尝试连接其他实例查看状态。

UNAVAILABLE – 组内节点全是offline状态,但实例在运行,可能实例刚重启还没加入Cluster。

结束了!!!


你可能感兴趣的:(MySQL)