proxysql + mysqlshell + innodbcluster 动态扩展mgr测试 (使用独立ip)

mysql shell 安装

参考我写的另一篇文章: https://blog.csdn.net/eaglejiawo1120/article/details/84301142

测试环境

主机ip mgr端口 是否可读写
192.168.1.103 3210(mysql启动端口),32100(mgr管理端口)
192.168.1.104 3210(mysql启动端口),32100(mgr管理端口)
192.168.1.104 3220(mysql启动端口),32200(mgr管理端口)

 

 

 

 

mysqlshell 安装位置 192.168.1.103
proxysql 安装位置 192.168.1.103,192.168.1.104

 

 

 

注意:本测试中涉及到mysql用户登录密码的部分都为空(只是为了方便测试)。

 

cluster状态值含义

mysqlshell查看cluster的状态时,可以看到如下几种值:

  • ONLINE: The instance is online and participating in the cluster.

  • OFFLINE: The instance has lost connection to the other instances.

  • RECOVERING: The instance is attempting to synchronize with the cluster by retrieving transactions it needs before it can become an ONLINE member.

  • UNREACHABLE: The instance has lost communication with the cluster.

  • ERROR: The instance has encountered an error during the recovery phase or while applying a transaction.

Important

Once an instance enters ERROR state, the super_read_only option is set to ON. To leave the ERROR state you must manually configure the instance with super_read_only=OFF.

  • (MISSING): The state of an instance which is part of the configured cluster, but is currently unavailable.

Note

The MISSING state is specific to InnoDB cluster, it is not a state generated by Group Replication. MySQL Shell uses this state to indicate instances that are registered in the metadata, but cannot be found in the live cluster view.

 

 

mgr 状态含义

Field

Description

Group Synchronized

ONLINE

The member is ready to serve as a fully functional group member, meaning that the client can connect and start executing transactions.

Yes

RECOVERING

The member is in the process of becoming an active member of the group and is currently going through the recovery process, receiving state information from a donor.

No

OFFLINE

The plugin is loaded but the member does not belong to any group.

No

ERROR

The state of the member. Whenever there is an error on the recovery phase or while applying changes, the server enters this state.

No

UNREACHABLE

Whenever the local failure detector suspects that a given server is not reachable, because maybe it has crashed or was disconnected involuntarily, it shows that server's state as 'unreachable'.

No

Important

Once an instance enters ERROR state, the super_read_only option is set to ON. To leave the ERROR state you must manually configure the instance withsuper_read_only=OFF.

 

创建一个初始化的写实例。

创建步骤如下:

假设mysql的安装目录是:/usr/local/mysql5.7.24

步骤如下:

cd /usr/local/mysql5.7.24  && mkdir -p mgr_data/s3210 && cd mgr_data/s3210

/usr/local/mysql5.7.24/mysqld --no-defaults --initialize-insecure --basedir=/usr/local/mysql5.7.24  --datadir=/home/mysql_test/mysql_5.7/mgr_data/s3210 --explicit_defaults_for_timestamp

chown -R mysql:mysql /home/mysql_test/mysql_5.7/mgr_data/s3210

创建如下s3210.cnf

[mysqld]
# server configuration
datadir=/home/mysql_test/mysql_5.7/mgr_data/s3210
basedir=/usr/local/mysql5.7.24

port=3210
socket=/home/mysql_test/mysql_5.7/mgr_data/s3210/s3210.sock
server_id=3210
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.1.103:32100"
loose-group_replication_bootstrap_group= off

report_host=192.168.1.103
relay-log=relay-bin

将改配置复制到: /home/mysql_test/mysql_5.7/mgr_data/s3210目录下。

启动实例: /usr/local/mysql5.7.24/bin/mysqld_safe --defaults-file=/home/mysql_test/mysql_5.7/mgr_data/s3210/s3210.cnf &

 

测试写实例并添加到cluster

先按照官网(参考1)的地址配置权限。

如下:

CREATE USER mgr_admin@'%' identified by '';
GRANT ALL PRIVILEGES ON mysql_innodb_cluster_metadata.* TO mgr_admin@'%' WITH GRANT OPTION;
GRANT RELOAD, SHUTDOWN, PROCESS, FILE, SUPER, REPLICATION SLAVE, REPLICATION CLIENT, \
CREATE USER ON *.* TO mgr_admin@'%' WITH GRANT OPTION;
GRANT SELECT ON *.* TO mgr_admin@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

这样配置完成后:

 MySQL  JS > dba.checkInstanceConfiguration('[email protected]:3210')
Please provide the password for '[email protected]:3210':
Validating instance...

Dba.checkInstanceConfiguration: Session account 'mgr_admin'@'%' (used to authenticate 'mgr_admin'@'192.168.1.103') does not have all the required privileges to execute this operation. For more information, see the online documentation. (RuntimeError)

从官网上没有找到对应的解决思路,索性先将权限放开进行测试:

#创建用户和初始化cluster时关闭sql_bin,以免主从切换sql冲突
set sql_log_bin=0;
CREATE USER mgr_admin@'%' identified by '';
GRANT ALL PRIVILEGES ON *.* TO mgr_admin@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

在进行测试:

 MySQL  JS > dba.checkInstanceConfiguration('[email protected]:3210')
Please provide the password for '[email protected]:3210':
Validating instance...

The instance '192.168.1.103:3210' is valid for Cluster usage
{
    "status": "ok"
}
好了,验证通过。


 MySQL 192.168.1.103:3210  JS > dba.createCluster('mgr_test');
A new InnoDB cluster will be created on instance '[email protected]:3210'.

Creating InnoDB cluster 'cluster_test' on '[email protected]:3210'...
Adding Seed Instance...

Cluster successfully created. Use Cluster.addInstance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.


创建cluster也成功,此时需要将sql_log_bin设置为1.

 

配置两个读实例

两个读实例的配置如下:

s3210.cnf

[mysqld]
# server configuration
datadir=/home/mysql_test/mysql_5.7/mgr_data/s3210
basedir=/usr/local/mysql5.7.24

port=3210
socket=/home/mysql_test/mysql_5.7/mgr_data/s3210/s3210.sock
server_id=3220
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.1.104:32100"
loose-group_replication_bootstrap_group= off

report_host=192.168.1.104
relay-log=relay-bin

s3220.cnf

[mysqld]
# server configuration
datadir=/home/mysql_test/mysql_5.7/mgr_data/s3220
basedir=/usr/local/mysql5.7.24

port=3220
socket=/home/mysql_test/mysql_5.7/mgr_data/s3220/s3220.sock
server_id=3230
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.1.104:32200"
loose-group_replication_bootstrap_group= off

report_host=192.168.1.104
relay-log=relay-bin

  分别启动两个读实例.并且添加mgr_admin用户并且赋予权限。同写实例的权限配置。

 

写实例中添加两个读实例到cluster

在192.168.1.103上执行如下脚本:

mysqlsh

 MySQL  192.168.1.103:3210  JS > dba.checkInstanceConfiguration('[email protected]:3210')
Please provide the password for '[email protected]:3210':
Validating instance...

The instance '192.168.1.104:3210' is valid for Cluster usage
{
    "status": "ok"
}

 MySQL  192.168.1.103:3210  JS > dba.checkInstanceConfiguration('[email protected]:3220')
Please provide the password for '[email protected]:3220':
Validating instance...

The instance '192.168.1.104:3220' is valid for Cluster usage
{
    "status": "ok"
}

 MySQL  192.168.1.103:3210  JS > var cluster=dba.getCluster()

 MySQL  192.168.1.103:3210  JS > cluster.addInstance("[email protected]:3210")
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Please provide the password for '[email protected]:3210':
Adding instance to the cluster ...

Cluster.addInstance: ERROR:
Group Replication join failed.
ERROR: Error joining instance to cluster: '192.168.1.104:3210' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication (RuntimeError)

如果碰到以上错误执行: reset master 即可。

 cluster.addInstance("[email protected]:3210")

 cluster.addInstance("[email protected]:3210")

 cluster.status()

MySQL 192.168.1.103:3210  JS > cluster.status();
{
    "clusterName": "mgr_test",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "192.168.1.103:3210",
        "ssl": "DISABLED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
        "topology": {
            "192.168.1.104:3210": {
                "address": "192.168.1.104:3210",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "192.168.1.104:3220": {
                "address": "192.168.1.104:3220",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "RECOVERING"
            },
            "192.168.1.103:3210": {
                "address": "192.168.1.103:3210",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://[email protected]:3210"
}

可以看到两个读实例还是没有能加入到集群中。通过日志查看:有sql执行冲突。

冲突位置:

2018-11-26T09:41:16.568507Z 100 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 150.

所以当前写实例192.168.1.103上的binlog.000001,position为150后的事件gtid.

而后在两个读实例中跳过该事件即可。

具体操作如下:

stop group_replication;

SET GTID_NEXT='f65d4372-f145-11e8-94e9-005056a17264:1';

begin;

commit;

SET GTID_NEXT='AUTOMATIC';

start group_replication;

而后在读实例中mysqlshell中查看两个读实例的当前状态。如果还有错则继续查询引起错误的事件,解决即可。

本例中冲突只是因为创建用户的时候在读写实例都执行了。只要将该冲突解决即可。

一般出现如上错误是因为创建读实例时,添加mgr_admin以及cluster加入读实例的语句没有使用sql_log_bin=0屏蔽导致的。

 

测试写实例停掉再启动


测试中将192.168.1.103中的3210实例停掉后,192.168.1.104的3210实例变为新的写实例。且此时访问192.168.1.104的3210实例,查看当前cluster的状态时显示为missing.

192.168.1.103的3210实例上通过mysqlsh, 执行

cluster=dba.getCluster();

cluster.rejoinInstance('[email protected]:3210')

在192.168.1.103中的3210实例中执行:

stop group_replication;

set global read_only=false;

set sql_log_bin = 0;
drop database mysql_innodb_cluster_metadata;

set sql_log_bin = 1;

reset master;

start group_replication;

再次查看状态,ok。

 

测试读实例停掉再启动

同写实例恢复

 

测试所有实例都挂掉的情况

192.168.1.103的3210实例上通过mysqlsh, 执行

dba.rebootClusterFromCompleteOutage('mgr_test')

然后根据提示恢复即可。

通过

cluster=dba.getCluster();

cluster.getStatus()来确认是否恢复成功。

 

no_quorum的情况

执行:

cluster=dba.getCluster();

cluster.getStatus()

结果如下:

{
    "clusterName": "mgr_test",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "192.168.1.103:3210",
        "ssl": "DISABLED",
        "status": "NO_QUORUM",
        "statusText": "Cluster has no quorum as visible from '192.168.1.103:3210' and cannot process write transactions. 2 members are not active",
        "topology": {
            "192.168.1.104:3210": {
                "address": "192.168.1.104:3210",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "192.168.1.104:3220": {
                "address": "192.168.1.104:3220",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "UNREACHABLE"
            },
            "192.168.1.103:3210": {
                "address": "192.168.1.103:3210",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://[email protected]:3210"

使用online的主机进行恢复cluster.

cluster.forceQuorumUsingPartitionOf("192.168.1.103:3210")

对于192.168.1.104:3220,192.168.1.104:3210。执行cluster.rejoinInstance即可。

 

 

proxysql 监控服务器的可用性

关于 proxysql配置监控的过程不再赘述,参考: 

proxysql安装及集群搭建: https://blog.csdn.net/eaglejiawo1120/article/details/84580419

 

参考

1. https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-production-deployment.html

2.https://lefred.be/content/mysql-innodb-cluster-mysql-shell-and-the-adminapi/

3.binlog查看: https://blog.csdn.net/u010433704/article/details/54962680

4.mgr 集群quorum:https://dev.mysql.com/doc/refman/8.0/en/group-replication-fault-tolerance.html

5. cluster status含义:https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-working-with-cluster.html

6. mgr状态含义:https://dev.mysql.com/doc/refman/8.0/en/group-replication-server-states.html

 

 

你可能感兴趣的:(数据库,mysql,复制方案)