参考我写的另一篇文章: https://blog.csdn.net/eaglejiawo1120/article/details/84301142
主机ip | mgr端口 | 是否可读写 |
192.168.1.103 | 3210(mysql启动端口),32100(mgr管理端口) | 写 |
192.168.1.104 | 3210(mysql启动端口),32100(mgr管理端口) | 读 |
192.168.1.104 | 3220(mysql启动端口),32200(mgr管理端口) | 读 |
mysqlshell 安装位置 | 192.168.1.103 |
proxysql 安装位置 | 192.168.1.103,192.168.1.104 |
注意:本测试中涉及到mysql用户登录密码的部分都为空(只是为了方便测试)。
mysqlshell查看cluster的状态时,可以看到如下几种值:
ONLINE
: The instance is online and participating in the cluster.
OFFLINE
: The instance has lost connection to the other instances.
RECOVERING
: The instance is attempting to synchronize with the cluster by retrieving transactions it needs before it can become an ONLINE
member.
UNREACHABLE
: The instance has lost communication with the cluster.
ERROR
: The instance has encountered an error during the recovery phase or while applying a transaction.
Important
Once an instance enters ERROR
state, the super_read_only
option is set to ON
. To leave the ERROR
state you must manually configure the instance with super_read_only=OFF
.
(MISSING)
: The state of an instance which is part of the configured cluster, but is currently unavailable.Note
The MISSING
state is specific to InnoDB cluster, it is not a state generated by Group Replication. MySQL Shell uses this state to indicate instances that are registered in the metadata, but cannot be found in the live cluster view.
Field |
Description |
Group Synchronized |
---|---|---|
ONLINE |
The member is ready to serve as a fully functional group member, meaning that the client can connect and start executing transactions. |
Yes |
RECOVERING |
The member is in the process of becoming an active member of the group and is currently going through the recovery process, receiving state information from a donor. |
No |
OFFLINE |
The plugin is loaded but the member does not belong to any group. |
No |
ERROR |
The state of the member. Whenever there is an error on the recovery phase or while applying changes, the server enters this state. |
No |
UNREACHABLE |
Whenever the local failure detector suspects that a given server is not reachable, because maybe it has crashed or was disconnected involuntarily, it shows that server's state as 'unreachable'. |
No |
Important
Once an instance enters ERROR
state, the super_read_only
option is set to ON
. To leave the ERROR
state you must manually configure the instance withsuper_read_only=OFF
.
创建步骤如下:
假设mysql的安装目录是:/usr/local/mysql5.7.24
步骤如下:
cd /usr/local/mysql5.7.24 && mkdir -p mgr_data/s3210 && cd mgr_data/s3210
/usr/local/mysql5.7.24/mysqld --no-defaults --initialize-insecure --basedir=/usr/local/mysql5.7.24 --datadir=/home/mysql_test/mysql_5.7/mgr_data/s3210 --explicit_defaults_for_timestamp
chown -R mysql:mysql /home/mysql_test/mysql_5.7/mgr_data/s3210
创建如下s3210.cnf
[mysqld]
# server configuration
datadir=/home/mysql_test/mysql_5.7/mgr_data/s3210
basedir=/usr/local/mysql5.7.24
port=3210
socket=/home/mysql_test/mysql_5.7/mgr_data/s3210/s3210.sock
server_id=3210
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.1.103:32100"
loose-group_replication_bootstrap_group= off
report_host=192.168.1.103
relay-log=relay-bin
将改配置复制到: /home/mysql_test/mysql_5.7/mgr_data/s3210目录下。
启动实例: /usr/local/mysql5.7.24/bin/mysqld_safe --defaults-file=/home/mysql_test/mysql_5.7/mgr_data/s3210/s3210.cnf &
先按照官网(参考1)的地址配置权限。
如下:
CREATE USER mgr_admin@'%' identified by '';
GRANT ALL PRIVILEGES ON mysql_innodb_cluster_metadata.* TO mgr_admin@'%' WITH GRANT OPTION;
GRANT RELOAD, SHUTDOWN, PROCESS, FILE, SUPER, REPLICATION SLAVE, REPLICATION CLIENT, \
CREATE USER ON *.* TO mgr_admin@'%' WITH GRANT OPTION;
GRANT SELECT ON *.* TO mgr_admin@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
这样配置完成后:
MySQL JS > dba.checkInstanceConfiguration('[email protected]:3210')
Please provide the password for '[email protected]:3210':
Validating instance...
Dba.checkInstanceConfiguration: Session account 'mgr_admin'@'%' (used to authenticate 'mgr_admin'@'192.168.1.103') does not have all the required privileges to execute this operation. For more information, see the online documentation. (RuntimeError)
从官网上没有找到对应的解决思路,索性先将权限放开进行测试:
#创建用户和初始化cluster时关闭sql_bin,以免主从切换sql冲突
set sql_log_bin=0;
CREATE USER mgr_admin@'%' identified by '';
GRANT ALL PRIVILEGES ON *.* TO mgr_admin@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
在进行测试:
MySQL JS > dba.checkInstanceConfiguration('[email protected]:3210')
Please provide the password for '[email protected]:3210':
Validating instance...
The instance '192.168.1.103:3210' is valid for Cluster usage
{
"status": "ok"
}
好了,验证通过。
MySQL 192.168.1.103:3210 JS > dba.createCluster('mgr_test');
A new InnoDB cluster will be created on instance '[email protected]:3210'.
Creating InnoDB cluster 'cluster_test' on '[email protected]:3210'...
Adding Seed Instance...
Cluster successfully created. Use Cluster.addInstance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.
创建cluster也成功,此时需要将sql_log_bin设置为1.
两个读实例的配置如下:
s3210.cnf
[mysqld]
# server configuration
datadir=/home/mysql_test/mysql_5.7/mgr_data/s3210
basedir=/usr/local/mysql5.7.24
port=3210
socket=/home/mysql_test/mysql_5.7/mgr_data/s3210/s3210.sock
server_id=3220
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.1.104:32100"
loose-group_replication_bootstrap_group= off
report_host=192.168.1.104
relay-log=relay-bin
s3220.cnf
[mysqld]
# server configuration
datadir=/home/mysql_test/mysql_5.7/mgr_data/s3220
basedir=/usr/local/mysql5.7.24
port=3220
socket=/home/mysql_test/mysql_5.7/mgr_data/s3220/s3220.sock
server_id=3230
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW
transaction_write_set_extraction=XXHASH64
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "192.168.1.104:32200"
loose-group_replication_bootstrap_group= off
report_host=192.168.1.104
relay-log=relay-bin
分别启动两个读实例.并且添加mgr_admin用户并且赋予权限。同写实例的权限配置。
在192.168.1.103上执行如下脚本:
mysqlsh
MySQL 192.168.1.103:3210 JS > dba.checkInstanceConfiguration('[email protected]:3210')
Please provide the password for '[email protected]:3210':
Validating instance...
The instance '192.168.1.104:3210' is valid for Cluster usage
{
"status": "ok"
}
MySQL 192.168.1.103:3210 JS > dba.checkInstanceConfiguration('[email protected]:3220')
Please provide the password for '[email protected]:3220':
Validating instance...
The instance '192.168.1.104:3220' is valid for Cluster usage
{
"status": "ok"
}
MySQL 192.168.1.103:3210 JS > var cluster=dba.getCluster()
MySQL 192.168.1.103:3210 JS > cluster.addInstance("[email protected]:3210")
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.
Please provide the password for '[email protected]:3210':
Adding instance to the cluster ...
Cluster.addInstance: ERROR:
Group Replication join failed.
ERROR: Error joining instance to cluster: '192.168.1.104:3210' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication (RuntimeError)
如果碰到以上错误执行: reset master 即可。
cluster.addInstance("[email protected]:3210")
cluster.addInstance("[email protected]:3210")
cluster.status()
MySQL 192.168.1.103:3210 JS > cluster.status();
{
"clusterName": "mgr_test",
"defaultReplicaSet": {
"name": "default",
"primary": "192.168.1.103:3210",
"ssl": "DISABLED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
"topology": {
"192.168.1.104:3210": {
"address": "192.168.1.104:3210",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"192.168.1.104:3220": {
"address": "192.168.1.104:3220",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "RECOVERING"
},
"192.168.1.103:3210": {
"address": "192.168.1.103:3210",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://[email protected]:3210"
}
可以看到两个读实例还是没有能加入到集群中。通过日志查看:有sql执行冲突。
冲突位置:
2018-11-26T09:41:16.568507Z 100 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000001' position 150.
所以当前写实例192.168.1.103上的binlog.000001,position为150后的事件gtid.
而后在两个读实例中跳过该事件即可。
具体操作如下:
stop group_replication;
SET GTID_NEXT='f65d4372-f145-11e8-94e9-005056a17264:1';
begin;
commit;
SET GTID_NEXT='AUTOMATIC';
start group_replication;
而后在读实例中mysqlshell中查看两个读实例的当前状态。如果还有错则继续查询引起错误的事件,解决即可。
本例中冲突只是因为创建用户的时候在读写实例都执行了。只要将该冲突解决即可。
一般出现如上错误是因为创建读实例时,添加mgr_admin以及cluster加入读实例的语句没有使用sql_log_bin=0屏蔽导致的。
测试中将192.168.1.103中的3210实例停掉后,192.168.1.104的3210实例变为新的写实例。且此时访问192.168.1.104的3210实例,查看当前cluster的状态时显示为missing.
192.168.1.103的3210实例上通过mysqlsh, 执行
cluster=dba.getCluster();
cluster.rejoinInstance('[email protected]:3210')
在192.168.1.103中的3210实例中执行:
stop group_replication;
set global read_only=false;
set sql_log_bin = 0;
drop database mysql_innodb_cluster_metadata;
set sql_log_bin = 1;
reset master;
start group_replication;
再次查看状态,ok。
同写实例恢复
192.168.1.103的3210实例上通过mysqlsh, 执行
dba.rebootClusterFromCompleteOutage('mgr_test')
然后根据提示恢复即可。
通过
cluster=dba.getCluster();
cluster.getStatus()来确认是否恢复成功。
执行:
cluster=dba.getCluster();
cluster.getStatus()
结果如下:
{
"clusterName": "mgr_test",
"defaultReplicaSet": {
"name": "default",
"primary": "192.168.1.103:3210",
"ssl": "DISABLED",
"status": "NO_QUORUM",
"statusText": "Cluster has no quorum as visible from '192.168.1.103:3210' and cannot process write transactions. 2 members are not active",
"topology": {
"192.168.1.104:3210": {
"address": "192.168.1.104:3210",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"192.168.1.104:3220": {
"address": "192.168.1.104:3220",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "UNREACHABLE"
},
"192.168.1.103:3210": {
"address": "192.168.1.103:3210",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://[email protected]:3210"
使用online的主机进行恢复cluster.
cluster.forceQuorumUsingPartitionOf("192.168.1.103:3210")
对于192.168.1.104:3220,192.168.1.104:3210。执行cluster.rejoinInstance即可。
关于 proxysql配置监控的过程不再赘述,参考:
proxysql安装及集群搭建: https://blog.csdn.net/eaglejiawo1120/article/details/84580419
参考
1. https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-production-deployment.html
2.https://lefred.be/content/mysql-innodb-cluster-mysql-shell-and-the-adminapi/
3.binlog查看: https://blog.csdn.net/u010433704/article/details/54962680
4.mgr 集群quorum:https://dev.mysql.com/doc/refman/8.0/en/group-replication-fault-tolerance.html
5. cluster status含义:https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-working-with-cluster.html
6. mgr状态含义:https://dev.mysql.com/doc/refman/8.0/en/group-replication-server-states.html