MMM配置官方文档: http://mysql-mmm.org/mmm2:guide
本文环境:
主库:CentOS6.7 x64 192.168.0.201 mysql-5.6.29
主库:CentOS6.7 x64 192.168.0.202 mysql-5.6.29
备库:CentOS6.7 x64 192.168.0.203 mysql-5.6.29
备库:CentOS6.7 x64 192.168.0.204 mysql-5.6.29
管理:CentOS6.7 x64 192.168.0.205
MMM服务器角色对应关系:
node1(主1) 192.168.0.201 db1 1
node2(主2) 192.168.0.202 db2 2
node3(从1) 192.168.0.203 db3 11
node4(从1) 192.168.0.204 db4 12
node5(监控) 192.168.0.205 mon -
配置完成后,使用下面的VIP访问MySQL Cluster。
192.168.0.211 writer
192.168.0.212 reader
192.168.0.213 reader
192.168.0.214 reader
192.168.0.215 reader
一、双主环境,以及从服务器环境准备
1,node1(主1) /etc/my.cnf
[mysqld]
server-id = 1
binlog-format = ROW
log-bin = master-bin
log-bin-index = master-bin.index
log-slave-updates = true
auto_increment_offset = 1
auto_increment_increment = 2
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000001 | 120 | | | |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
mysql>
2. node1(主2) /etc/my.cnf
[mysqld]
server-id = 2
binlog-format = ROW
log-bin = master-bin
log-bin-index = master-bin.index
log-slave-updates = true
auto_increment_offset = 2
auto_increment_increment = 2
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000001 | 120 | | | |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
mysql>
3. node1,node2 双主服务器配置同步复制帐号
grant replication slave on *.* to 'repl'@'%' identified by '123456';
flush privileges;
4, node3(从)/etc/my.cnf
[mysqld]
binlog-format = ROW
log-bin = mysql-bin
relay-log = slave-relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = true
server-id = 11
skip-name-resolve
5, node4(从)/etc/my.cnf
[mysqld]
binlog-format = ROW
log-bin = mysql-bin
relay-log = slave-relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = true
server-id = 12
skip-name-resolve
6. node1从库配置同步过程
(1) 配置同步,手动执行同步参数,该配置会写入master.info文件中。
mysql >
CHANGE MASTER TO
MASTER_HOST='192.168.0.202',
MASTER_PORT=3306,
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=120;
mysql> start slave;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.202
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000001
Read_Master_Log_Pos: 407
Relay_Log_File: node1-relay-bin.000002
Relay_Log_Pos: 571
Relay_Master_Log_File: master-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
7. node1从库配置同步过程
(1) 配置同步,手动执行同步参数,该配置会写入master.info文件中。
mysql >
CHANGE MASTER TO
MASTER_HOST='192.168.0.201',
MASTER_PORT=3306,
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=120;
mysql> start slave;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.201
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000001
Read_Master_Log_Pos: 694
Relay_Log_File: node2-relay-bin.000002
Relay_Log_Pos: 571
Relay_Master_Log_File: master-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
8. 测试双主相互复制
(1) node1上创建数据库
mysql> create database mydb;
Query OK, 1 row affected (0.00 sec)
mysql> use mydb;
Database changed
mysql> show tables;
+----------------+
| Tables_in_mydb |
+----------------+
| user |
+----------------+
1 row in set (0.01 sec)
mysql> select * from user;
+----+----------+----------+
| id | username | password |
+----+----------+----------+
| 1 | koumm | 123456 |
+----+----------+----------+
1 row in set (0.01 sec)
mysql>
(2) node2上的操作如下:
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mydb |
| mysql |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)
mysql> use mydb;
Database changed
mysql> CREATE TABLE `user` (
-> `id` varchar(20) NOT NULL,
-> `username` varchar(20) NOT NULL,
-> `password` char(32) NOT NULL,
-> PRIMARY KEY (`id`)
-> ) ;
Query OK, 0 rows affected (0.14 sec)
mysql> INSERT INTO user VALUES ('1', 'koumm', '123456');
Query OK, 1 row affected (0.00 sec)
mysql>
二、配置node3, node4同步node1(主)
1,node1主库锁表,导数据库
(1) 主库锁表
mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000001 | 1226 | | | |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
(2) 主库备份
[root@master ~]# mysqldump -uroot -p -B mydb > mydb.sql
说明:-B参数有建库语句。
(3) 主库解开锁表功能
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
mysql>
2. node3,node4从库导入数据库
# mysql -uroot -padmin < mydb.sql
3. node3,node4 从库配置同步过程
(1) 配置同步,手动执行同步参数,该配置会写入master.info文件中。
mysql >
CHANGE MASTER TO
MASTER_HOST='192.168.0.201',
MASTER_PORT=3306,
MASTER_USER='repl',
MASTER_PASSWORD='123456',
MASTER_LOG_FILE='master-bin.000001',
MASTER_LOG_POS=1226;
mysql> start slave;
Query OK, 0 rows affected (0.04 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.121
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000005
Read_Master_Log_Pos: 730
Relay_Log_File: slave-relay-bin.000002
Relay_Log_Pos: 284
Relay_Master_Log_File: master-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
可以简单测试一下同步数据情况。
三、配置MMM过程
1, 安装mmm软件包
(1) node1,node2,node3,node4,node5节点
# wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# rpm -ivh epel-release-6-8.noarch.rpm
(2) node5(监控节点)
# yum install mysql-mmm mysql-mmm-agent mysql-mmm-tools mysql-mmm-monitor
(3) node1,node2,node3,node4 (mysql节点)
# yum install mysql-mmm-agent
2,所有的MySQL节点添加以下两个用户,这里可以在主库做该操作,会自动同步到其它节点。
mysql> grant replication client on *.* to 'mmm_monitor'@'192.168.0.%' identified by '123456';
mysql> grant super, replication client, process on *.* to 'mmm_agent'@'192.168.0.%' identified by '123456';
mysql> flush privileges;
3,所有主机node1,node2,node3,node4,node5上创建如下文件
[root@node1 ~]# vi /etc/mysql-mmm/mmm_common.conf
active_master_role writer
cluster_interface eth0
pid_path /var/run/mysql-mmm/mmm_agentd.pid
bin_path /usr/libexec/mysql-mmm/
replication_user repl
replication_password 123456
agent_user mmm_agent
agent_password 123456
ip 192.168.0.201
mode master
peer db2
ip 192.168.0.202
mode master
peer db1
ip 192.168.0.203
mode slave
ip 192.168.0.204
mode slave
hosts db1, db2
ips 192.168.0.211
mode exclusive
hosts db1, db2, db3, db4
ips 192.168.0.212,192.168.0.213,192.168.0.214,192.168.0.215
mode balanced
复制该配置文件到所有节点上:
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node2:/etc/mysql-mmm/
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node3:/etc/mysql-mmm/
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node4:/etc/mysql-mmm/
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node5:/etc/mysql-mmm/
4. 配置agent代理节点,所有mysql主从服务器均为代理节点
(1) node1,node2,node3,node4 的MySQL节点配置mmm_agent.conf
# vi /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this db1 #在四台mysql节点上设置对应的db,分别为db1、db2、db3、db4
(2) 默认为启用,可以不用修改
# cat /etc/default/mysql-mmm-agent
# mysql-mmm-agent defaults
ENABLED=1
(3) mysql节点启动服务
# chkconfig mysql-mmm-agent on
# /etc/init.d/mysql-mmm-agent start
5,配置MMM管理监控节点node5
(1) 配置监控
[root@node5 ~]# vi /etc/mysql-mmm/mmm_mon.conf
include mmm_common.conf
ip 127.0.0.1
pid_path /var/run/mysql-mmm/mmm_mond.pid
bin_path /usr/libexec/mysql-mmm
status_path /var/lib/mysql-mmm/mmm_mond.status
ping_ips 192.168.0.201,192.168.0.202,192.168.0.203,192.168.0.204
auto_set_online 30
# The kill_host_bin does not exist by default, though the monitor will
# throw a warning about it missing. See the section 5.10 "Kill Host
# Functionality" in the PDF documentation.
#
# kill_host_bin /usr/libexec/mysql-mmm/monitor/kill_host
#
monitor_user mmm_monitor
monitor_password 123456
debug 0
(2) mysql节点启动服务
# chkconfig mysql-mmm-monitor on
# /etc/init.d/mysql-mmm-monitor start
6, 查看状态
(1) 管理服务器上查看状态
[root@node1 ~]# mmm_control show
db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.215), writer(192.168.0.211)
db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213)
db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.212)
db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
[root@node1 ~]# mmm_control checks all
db4 ping [last change: 2016/04/27 16:45:49] OK
db4 mysql [last change: 2016/04/27 16:45:49] OK
db4 rep_threads [last change: 2016/04/27 16:45:49] OK
db4 rep_backlog [last change: 2016/04/27 16:45:49] OK: Backlog is null
db2 ping [last change: 2016/04/27 16:45:49] OK
db2 mysql [last change: 2016/04/27 16:45:49] OK
db2 rep_threads [last change: 2016/04/27 16:45:49] OK
db2 rep_backlog [last change: 2016/04/27 16:45:49] OK: Backlog is null
db3 ping [last change: 2016/04/27 16:45:49] OK
db3 mysql [last change: 2016/04/27 16:45:49] OK
db3 rep_threads [last change: 2016/04/27 16:45:49] OK
db3 rep_backlog [last change: 2016/04/27 16:45:49] OK: Backlog is null
db1 ping [last change: 2016/04/27 16:45:49] OK
db1 mysql [last change: 2016/04/27 16:45:49] OK
db1 rep_threads [last change: 2016/04/27 16:45:49] OK
db1 rep_backlog [last change: 2016/04/27 16:45:49] OK: Backlog is null
[root@node1 ~]#
(2) 服务器的启动VIP地址
[root@node1 ~]# ip a
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0:
link/ether 00:0c:29:ff:33:6a brd ff:ff:ff:ff:ff:ff
inet 192.168.0.201/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.215/32 scope global eth0
inet 192.168.0.211/32 scope global eth0
inet6 fe80::20c:29ff:feff:336a/64 scope link
valid_lft forever preferred_lft forever
[root@node1 ~]#
[root@node2 ~]# ip a
1: lo:
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0:
link/ether 00:0c:29:8c:60:58 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.202/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.213/32 scope global eth0
inet6 fe80::20c:29ff:fe8c:6058/64 scope link
valid_lft forever preferred_lft forever
[root@node2 ~]#
服务器读写采有VIP地址进行读写,出现故障时VIP会漂移到其它节点,由其它节点提供服务。
四、MMM高可用测试
首先查看整个集群的状态,可以看到整个集群状态正常。
[root@node5 ~]# mmm_control show
db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212), writer(192.168.0.211)
db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213)
db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.215)
db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
[[email protected] ~]#
1. 模拟node2宕机,手动停止mysql服务,观察monitor日志node2由关闭到启动的日志如下:
[[email protected] ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2016/04/29 09:58:33 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2016/04/29 10:00:54 FATAL State of host 'db2' changed from HARD_OFFLINE to AWAITING_RECOVERY
2016/04/29 10:01:24 FATAL State of host 'db2' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(30 seconds). It was in state AWAITING_RECOVERY for 30
seconds
重新查看集群的最新状态:
[root@node5 ~]# mmm_control show
db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212), writer(192.168.0.211)
db2(192.168.0.202) master/HARD_OFFLINE. Roles: #从日志发现db2的状态有ONLINE转换为HARD_OFFLINE
db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.213), reader(192.168.0.215)
db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
[root@node5 ~]# mmm_control checks all
db4 ping [last change: 2016/04/29 09:01:08] OK
db4 mysql [last change: 2016/04/29 09:01:08] OK
db4 rep_threads [last change: 2016/04/29 09:01:08] OK
db4 rep_backlog [last change: 2016/04/29 09:01:08] OK: Backlog is null
db2 ping [last change: 2016/04/29 09:01:08] OK
db2 mysql [last change: 2016/04/29 09:58:33] ERROR: Connect error (host = 192.168.0.202:3306, user = mmm_monitor)! Lost connection to MySQL server at 'reading
initial communication packet', system error: 111 能ping通,说明只是服务死掉了。
db2 rep_threads [last change: 2016/04/29 09:01:08] OK
db2 rep_backlog [last change: 2016/04/29 09:01:08] OK: Backlog is null
db3 ping [last change: 2016/04/29 09:01:08] OK
db3 mysql [last change: 2016/04/29 09:01:08] OK
db3 rep_threads [last change: 2016/04/29 09:01:08] OK
db3 rep_backlog [last change: 2016/04/29 09:01:08] OK: Backlog is null
db1 ping [last change: 2016/04/29 09:01:08] OK
db1 mysql [last change: 2016/04/29 09:56:33] OK
db1 rep_threads [last change: 2016/04/29 09:01:08] OK
db1 rep_backlog [last change: 2016/04/29 09:01:08] OK: Backlog is null
[root@node5 ~]#
启动node2,node2由HARD_OFFLINE转到AWAITING_RECOVERY状态。这里db2再次接管读请求。
[root@node5 ~]# mmm_control show
db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212), writer(192.168.0.211)
db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213)
db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.215)
db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
[root@node5 ~]#
2. 模拟node1主库宕机,查看集群状态。
[root@node5 ~]# mmm_control show
db1(192.168.0.201) master/HARD_OFFLINE. Roles:
db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213), writer(192.168.0.211)
db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.212), reader(192.168.0.215)
db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
node1主库再启动:
[root@node5 ~]# mmm_control show
db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212)
db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213), writer(192.168.0.211)
db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.215)
db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
可以看到主库启动用会自动转到读的角色,不会接管主,只到现有的主再次宕机。
[[email protected] ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2016/04/29 10:03:25 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2016/04/29 10:06:20 FATAL State of host 'db1' changed from HARD_OFFLINE to AWAITING_RECOVERY
2016/04/29 10:06:51 FATAL State of host 'db1' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(30 seconds). It was in state AWAITING_RECOVERY for 31
seconds
3. 测试小结:
(1)node2备选主节点宕机不影响集群的状态,就是移除了node2备选节点的读状态。
(2)node1主节点宕机,由node2备选主节点接管写角色,node3,node4指向新node2主库进行复制,node3,node4会自动change master到node2.
(3)问题来了:
如果node1主库宕机,node2复制应用又落后于node1时就变成了主可写状态,这进的数据主无法保证一致性。
如果node2,node3,node4延迟于node1主,这个时node1宕机,node3,node4将会等待数据追上db1后,再重新指向新的主node2进行复制操作,这时的数据也无法保证同步的一致性。
(4)如查采用MMM高可用架构,主,主备选节点机器配置一样,而且开启半同步进一步提高安全性或采用MariaDB进行多线程从复制,提高复制的性能。
说明:MMM(Master-Master Replication Manager for MySQL)主主复制管理器, MMM集群套件具有良好的稳定性、高可用性和可扩展性。当活动的Master节点出现故障时,备用Master节点可以立即接管,而其他的Slave节点也能自动切换到备用Master节点继续进行同步复制,而无需人为干涉;MMM架构需要多个节点、多个IP,对服务器数量有要求,在读写非常繁忙的业务系统下表现不是很稳定,可能会出现复制延时、切换失效等问题。MMM方案并不太适应于对数据安全性要求很高,并且读、写繁忙的环境中。