MMM配置官方文档: http://mysql-mmm.org/mmm2:guide

本文环境:    
主库:CentOS6.7 x64 192.168.0.201 mysql-5.6.29    
主库:CentOS6.7 x64 192.168.0.202 mysql-5.6.29    
备库:CentOS6.7 x64 192.168.0.203 mysql-5.6.29    
备库:CentOS6.7 x64 192.168.0.204 mysql-5.6.29    
管理:CentOS6.7 x64 192.168.0.205

MMM服务器角色对应关系:    
node1(主1)     192.168.0.201   db1    1    
node2(主2)     192.168.0.202   db2    2    
node3(从1)     192.168.0.203   db3    11    
node4(从1)     192.168.0.204   db4    12    
node5(监控)    192.168.0.205   mon    -

配置完成后,使用下面的VIP访问MySQL Cluster。    
192.168.0.211  writer 
192.168.0.212  reader 
192.168.0.213  reader    
192.168.0.214  reader    
192.168.0.215  reader

 

一、双主环境,以及从服务器环境准备

1,node1(主1) /etc/my.cnf

[mysqld]    
server-id              = 1    
binlog-format          = ROW    
log-bin                = master-bin    
log-bin-index          = master-bin.index    
log-slave-updates      = true    
auto_increment_offset  = 1    
auto_increment_increment = 2


mysql> show master status;    
+-------------------+----------+--------------+------------------+-------------------+    
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |    
+-------------------+----------+--------------+------------------+-------------------+    
| master-bin.000001 |      120 |              |                  |                   |    
+-------------------+----------+--------------+------------------+-------------------+    
1 row in set (0.00 sec)

mysql>

 

2. node1(主2) /etc/my.cnf

[mysqld]    
server-id              = 2    
binlog-format          = ROW    
log-bin                = master-bin    
log-bin-index          = master-bin.index    
log-slave-updates      = true    
auto_increment_offset  = 2    
auto_increment_increment = 2


mysql> show master status;    
+-------------------+----------+--------------+------------------+-------------------+    
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |    
+-------------------+----------+--------------+------------------+-------------------+    
| master-bin.000001 |      120 |              |                  |                   |    
+-------------------+----------+--------------+------------------+-------------------+    
1 row in set (0.00 sec)

mysql>

 

3. node1,node2 双主服务器配置同步复制帐号

grant replication slave on *.* to 'repl'@'%' identified by '123456';    
flush privileges;

 

4, node3(从)/etc/my.cnf

[mysqld]    
binlog-format                = ROW    
log-bin                      = mysql-bin    
relay-log                    = slave-relay-bin    
relay-log-index              = slave-relay-bin.index    
log-slave-updates            = true    
server-id                    = 11    
skip-name-resolve

 

5, node4(从)/etc/my.cnf

[mysqld]    
binlog-format                = ROW    
log-bin                      = mysql-bin    
relay-log                    = slave-relay-bin    
relay-log-index              = slave-relay-bin.index    
log-slave-updates            = true    
server-id                    = 12    
skip-name-resolve

 

6. node1从库配置同步过程

(1) 配置同步,手动执行同步参数,该配置会写入master.info文件中。

mysql >

CHANGE MASTER TO    
MASTER_HOST='192.168.0.202',    
MASTER_PORT=3306,    
MASTER_USER='repl',    
MASTER_PASSWORD='123456',    
MASTER_LOG_FILE='master-bin.000001',    
MASTER_LOG_POS=120;

mysql> start slave;

mysql> show slave status\G;    
*************************** 1. row ***************************    
               Slave_IO_State: Waiting for master to send event    
                  Master_Host: 192.168.0.202    
                  Master_User: repl    
                  Master_Port: 3306    
                Connect_Retry: 60    
              Master_Log_File: master-bin.000001    
          Read_Master_Log_Pos: 407    
               Relay_Log_File: node1-relay-bin.000002    
                Relay_Log_Pos: 571    
        Relay_Master_Log_File: master-bin.000001    
             Slave_IO_Running: Yes    
            Slave_SQL_Running: Yes    
              Replicate_Do_DB:

 

7. node1从库配置同步过程

(1) 配置同步,手动执行同步参数,该配置会写入master.info文件中。

mysql >

CHANGE MASTER TO    
MASTER_HOST='192.168.0.201',    
MASTER_PORT=3306,    
MASTER_USER='repl',    
MASTER_PASSWORD='123456',    
MASTER_LOG_FILE='master-bin.000001',    
MASTER_LOG_POS=120;

mysql> start slave;    
mysql> show slave status\G;    
*************************** 1. row ***************************    
               Slave_IO_State: Waiting for master to send event    
                  Master_Host: 192.168.0.201    
                  Master_User: repl    
                  Master_Port: 3306    
                Connect_Retry: 60    
              Master_Log_File: master-bin.000001    
          Read_Master_Log_Pos: 694    
               Relay_Log_File: node2-relay-bin.000002    
                Relay_Log_Pos: 571    
        Relay_Master_Log_File: master-bin.000001    
             Slave_IO_Running: Yes    
            Slave_SQL_Running: Yes    
              Replicate_Do_DB:

 

8. 测试双主相互复制

(1) node1上创建数据库

mysql> create database mydb;    
Query OK, 1 row affected (0.00 sec)

mysql> use mydb;    
Database changed    
mysql> show tables;    
+----------------+    
| Tables_in_mydb |    
+----------------+    
| user           |    
+----------------+    
1 row in set (0.01 sec)

mysql> select * from user;    
+----+----------+----------+    
| id | username | password |    
+----+----------+----------+    
| 1  | koumm    | 123456   |    
+----+----------+----------+    
1 row in set (0.01 sec)

mysql>


(2) node2上的操作如下:    
mysql> show databases;    
+--------------------+    
| Database           |    
+--------------------+    
| information_schema |    
| mydb               |    
| mysql              |    
| performance_schema |    
+--------------------+    
4 rows in set (0.00 sec)

mysql> use mydb;    
Database changed    
mysql> CREATE TABLE `user` (    
    ->   `id` varchar(20) NOT NULL,    
    ->   `username` varchar(20) NOT NULL,    
    ->   `password` char(32) NOT NULL,    
    ->   PRIMARY KEY (`id`)    
    -> ) ;    
Query OK, 0 rows affected (0.14 sec)

mysql> INSERT INTO user  VALUES ('1', 'koumm', '123456');    
Query OK, 1 row affected (0.00 sec)

mysql>

 

二、配置node3, node4同步node1(主)

1,node1主库锁表,导数据库

(1) 主库锁表

mysql> flush tables with read lock;    
Query OK, 0 rows affected (0.00 sec)


mysql> show master status;    
+-------------------+----------+--------------+------------------+-------------------+    
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |    
+-------------------+----------+--------------+------------------+-------------------+    
| master-bin.000001 |     1226 |              |                  |                   |    
+-------------------+----------+--------------+------------------+-------------------+    
1 row in set (0.00 sec)


(2) 主库备份

[root@master ~]# mysqldump -uroot -p -B mydb > mydb.sql    
说明:-B参数有建库语句。


(3) 主库解开锁表功能

mysql> unlock tables;    
Query OK, 0 rows affected (0.00 sec)    
mysql>

 

2. node3,node4从库导入数据库

# mysql -uroot -padmin < mydb.sql

 

3. node3,node4 从库配置同步过程

(1) 配置同步,手动执行同步参数,该配置会写入master.info文件中。

mysql >

CHANGE MASTER TO    
MASTER_HOST='192.168.0.201',    
MASTER_PORT=3306,    
MASTER_USER='repl',    
MASTER_PASSWORD='123456',    
MASTER_LOG_FILE='master-bin.000001',    
MASTER_LOG_POS=1226;


mysql> start slave;    
Query OK, 0 rows affected (0.04 sec)

mysql> show slave status\G;    
*************************** 1. row ***************************    
               Slave_IO_State: Waiting for master to send event    
                  Master_Host: 192.168.1.121    
                  Master_User: repl    
                  Master_Port: 3306    
                Connect_Retry: 60    
              Master_Log_File: master-bin.000005    
          Read_Master_Log_Pos: 730    
               Relay_Log_File: slave-relay-bin.000002    
                Relay_Log_Pos: 284    
        Relay_Master_Log_File: master-bin.000005    
             Slave_IO_Running: Yes    
            Slave_SQL_Running: Yes    
              Replicate_Do_DB:    
          Replicate_Ignore_DB:


可以简单测试一下同步数据情况。

 

三、配置MMM过程

1, 安装mmm软件包

(1) node1,node2,node3,node4,node5节点    
# wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm    
# rpm -ivh epel-release-6-8.noarch.rpm

(2) node5(监控节点)    
# yum install mysql-mmm mysql-mmm-agent mysql-mmm-tools mysql-mmm-monitor

(3) node1,node2,node3,node4 (mysql节点)    
# yum install mysql-mmm-agent

 

2,所有的MySQL节点添加以下两个用户,这里可以在主库做该操作,会自动同步到其它节点。

mysql> grant replication client on *.* to 'mmm_monitor'@'192.168.0.%' identified by '123456';    
mysql> grant super, replication client, process on *.* to 'mmm_agent'@'192.168.0.%' identified by '123456';    
mysql> flush privileges;

 

3,所有主机node1,node2,node3,node4,node5上创建如下文件

[root@node1 ~]# vi /etc/mysql-mmm/mmm_common.conf    
active_master_role      writer

   
    cluster_interface       eth0    
    pid_path                /var/run/mysql-mmm/mmm_agentd.pid    
    bin_path                /usr/libexec/mysql-mmm/    
    replication_user        repl    
    replication_password    123456    
    agent_user              mmm_agent    
    agent_password          123456    

   
    ip      192.168.0.201    
    mode    master    
    peer    db2    

   
    ip      192.168.0.202    
    mode    master    
    peer    db1    

   
    ip      192.168.0.203    
    mode    slave    

   
    ip      192.168.0.204    
    mode    slave    

   
    hosts   db1, db2    
    ips     192.168.0.211    
    mode    exclusive    

   
    hosts   db1, db2, db3, db4    
    ips     192.168.0.212,192.168.0.213,192.168.0.214,192.168.0.215    
    mode    balanced    

复制该配置文件到所有节点上:    
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node2:/etc/mysql-mmm/    
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node3:/etc/mysql-mmm/    
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node4:/etc/mysql-mmm/    
[root@node1 ~]# scp /etc/mysql-mmm/mmm_common.conf node5:/etc/mysql-mmm/

 

4. 配置agent代理节点,所有mysql主从服务器均为代理节点

(1) node1,node2,node3,node4 的MySQL节点配置mmm_agent.conf

# vi /etc/mysql-mmm/mmm_agent.conf    
include mmm_common.conf    
this db1   #在四台mysql节点上设置对应的db,分别为db1、db2、db3、db4


(2) 默认为启用,可以不用修改    
# cat /etc/default/mysql-mmm-agent    
# mysql-mmm-agent defaults    
ENABLED=1

(3) mysql节点启动服务

# chkconfig mysql-mmm-agent on    
# /etc/init.d/mysql-mmm-agent start

 

5,配置MMM管理监控节点node5

(1) 配置监控    
[root@node5 ~]# vi /etc/mysql-mmm/mmm_mon.conf    
include mmm_common.conf

   
    ip                  127.0.0.1    
    pid_path            /var/run/mysql-mmm/mmm_mond.pid    
    bin_path            /usr/libexec/mysql-mmm    
    status_path         /var/lib/mysql-mmm/mmm_mond.status    
    ping_ips            192.168.0.201,192.168.0.202,192.168.0.203,192.168.0.204    
    auto_set_online     30

    # The kill_host_bin does not exist by default, though the monitor will    
    # throw a warning about it missing.  See the section 5.10 "Kill Host    
    # Functionality" in the PDF documentation.    
    #    
    # kill_host_bin     /usr/libexec/mysql-mmm/monitor/kill_host    
    #    

   
    monitor_user        mmm_monitor    
    monitor_password    123456    

debug 0


(2) mysql节点启动服务

# chkconfig mysql-mmm-monitor on    
# /etc/init.d/mysql-mmm-monitor start

 

6, 查看状态

(1) 管理服务器上查看状态

[root@node1 ~]# mmm_control show    
  db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.215), writer(192.168.0.211)    
  db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213)    
  db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.212)    
  db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)

[root@node1 ~]# mmm_control checks all    
db4  ping         [last change: 2016/04/27 16:45:49]  OK    
db4  mysql        [last change: 2016/04/27 16:45:49]  OK    
db4  rep_threads  [last change: 2016/04/27 16:45:49]  OK    
db4  rep_backlog  [last change: 2016/04/27 16:45:49]  OK: Backlog is null    
db2  ping         [last change: 2016/04/27 16:45:49]  OK    
db2  mysql        [last change: 2016/04/27 16:45:49]  OK    
db2  rep_threads  [last change: 2016/04/27 16:45:49]  OK    
db2  rep_backlog  [last change: 2016/04/27 16:45:49]  OK: Backlog is null    
db3  ping         [last change: 2016/04/27 16:45:49]  OK    
db3  mysql        [last change: 2016/04/27 16:45:49]  OK    
db3  rep_threads  [last change: 2016/04/27 16:45:49]  OK    
db3  rep_backlog  [last change: 2016/04/27 16:45:49]  OK: Backlog is null    
db1  ping         [last change: 2016/04/27 16:45:49]  OK    
db1  mysql        [last change: 2016/04/27 16:45:49]  OK    
db1  rep_threads  [last change: 2016/04/27 16:45:49]  OK    
db1  rep_backlog  [last change: 2016/04/27 16:45:49]  OK: Backlog is null    
[root@node1 ~]#


(2) 服务器的启动VIP地址

[root@node1 ~]# ip a    
1: lo: mtu 65536 qdisc noqueue state UNKNOWN    
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    
    inet 127.0.0.1/8 scope host lo    
    inet6 ::1/128 scope host    
       valid_lft forever preferred_lft forever    
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000    
    link/ether 00:0c:29:ff:33:6a brd ff:ff:ff:ff:ff:ff    
    inet 192.168.0.201/24 brd 192.168.0.255 scope global eth0    
    inet 192.168.0.215/32 scope global eth0    
    inet 192.168.0.211/32 scope global eth0    
    inet6 fe80::20c:29ff:feff:336a/64 scope link    
       valid_lft forever preferred_lft forever    
[root@node1 ~]#

[root@node2 ~]# ip a    
1: lo: mtu 65536 qdisc noqueue state UNKNOWN    
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    
    inet 127.0.0.1/8 scope host lo    
    inet6 ::1/128 scope host    
       valid_lft forever preferred_lft forever    
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000    
    link/ether 00:0c:29:8c:60:58 brd ff:ff:ff:ff:ff:ff    
    inet 192.168.0.202/24 brd 192.168.0.255 scope global eth0    
    inet 192.168.0.213/32 scope global eth0    
    inet6 fe80::20c:29ff:fe8c:6058/64 scope link    
       valid_lft forever preferred_lft forever    
[root@node2 ~]#

服务器读写采有VIP地址进行读写,出现故障时VIP会漂移到其它节点,由其它节点提供服务。


四、MMM高可用测试

首先查看整个集群的状态,可以看到整个集群状态正常。
[root@node5 ~]# mmm_control show     
  db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212), writer(192.168.0.211)
  db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213)
  db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.215)
  db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
[[email protected] ~]#

1. 模拟node2宕机,手动停止mysql服务,观察monitor日志node2由关闭到启动的日志如下:

[[email protected] ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2016/04/29 09:58:33 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2016/04/29 10:00:54 FATAL State of host 'db2' changed from HARD_OFFLINE to AWAITING_RECOVERY
2016/04/29 10:01:24 FATAL State of host 'db2' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(30 seconds). It was in state AWAITING_RECOVERY for 30

seconds

重新查看集群的最新状态:

[root@node5 ~]# mmm_control show
  db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212), writer(192.168.0.211)
  db2(192.168.0.202) master/HARD_OFFLINE. Roles:   #从日志发现db2的状态有ONLINE转换为HARD_OFFLINE
  db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.213), reader(192.168.0.215)
  db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)

[root@node5 ~]# mmm_control checks all
db4  ping         [last change: 2016/04/29 09:01:08]  OK
db4  mysql        [last change: 2016/04/29 09:01:08]  OK
db4  rep_threads  [last change: 2016/04/29 09:01:08]  OK
db4  rep_backlog  [last change: 2016/04/29 09:01:08]  OK: Backlog is null
db2  ping         [last change: 2016/04/29 09:01:08]  OK
db2  mysql        [last change: 2016/04/29 09:58:33]  ERROR: Connect error (host = 192.168.0.202:3306, user = mmm_monitor)! Lost connection to MySQL server at 'reading

initial communication packet', system error: 111   能ping通,说明只是服务死掉了。
db2  rep_threads  [last change: 2016/04/29 09:01:08]  OK
db2  rep_backlog  [last change: 2016/04/29 09:01:08]  OK: Backlog is null
db3  ping         [last change: 2016/04/29 09:01:08]  OK
db3  mysql        [last change: 2016/04/29 09:01:08]  OK
db3  rep_threads  [last change: 2016/04/29 09:01:08]  OK
db3  rep_backlog  [last change: 2016/04/29 09:01:08]  OK: Backlog is null
db1  ping         [last change: 2016/04/29 09:01:08]  OK
db1  mysql        [last change: 2016/04/29 09:56:33]  OK
db1  rep_threads  [last change: 2016/04/29 09:01:08]  OK
db1  rep_backlog  [last change: 2016/04/29 09:01:08]  OK: Backlog is null

[root@node5 ~]#

启动node2,node2由HARD_OFFLINE转到AWAITING_RECOVERY状态。这里db2再次接管读请求。

[root@node5 ~]# mmm_control show     
  db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212), writer(192.168.0.211)
  db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213)
  db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.215)
  db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)
[root@node5 ~]#


2. 模拟node1主库宕机,查看集群状态。

[root@node5 ~]# mmm_control show
  db1(192.168.0.201) master/HARD_OFFLINE. Roles:
  db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213), writer(192.168.0.211)
  db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.212), reader(192.168.0.215)
  db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)

node1主库再启动:

[root@node5 ~]# mmm_control show
  db1(192.168.0.201) master/ONLINE. Roles: reader(192.168.0.212)
  db2(192.168.0.202) master/ONLINE. Roles: reader(192.168.0.213), writer(192.168.0.211)
  db3(192.168.0.203) slave/ONLINE. Roles: reader(192.168.0.215)
  db4(192.168.0.204) slave/ONLINE. Roles: reader(192.168.0.214)

可以看到主库启动用会自动转到读的角色,不会接管主,只到现有的主再次宕机。

[[email protected] ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2016/04/29 10:03:25 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2016/04/29 10:06:20 FATAL State of host 'db1' changed from HARD_OFFLINE to AWAITING_RECOVERY
2016/04/29 10:06:51 FATAL State of host 'db1' changed from AWAITING_RECOVERY to ONLINE because of auto_set_online(30 seconds). It was in state AWAITING_RECOVERY for 31

seconds

3. 测试小结:
(1)node2备选主节点宕机不影响集群的状态,就是移除了node2备选节点的读状态。
(2)node1主节点宕机,由node2备选主节点接管写角色,node3,node4指向新node2主库进行复制,node3,node4会自动change master到node2.
(3)问题来了:
如果node1主库宕机,node2复制应用又落后于node1时就变成了主可写状态,这进的数据主无法保证一致性。
如果node2,node3,node4延迟于node1主,这个时node1宕机,node3,node4将会等待数据追上db1后,再重新指向新的主node2进行复制操作,这时的数据也无法保证同步的一致性。
(4)如查采用MMM高可用架构,主,主备选节点机器配置一样,而且开启半同步进一步提高安全性或采用MariaDB进行多线程从复制,提高复制的性能。

   
说明:MMM(Master-Master Replication Manager for MySQL)主主复制管理器, MMM集群套件具有良好的稳定性、高可用性和可扩展性。当活动的Master节点出现故障时,备用Master节点可以立即接管,而其他的Slave节点也能自动切换到备用Master节点继续进行同步复制,而无需人为干涉;MMM架构需要多个节点、多个IP,对服务器数量有要求,在读写非常繁忙的业务系统下表现不是很稳定,可能会出现复制延时、切换失效等问题。MMM方案并不太适应于对数据安全性要求很高,并且读、写繁忙的环境中。