Galera是一个MySQL(也支持MariaDB,Percona)的同步多主集群软件。
yum install gcc gcc-c++
确认安装有boost-devel的版本至少为1.4.1
yum install boost-devel
安装scons check-devel openssl-devel
yum install scons check-devel openssl-devel
在配置文件中添加如下内容
[mysqld]
wsrep_node_name = node1
wsrep_provider = /usr/local/mysql/lib/plugin/libgalera_smm.so
wsrep_cluster_address="gcomm://"
wsrep_sst_method = xtrabackup
#wsrep_sst_auth=root:
如未开启二进制,则配置文件中需要添加以下内容:
binlog_format=ROW
log-bin=mysql-bin
server-id=101
log-slave-updates=1
“gcomm://” 是特殊的地址,仅仅是Galera cluster初始化启动时使用。如果集群启动以后,我们关闭了第一个节点,那么再次启动的时候必须先修改”gcomm://”为其他节点的集群地址,例如:wsrep_cluster_address=”gcomm://192.168.17.12”
初次启动
/usr/local/mysql/bin/mysqld_safe --wsrep_cluster_address=gcomm:// >/dev/null &
或者
service mysql start --wsrep_cluster_address=gcomm://
netstat -plantu | grep mysqld
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 3656/mysqld
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 3656/mysqld
端口4567是wsrep使用的默认端口。
- 关闭
/usr/local/mysql/bin/mysqladmin -uroot -p shutdown
/usr/local/mysql/bin/mysqld_safe --wsrep_cluster_address="gcomm://192.168.17.12:4567,192.168.17.13:4567" >/dev/null &
/usr/local/mysql/bin/mysqld_safe --wsrep_cluster_address="gcomm://192.168.17.12:4567,192.168.17.13:4567" >/dev/null &
SHOW GLOBAL VARIABLES LIKE 'version';
mysql> SHOW GLOBAL STATUS LIKE 'wsrep_provider_version';
+------------------------+------------+
| Variable_name | Value |
+------------------------+------------+
| wsrep_provider_version | 2.4(rXXXX) |
+------------------------+------------+
SHOW VARIABLES LIKE 'wsrep%' \G
show status like 'wsrep%';
+----------------------------+----------------------------------------------------------+
| Variable_name | Value |
+----------------------------+----------------------------------------------------------+
| wsrep_local_state_uuid | 80cdd13d-8cf2-11e2-0800-e0817023b754 |
| wsrep_protocol_version | 4 |
| wsrep_last_committed | 3 |
| wsrep_replicated | 3 |
| wsrep_replicated_bytes | 522 |
| wsrep_received | 6 |
| wsrep_received_bytes | 1134 |
| wsrep_local_commits | 1 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.000000 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.000000 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 1.000000 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 5 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 192.168.7.11:3306,192.168.17.12:3306,192.168.17.13:3306 |
| wsrep_cluster_conf_id | 13 |
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | 80cdd13d-8cf2-11e2-0800-e0817023b754 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy |
| wsrep_provider_version | 2.4(rXXXX) |
| wsrep_ready | ON |
+----------------------------+----------------------------------------------------------+
40 rows in set (0.00 sec)
监测状态说明
1. 集群完整性检查:
wsrep_cluster_state_uuid:在集群所有节点的值应该是相同的,有不同值的节点,说明其没有连接入集群.
wsrep_cluster_conf_id:正常情况下所有节点上该值是一样的.如果值不同,说明该节点被临时”分区”了.当节点之间网络连接恢复的时候应该会恢复一样的值.
wsrep_cluster_size:如果这个值跟预期的节点数一致,则所有的集群节点已经连接.
wsrep_cluster_status:集群组成的状态.如果不为”Primary”,说明出现”分区”或是”split-brain”状况.
2. 节点状态检查:
wsrep_ready: 该值为ON,则说明可以接受SQL负载.如果为Off,则需要检查wsrep_connected.
wsrep_connected: 如果该值为Off,且wsrep_ready的值也为Off,则说明该节点没有连接到集群.(可能是wsrep_cluster_address或wsrep_cluster_name等配置错造成的.具体错误需要查看错误日志)
wsrep_local_state_comment:如果wsrep_connected为On,但wsrep_ready为OFF,则可以从该项查看原因.
3. 复制健康检查:
wsrep_flow_control_paused:表示复制停止了多长时间.即表明集群因为Slave延迟而慢的程度.值为0~1,越靠近0越好,值为1表示复制完全停止.可优化wsrep_slave_threads的值来改善.
wsrep_cert_deps_distance:有多少事务可以并行应用处理.wsrep_slave_threads设置的值不应该高出该值太多.
wsrep_flow_control_sent:表示该节点已经停止复制了多少次.
wsrep_local_recv_queue_avg:表示slave事务队列的平均长度.slave瓶颈的预兆.