前面一篇文章已经实现了mysql的主从复制以及MHA的高可用,那么接下来这一章就要实现Maxscale的读写分离和HA,对于Maxscale的HA可以用keepalived、Heartbeat来实现,不过官方推荐corosync+pacemaker,熟悉高可用的朋友们就会知道corosync+pacemaker更加强大,配置灵活,corosync则允许为不同的资源组配置不同的主服务,在corosync中,其会自行处理配置文件的同步问题,corosync支持多个节点的集群,支持把资源进行分组,按照组进行资源的管理,设置主服务,自行进行启停,当然Corosync有一定的复杂度,所以我们在配置的时候需要一点耐心,因此,一般来说选择corosync来进行心跳检测,再搭配pacemaker的资源管理系统来构建高可用的系统。
#初始化
ntpdate 120.25.108.11 /root/init_system_centos7.sh
#hosts文件配置(maxscale61,maxscale62)
cat >> /etc/hosts << EOF 192.168.5.61 maxscale61.blufly.com 192.168.5.62 maxscale62.blufly.com 192.168.5.51 db51.blufly.com 192.168.5.52 db52.blufly.com 192.168.5.53 db53.blufly.com EOF
#配置双机信任
[root@maxscale61 ~]# ssh-keygen -t rsa [root@maxscale61 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub -p 65535 [email protected] [root@maxscale62 ~]# ssh-keygen -t rsa [root@maxscale62 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub -p 65535 [email protected]
#####---------------- 一、安装maxscale -----------------#####
#在mysql master节点上创建监控、路由帐户(db52,故障切换后,现在db52为master)
CREATE USER maxscale@'%' IDENTIFIED BY "balala369"; GRANT replication slave, replication client ON *.* TO maxscale@'%'; GRANT SELECT ON mysql.* TO maxscale@'%'; GRANT ALL ON maxscale_schema.* TO maxscale@'%'; GRANT SHOW DATABASES ON *.* TO maxscale@'%'; flush privileges;
#安装maxscale(maxscale61,maxscale62)
[root@maxscale61 opt]# yum -y install libcurl libaio openssl [root@maxscale61 opt]# cd /opt [root@maxscale61 opt]# wget https://downloads.mariadb.com/MaxScale/latest/centos/7server/x86_64/maxscale-2.2.13-1.centos.7.x86_64.rpm [root@maxscale61 opt]# yum -y localinstall maxscale-2.2.13-1.centos.7.x86_64.rpm [root@maxscale61 opt]# maxkeys [root@maxscale61 opt]# maxpasswd balala369 47794130FFBA029760829CD50C10ABAC chown -R maxscale:maxscale /var/lib/maxscale/
#Maxscale 配置文件(maxscale61,maxscale62)
cat /etc/maxscale.cnf
[maxscale] # 开启线程个数,默认为1.设置为auto会同cpu核数相同 threads=auto # timestamp精度 ms_timestamp=1 # 将日志写入到syslog中 syslog=1 # 将日志写入到maxscale的日志文件中 maxlog=1 # 不将日志写入到共享缓存中,开启debug模式时可打开加快速度 log_to_shm=0 # 记录告警信息 log_warning=1 # 记录notice log_notice=1 # 记录info log_info=1 # 不打开debug模式 log_debug=0 # 日志递增 log_augmentation=1 [server1] type=server address=192.168.5.51 port=9106 protocol=MariaDBBackend serv_weight=3 [server2] type=server address=192.168.5.52 port=9106 protocol=MariaDBBackend serv_weight=1 [server3] type=server address=192.168.5.53 port=9106 protocol=MariaDBBackend serv_weight=3 [MariaDB-Monitor] type=monitor module=mariadbmon servers=server1,server2,server3 user=maxscale passwd=47794130FFBA029760829CD50C10ABAC monitor_interval=2000 detect_stale_master=true [Read-Only-Service] type=service router=readconnroute servers=server1,server2,server3 user=maxscale passwd=47794130FFBA029760829CD50C10ABAC router_options=slave enable_root_user=1 weightby=serv_weight [Read-Write-Service] type=service router=readwritesplit servers=server1,server2,server3 user=maxscale passwd=47794130FFBA029760829CD50C10ABAC enable_root_user=1 [MaxAdmin-Service] type=service router=cli [Read-Only-Listener] type=listener service=Read-Only-Service protocol=MariaDBClient port=4008 [Read-Write-Listener] type=listener service=Read-Write-Service protocol=MariaDBClient port=4006 [MaxAdmin-Listener] type=listener service=MaxAdmin-Service protocol=maxscaled socket=default
#配置上 systemctl 的方式启动 maxscale
vi /usr/lib/systemd/system/maxscale.service
[Unit] Description=MariaDB MaxScale Database Proxy After=network.target [Service] Type=forking Restart=on-abort # PIDFile=/var/run/maxscale/maxscale.pid ExecStartPre=/usr/bin/install -d /var/run/maxscale -o maxscale -g maxscale ExecStart=/usr/bin/maxscale --user=maxscale -f /etc/maxscale.cnf TimeoutStartSec=120 LimitNOFILE=65535 [Install] WantedBy=multi-user.target
#测试maxscale启动、停止
systemctl start maxscale.service systemctl status maxscale.service systemctl stop maxscale.service systemctl status maxscale.service
#开机自启动
systemctl enable maxscale.service
#启动maxscale
[root@maxscale61 opt]# maxscale --user=maxscale -f /etc/maxscale.cnf [root@maxscale61 opt]# netstat -tnlup|grep maxscale tcp 0 0 127.0.0.1:8989 0.0.0.0:* LISTEN 31708/maxscale tcp6 0 0 :::4008 :::* LISTEN 31708/maxscale tcp6 0 0 :::4006 :::* LISTEN 31708/maxscale
#登录 MaxScale 管理器,查看一下数据库连接状态
[root@maxscale61 ~]# maxadmin -S /tmp/maxadmin.sock MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- server1 | 192.168.5.51 | 9106 | 0 | Slave, Running server2 | 192.168.5.52 | 9106 | 0 | Master, Running server3 | 192.168.5.53 | 9106 | 0 | Slave, Running -------------------+-----------------+-------+-------------+-------------------- MaxScale> MaxScale> list services Services. --------------------------+-------------------+--------+----------------+------------------- Service Name | Router Module | #Users | Total Sessions | Backend databases --------------------------+-------------------+--------+----------------+------------------- Read-Only-Service | readconnroute | 1 | 1 | server1, server2, server3 Read-Write-Service | readwritesplit | 1 | 1 | server1, server2, server3 MaxAdmin-Service | cli | 2 | 2 | --------------------------+-------------------+--------+----------------+-------------------
###验证maxscale的monitor插件,关闭db51的数据库服务
[root@db51 ~]# /etc/init.d/mysqld stop Stopping mysqld (via systemctl): [ 确定 ] [root@maxscale61 opt]# maxadmin -S /tmp/maxadmin.sock MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- server1 | 192.168.5.51 | 9106 | 0 | Down server2 | 192.168.5.52 | 9106 | 0 | Master, Running server3 | 192.168.5.53 | 9106 | 0 | Slave, Running -------------------+-----------------+-------+-------------+--------------------
#启动db51的数据库服务
[root@db51 ~]# /etc/init.d/mysqld start Starting mysqld (via systemctl): [ 确定 ] MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- server1 | 192.168.5.51 | 9106 | 0 | Slave, Running server2 | 192.168.5.52 | 9106 | 0 | Master, Running server3 | 192.168.5.53 | 9106 | 0 | Slave, Running -------------------+-----------------+-------+-------------+--------------------
###验证读写分离(在db51操作,maxscale61没有装mysql,所以没有mysql命令)
[root@db51 ~]# mysql -ublufly -p852741 -h192.168.5.61 -P4006
#注意: 这边登录的用户就是普通的MySQL用户, 不是maxscale用户
MySQL [(none)]> select @@hostname; +-----------------+ | @@hostname | +-----------------+ | db51.blufly.com | +-----------------+ 1 row in set (0.001 sec) MySQL [mysql]> use test; Database changed
#创建表
MySQL [test]> CREATE TABLE bf_staff( -> staff_id INT NOT NULL AUTO_INCREMENT, -> staff_name VARCHAR(40) NOT NULL, -> staff_title VARCHAR(100) NOT NULL, -> entry_date DATE, -> PRIMARY KEY ( staff_id ) -> )ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.167 sec) MySQL [test]> show tables; +----------------+ | Tables_in_test | +----------------+ | bf_staff | +----------------+ 1 row in set (0.001 sec)
#插入数据
MySQL [test]> insert into bf_staff (staff_name,staff_title,entry_date) values('张森','软件工程师','1988-10-11'),('王梅','人事专员','1993-3-20'); Query OK, 2 rows affected (0.012 sec) Records: 2 Duplicates: 0 Warnings: 0
MySQL [test]> select * from bf_staff; +----------+------------+-----------------+------------+ | staff_id | staff_name | staff_title | entry_date | +----------+------------+-----------------+------------+ | 1 | 张森 | 软件工程师 | 1988-10-11 | | 2 | 王梅 | 人事专员 | 1993-03-20 | +----------+------------+-----------------+------------+ 2 rows in set (0.001 sec)
MySQL [test]> insert into bf_staff (staff_name,staff_title,entry_date) values('李自在','产品经理','1979-11-19'),('王衡','测试工程师','1995-6-2');
#在maxscale61查看读写分离的过程
[root@maxscale61 ~]# cat /var/log/maxscale/maxscale.log #select被分配到db51 2018-09-12 16:51:46.262 info : (5) [readwritesplit] (log_transaction_status): > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 16, type: QUERY_TYPE_SHOW_TABLES, stmt: show tables 2018-09-12 16:51:46.262 info : (5) [readwritesplit] (handle_got_target): Route query to slave [192.168.5.51]:9106 < 2018-09-12 16:51:46.262 info : (5) [readwritesplit] (clientReply): Reply complete, last reply from server1 2018-09-12 16:51:58.842 info : (5) [readwritesplit] (log_transaction_status): > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 27, type: QUERY_TYPE_READ, stmt: select * from bf_staff 2018-09-12 16:51:58.842 info : (5) [readwritesplit] (handle_got_target): Route query to slave [192.168.5.51]:9106 < 2018-09-12 16:51:58.843 info : (5) [readwritesplit] (clientReply): Reply complete, last reply from server1
#insert被分配到db52
2018-09-12 16:59:52.066 info : (5) [readwritesplit] (log_transaction_status): > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 149, type: QUERY_TYPE_WRITE, stmt: insert into bf_staff (staff_name,staff_title,entry_date) values('李自在','产品经理','1979-11-19'),('王衡','测试工程师','1995-6-2') 2018-09-12 16:59:52.066 info : (5) [readwritesplit] (handle_got_target): Route query to master [192.168.5.52]:9106 < 2018-09-12 16:59:52.071 info : (5) [readwritesplit] (clientReply): Reply complete, last reply from server2
##------- maxscale注意事项 --------##
#详细的注意事项链接 https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale/limitations-and-known-issues-within-maxscale/
#这里我主要讲些重点需要注意的:
1)创建链接的时候,不支持压缩协议
2)转发路由不能动态的识别master节点的迁移
3)LONGLOB字段不支持
4)在一下情况会将语句转到master节点中(保证事务一致):
明确指定事务;
prepared的语句;
语句中包含存储过程,自定义函数
包含多条语句信息:INSERT INTO ... ; SELECT LAST_INSERT_ID();
5)一些语句默认会发送到后端的所有server中,但是可以指定
use_sql_variables_in=[master|all] (default: all)
为master的时候可以将语句都转移到master 上执行。但是自动提交值和prepared的语句仍然发送到所有后端server。
这些语句为:
COM_INIT_DB (USEcreates this) COM_CHANGE_USER COM_STMT_CLOSE COM_STMT_SEND_LONG_DATA COM_STMT_RESET COM_STMT_PREPARE COM_QUIT (no response, session is closed) COM_REFRESH COM_DEBUG COM_PING SQLCOM_CHANGE_DB (USE ... statements) SQLCOM_DEALLOCATE_PREPARE SQLCOM_PREPARE SQLCOM_SET_OPTION SELECT ..INTO variable|OUTFILE|DUMPFILE SET autocommit=1|0
6)maxscale不支持主机名匹配的认证模式,只支持IP地址方式的host解析。所以在添加user的时候记得使用合适的范式。
7)跨库查询不支持,会显示的指定到第一个数据库中
8)通过select方式改变会话变量的行为不支持
#####------------ 二、安装配置pacemaker+corosync --------------#####
#官方推荐用pacemaker+corosync来实现maxscale的高可用
yum install pcs pacemaker corosync fence-agents-all -y
#启动pcsd服务(开机自启动)(maxscale61,maxscale62)
systemctl start pcsd.service systemctl enable pcsd.service
#为hacluster设置密码,安装组件生成的hacluster用户,用来本地启动pcs进程,因此我们需要设定密码,每个节点的密码相同(maxscale61,maxscale62)
passwd hacluster balala369
#集群各节点之间认证
[root@maxscale61 ~]# pcs cluster auth 192.168.5.61 192.168.5.62 Username: hacluster Password: 192.168.5.62: Authorized 192.168.5.61: Authorized
#创建 maxscalecluster 集群资源
[root@maxscale61 ~]# pcs cluster setup --name maxscalecluster 192.168.5.61 192.168.5.62 Destroying cluster on nodes: 192.168.5.61, 192.168.5.62... 192.168.5.62: Stopping Cluster (pacemaker)... 192.168.5.61: Stopping Cluster (pacemaker)... 192.168.5.62: Successfully destroyed cluster 192.168.5.61: Successfully destroyed cluster Sending 'pacemaker_remote authkey' to '192.168.5.61', '192.168.5.62' 192.168.5.61: successful distribution of the file 'pacemaker_remote authkey' 192.168.5.62: successful distribution of the file 'pacemaker_remote authkey' Sending cluster config files to the nodes... 192.168.5.61: Succeeded 192.168.5.62: Succeeded Synchronizing pcsd certificates on nodes 192.168.5.61, 192.168.5.62... 192.168.5.62: Success 192.168.5.61: Success Restarting pcsd on the nodes in order to reload the certificates... 192.168.5.62: Success 192.168.5.61: Success
#查看corosync配置文件
cat /etc/corosync/corosync.conf
#设置集群自启动
[root@maxscale61 ~]# pcs cluster enable --all 192.168.5.61: Cluster Enabled 192.168.5.62: Cluster Enabled
#查看集群状态
[root@maxscale61 ~]# pcs cluster status Error: cluster is not currently running on this node #n the back-end , “pcs cluster start” command will trigger the following command on each cluster node [root@maxscale61 ~]# systemctl start corosync.service [root@maxscale61 ~]# systemctl start pacemaker.service [root@maxscale61 ~]# systemctl enable corosync [root@maxscale61 ~]# systemctl enable pacemaker [root@maxscale62 ~]# systemctl start corosync.service [root@maxscale62 ~]# systemctl start pacemaker.service [root@maxscale62 ~]# systemctl enable corosync [root@maxscale62 ~]# systemctl enable pacemaker
[root@maxscale61 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: maxscale61.blufly.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Tue Sep 18 16:05:30 2018 Last change: Tue Sep 18 15:47:57 2018 by hacluster via crmd on maxscale61.blufly.com 2 nodes configured 0 resources configured PCSD Status: maxscale62.blufly.com (192.168.5.62): Online maxscale61.blufly.com (192.168.5.61): Online
#查看启动节点状态
[root@maxscale61 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.5.61 status = ring 0 active with no faults [root@maxscale62 ~]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 192.168.5.62 status = ring 0 active with no faults
#查看pacemaker进程
[root@maxscale61 ~]# ps axf |grep pacemaker 17859 pts/0 S+ 0:00 | \_ grep --color=auto pacemaker 17699 ? Ss 0:00 /usr/sbin/pacemakerd -f 17700 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib 17701 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd 17702 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd 17703 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd 17704 ? Ss 0:02 \_ /usr/libexec/pacemaker/pengine 17705 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
#查看集群信息
[root@maxscale61 ~]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.5.61) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.5.62) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined
#禁用STONITH
pcs property set stonith-enabled=false
#无法仲裁时候,选择忽略
pcs property set no-quorum-policy=ignore
#检查配置是否正确
crm_verify -L -V
#用crm添加集群资源
[root@maxscale61 ~]# crm -bash: crm: 未找到命令
[root@maxscale61 ~]# rpm -qa pacemaker pacemaker-1.1.18-11.el7_5.3.x86_64
#从pacemaker 1.1.8开始,crm发展成了一个独立项目,叫crmsh。也就是说,我们安装了pacemaker后,并没有crm这个命令,我们要实现对集群资源管理,还需要独立安装crmsh,crmsh依赖于许多包如:pssh
[root@maxscale61 ~]# wget -O /etc/yum.repos.d/network:ha-clustering:Stable.repo http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo [root@maxscale61 ~]# yum -y install crmsh
[root@maxscale62 ~]# wget -O /etc/yum.repos.d/network:ha-clustering:Stable.repo http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo [root@maxscale62 ~]# yum -y install crmsh
#如果yum安装报错,那就下载rpm包进行安装(maxscale61,maxscale62)
cd /opt wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/crmsh-3.0.0-6.2.noarch.rpm wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/crmsh-scripts-3.0.0-6.2.noarch.rpm wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/pssh-2.3.1-7.3.noarch.rpm wget http://mirror.yandex.ru/opensuse/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/python-parallax-1.0.1-29.1.noarch.rpm wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/noarch/python-pssh-2.3.1-7.3.noarch.rpm yum -y install crmsh-3.0.0-6.2.noarch.rpm crmsh-scripts-3.0.0-6.2.noarch.rpm pssh-2.3.1-7.3.noarch.rpm python-parallax-1.0.1-29.1.noarch.rpm python-pssh-2.3.1-7.3.noarch.rpm
#配置VIP和监控的服务(只在maxscale61上配置)
crm crm(live)# status #查看systemd类型可代理的服务,其中有maxscale crm(live)ra# list systemd crm(live)# configure crm(live)configure# primitive maxscalevip ocf:IPaddr params ip=192.168.5.60 op monitor timeout=30s interval=60s #在这里我们以192.168.5.60作为浮动IP,名字为maxscalevip并且告诉集群每30秒检查它一次 #配置监控的服务(maxscale.service) crm(live)configure# primitive maxscaleserver systemd:maxscale op monitor timeout=30s interval=60s #将 VIP(MaxScaleVIP)和监听的服务(maxscaleserver)归为同一个组 crm(live)configure# group maxscalegroup maxscalevip maxscaleserver #验证配置, 提交修改的配置 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show
#查看服务情况
crm(live)# status Stack: corosync Current DC: maxscale61.blufly.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Tue Sep 18 16:50:13 2018 Last change: Tue Sep 18 16:48:12 2018 by root via cibadmin on maxscale61.blufly.com 2 nodes configured 2 resources configured Online: [ maxscale61.blufly.com maxscale62.blufly.com ] Full list of resources: Resource Group: maxscalegroup maxscalevip (ocf::heartbeat:IPaddr): Started maxscale61.blufly.com maxscaleserver (systemd:maxscale): Started maxscale61.blufly.com crm(live)# quit
#查看启动的资源
[root@maxscale61 opt]# ip addr | grep 192.168.5.60 inet 192.168.5.60/24 brd 192.168.5.255 scope global secondary eno16777984
[root@maxscale61 opt]# ps -ef | grep maxscale maxscale 22159 1 0 16:48 ? 00:00:01 /usr/bin/maxscale root 22529 13940 0 16:51 pts/0 00:00:00 grep --color=auto maxscale
#服务转跳测试
#停止maxscale61上的maxscale服务
[root@maxscale61 opt]# systemctl stop maxscale.service
#在使用systemctl stop maxscale.service进行故障切换的时候,它不会马上发生VIP漂移,而是会先在本机(maxscale61:192.168.5.61)上尝试启动maxscale服务, 经过多次尝试不行才发生VIP和服务的转移。
#这边我要夸一下这样的资源管理其实是很符合常理的很好的。这比我们的MHA符合常理的多,其实在压力比较大的数据库中,也是不应该如果一宕机就马上转移的,应该先在原先的服务器上再次启动一下服务,起步来在转跳的。因为如果压力大导致的奔溃,启动服务应该需要先把热数据加载到数据库中的。
#演示maxscale61宕机的情况下,看看maxscale服务和VIP是否是会漂移到maxscale62(192.168.5.62)上。
[root@maxscale61 opt]# shutdown -h now
#maxscale61被关机后,VIP立马切换到maxscale62上。从ping上面来看没有掉包的情况,做到无缝切换
#重启maxscale61,再关才maxscale62,看VIP和maxscale服务的切换
[root@maxscale62 opt]# shutdown -h now
#查看整个集群所有组件的状态
pcs status
#内核参数调整
#开启IP转发功能
net.ipv4.ip_forward = 1
#开启允许绑定非本机的IP
net.ipv4.ip_nonlocal_bind = 1
止此,一套mysql高可用方案已经搭建完成!