maxscale 基于keepalived的高可用,通过VIP提供服务
maxscale官网:https://mariadb.com/downloads/mariadb-tx/maxscale
maxscale文档(比官网查看方便)https://github.com/mariadb-corporation/MaxScale/tree/2.2/Documentation
mysql基于GTID模式的主从复制,可以在主库故障后快速修复复制状态。
NAME |
VERSION |
IP |
PORT |
COMMENT |
maxscale |
2.2.6 |
172.16.10.114 |
4306,6603 |
4306为读写分离端口,6603为管理端口 |
master |
5.6.39 |
172.16.10.114 |
3308 |
GTID复制 |
slave |
5.6.39 |
172.16.10.114 |
3309 |
GTID复制 |
mysql GTID配置
开启GTID需要在配置文件中加入以下参数:
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
主从配置命令:
CHANGE MASTER TO MASTER_HOST=host, MASTER_PORT=port, MASTER_USER=user, MASTER_PASSWORD=password, MASTER_AUTO_POSITION=1;
注意:
法一,在备份时候添加参数--set-gtid-purged=OFF,不备份GTID相关信息。
法二,在进行还原时,先进入实例执行reset master,再进行数据还原
set grid_next='xxxxxxxx'; #待跳过的GTID
begin;
commit; #通过产生一个空事务来占据此GTID
change master to master_auto_position=1;
MaxScale安装配置
wget -c https://downloads.mariadb.com/MaxScale/2.2.6/rhel/6/x86_64/maxscale-2.2.6-1.rhel.6.x86_64.rpm
yum install maxscale-2.2.6-1.rhel.6.x86_64.rpm
cat /etc/maxscale.cnf
[maxscale]
threads=8 #线程配置,默认为1
auth_connect_timeout=3600
auth_read_timeout=3600
auth_write_timeout=3600
[server1] #配置后端服务器
type=server
address=172.16.10.114
port=3308
protocol=MySQLBackend
server_weight=1
[server2]
type=server
address=172.16.10.114
port=3309
protocol=MySQLBackend
server_weight=1
#[server3]
#type=server
#address=172.16.10.114
#port=3310
#protocol=MySQLBackend
#server_weight=1
[readwritesplit] #读写分离配置
type=service
router=readwritesplit
servers=server1,server2
user=connect
passwd=connect
weightby=server_weight
max_slave_replication_lag=10 #允许最大主从延迟,当主从延迟超过该值时,不再向从库分发读请求
[Read Service] #配置读服务,虽然字面意思为读服务,也可以执行DML,DDL等操作,取决于对用户的授权,因此可以理解为连接服务
type=service
router=readconnroute
router_options=master
servers=server1,server2
user=connect
passwd=connect
weightby=server_weight
[MySQL Monitor] #监控配置
type=monitor
module=mariadbmon
servers=server1,server2
user=monitor
passwd=monitor
auto_failover=true #是否故障自动切换
auto_rejoin=true #故障实例恢复后自动加入集群
detect_standalone_master=true #探测独立的master,是否允许集群中最后一个实例成为主库
allow_cluster_recovery=false #是否允许集群自动恢复
#failcount=3 #在集群中最后一个实例成为主库前检查其它从库是否存活的次数,默认为5
#monitor_interval=10000 #探测间隔,单位毫秒,默认2000
detect_stale_master=true #当集群中只剩下主或主从复制全出错时,是否允许主提供服务
#detect_stale_slave=false
script=/tmp/reset_slave.sh #在下面的events发生时,执行的脚本
events=master_down #配置在发生什么事件时,执行上面的脚本
[Splitter-Service] #配置读写分离监听端口
type=listener
service=readwritesplit
protocol=MySQLClient
port=4306
[Read Listener] #配置读服务监听端口
type=listener
service=Read Service
protocol=MySQLClient
port=4307
[MaxAdmin Service] #配置管理服务
type=service
router=cli
[MaxAdmin Listener] #配置管理服务端口
type=listener
service=MaxAdmin Service
protocol=maxscaled
port=6603
上面的配置中涉及2个用户,一个是连接数据库的用户,一个是监控用户,其授权分别如下;
connect
CREATE USER 'connect'@'172.16.10.114' IDENTIFIED BY 'connect';
GRANT SELECT ON mysql.user TO 'connect'@'172.16.10.114';
GRANT SELECT ON mysql.db TO 'connect'@'172.16.10.114';
GRANT SELECT ON mysql.tables_priv TO 'connect'@'172.16.10.114';
GRANT SHOW DATABASES ON *.* TO 'connect'@'172.16.10.114';
monitor
CREATE USER 'monitor'@'172.16.10.114' IDENTIFIED BY 'monitor';
GRANT RELOAD, SUPER, REPLICATION CLIENT ON *.* to 'monitor'@'172.16.10.114';
注意:
maxscale IP为172.16.10.114,应用IP 172.16.10.238,访问数据库test,其授权为:
grant select,update,delete,insert on test.* to test_rw@'172.16.10.114' identified by 'test_rw';
grant select,update,delete,insert on test.* to test_rw@'172.16.10.238' identified by 'test_rw';
grant select on test.* to test_r@'172.16.10.114' identified by 'test_r';
grant select on test.* to test_r@'172.16.10.238' identified by 'test_r';
此脚本在主从切换后,需要修改连接IP为新从库的IP
cat /tmp/reset_slave.sh
mysql -h127.0.0.1 -P3309 -uthunder -pthunder -Nse 'stop slave;reset slave all;'
启动maxscale
/etc/init.d/maxscale start
管理maxscale
通过maxadmin命令,默认用户名和密码为admin/mariadb
maxadmin -h127.0.0.1 -P6603 -uadmin -p
也可能通过maxctrl命令通过API来管理,大致命令相同,maxadmin属于交互式,maxctrl属于非交互
MaxScale> help
Available commands:
add:
add user - Add an administrative account for using maxadmin over the network
add readonly-user - Add a read-only account for using maxadmin over the network
add server - Add a new server to a service
remove:
remove user - Remove account for using maxadmin over the network
remove server - Remove a server from a service or a monitor
create:
create server - Create a new server
create listener - Create a new listener for a service
create monitor - Create a new monitor
destroy:
destroy server - Destroy a server
destroy listener - Destroy a listener
destroy monitor - Destroy a monitor
alter:
alter server - Alter server parameters
alter monitor - Alter monitor parameters
alter service - Alter service parameters
alter maxscale - Alter maxscale parameters
set:
set server - Set the status of a server
set pollsleep - Set poll sleep period
set nbpolls - Set non-blocking polls
set log_throttling - Set the log throttling configuration
clear:
clear server - Clear server status
disable:
disable log-priority - Disable a logging priority
disable sessionlog-priority - [Deprecated] Disable a logging priority for a particular session
disable root - Disable root access
disable syslog - Disable syslog logging
disable maxlog - Disable MaxScale logging
disable account - Disable Linux user
enable:
enable log-priority - Enable a logging priority
enable sessionlog-priority - [Deprecated] Enable a logging priority for a session
enable root - Enable root user access to a service
enable syslog - Enable syslog logging
enable maxlog - Enable MaxScale logging
enable account - Activate a Linux user account for administrative MaxAdmin use
enable readonly-account - Activate a Linux user account for read-only MaxAdmin use
flush:
flush log - Flush the content of a log file and reopen it
flush logs - Flush the content of a log file and reopen it
list:
list clients - List all the client connections to MaxScale
list dcbs - List all active connections within MaxScale
list filters - List all filters
list listeners - List all listeners
list modules - List all currently loaded modules
list monitors - List all monitors
list services - List all services
list servers - List all servers
list sessions - List all the active sessions within MaxScale
list threads - List the status of the polling threads in MaxScale
list commands - List registered commands
reload:
reload config - [Deprecated] Reload the configuration
reload dbusers - Reload the database users for a service
restart:
restart monitor - Restart a monitor
restart service - Restart a service
restart listener - Restart a listener
shutdown:
shutdown maxscale - Initiate a controlled shutdown of MaxScale
shutdown monitor - Stop a monitor
shutdown service - Stop a service
shutdown listener - Stop a listener
show:
show dcbs - Show all DCBs
show dbusers - [deprecated] Show user statistics
show authenticators - Show authenticator diagnostics for a service
show epoll - Show the polling system statistics
show eventstats - Show event queue statistics
show filter - Show filter details
show filters - Show all filters
show log_throttling - Show the current log throttling setting (count, window (ms), suppression (ms))
show modules - Show all currently loaded modules
show monitor - Show monitor details
show monitors - Show all monitors
show persistent - Show the persistent connection pool of a server
show server - Show server details
show servers - Show all servers
show serversjson - Show all servers in JSON
show services - Show all configured services in MaxScale
show service - Show a single service in MaxScale
show session - Show session details
show sessions - Show all active sessions in MaxScale
show tasks - Show all active housekeeper tasks in MaxScale
show threads - Show the status of the worker threads in MaxScale
show users - Show enabled Linux accounts
show version - Show the MaxScale version number
sync:
sync logs - Flush log files to disk
call:
call command - Call module command
ping:
ping workers - Ping Workers
Type `help COMMAND` to see details of each command.
Where commands require names as arguments and these names contain
whitespace either the \ character may be used to escape the whitespace
or the name may be enclosed in double quotes ".
查看当前mysql实例状态
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Master, Running
server2 | 172.16.10.114 | 3309 | 0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------
[root@Hexindai-BJ-DFGC-114 ~]# maxctrl list servers
┌─────────┬───────────────┬──────┬─────────────┬────
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼───────────────┼──────┼─────────────┼────
│ server1 │ 172.16.10.114 │ 3308 │ 0 │ Master, Running │ │
├─────────┼───────────────┼──────┼─────────────┼────
│ server2 │ 172.16.10.114 │ 3309 │ 0 │ Slave, Running │ │
└─────────┴───────────────┴──────┴─────────────┴────
复制故障后状态:
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Master, Running
server2 | 172.16.10.114 | 3309 | 0 | Running
-------------------+-----------------+-------+-------------+--------------------
解决主从复制问题
从库宕机恢复步骤:
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Master, Running
server2 | 172.16.10.114 | 3309 | 0 | Maintenance, Down
-------------------+-----------------+-------+-------------+--------------------
将从库重新加到读写分离集群中
启动从库
启动复制start slave;
等主从追赶上后,在maxscale里面执行
MaxScale> clear server server2 maintenance
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Master, Running
server2 | 172.16.10.114 | 3309 | 0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------
主库宕机后恢复步骤:
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Down
server2 | 172.16.10.114 | 3309 | 0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Maintenance, Down
server2 | 172.16.10.114 | 3309 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
切换时间约为10s, 为以下2个参数的乘积,最长时间不超过monitor_interval*(failcount+1)
#failcount=3 #在集群中最后一个实例成为主库前检查其它从库是否存活的次数,默认为5
#monitor_interval=10000 #探测间隔,单位毫秒,默认2000
此时从库接替原主库接受读写请求,恢复主从架构,原主库将成为新的从库,启动实例,做主从复制:
change master to master_host='172.16.10.114', master_port=3309, master_user='repl', master_password='repl4slave', master_auto_position=1;
检查复制状态
show slave status;
将新从库加入到读写分离集群中:
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Maintenance, Down
server2 | 172.16.10.114 | 3309 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Maintenance, Down
server2 | 172.16.10.114 | 3309 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
MaxScale> clear server server1 maintenance
MaxScale> list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
server1 | 172.16.10.114 | 3308 | 0 | Slave, Running
server2 | 172.16.10.114 | 3309 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
如不考虑maxscale的高可用,可忽略以下内容
INSTALL
#yum install keepalived
CONFIG ON MASTER
#cat /etc/keepalived/keepalived.conf
vrrp_script chk_myscript {
script "/opt/soft/is_maxscale_running.sh"
interval 2 # check every 2 seconds
fall 2 # require 2 failures for KO
rise 2 # require 2 successes for OK
}
vrrp_instance VI_1 {
state MASTER
interface em1
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass mypass
}
virtual_ipaddress {
192.168.1.13 #VIP
}
track_script {
chk_myscript
}
notify "/opt/soft/notify_script.sh"
}
CONFIG ON STANDBY
#cat /etc/keepalived/keepalived.conf
vrrp_script chk_myscript {
script "/opt/soft/is_maxscale_running.sh"
interval 2 # check every 2 seconds
fall 2 # require 2 failures for KO
rise 2 # require 2 successes for OK
}
vrrp_instance VI_1 {
state MASTER
interface em1
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass mypass
}
virtual_ipaddress {
192.168.1.13
}
track_script {
chk_myscript
}
notify "/opt/soft/notify_script.sh"
}
#cat is_maxscale_running.sh
#!/bin/bash
fileName="maxadmin_output.txt"
rm $fileName
timeout 2s maxadmin -h127.0.0.1 -uadmin -pmariadb list servers > $fileName
to_result=$?
if [ $to_result -ge 1 ]
then
echo Timed out or error, timeout returned $to_result
exit 3
else
echo MaxAdmin success, rval is $to_result
echo Checking maxadmin output sanity
grep1=$(grep server1 $fileName)
grep2=$(grep server2 $fileName)
if [ "$grep1" ] && [ "$grep2" ]
then
echo All is fine
exit 0
else
echo Something is wrong
exit 3
fi
fi
#cat notify_script.sh
#!/bin/bash
TYPE=$1
NAME=$2
STATE=$3
OUTFILE=./state.txt
touch $OUTFILE
case $STATE in
"MASTER") echo "Setting this MaxScale node to active mode" > $OUTFILE
maxctrl alter maxscale passive false
exit 0
;;
"BACKUP") echo "Setting this MaxScale node to passive mode" > $OUTFILE
maxctrl alter maxscale passive true
exit 0
;;
"FAULT") echo "MaxScale failed the status check." > $OUTFILE
maxctrl alter maxscale passive true
exit 0
;;
*) echo "Unknown state" > $OUTFILE
exit 1
;;
esac
启动keepalived
/etc/init.d/keepalived start
分别停掉keepalived与maxscale,观察VIP漂移