具体环境如下:
三台主机,其中两台是主从,一台是管理主机
M主机:192.168.1.34
S主机:192.168.1.35
MG主机:192.168.1.32
大概步骤如下:
前提条件:
a、下载MHA安装包,mha4mysql-manager-0.53-0.el6.noarch,mha4mysql-node-0.53-0.el6.noarch
具体下载地址:http://code.google.com/p/mysql-master-ha/downloads/list
b、配置三台主机之间的信任关系,通过ssh都可以互访。因为manager节点操作时需要ssh到目标主机上;
c、修改各主机的hosts文件,保持一致。
配置步骤:
1、在M/S主机上面分别安装node包,即rpm -ivh mha4mysql-node-0.53-0.el6.noarch
2、在MG主机上面分别安装manager/node包,rpm -vih mha4mysql-node-0.53-0.el6.noarch/mha4mysql-manager-0.53-0.el6.noarch
在安装manager包时,需要其它依赖包,建议配置epel库。
3、配置主从关系,具体不详细说了,网上面有很案例。
4、编辑配置文件,如果是多套集群的话,需要masterha_default.cnf否则只需要配置一个appl.cnf
内容大概如下:
[server default]
manager_workdir=/mha/appl ###这个目录三个节点上都创建
manager_log=/mha/appl/manager.log
remote_workdir=/mha/appl
[server1]
hostname=192.168.1.34
master_binlog_dir=/mysql/mysql1/data ===binglog日志路径
candidate_master=1
[server2]
hostname=192.168.1.35
master_binlog_dir=/mysql/mysql1/data
candidate_master=1
下面是default文件
/etc/masterha_default.cnf
[server default]
user=root
password=mysql
ssh_user=root
repl_user=repl
repl_password=repl
ping_interval=3
ping_type=select
5、测试配置文件
/usr/bin/masterha_check_repl --conf=/etc/appl.cnf
masteha_check_repl --conf=/etc/appl.cnf
-bash: masteha_check_repl: command not found
[root@skfweb2 /mha/appl]masterha_check_repl --conf=/etc/appl.cnf
Fri Aug 22 21:25:10 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Fri Aug 22 21:25:10 2014 - [info] Reading application default configurations from /etc/appl.cnf..
Fri Aug 22 21:25:10 2014 - [info] Reading server configurations from /etc/appl.cnf..
Fri Aug 22 21:25:10 2014 - [info] MHA::MasterMonitor version 0.53.
Fri Aug 22 21:25:10 2014 - [info] Dead Servers:
Fri Aug 22 21:25:10 2014 - [info] Alive Servers:
Fri Aug 22 21:25:10 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Aug 22 21:25:10 2014 - [info] Current Alive Master: xxxxx.34(xxxxx:3306)
Fri Aug 22 21:25:10 2014 - [info] Checking slave configurations..
Fri Aug 22 21:25:10 2014 - [info] read_only=1 is not set on slave xxxxx.35(xxxxx.35:3306).
Fri Aug 22 21:25:10 2014 - [warning] relay_log_purge=0 is not set on slave xxxxxx.35(xxxxx.35:3306).
Fri Aug 22 21:25:10 2014 - [info] Checking replication filtering settings..
Fri Aug 22 21:25:10 2014 - [info] binlog_do_db= , binlog_ignore_db=
Fri Aug 22 21:25:10 2014 - [info] Replication filtering check ok.
Fri Aug 22 21:25:10 2014 - [info] Starting SSH connection tests..
Fri Aug 22 21:25:11 2014 - [info] All SSH connection tests passed successfully.
Fri Aug 22 21:25:11 2014 - [info] Checking MHA Node version..
Fri Aug 22 21:25:12 2014 - [info] Version check ok.
Fri Aug 22 21:25:12 2014 - [info] Checking SSH publickey authentication settings on the current master..
Fri Aug 22 21:25:12 2014 - [info] HealthCheck: SSH to 192.168.1.34 is reachable.
Fri Aug 22 21:25:12 2014 - [info] Master MHA Node version is 0.53.
Fri Aug 22 21:25:12 2014 - [info] Checking recovery script configurations on the current master..
Fri Aug 22 21:25:12 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/mysql1/data --output_file=/mha/appl/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000001
Fri Aug 22 21:25:12 2014 - [info] Connecting to [email protected](192.168.1.34)..
Creating /mha/appl if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /mysql/mysql1/data, up to mysql-bin.000001
Fri Aug 22 21:25:12 2014 - [info] Master setting check done.
Fri Aug 22 21:25:12 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Aug 22 21:25:12 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.1.35 --slave_ip=192.168.1.35 --slave_port=3306 --workdir=/mha/appl --target_version=5.5.15-log --manager_version=0.53 --relay_log_info=/mysql/mysql1/data/relay-log.info --relay_dir=/mysql/mysql1/data/ --slave_pass=xxx
Fri Aug 22 21:25:12 2014 - [info] Connecting to [email protected](192.168.35:22)..
Checking slave recovery environment settings..
Opening /mysql/mysql1/data/relay-log.info ... ok.
Relay log found at /mysql/mysql1, up to mysql_3306-relay-bin.000002
Temporary relay log file is /mysql/mysql1/mysql_3306-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Aug 22 21:25:12 2014 - [info] Slaves settings check done.
Fri Aug 22 21:25:12 2014 - [info]
10.204.96.34 (current master)
+--10.204.96.35
Fri Aug 22 21:25:12 2014 - [info] Checking replication health on 192.168.1.35..
Fri Aug 22 21:25:12 2014 - [info] ok.
Fri Aug 22 21:25:12 2014 - [warning] master_ip_failover_script is not defined.
Fri Aug 22 21:25:12 2014 - [warning] shutdown_script is not defined.
Fri Aug 22 21:25:12 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
上面显示配置正常。
5、起MHA
/usr/bin/masterha_manager --conf=/etc/appl.conf &
可以通过/mha/appl/下的log文件进行查看。
通过/usr/bin/masterha_check_status --conf=/etc/appl.conf查看状态。
停止MHA:/usr/bin/masterha_stop --conf=/etc/appl.conf
下面做个简单的测试:
模拟主从switch,适用于主库升级,临时切到slave上面。
不要启动manager进程,同时主从正常,从库正常应用日志。
切换有两种方式,一种是failover,一种是switch,下面是switch
masterha_master_switch --master_state=alive --conf=/etc/appl.cnf
可以观察slave的日志,从192.168.1.34切到192.168.1.35上面
140822 17:26:05 [Note] Error reading relay log event: slave SQL thread was killed
140822 17:26:05 [Note] Slave I/O thread killed while connecting to master
140822 17:26:05 [Note] Slave I/O thread exiting, read up to log 'FIRST', position 4
140822 17:31:54 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='192.168.1.35', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='192.168.1.34', master_port='3306', master_log_file='mysql-bin.000001', master_log_pos='1113'.
140822 17:31:59 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 1113, relay log '/mysql/mysql1/mysql_3306-relay-bin.000001' position: 4
140822 17:31:59 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'mysql-bin.000001' at position 1113
140822 20:33:07 [Note] Error reading relay log event: slave SQL thread was killed
140822 20:33:07 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
140822 20:33:07 [Note] Slave I/O thread killed while reading event
140822 20:33:07 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000001', position 365049
然后在原主上面执行CHANGE MASTER TO master_host='192.168.1.34', master_port='3306', master_log_file='mysql-bin.000001', master_log_pos='1113'.
, MASTER_USER='repl', MASTER_PASSWORD='repl';直接start slave就可以。
本次实验为止,后续加上VIP通过Keplived或heartbeat来实现。