http://download.csdn.net/detail/blog_liuliang/9695450
服务器 |
主机名 |
IP |
ServerID |
Mysql版本 |
Master1 |
tomcat1 |
172.18.3.180 |
1 |
5.5.32 |
Slave |
upfun |
172.18.3.107 |
2 |
5.5.32 |
Slave |
localhost |
172.18.3.140 |
3 |
5.5.32 |
Monitor |
node2 |
172.18.3.185 |
4 |
5.5.32 |
VIP |
角色 |
描述 |
172.18.3.183 |
Writer |
应用配置ip |
一:安装
1.配置my.cnf文件
172.18.3.180:
log-bin=mysql-bin 开启binlog,否则无法主从复制
server-id=1 serverID不能重复
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
172.18.3.140:
log-bin=mysql-bin 开启binlog,否则无法主从复制
server-id=2 serverID不能重复
log_slave_updates = 1
auto-increment-increment = 2
auto-increment-offset = 2
172.18.3.107:
log-bin=mysql-bin 开启binlog,否则无法主从复制
server-id=3 serverID不能重复
log_slave_updates = 1
2.配置主从复制
登录master1的数据库,执行以下命令:
MySQL> grant replication slave on *.* to 'epel'@'%' identified by '123456';
MySQL>flush privileges;
MySQL> show master status;(得出一下结果)
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| MySQL-bin.000001| 107 | |
+------------------+----------+--------------+------------------+
mysql-bin.00001是我们要开始同步的文件,107始位置。
登录到master2的数据库,执行以下命令:
MySQL>change master to master_host=’172.18.3.180’,master_user=’epel’,
master_pawssord=’123456’,master_log_file=’mysql-bin.000001’,master_log_pos=’107;
MySQL> start slave;
MySQL>show slave status/G; 查看同步状态(Slave_IO_Running: Yes和Slave_SQL_Running: Yes一定要为YES)
上述过程已经完成了slave对master1的复制,然后其他服务器的主从复制同理配置。
如果主数据库有数据的话
数据库锁表操作,不让数据再进行写入动作。
mysql> FLUSH TABLES WITH READ LOCK ;
至此,mysql主从复制设置完毕。
下载解压后运行 rpm -ivh *.rpm即可。
MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
[root@node2 ~]# ssh-keygen -t rsa
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[root@node2 .ssh]# ll
authorized_keys
id_rsa
id_rsa.pub
known_hosts
按照上述方法将四台服务器节点均配置ssh免密登陆。
5.配置monitor
在monitor服务器上创建两个路径
mkdir /etc/masterha
mkdir -p /masterha/app1
创建app1.cnf
[root@node2 .ssh]# vi /etc/masterha/app1.cnf
[server default]
#manager dir
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/masterha/app1
#mysql manager user
user=root
password=xinwei
#node server user
ssh_user=root
#replication_user
repl_user=epel
repl_password=123456
#checking master every second
ping_interval=1
#promote script
#shutdown_script=""
master_ip_failover_script="/usr/local/bin/master_ip_failover"
master_ip_online_change_script="/usr/local/bin/master_ip_online_change"
report_script="/usr/local/bin/send_report"
[server1]
hostname=172.18.3.107
master_binlog_dir="/usr/local/mysql/data"
ssh_port=22
candidate_master=1
[server2]
hostname=172.18.3.180
master_binlog_dir="/usr/local/mysql/data"
ssh_port=22
candidate_master=1
[server3]
hostname=172.18.3.140
master_binlog_dir="/usr/local/mysql/data"
ssh_port=22
candidate_master=1
6.vip设置
通过脚本的方式管理VIP。这里是添加脚本到/usr/local/bin/master_ip_failover,修改完成后内容如下,而且如果使用脚本管理vip的话,需要手动在master服务器上绑定一个vip。也可以通过第三方软件keepalived进行vip管理。
[root@tomcat1~]# /sbin/ifconfig br0:2 172.18.3.183/24
二:使用
1.检查ssh
[root@node2 .ssh]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
看到输出结果最后如下即检测成功
Thu Nov 24 13:04:29 2016 - [info] All SSH connection tests passed successfully.
2.检查复制
[root@node2 .ssh]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
看到输出结果最后如下即检测成功
MySQL Replication Health is OK.
3.开启监控端
只有当上述两个检测都是ok的状态才可以开启监控
[root@node2 .ssh]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
其实这里还可以设置参数。在每次故障转移后,删除app1.conf中的坏掉主机的信息。
4.查看见监控开启状态
[root@node2 .ssh]# masterha_check_status --conf=/etc/masterha/app1.cnf
看到输出结果如下即开启成功
app1 (pid:9001) is running(0:PING_OK), master:172.18.3.180
5.查看监控日志
[root@node2 .ssh]# cd /masterha/app1/
[root@node2 app1]# tail -f manager.log
每次故障转移后,都会自动在masterha/app1/下生成一个文件,需要删除该文件才可以进行下次的故障转移。
三:故障测试
监听服务
[root@node2 app1]# tail -f manager.log
IN SCRIPT TEST====/sbin/ifconfig br0:2 down==/sbin/ifconfig br0:2 172.18.3.183/24===
Checking the Status of the script.. OK
Thu Nov 24 15:56:14 2016 - [info] OK.
Thu Nov 24 15:56:14 2016 - [warning] shutdown_script is not defined. Thu Nov 24 15:56:14 2016 - [info] Set master ping interval 1 seconds. Thu Nov 24 15:56:14 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Thu Nov 24 15:56:14 2016 - [info] Starting ping health check on 172.18.3.180(172.18.3.180:3306).. Thu Nov 24 15:56:14 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
关闭172.18.3.180服务器的mysql数据库后查看日志
[root@node2 app1]# tail -f manager.log
----- Failover Report -----
app1: MySQL Master failover 172.18.3.180 to 172.18.3.107 succeeded
Master 172.18.3.180 is down!
Check MHA Manager logs at node2:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 172.18.3.180.
The latest slave 172.18.3.107(172.18.3.107:3306) has all relay logs for recovery.
Selected 172.18.3.107 as a new master.
172.18.3.107: OK: Applying all logs succeeded.
172.18.3.107: OK: Activated master IP address.
172.18.3.140: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.18.3.140: OK: Applying all logs succeeded. Slave started, replicating from 172.18.3.107.
172.18.3.107: Resetting slave info succeeded.
Master failover to 172.18.3.107(172.18.3.107:3306) completed successfully.
Thu Nov 24 16:03:41 2016 - [info] Sending mail..
[root@upfun ~]# ip addr show br0
3: br0: mtu 1500 qdisc noqueue state UP
link/ether 90:b1:1c:66:0e:e2 brd ff:ff:ff:ff:ff:ff
inet 172.18.3.107/24 brd 172.18.3.255 scope global br0
valid_lft forever preferred_lft forever
inet 172.18.3.183/24 brd 172.18.3.255 scope global secondary br0:2
valid_lft forever preferred_lft forever
inet6 fe80::92b1:1cff:fe66:ee2/64 scope link
valid_lft forever preferred_lft forever
[root@tomcat1 ~]# ip addr show br0
4: br0: mtu 1500 qdisc noqueue state UP
link/ether 32:8b:5c:3f:e4:30 brd ff:ff:ff:ff:ff:ff
inet 172.18.3.180/24 brd 172.18.3.255 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::5427:8fff:fe4d:dec0/64 scope link
valid_lft forever preferred_lft forever
发现vip 172.18.3.183已经漂移到新的master上。
损坏的master修复后,只需要将修好的服务器作为新master的从服务器即可重新加入MHA集群。
四:报错
1 error 1
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错
Testing mysql connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!
at /usr/bin/apply_diff_relay_logs line 375
解决方案:ln -s /usr/local/mysql/bin/mysql /usr/bin
2 error 2
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错
Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99.
解决方案:在node节点上执行 which mysqlbinlog,比如我的结果就是
[localhost~]$ which mysqlbinlog
/usr/local/mysql/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
3 error 3
运行master_check_ssh --conf=/etc/masterha/aap1.cnf报错
connection via SSH [email protected]@192.168.17.200 ...
permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)
[error] [/usr/local/share/perl5/MHA/SSHcheck.pm,ln163]
解决方案:一般是公钥有问题,需要删除 /root/.ssh/known_hosts里面的相关ip内容 重新生成一下就ok了
4 error 4
运行master_check_ssh --conf=/etc/masterha/aap1.cnf报错
Sun Nov 20 20:10:59 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000001
Sun Nov 20 20:10:59 2016 - [info] Connecting to [email protected](172.18.3.180)..
Failed to save binary log: Binlog not found from /data/mysqllog/3306! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
at /usr/bin/save_binary_logs line 117.
eval {...} called at /usr/bin/save_binary_logs line 66
main::main() called at /usr/bin/save_binary_logs line 62
Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln154] Master setting check failed!
Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln367] Master configuration failed.
Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48.
Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Sun Nov 20 20:10:59 2016 - [info] Got exit code 1 (Not master dead).
解决方案:
/etc/masterha/aap1.cnf中的datadir路径应该是mysql中bin-log的位置
5 error 5
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错
Mon Nov 21 11:11:40 2016 - [info] MHA::MasterRotate version 0.55.
Mon Nov 21 11:11:40 2016 - [info] Starting online master switch..
Mon Nov 21 11:11:40 2016 - [info]
Mon Nov 21 11:11:40 2016 - [info] * Phase 1: Configuration Check Phase..
Mon Nov 21 11:11:40 2016 - [info]
Mon Nov 21 11:11:40 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Nov 21 11:11:40 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Mon Nov 21 11:11:40 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Mon Nov 21 11:11:40 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln604] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Mon Nov 21 11:11:40 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm line 85.
解决方案:
手动将修复后的master做成新的master的从服务器
6 warning 1
各从库应设置relay_log_purge=0
###否则收到以下告警信息 ##mysql -e 'set global relay_log_purge=0' 动态修改该参数,因为随时slave会提升为master
[warning] relay_log_purge=0 is not set on slave (172.18.3.140:3306).
7 warning 2
各从库设置read_only=1
###否则收到以下告警信息 ## mysql -e 'set global read_only=1' 动态修改该参数,因为随时slave会提升为master。
[info] read_only=1 is not set on slave (172.18.3.107:3306).