mysql mha 简书_MySQL高可用解决方案:MHA

MHA的简要介绍

MHA全称Master High Availability,也就是主节点的高可用,是目前比较成熟的MySQL高可用解决方案。它的主要功能主要是通过一个manager节点来监控主节点和从节点的状况,并会在主节点发生故障的时候,自动将一个数据最贴近Master的从节点转化成主节点。实现自动的故障转移。

MHA的变成语言是Perl,需要安装一些软件包来进行编译操作,但是总体的编译过程十分地简单。

实验拓扑

mysql mha 简书_MySQL高可用解决方案:MHA_第1张图片

MHA.gif

要点以及基础知识

MHA的组件中主要有两个,一个是Manager节点组件。类似于一个监督者。

Node节点组件则是安装于数据库节点,其中一个作为Master。

MHA在主节点发生故障时需要进行主节点自动切换,所以必不可少地需要管理员权限。所以多个节点之间需要基于ssh秘钥认证。

MHA的主要配置在于manager。

主机名主机地址角色

node1

192.168.2.201

Master节点,安装node组件

node2

192.168.2.202

Slave节点,安装node组件

node3

192.168.2.203

Slave节点,安装node组件

node4

192.168.2.204

安装manager组件

本文使用CentOS7.1,数据库:MariaDB-5.5.50

关于半同步复制的详细配置,可以参考我的上一篇文章。由于篇幅问题,这里主要讲如何安装配置和使用MHA组件。

因为数据库版本是MariaDB-5.5.50,所以选择编译在codegoole上面的mha4mysql-0.56

注意:本文关闭了selinux,以及iptables。

Perl编译安装

最新版MHA下载地址:

mha4mysql-manager

mha4mysql-node

题外话

本来代码是在codegoogle上面进行托管的,甚至连一些介绍的主页也是在codegoogle上面的。

但是由于github的出现,很多软件都转移到github上边了。codegoole上面的rpm包很多都已经失效。

因为来历不明的rpm不敢安装在实际环境中,所以选择使用perl编译安装。

(1)在每一个节点上面进行编译环境的安装

yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker

(2)在node4中安装manager组件

a.使用make Makefile.PL检查编译环境,功能类似于./configure

其实node1~node3这三个配置了半同步复制的数据库节点安装的是node组件,但是也是执行这两步。

一般都不会出错。而且node节点不用额外配置,所以不做重复演示了。

[root@node4 mha4mysql-manager-0.56]# perl Makefile.PL

*** Module::AutoInstall version 1.03

*** Checking for Perl dependencies...

[Core Features]

- DBI ...loaded. (1.627)

- DBD::mysql ...loaded. (4.023)

- Time::HiRes ...loaded. (1.9725)

- Config::Tiny ...loaded. (2.14)

- Log::Dispatch ...loaded. (2.41)

- Parallel::ForkManager ...loaded. (1.05)

- MHA::NodeConst ...loaded. (0.56)

*** Module::AutoInstall configuration finished.

Writing Makefile for mha4mysql::manager

b.使用make&&make install安装

[root@node4 mha4mysql-manager-0.56]# make&&make install

Skip blib/lib/MHA/ManagerUtil.pm (unchanged)

Skip blib/lib/MHA/Config.pm (unchanged)

Skip blib/lib/MHA/HealthCheck.pm (unchanged)

Skip blib/lib/MHA/ManagerConst.pm (unchanged)

Skip blib/lib/MHA/ServerManager.pm (unchanged)

Skip blib/lib/MHA/ManagerAdmin.pm (unchanged)

Skip blib/lib/MHA/FileStatus.pm (unchanged)

Skip blib/lib/MHA/ManagerAdminWrapper.pm (unchanged)

Skip blib/lib/MHA/MasterFailover.pm (unchanged)

Skip blib/lib/MHA/MasterRotate.pm (unchanged)

Skip blib/lib/MHA/MasterMonitor.pm (unchanged)

Skip blib/lib/MHA/SSHCheck.pm (unchanged)

Skip blib/lib/MHA/Server.pm (unchanged)

Skip blib/lib/MHA/DBHelper.pm (unchanged)

cp bin/masterha_stop blib/script/masterha_stop

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop

cp bin/masterha_conf_host blib/script/masterha_conf_host

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host

cp bin/masterha_check_repl blib/script/masterha_check_repl

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl

cp bin/masterha_check_status blib/script/masterha_check_status

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status

cp bin/masterha_master_monitor blib/script/masterha_master_monitor

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor

cp bin/masterha_check_ssh blib/script/masterha_check_ssh

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh

cp bin/masterha_master_switch blib/script/masterha_master_switch

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch

cp bin/masterha_secondary_check blib/script/masterha_secondary_check

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check

cp bin/masterha_manager blib/script/masterha_manager

/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager

Manifying blib/man1/masterha_stop.1

Manifying blib/man1/masterha_conf_host.1

Manifying blib/man1/masterha_check_repl.1

Manifying blib/man1/masterha_check_status.1

Manifying blib/man1/masterha_master_monitor.1

Manifying blib/man1/masterha_check_ssh.1

Manifying blib/man1/masterha_master_switch.1

Manifying blib/man1/masterha_secondary_check.1

Manifying blib/man1/masterha_manager.1

Appending installation info to /usr/lib64/perl5/perllocal.pod

数据库节点的配置

半同步复制Master节点Node1的MariaDB配置文件

[mysqld]

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

# Settings user and group are ignored when systemd is used.

# If you need to run mysqld under a different user or group,

# customize your systemd unit file for mariadb according to the

# instructions in http://Fedoraproject.org/wiki/Systemd

innodb_file_per_table = 1

skip_name_resolve = 1

log_bin = Master-log

log_bin_index = 1

server_id = 1

relay_log=relay-log

relay_log_purge=0

#skip-grant-tables

#skip-networking

[mysqld_safe]

log-error=/var/log/mariadb/mariadb.log

pid-file=/var/run/mariadb/mariadb.pid

#

# include all files from the config directory

#

!includedir /etc/my.cnf.d

这里需要注意的是,

半同步复制主节点和从节点都要启动了二进制日志log_bin = Master-log,中继日志relay_log=relay-log

而且这里关闭了中继日志的修剪功能relay_log_purge=0。因为这由MHA完成。

半同步复制Slave节点Node2和node3的MariaDB配置文件

[mysqld]

datadir=/var/lib/mysql/

socket=/var/lib/mysql/mysql.sock

log_bin=Master-bin

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

# Settings user and group are ignored when systemd is used.

# If you need to run mysqld under a different user or group,

# customize your systemd unit file for mariadb according to the

# instructions in http://fedoraproject.org/wiki/Systemd

skip_name_resolve=true

innodb_file_per_table=ture

server_id = 2

log_bin=bin_log

relay_log=relay-log

read_only = 1

relay_log_purge=0

[mysqld_safe]

log-error=/var/log/mariadb/mariadb.log

pid-file=/var/run/mariadb/mariadb.pid

#

# include all files from the config directory

#

!includedir /etc/my.cnf.d

这里比Master节点多一个read_only=1

假如Slave节点被提升为Master节点的话,MHA会自动将这个read_only=1去掉

并且会将修改其他Slave节点指向新的主节点,可以用show slave status\G查看。

Manager节点配置

(1)复制默认文件作为模板,并清空默认配置

cp /etc/masterha/masterha_default.cnf /etc/masterha/app1.cnf

> /etc/masterha/masterha_default.cnf

(2)配置/etc/masterha/app1.cnf,用于启动manager进程的时候指定。

MHA的一个manager节点可以通过启动多个进程来监控多个MHA集群,所以使用app1,app2的方式。

[server default]

#manager_workdir=/var/log/masterha/app1

#manager_log=/var/log/masterha/app1/manager.log

user=root

password=123456789

manager_workdir=/data/masterha/app1

manager_log=/data/masterha/app1/manager.log

remote_workdir=/data/masterha/app1

ssh_user=root

repl_user=repuser

repl_password=repuser

ping_interval=1

[server1]

hostname=node1

candidate_master=1

[server2]

hostname=node2

candidate_master=1

[server3]

hostname=node3

这里的user和password指的是数据库管理员的账号密码

repl_user和repl_password是具有复制权限的用户和密码

ssh_user=root是ssh的账户,由于是秘钥认证,并不需要密码

配置文件中,hostname=node1是因为主机可以使用node1访问到该主机,这里也可以用ip地址。

(3)创建配置文件中manager_workdir的工作路径

mkdir /data/masterha/app1/

利用MHA的工具测试环境是否正常

(1)测试ssh是否连接正常

[root@node4 mha4mysql-manager-0.56]# masterha_check_ssh --conf=/etc/masterha/app1.cnf

Thu Nov 10 22:59:03 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Thu Nov 10 22:59:03 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..

Thu Nov 10 22:59:03 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..

Thu Nov 10 22:59:03 2016 - [info] Starting SSH connection tests..

Thu Nov 10 22:59:04 2016 - [debug]

Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node1(192.168.2.201:22) to root@node2(192.168.2.202:22)..

Thu Nov 10 22:59:03 2016 - [debug] ok.

Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node1(192.168.2.201:22) to root@node3(192.168.2.203:22)..

Thu Nov 10 22:59:03 2016 - [debug] ok.

Thu Nov 10 22:59:04 2016 - [debug]

Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node2(192.168.2.202:22) to root@node1(192.168.2.201:22)..

Thu Nov 10 22:59:04 2016 - [debug] ok.

Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node2(192.168.2.202:22) to root@node3(192.168.2.203:22)..

Thu Nov 10 22:59:04 2016 - [debug] ok.

Thu Nov 10 22:59:05 2016 - [debug]

Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node3(192.168.2.203:22) to root@node1(192.168.2.201:22)..

Thu Nov 10 22:59:04 2016 - [debug] ok.

Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node3(192.168.2.203:22) to root@node2(192.168.2.202:22)..

Thu Nov 10 22:59:05 2016 - [debug] ok.

Thu Nov 10 22:59:05 2016 - [info] All SSH connection tests passed successfully.

这么多输出信息,其实只看最后一句就知道ssh是否正常了

这里需要注意的是这里指定了刚才配置的app1.

(2)测试复制功能是否正常

[root@node4 mha4mysql-manager-0.56]# masterha_check_repl --conf=/etc/masterha/app1.cnf

Thu Nov 10 23:07:35 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Thu Nov 10 23:07:35 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..

Thu Nov 10 23:07:35 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..

Thu Nov 10 23:07:35 2016 - [info] MHA::MasterMonitor version 0.56.

Thu Nov 10 23:07:35 2016 - [info] GTID failover mode = 0

Thu Nov 10 23:07:35 2016 - [info] Dead Servers:

Thu Nov 10 23:07:35 2016 - [info] Alive Servers:

Thu Nov 10 23:07:35 2016 - [info] node1(192.168.2.201:3306)

Thu Nov 10 23:07:35 2016 - [info] node2(192.168.2.202:3306)

Thu Nov 10 23:07:35 2016 - [info] node3(192.168.2.203:3306)

Thu Nov 10 23:07:35 2016 - [info] Alive Slaves:

Thu Nov 10 23:07:35 2016 - [info] node2(192.168.2.202:3306) Version=5.5.50-MariaDB (oldest major version between slaves) log-bin:enabled

Thu Nov 10 23:07:35 2016 - [info] Replicating from 192.168.2.201(192.168.2.201:3306)

Thu Nov 10 23:07:35 2016 - [info] Primary candidate for the new Master (candidate_master is set)

Thu Nov 10 23:07:35 2016 - [info] node3(192.168.2.203:3306) Version=5.5.50-MariaDB (oldest major version between slaves) log-bin:enabled

Thu Nov 10 23:07:35 2016 - [info] Replicating from 192.168.2.201(192.168.2.201:3306)

Thu Nov 10 23:07:35 2016 - [info] Current Alive Master: node1(192.168.2.201:3306)

Thu Nov 10 23:07:35 2016 - [info] Checking slave configurations..

Thu Nov 10 23:07:35 2016 - [warning] relay_log_purge=0 is not set on slave node3(192.168.2.203:3306).

Thu Nov 10 23:07:35 2016 - [info] Checking replication filtering settings..

Thu Nov 10 23:07:35 2016 - [info] binlog_do_db= , binlog_ignore_db=

Thu Nov 10 23:07:35 2016 - [info] Replication filtering check ok.

Thu Nov 10 23:07:35 2016 - [info] GTID (with auto-pos) is not supported

Thu Nov 10 23:07:35 2016 - [info] Starting SSH connection tests..

Thu Nov 10 23:07:37 2016 - [info] All SSH connection tests passed successfully.

Thu Nov 10 23:07:37 2016 - [info] Checking MHA Node version..

Thu Nov 10 23:07:37 2016 - [info] Version check ok.

Thu Nov 10 23:07:37 2016 - [info] Checking SSH publickey authentication settings on the current master..

Thu Nov 10 23:07:37 2016 - [info] HealthCheck: SSH to node1 is reachable.

Thu Nov 10 23:07:37 2016 - [info] Master MHA Node version is 0.56.

Thu Nov 10 23:07:37 2016 - [info] Checking recovery script configurations on node1(192.168.2.201:3306)..

Thu Nov 10 23:07:37 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=Master-log.000006

Thu Nov 10 23:07:37 2016 - [info] Connecting to [email protected](node1:22)..

Creating /data/masterha/app1 if not exists.. ok.

Checking output directory is accessible or not..

ok.

Binlog found at /var/lib/mysql, up to Master-log.000006

Thu Nov 10 23:07:38 2016 - [info] Binlog setting check done.

Thu Nov 10 23:07:38 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Thu Nov 10 23:07:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node2 --slave_ip=192.168.2.202 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.50-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx

Thu Nov 10 23:07:38 2016 - [info] Connecting to [email protected](node2:22)..

Checking slave recovery environment settings..

Opening /var/lib/mysql/relay-log.info ... ok.

Relay log found at /var/lib/mysql, up to relay-log.000004

Temporary relay log file is /var/lib/mysql/relay-log.000004

Testing mysql connection and privileges.. done.

Testing mysqlbinlog output.. done.

Cleaning up test file(s).. done.

Thu Nov 10 23:07:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node3 --slave_ip=192.168.2.203 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.50-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx

Thu Nov 10 23:07:38 2016 - [info] Connecting to [email protected](node3:22)..

Checking slave recovery environment settings..

Opening /var/lib/mysql/relay-log.info ... ok.

Relay log found at /var/lib/mysql, up to relay-log.000002

Temporary relay log file is /var/lib/mysql/relay-log.000002

Testing mysql connection and privileges.. done.

Testing mysqlbinlog output.. done.

Cleaning up test file(s).. done.

Thu Nov 10 23:07:38 2016 - [info] Slaves settings check done.

Thu Nov 10 23:07:38 2016 - [info]

node1(192.168.2.201:3306) (current master)

+--node2(192.168.2.202:3306)

+--node3(192.168.2.203:3306)

Thu Nov 10 23:07:38 2016 - [info] Checking replication health on node2..

Thu Nov 10 23:07:38 2016 - [info] ok.

Thu Nov 10 23:07:38 2016 - [info] Checking replication health on node3..

Thu Nov 10 23:07:38 2016 - [info] ok.

Thu Nov 10 23:07:38 2016 - [warning] master_ip_failover_script is not defined.

Thu Nov 10 23:07:38 2016 - [warning] shutdown_script is not defined.

Thu Nov 10 23:07:38 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

(3)最激动人心的时刻到了,启动服务!

[root@node4 mha4mysql-manager-0.56]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &

[1] 8463

(4)查看masterha是否正在正常运行,还有主节点信息。

[root@node4 mha4mysql-manager-0.56]# masterha_check_status --conf=/etc/masterha/app1.cnf

app1 (pid:8463) is running(0:PING_OK), master:node1

模拟MHA故障

(1)Master节点·node1·关闭MariaDB

systemctl stop mariadb.service

(2)查看manager节点的状况

[root@node4 mha4mysql-manager-0.56]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 is stopped(2:NOT_RUNNING).

[1]+ Done nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1

可以看出MHA程序masterha_manager已经退出了

同时还要注意一点,在工作路径/data/masterha/app1/下会生成一个app1.failover.complete的文件。

如果需要启动的时候,最好删除这个文件,否则会启动失败。

(3)去node3查看slave信息,node3指向新的Master节点。

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.2.202

Master_User: repuser

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: bin_log.000002

Read_Master_Log_Pos: 245

Relay_Log_File: relay-log.000002

Relay_Log_Pos: 527

Relay_Master_Log_File: bin_log.000002

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 245

Relay_Log_Space: 815

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 2

(4)node2原本作为从节点所设置的只读属性也自动取消了。

MariaDB [(none)]> show variables like '%read_only%';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | OFF |

+---------------+-------+

1 row in set (0.00 sec)

(5)灾后重建的步骤

我们知道,当时原有master故障的时候,masterha_manager会通过二进制日志和中继日志的状况,选举出新的master节点,并由只读状态改为可读写的状态会退出。

所以接下来要怎么做呢?

a.删除工作路径下的failover.complete文件。

如/data/masterha/app1/app1.failover.complete

b.原有的master,也就是node1节点。

需要清空数据库,再将node2全备一次,恢复到node1上面来

并配置node1为Slave节点,并指向新的节点node2

c.重新通过masterha_check等工具检测环境是否正常,并重新启动MHA的主程序masterha_manager。

你可能感兴趣的:(mysql,mha,简书)