HA 即(high available cluster)高可用集群,又称双机热备,保证关键性业务的不间断提供服务。 如:两台机器A和B,正常情况A提供服务,B待命闲置;一但A宕机或服务宕掉,自动切换至B机继续提供服务。实现高可用的开源软件有heartbeat和keepalived,其中keepalived还有负载均衡的功能。heartbeat作为常用集群开源软件,熟悉它的配置方法,非常有必要。
说明:以下是heartbeat的yum安装和配置的方法介绍,需要扩展epel源,如果没有,执行命令:
# yum install -y epel-release
1. 试验环境:
两个CentOS 6.0 64位虚拟机(master:eth1: 192.168.220.11;slave:eth1: 192.168.220.22),master主机设置一个虚拟ip作为心跳线(虚拟机只有一个网卡,实际应用中应该有多个网卡,或者用串口来连接,否则会有不安全因素)
2. 前期准备:
【1】修改hostname:(修改hostname的目的是为了便于记忆,hostname可以自定义)
master主机:
# vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=master # hostname master;bash
slave主机:
# vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=slave # hostname slave;bah
【2】修改/etc/hosts文件(两台主机作同样的配置)
# vim /etc/hosts 192.168.220.11 master 192.168.220.22 slave
【3】关闭防火墙
# iptables -F # getenforce //若get到Disabled,不需做配置;若get到的是Enforcing,作如下修改: # vim /etc/selinux/config SELINUX=enforcing --> SELINUX=disabled
【4】虚拟ip的设定
# cd /etc/sysconfig/network-scripts # cp ifcfg-eth1 ifcfg-eth1:0 # vim ifcfg-eth1:0 //简单配置,很多参数都不需要设定,如下: DEVICE=eth1:0 //修改为eth1:0 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static IPADDR=192.168.220.33 //修改为33 NETMASK=255.255.255.0 # /etc/init.d/network restart # ifconfig //配置正确的话,能列出eth1:0虚拟网卡的信息
3. heartbeat的安装和配置:
【1】yum安装:# yum install -y heartbeat* libnet nginx //依赖libnet,nginx是我们试验的服务,可以用yum安装。
【2】master主机的配置:
# cd /usr/share/doc/heartbeat-3.0.4/ //注意版本的问题,可以不是3.0.4 # cp authkeys ha.cf haresources /etc/ha.d/ //拷贝3个核心配置文件 # cd /etc/ha.d
(1)修改authkeys
# vim authkeys //最后4行配置如下: # auth 1 #1 crc //最不严谨 #2 sha1 HI! //最严谨 #3 md5 Hello! //中间值
将第一行的auth后面的值修改成3,并且,打开最后一行的注释,即选择中间严谨的类型。
# chmod 600 authkeys //修改权限为600,否则heartbeat无法启动
(2)修改haresources
# vim haresources //默认是全部注释的,所以可以在后面追加一行: master 192.168.220.33/24/eth1:0 nginx //注意这里的ip是虚拟网卡的ip,即心跳线的配置ip,24规定网段,nginx是我们要试验的服务名称
(3)修改ha.cf
# > ha.cf //清空配置 # vim !$ //编辑,添加如下配置: debugfile /var/log/ha-debug //排错日志路径 logfile /var/log/ha-log //运行日志 logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 60 udpport 694 ucast eth1 192.168.220.22 //slave的网卡ip auto_failback on node master node slave ping 192.168.220.2 //仲裁地址,一般为路由器地址,或者一个稳妥的、服务稳定的ip respawn hacluster /usr/lib64/heartbeat/ipfail //注意: 32bit的linux系统,路径为lib,而非lib64,如下: ########## ERROR: Client child command [/usr/lib/heartbeat/ipfail] is not executable ##############
(4)复制配置文件到slave主机:
# scp authkeys ha.cf haresources slave:/etc/ha.d/
【3】slave主机的配置: 只需要修改ha.cf:
ucast eth1 192.168.220.22 --> ucast eth1 192.168.220.11 //将ip改成master的ip地址
【4】启动heartbeat(先master,后slave)
(1)master主机
# /etc/init.d/heartbeat start Starting High-Availability services: INFO: Running OK CRITICAL: Resource 192.168.220.33/24/eth1:0 is active, and should not be! CRITICAL: Non-idle resources can affect data integrity! info: If you don't know what this means, then get help! info: Read the docs and/or source to /usr/share/heartbeat/ResourceManager for more details. CRITICAL: Resource 192.168.220.33/24/eth1:0 is active, and should not be! CRITICAL: Non-idle resources can affect data integrity! info: If you don't know what this means, then get help! info: Read the docs and/or the source to /usr/share/heartbeat/ResourceManager for more details. CRITICAL: Non-idle resources will affect resource takeback! CRITICAL: Non-idle resources may affect data integrity! Done.
heartbeat会自动拉起nginx,不过第一次启动会比较慢。过一段时间(10S多),检查nginx是否被拉起:
# ps aux |grep nginx
(2)修改nginx的index.html,方便查看机器的运行状况:
# > /usr/share/doc/nginx/html/index.html //清空 # echo "masterMMMMMMMMMMMM" > !$
如果nginx已经启动,在浏览器里面输入下面网址:192.168.220.33,应该可以得到回执结果(虚拟网卡的ip): masterMMMMMMMMMMMM
(3)slave主机:
正常情况下,nginx是不被拉起的,因为主机还没宕机,所以ps aux |grep nginx的结果是空。
修改nginx的index.html:
# > /usr/share/doc/nginx/html/index.html # echo "slaveSSSSSSSSSSSSSS" > !$
心跳线检测的原理是ping,那么我们将master的ping服务关闭,heartbeat检测到ping失败后,会将nginx的服务转给slave来执行:
iptables -A INPUT -p icmp -j DROP //ping命令来自icmp协议,关掉协议,ping失效。
这时候,可以用tail -f /var/log/ha-log命令来查看heartbeat的处理过程:
master的ha-log日志内容:
Jan 11 22:47:32 master heartbeat: [2574]: WARN: node 192.168.220.2: is dead //ping 192.168.220.2路由器失败 Jan 11 22:47:32 master ipfail: [2601]: info: Status update: Node 192.168.220.2 now has status dead Jan 11 22:47:32 master heartbeat: [2574]: info: Link 192.168.220.2:192.168.220.2 dead. //路由器挂了 harc(default)[2929]: 2016/01/11_22:47:32 info: Running /etc/ha.d//rc.d/status status Jan 11 22:47:33 master ipfail: [2601]: info: NS: We are dead. :< Jan 11 22:47:33 master ipfail: [2601]: info: Link Status update: Link 192.168.220.2/192.168.220.2 now has status dead Jan 11 22:47:34 master ipfail: [2601]: info: We are dead. :< //哦,原来是我们自己挂了 Jan 11 22:47:34 master ipfail: [2601]: info: Asking other side for ping node count. Jan 11 22:47:37 master ipfail: [2601]: info: Giving up because we were told that we have less ping nodes. Jan 11 22:47:37 master ipfail: [2601]: info: Delayed giveup in 4 seconds. Jan 11 22:47:41 master ipfail: [2601]: info: giveup() called (timeout worked) Jan 11 22:47:42 master heartbeat: [2574]: info: master wants to go standby [all] Jan 11 22:47:42 master heartbeat: [2574]: info: standby: slave can take our all resources //从可接管服务 Jan 11 22:47:42 master heartbeat: [2956]: info: give up all HA resources (standby). //放弃我们的工作 ResourceManager(default)[2969]: 2016/01/11_22:47:42 info: Releasing resource group: master 192.168.220.33/24/eth1:0 nginx ResourceManager(default)[2969]: 2016/01/11_22:47:42 info: Running /etc/init.d/nginx stop //停掉nginx服务 ResourceManager(default)[2969]: 2016/01/11_22:47:42 info: Running /etc/ha.d/resource.d/IPaddr 192.168.220.33/24/eth1:0 stop IPaddr(IPaddr_192.168.220.33)[3057]: 2016/01/11_22:47:42 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.220.33)[3031]: 2016/01/11_22:47:42 INFO: Success Jan 11 22:47:42 master heartbeat: [2956]: info: all HA resource release completed (standby). Jan 11 22:47:42 master heartbeat: [2574]: info: Local standby process completed [all]. Jan 11 22:47:43 master heartbeat: [2574]: WARN: 1 lost packet(s) for [slave] [459:461] Jan 11 22:47:43 master heartbeat: [2574]: info: remote resource transition completed. //远程资源传递完成 Jan 11 22:47:43 master heartbeat: [2574]: info: No pkts missing from slave! //没有遗失数据 Jan 11 22:47:43 master heartbeat: [2574]: info: Other node completed standby takeover of all resources. //slave节点完全接管我们的工作
slave的ha-log内容:
Jan 12 11:48:17 slave ipfail: [115215]: info: Telling other node that we have more visible ping nodes. //告知master,我们可以ping通 Jan 12 11:48:22 slave heartbeat: [115188]: info: master wants to go standby [all] //master想让我们接手 Jan 12 11:48:22 slave heartbeat: [115188]: info: standby: acquire [all] resources from master //接受来自master的资源 Jan 12 11:48:22 slave heartbeat: [115841]: info: acquire all HA resources (standby). ResourceManager(default)[115854]: 2016/01/12_11:48:22 info: Acquiring resource group: master 192.168.220.33/24/eth1:0 nginx /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.220.33)[115882]: 2016/01/12_11:48:22 INFO: Resource is stopped ResourceManager(default)[115854]: 2016/01/12_11:48:22 info: Running /etc/ha.d/resource.d/IPaddr 192.168.220.33/24/eth1:0 start //启动心跳线网卡 IPaddr(IPaddr_192.168.220.33)[116015]: 2016/01/12_11:48:22 INFO: Adding inet address 192.168.220.33/24 with broadcast address 192.168.220.255 to device eth1 (with label eth1:0) //虚拟网卡指向我们的网卡 IPaddr(IPaddr_192.168.220.33)[116015]: 2016/01/12_11:48:22 INFO: Bringing device eth1 up IPaddr(IPaddr_192.168.220.33)[116015]: 2016/01/12_11:48:22 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.220.33 eth1 192.168.220.33 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.220.33)[115989]: 2016/01/12_11:48:22 INFO: Success //网卡配置完毕 ResourceManager(default)[115854]: 2016/01/12_11:48:22 info: Running /etc/init.d/nginx start //启动nginx服务 Jan 12 11:48:23 slave heartbeat: [115841]: info: all HA resource acquisition completed (standby). //所有HA资源接手完毕 Jan 12 11:48:23 slave heartbeat: [115188]: info: Standby resource acquisition done [all]. //资源接手完毕 Jan 12 11:48:24 slave heartbeat: [115188]: info: remote resource transition completed. //远程资源传送完毕,完活儿!!!
根据这些内容,我们可以知道heartbeat的运行过程;如此,在浏览器输入心跳线网卡地址的时候:192.168.220.33,得到如下返回结果:
slaveSSSSSSSSSSSSSS
这时候,master的nginx被关闭,而slave的nginx正式接手,完成了服务的不间断提供。
如果刚才不是用防火墙,而是执行命令,将heartbeat服务关闭,结果也是一样的,slave会接手nginx服务。那么,如果将ipatables刚设的规则去掉,或者重新开启heartbeat服务,会怎么样呢?
# ipatales -D INPUT -p icmp -j DROP # service heartbeat start
结果是,slave自动关闭nginx,master的nginx又重新启动,接手web服务,可以自己亲自试验一下。刷新浏览器,可以清楚的看到结果。