Heartbeat2.1.3双机热备

硬件环境
CentOS1
eth0 Host-only 192.168.91.129/24 server201 对外IP地址
eth1 Custom(VMnet2) 10.0.0.201/8 HA01      心跳地址

CentOS2
eth0 Host-only 192.168.91.131/24 server202 对外IP地址
eth1 Custom(VMnet2) 10.0.0.202/8 HA02      心跳地址

eth0:1 192.168.91.130 对外服务集群IP地址




server201上的/etc/hosts
192.168.91.129 server201 HA01
10.0.0.201 HA01
10.0.0.202 HA02
192.168.91.131 server202

server202上的/etc/hosts
192.168.91.131 server202 HA02
10.0.0.201 HA02
10.0.0.202 HA01
192.168.91.129 server201

setup
vi /etc/sysconfig/network           
vi /etc/sysconfig/network-scripts/ifcfg-eth0    
vi /etc/sysconfig/network-scripts/ifcfg-eth1
hostname
uname -n

以下两机配置一样
yum install perl-* php-snmp PyXML ipvsadm
rpm -ivh libnet-1.1.2.1-2.2.el5.rf.i386.rpm
rpm -ivh heartbeat-ldirectord-2.1.3-3.el5.centos.i386.rpm #注意安装顺序
rpm -ivh heartbeat-pils-2.1.3-3.el5.centos.i386.rpm
rpm -ivh heartbeat-stonith-2.1.3-3.el5.centos.i386.rpm
rpm -ivh heartbeat-2.1.3-3.el5.centos.i386.rpm
   ldirectordLdirectord的作用是监测Real Server,当Real Server失效时,把它从虚拟服务器列表中删除,恢复时重新添加到列表,在LVS里叫这个名字,在Heartbeat里叫Node
   Heartbeat提供HA的基本功能:心跳检测,资源接管,监测群集中系统服务,在节点间转移共享IP地址所有者
资源可以包括:磁盘分区,文件系统,IP地址,应用程序服务,NFS文件系统
  Stonith Shoot The Other Node In the Head (俗称爆头)设备是一种能够自动关闭电源来响应软件命令的设备
  Watchdog在实现上可以是硬件电路也可以是软件定时器,能够在系统出现故障时自动重新启动系统。
  VS:Linux Virtual  Server类似于iptables的架构,在内核中有一段代码用于实时监听数据包来源的请求,当数据包到达端口时做一次重定向。这一系列的工作必须在内核中实现。在内核中实现数据包请求处理的代码叫做ipvs。ipvs仅仅提供了功能框架,还需要自己手动定义是数据对哪个服务的请求

基础概念
http://lyp0909.blog.51cto.com/508999/546865

cp /usr/share/doc/heartbeat-2.1.3/ha.cf /etc/ha.d/       
cp /usr/share/doc/heartbeat-2.1.3/authkeys /etc/ha.d/       
cp /usr/share/doc/heartbeat-2.1.3/haresources /etc/ha.d/
touch /etc/ha.d/resource.d/test #测试服务,这个脚本来代替实际环境中的服务。结果是在messages中写入一条信息root: /etc/ha.d/resource.d/test called with start
vim /etc/ha.d/resource.d/test
#!/bin/bash
logger $0 called with $1
case "$1" in
start)
#start commands go here
;;
stop)
#stop commands go here
;;
status)
#status commands go here
;;
esac
chmod 755 /etc/ha.d/resource.d/test
/etc/ha.d/resource.d/test start
tail /var/log/message
vi /etc/ha.d/authkeys #加密方式
auth 1
1 crc #加密级别最低
chmod 600 /etc/ha.d/authkeys #这个文件不是600权限服务不能启动
vi /etc/ha.d/haresources
server201 IPaddr::192.168.91.130/24/eth0 test  #声明HA资源,主机名,HA IP,这里声明heartbeat自动启动和停止的服务为test
脚本参数用::分开
heartbeat 在启动会使用<scriptname> start形式运行脚本,启动顺序是先运行realserver脚本,再运行IPaddr脚本,最后运行ldirectord脚本,停止时会使用<scriptname> stop运行脚本,停止顺序是先停止ldirectord脚本,再停止IPaddr脚本,最后停止realserver脚本。
vi /etc/ha.d/ha.cf
logfile /var/log/ha-log #ha的日志文件
logfacility     local0
keepalive 500ms #心跳检测频率
deadtime 10 #10秒未检测到宣告死亡
warntime 5 #5秒未检测到发出警告
initdead 60 #网络重启时间
udpport 694
bcast   eth1 #使用eth1广播heartbeat UDP包,单网卡也可以ucast eth0 192.168.91.131,但在另外一机此IP须改
auto_failback on #主机恢复后夺回控制权
node server201 #声明节点,必须与uname -n一样
node server202 #声明节点
hopfudge 1
ping 192.168.91.1
ping_group group1 192.168.91.129 192.168.91.131
respawn hacluster /usr/lib/heartbeat/ipfail #需要配合ping语句用于检测和处理网络故障
vi /etc/sysconfig/iptables
-A RH-Firewall-1-INPUT -p udp -m udp --dport 694 -d 10.0.0.201 -j ACCEPT
service iptables stop
chkconfig --levels 345 heartbeat on
service heartbeat start
tcpdump -i eth1 -n -p udp port 694
测试方法1 service heartbeat stop
用这个方法ping包根本不会丢,先在主节点上停HA服务,可以在备节点上看到马上开始test服务
[root@server202 ha.d]# cat /var/log/messages |grep "/etc/ha.d/resource.d/test"
Dec 27 14:30:19 server202 logger: /etc/ha.d/resource.d/test called with status
Dec 27 14:30:19 server202 ResourceManager[19903]: info: Running /etc/ha.d/resource.d/test  start
Dec 27 14:30:19 server202 logger: /etc/ha.d/resource.d/test called with start
Dec 27 14:30:28 server202 ResourceManager[20199]: info: Running /etc/ha.d/resource.d/test  stop
Dec 27 14:30:28 server202 logger: /etc/ha.d/resource.d/test called with stop
Dec 27 14:31:59 server202 logger: /etc/ha.d/resource.d/test called with status
Dec 27 14:31:59 server202 ResourceManager[20573]: info: Running /etc/ha.d/resource.d/test  start
Dec 27 14:31:59 server202 logger: /etc/ha.d/resource.d/test called with start
Dec 27 14:33:19 server202 ResourceManager[20939]: info: Running /etc/ha.d/resource.d/test  stop
Dec 27 14:33:19 server202 logger: /etc/ha.d/resource.d/test called with stop
[root@server201 resource.d]# cat /var/log/messages |grep "/etc/ha.d/resource.d/test"
Dec 27 14:30:17 server201 ResourceManager[12162]: info: Running /etc/ha.d/resource.d/test  stop
Dec 27 14:30:17 server201 logger: /etc/ha.d/resource.d/test called with stop
Dec 27 14:31:15 server201 logger: /etc/ha.d/resource.d/test called with status
Dec 27 14:31:15 server201 ResourceManager[21426]: info: Running /etc/ha.d/resource.d/test  start
Dec 27 14:31:15 server201 logger: /etc/ha.d/resource.d/test called with start
Dec 27 14:31:57 server201 ResourceManager[28720]: info: Running /etc/ha.d/resource.d/test  stop
Dec 27 14:31:58 server201 logger: /etc/ha.d/resource.d/test called with stop
Dec 27 14:33:20 server201 logger: /etc/ha.d/resource.d/test called with status
Dec 27 14:33:20 server201 ResourceManager[9107]: info: Running /etc/ha.d/resource.d/test  start
Dec 27 14:33:20 server201 logger: /etc/ha.d/resource.d/test called with start

测试方法2,在本机一直ping 192.168.91.130,将主机server201直接关电,在备机上看到
heartbeat[5512]: 2011/12/27_12:59:03 WARN: node server201: is dead
heartbeat[5512]: 2011/12/27_12:59:03 WARN: No STONITH device configured.
heartbeat[5512]: 2011/12/27_12:59:03 WARN: Shared disks are not protected.
heartbeat[5512]: 2011/12/27_12:59:03 info: Resources being acquired from server201.
heartbeat[5512]: 2011/12/27_12:59:03 info: Link server201:eth1 dead.
heartbeat[5641]: 2011/12/27_12:59:03 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys server202] to acquire.
harc[5640]:     2011/12/27_12:59:03 info: Running /etc/ha.d/rc.d/status status
mach_down[5669]:        2011/12/27_12:59:03 info: Taking over resource group IPaddr::192.168.91.130/24/eth0
ResourceManager[5695]:  2011/12/27_12:59:03 info: Acquiring resource group: server201 IPaddr::192.168.91.130/24/eth0
IPaddr[5722]:   2011/12/27_12:59:03 INFO:  Resource is stopped
ResourceManager[5695]:  2011/12/27_12:59:03 info: Running /etc/ha.d/resource.d/IPaddr 192.168.91.130/24/eth0 start
IPaddr[5820]:   2011/12/27_12:59:04 INFO: Using calculated netmask for 192.168.91.130: 255.255.255.0
IPaddr[5820]:   2011/12/27_12:59:04 INFO: eval ifconfig eth0:0 192.168.91.130 netmask 255.255.255.0 broadcast 192.168.91.255
IPaddr[5791]:   2011/12/27_12:59:04 INFO:  Success
mach_down[5669]:        2011/12/27_12:59:04 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5669]:        2011/12/27_12:59:04 info: mach_down takeover complete for node server201.
heartbeat[5512]: 2011/12/27_12:59:04 info: mach_down takeover complete.
可以看到备机马上启用了eth0:0接口
[root@server202 ha.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:97:95:79
          inet addr:192.168.91.131  Bcast:192.168.91.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe97:9579/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1412 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:133123 (130.0 KiB)  TX bytes:251446 (245.5 KiB)
          Interrupt:67 Base address:0x2024

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:97:95:79
          inet addr:192.168.91.130  Bcast:192.168.91.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:67 Base address:0x2024

eth1      Link encap:Ethernet  HWaddr 00:0C:29:97:95:83
          inet addr:10.0.0.202  Bcast:10.255.255.255  Mask:255.0.0.0
          inet6 addr: fe80::20c:29ff:fe97:9583/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1070 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1227 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:235518 (229.9 KiB)  TX bytes:268545 (262.2 KiB)
          Interrupt:67 Base address:0x20a4

将主机开机,等HA服务启动,即可备机上可以看到以下日志

heartbeat[5512]: 2011/12/27_13:02:54 info: Heartbeat restart on node server201
heartbeat[5512]: 2011/12/27_13:02:54 info: Link server201:eth1 up.
heartbeat[5512]: 2011/12/27_13:02:54 info: Status update for node server201: status init
heartbeat[5512]: 2011/12/27_13:02:54 info: Status update for node server201: status up
heartbeat[5512]: 2011/12/27_13:02:54 info: all clients are now paused
harc[6042]:     2011/12/27_13:02:54 info: Running /etc/ha.d/rc.d/status status
harc[6058]:     2011/12/27_13:02:54 info: Running /etc/ha.d/rc.d/status status
heartbeat[5512]: 2011/12/27_13:02:55 info: Status update for node server201: status active
harc[6074]:     2011/12/27_13:02:55 info: Running /etc/ha.d/rc.d/status status
heartbeat[5512]: 2011/12/27_13:02:55 info: remote resource transition completed.
heartbeat[5512]: 2011/12/27_13:02:55 info: server202 wants to go standby [foreign]
heartbeat[5512]: 2011/12/27_13:02:56 info: all clients are now resumed
heartbeat[5512]: 2011/12/27_13:02:56 info: standby: server201 can take our foreign resources
heartbeat[6090]: 2011/12/27_13:02:56 info: give up foreign HA resources (standby).
ResourceManager[6103]:  2011/12/27_13:02:56 info: Releasing resource group: server201 IPaddr::192.168.91.130/24/eth0
ResourceManager[6103]:  2011/12/27_13:02:56 info: Running /etc/ha.d/resource.d/IPaddr 192.168.91.130/24/eth0 stop
IPaddr[6170]:   2011/12/27_13:02:56 INFO: ifconfig eth0:0 down
IPaddr[6141]:   2011/12/27_13:02:56 INFO:  Success
heartbeat[6090]: 2011/12/27_13:02:56 info: foreign HA resource release completed (standby).
heartbeat[5512]: 2011/12/27_13:02:56 info: Local standby process completed [foreign].
heartbeat[5512]: 2011/12/27_13:02:58 WARN: 1 lost packet(s) for [server201] [14:16]
heartbeat[5512]: 2011/12/27_13:02:58 info: remote resource transition completed.
heartbeat[5512]: 2011/12/27_13:02:58 info: No pkts missing from server201!
heartbeat[5512]: 2011/12/27_13:02:58 info: Other node completed standby takeover of foreign resources.


将备机直接关机,在HA01上看到HA02down了
heartbeat[3825]: 2011/12/27_13:05:38 WARN: node server202: is dead
heartbeat[3825]: 2011/12/27_13:05:38 WARN: No STONITH device configured.
heartbeat[3825]: 2011/12/27_13:05:38 WARN: Shared disks are not protected.
heartbeat[3825]: 2011/12/27_13:05:38 info: Resources being acquired from server202.
heartbeat[3825]: 2011/12/27_13:05:38 info: Link server202:eth1 dead.
harc[4234]:     2011/12/27_13:05:38 info: Running /etc/ha.d/rc.d/status status
mach_down[4254]:        2011/12/27_13:05:38 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[4254]:        2011/12/27_13:05:38 info: mach_down takeover complete for node server202.
heartbeat[3825]: 2011/12/27_13:05:38 info: mach_down takeover complete.
IPaddr[4320]:   2011/12/27_13:05:38 INFO:  Running OK
heartbeat[4235]: 2011/12/27_13:05:38 info: Local Resource acquisition completed.
又将备机开启,复活了
heartbeat[3825]: 2011/12/27_13:07:33 info: Heartbeat restart on node server202
heartbeat[3825]: 2011/12/27_13:07:33 info: Link server202:eth1 up.
heartbeat[3825]: 2011/12/27_13:07:33 info: Status update for node server202: status init
heartbeat[3825]: 2011/12/27_13:07:33 info: Status update for node server202: status up
harc[4632]:     2011/12/27_13:07:33 info: Running /etc/ha.d/rc.d/status status
harc[4648]:     2011/12/27_13:07:33 info: Running /etc/ha.d/rc.d/status status
heartbeat[3825]: 2011/12/27_13:07:35 info: Status update for node server202: status active
harc[4664]:     2011/12/27_13:07:35 info: Running /etc/ha.d/rc.d/status status
heartbeat[3825]: 2011/12/27_13:07:36 info: remote resource transition completed.
heartbeat[3825]: 2011/12/27_13:07:36 info: server201 wants to go standby [foreign]
heartbeat[3825]: 2011/12/27_13:07:36 info: standby: server202 can take our foreign resources
heartbeat[4680]: 2011/12/27_13:07:36 info: give up foreign HA resources (standby).
heartbeat[4680]: 2011/12/27_13:07:36 info: foreign HA resource release completed (standby).
heartbeat[3825]: 2011/12/27_13:07:36 info: Local standby process completed [foreign].
heartbeat[3825]: 2011/12/27_13:07:37 WARN: 1 lost packet(s) for [server202] [15:17]
heartbeat[3825]: 2011/12/27_13:07:37 info: remote resource transition completed.
heartbeat[3825]: 2011/12/27_13:07:37 info: No pkts missing from server202!
heartbeat[3825]: 2011/12/27_13:07:37 info: Other node completed standby takeover of foreign resources.
查看一下心跳广播
[root@server201 ~]# tcpdump -i eth1 -n -p udp port 694
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
13:22:10.412795 IP 10.0.0.201.48443 > 10.255.255.255.ha-cluster: UDP, length 178
13:22:10.902298 IP 10.0.0.202.47537 > 10.255.255.255.ha-cluster: UDP, length 179
13:22:11.204098 IP 10.0.0.201.48443 > 10.255.255.255.ha-cluster: UDP, length 180
13:22:11.204158 IP 10.0.0.201.48443 > 10.255.255.255.ha-cluster: UDP, length 179
13:22:11.204202 IP 10.0.0.201.48443 > 10.255.255.255.ha-cluster: UDP, length 180
13:22:11.698955 IP 10.0.0.202.47537 > 10.255.255.255.ha-cluster: UDP, length 179
13:22:11.985902 IP 10.0.0.201.48443 > 10.255.255.255.ha-cluster: UDP, length 179
13:22:12.483038 IP 10.0.0.202.47537 > 10.255.255.255.ha-cluster: UDP, length 179
13:22:12.772028 IP 10.0.0.201.48443 > 10.255.255.255.ha-cluster: UDP, length 179
将HA01的心跳接口down,ifdown eth1
HA01:
heartbeat[3850]: 2011/12/27_13:23:24 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:24 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:25 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:25 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:26 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:26 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:27 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:27 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:27 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:27 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:28 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:28 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:29 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:29 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:30 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:30 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:31 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:31 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:31 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:31 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:31 ERROR: glib: Unable to send bcast [-1] packet(len=179): No such device
heartbeat[3850]: 2011/12/27_13:23:31 ERROR: write_child: write failure on bcast eth1.: No such device
heartbeat[3850]: 2011/12/27_13:23:31 WARN: Temporarily Suppressing write error messages
heartbeat[3850]: 2011/12/27_13:23:31 WARN: Is a cable unplugged on bcast eth1?
heartbeat[3830]: 2011/12/27_13:23:39 WARN: node server202: is dead
heartbeat[3830]: 2011/12/27_13:23:39 WARN: No STONITH device configured.
heartbeat[3830]: 2011/12/27_13:23:39 WARN: Shared disks are not protected.
heartbeat[3830]: 2011/12/27_13:23:39 info: Resources being acquired from server202.
heartbeat[3830]: 2011/12/27_13:23:39 info: Link server202:eth1 dead.
harc[4643]:     2011/12/27_13:23:39 info: Running /etc/ha.d/rc.d/status status
mach_down[4673]:        2011/12/27_13:23:39 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[4673]:        2011/12/27_13:23:39 info: mach_down takeover complete for node server202.
heartbeat[3830]: 2011/12/27_13:23:39 info: mach_down takeover complete.
IPaddr[4729]:   2011/12/27_13:23:40 INFO:  Running OK
heartbeat[4644]: 2011/12/27_13:23:40 info: Local Resource acquisition completed.
heartbeat[3830]: 2011/12/27_13:24:04 CRIT: Cluster node server202 returning after partition.
heartbeat[3830]: 2011/12/27_13:24:04 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
heartbeat[3830]: 2011/12/27_13:24:04 WARN: Deadtime value may be too small.
heartbeat[3830]: 2011/12/27_13:24:04 info: See FAQ for information on tuning deadtime.
heartbeat[3830]: 2011/12/27_13:24:04 info: URL: http://linux-ha.org/FAQ#heavy_load
heartbeat[3830]: 2011/12/27_13:24:04 info: Link server202:eth1 up.
heartbeat[3830]: 2011/12/27_13:24:04 WARN: Late heartbeat: Node server202: interval 25550 ms
heartbeat[3830]: 2011/12/27_13:24:04 info: Status update for node server202: status active
harc[4882]:     2011/12/27_13:24:04 info: Running /etc/ha.d/rc.d/status status
heartbeat[3830]: 2011/12/27_13:24:07 WARN: Shutdown delayed until current resource activity finishes.
heartbeat[3830]: 2011/12/27_13:24:08 info: Heartbeat shutdown in progress. (3830)
heartbeat[3830]: 2011/12/27_13:24:08 info: Received shutdown notice from 'server202'.
heartbeat[3830]: 2011/12/27_13:24:08 info: Resource takeover cancelled - shutdown in progress.
heartbeat[4899]: 2011/12/27_13:24:08 info: Giving up all HA resources.
ResourceManager[4912]:  2011/12/27_13:24:08 info: Releasing resource group: server201 IPaddr::192.168.91.130/24/eth0
ResourceManager[4912]:  2011/12/27_13:24:08 info: Running /etc/ha.d/resource.d/IPaddr 192.168.91.130/24/eth0 stop
IPaddr[4979]:   2011/12/27_13:24:08 INFO: ifconfig eth0:0 down
IPaddr[4950]:   2011/12/27_13:24:08 INFO:  Success
heartbeat[4899]: 2011/12/27_13:24:08 info: All HA resources relinquished.
heartbeat[3830]: 2011/12/27_13:24:11 info: killing HBWRITE process 3850 with signal 15
heartbeat[3830]: 2011/12/27_13:24:11 info: killing HBREAD process 3851 with signal 15
heartbeat[3830]: 2011/12/27_13:24:11 info: killing HBFIFO process 3849 with signal 15
heartbeat[3830]: 2011/12/27_13:24:11 info: Core process 3851 exited. 3 remaining
heartbeat[3830]: 2011/12/27_13:24:11 info: Core process 3850 exited. 2 remaining
heartbeat[3830]: 2011/12/27_13:24:11 info: Core process 3849 exited. 1 remaining
heartbeat[3830]: 2011/12/27_13:24:11 info: server201 Heartbeat shutdown complete.
heartbeat[3830]: 2011/12/27_13:24:11 info: Heartbeat restart triggered.
heartbeat[3830]: 2011/12/27_13:24:11 info: Restarting heartbeat.
heartbeat[3830]: 2011/12/27_13:24:11 info: Performing heartbeat restart exec.
heartbeat[3830]: 2011/12/27_13:24:22 info: Version 2 support: false
heartbeat[3830]: 2011/12/27_13:24:22 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[3830]: 2011/12/27_13:24:22 info: **************************
heartbeat[3830]: 2011/12/27_13:24:22 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[5022]: 2011/12/27_13:24:22 info: heartbeat: version 2.1.3
heartbeat[5022]: 2011/12/27_13:24:22 info: Heartbeat generation: 1324959288
heartbeat[5022]: 2011/12/27_13:24:22 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[5022]: 2011/12/27_13:24:22 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
heartbeat[5022]: 2011/12/27_13:24:22 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5022]: 2011/12/27_13:24:22 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5022]: 2011/12/27_13:24:22 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5022]: 2011/12/27_13:24:22 info: Local status now set to: 'up'
heartbeat[5022]: 2011/12/27_13:24:24 info: Link server202:eth1 up.
heartbeat[5022]: 2011/12/27_13:24:24 info: Status update for node server202: status up
heartbeat[5022]: 2011/12/27_13:24:24 info: Link server201:eth1 up.
heartbeat[5022]: 2011/12/27_13:24:24 info: Comm_now_up(): updating status to active
heartbeat[5022]: 2011/12/27_13:24:24 info: Local status now set to: 'active'
harc[5028]:     2011/12/27_13:24:24 info: Running /etc/ha.d/rc.d/status status
heartbeat[5022]: 2011/12/27_13:24:24 info: Status update for node server202: status active
harc[5047]:     2011/12/27_13:24:24 info: Running /etc/ha.d/rc.d/status status
heartbeat[5022]: 2011/12/27_13:24:40 info: local resource transition completed.
heartbeat[5022]: 2011/12/27_13:24:40 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[5099]:   2011/12/27_13:24:41 INFO:  Resource is stopped
heartbeat[5063]: 2011/12/27_13:24:41 info: Local Resource acquisition completed.
heartbeat[5022]: 2011/12/27_13:24:41 info: remote resource transition completed.
harc[5150]:     2011/12/27_13:24:41 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[5150]:  2011/12/27_13:24:41 received ip-request-resp IPaddr::192.168.91.130/24/eth0 OK yes
ResourceManager[5171]:  2011/12/27_13:24:41 info: Acquiring resource group: server201 IPaddr::192.168.91.130/24/eth0
IPaddr[5198]:   2011/12/27_13:24:41 INFO:  Resource is stopped
ResourceManager[5171]:  2011/12/27_13:24:41 info: Running /etc/ha.d/resource.d/IPaddr 192.168.91.130/24/eth0 start
IPaddr[5296]:   2011/12/27_13:24:41 INFO: Using calculated netmask for 192.168.91.130: 255.255.255.0
IPaddr[5296]:   2011/12/27_13:24:41 INFO: eval ifconfig eth0:0 192.168.91.130 netmask 255.255.255.0 broadcast 192.168.91.255
IPaddr[5267]:   2011/12/27_13:24:41 INFO:  Success
HA02:
heartbeat[3825]: 2011/12/27_13:23:40 WARN: node server201: is dead
heartbeat[3825]: 2011/12/27_13:23:40 WARN: No STONITH device configured.
heartbeat[3825]: 2011/12/27_13:23:40 WARN: Shared disks are not protected.
heartbeat[3825]: 2011/12/27_13:23:40 info: Resources being acquired from server201.
heartbeat[3825]: 2011/12/27_13:23:40 info: Link server201:eth1 dead.
harc[5066]:     2011/12/27_13:23:40 info: Running /etc/ha.d/rc.d/status status
heartbeat[5067]: 2011/12/27_13:23:40 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys server202] to acquire.
mach_down[5095]:        2011/12/27_13:23:40 info: Taking over resource group IPaddr::192.168.91.130/24/eth0
ResourceManager[5121]:  2011/12/27_13:23:40 info: Acquiring resource group: server201 IPaddr::192.168.91.130/24/eth0
IPaddr[5148]:   2011/12/27_13:23:41 INFO:  Resource is stopped
ResourceManager[5121]:  2011/12/27_13:23:41 info: Running /etc/ha.d/resource.d/IPaddr 192.168.91.130/24/eth0 start
IPaddr[5246]:   2011/12/27_13:23:41 INFO: Using calculated netmask for 192.168.91.130: 255.255.255.0
IPaddr[5246]:   2011/12/27_13:23:41 INFO: eval ifconfig eth0:0 192.168.91.130 netmask 255.255.255.0 broadcast 192.168.91.255
IPaddr[5217]:   2011/12/27_13:23:41 INFO:  Success
mach_down[5095]:        2011/12/27_13:23:41 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat[3825]: 2011/12/27_13:23:41 info: mach_down takeover complete.
mach_down[5095]:        2011/12/27_13:23:41 info: mach_down takeover complete for node server201.
heartbeat[3825]: 2011/12/27_13:24:04 CRIT: Cluster node server201 returning after partition.
heartbeat[3825]: 2011/12/27_13:24:04 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
heartbeat[3825]: 2011/12/27_13:24:04 WARN: Deadtime value may be too small.
heartbeat[3825]: 2011/12/27_13:24:04 info: See FAQ for information on tuning deadtime.
heartbeat[3825]: 2011/12/27_13:24:04 info: URL: http://linux-ha.org/FAQ#heavy_load
heartbeat[3825]: 2011/12/27_13:24:04 info: Link server201:eth1 up.
heartbeat[3825]: 2011/12/27_13:24:04 WARN: Late heartbeat: Node server201: interval 25440 ms
heartbeat[3825]: 2011/12/27_13:24:04 info: Status update for node server201: status active
harc[5354]:     2011/12/27_13:24:04 info: Running /etc/ha.d/rc.d/status status
heartbeat[3825]: 2011/12/27_13:24:07 info: Heartbeat shutdown in progress. (3825)
heartbeat[5370]: 2011/12/27_13:24:07 info: Giving up all HA resources.
ResourceManager[5383]:  2011/12/27_13:24:07 info: Releasing resource group: server201 IPaddr::192.168.91.130/24/eth0
ResourceManager[5383]:  2011/12/27_13:24:07 info: Running /etc/ha.d/resource.d/IPaddr 192.168.91.130/24/eth0 stop
IPaddr[5450]:   2011/12/27_13:24:07 INFO: ifconfig eth0:0 down
IPaddr[5421]:   2011/12/27_13:24:07 INFO:  Success
heartbeat[5370]: 2011/12/27_13:24:07 info: All HA resources relinquished.
heartbeat[3825]: 2011/12/27_13:24:08 info: Received shutdown notice from 'server201'.
heartbeat[3825]: 2011/12/27_13:24:08 info: Resource takeover cancelled - shutdown in progress.
heartbeat[3825]: 2011/12/27_13:24:10 info: killing HBFIFO process 3844 with signal 15
heartbeat[3825]: 2011/12/27_13:24:10 info: killing HBWRITE process 3845 with signal 15
heartbeat[3825]: 2011/12/27_13:24:10 info: killing HBREAD process 3846 with signal 15
heartbeat[3825]: 2011/12/27_13:24:10 info: Core process 3846 exited. 3 remaining
heartbeat[3825]: 2011/12/27_13:24:10 info: Core process 3845 exited. 2 remaining
heartbeat[3825]: 2011/12/27_13:24:10 info: Core process 3844 exited. 1 remaining
heartbeat[3825]: 2011/12/27_13:24:10 info: server202 Heartbeat shutdown complete.
heartbeat[3825]: 2011/12/27_13:24:10 info: Heartbeat restart triggered.
heartbeat[3825]: 2011/12/27_13:24:10 info: Restarting heartbeat.
heartbeat[3825]: 2011/12/27_13:24:10 info: Performing heartbeat restart exec.
heartbeat[3825]: 2011/12/27_13:24:21 info: Version 2 support: false
heartbeat[3825]: 2011/12/27_13:24:21 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[3825]: 2011/12/27_13:24:21 info: **************************
heartbeat[3825]: 2011/12/27_13:24:21 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[5493]: 2011/12/27_13:24:21 info: heartbeat: version 2.1.3
heartbeat[5493]: 2011/12/27_13:24:21 info: Heartbeat generation: 1324959301
heartbeat[5493]: 2011/12/27_13:24:21 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[5493]: 2011/12/27_13:24:21 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
heartbeat[5493]: 2011/12/27_13:24:21 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5493]: 2011/12/27_13:24:21 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5493]: 2011/12/27_13:24:21 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5493]: 2011/12/27_13:24:22 info: Local status now set to: 'up'
heartbeat[5493]: 2011/12/27_13:24:24 info: Link server202:eth1 up.
heartbeat[5493]: 2011/12/27_13:24:24 info: Link server201:eth1 up.
heartbeat[5493]: 2011/12/27_13:24:24 info: Status update for node server201: status up
harc[5499]:     2011/12/27_13:24:24 info: Running /etc/ha.d/rc.d/status status
heartbeat[5493]: 2011/12/27_13:24:24 info: Comm_now_up(): updating status to active
heartbeat[5493]: 2011/12/27_13:24:24 info: Local status now set to: 'active'
heartbeat[5493]: 2011/12/27_13:24:24 info: Status update for node server201: status active
harc[5516]:     2011/12/27_13:24:24 info: Running /etc/ha.d/rc.d/status status
heartbeat[5493]: 2011/12/27_13:24:40 info: remote resource transition completed.
heartbeat[5493]: 2011/12/27_13:24:40 info: remote resource transition completed.
heartbeat[5493]: 2011/12/27_13:24:40 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat[5547]: 2011/12/27_13:24:41 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys server202] to acquire.
总共丢失6个包,主机down了才丢失3个包,主机恢复时丢失0-1个包,可以看到两台机都启用了集群IP地址,经测试如果客户机先ping备机IP那么它就获得备机服务,如果先ping 主机IP那么它获得主机服务,这会造成服务错乱,而使用域名则不用,始终使用主机服务.

RHEL5.4 Heartbeat安装(第一部份 安装)
http://tlinle.blog.51cto.com/251944/394195

[转载]利用lvs+keeplived实现squid双机热备
http://blog.sina.com.cn/s/blog_6e2592ea0100n9w2.html

你可能感兴趣的:(heartbeat)