9) 启动服务 监视服务
将 HA1和HA2的heartheat 服务启动
/etc/init.d/heartbeat start
监视服务:
首先在HA1上查看 messages
#cat /var/log/messages
Sep 19 15:56:37 HA1 heartbeat: [26814]: info: Version 2 support: false
Sep 19 15:56:37 HA1 heartbeat: [26814]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Sep 19 15:56:37 HA1 heartbeat: [26814]: info: **************************
Sep 19 15:56:37 HA1 heartbeat: [26814]: info: Configuration validated. Starting heartbeat 3.0.2
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: heartbeat: version 3.0.2
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: Heartbeat generation: 1284708296
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: Local status now set to: 'up'
Sep 19 15:56:41 HA1 heartbeat: [26815]: info: Link ha1:eth1 up.
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Link ha2:eth1 up.
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Status update for node ha2: status up
Sep 19 15:56:47 HA1 harc[26822]: info: Running /usr/etc/ha.d//rc.d/status status
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Comm_now_up(): updating status to active
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Local status now set to: 'active'
Sep 19 15:56:48 HA1 heartbeat: [26815]: info: Status update for node ha2: status active
Sep 19 15:56:48 HA1 harc[26842]: info: Running /usr/etc/ha.d//rc.d/status status
Sep 19 15:57:04 HA1 heartbeat: [26815]: info: remote resource transition completed.
Sep 19 15:57:04 HA1 heartbeat: [26815]: info: remote resource transition completed.
Sep 19 15:57:04 HA1 heartbeat: [26815]: info: Initial resource acquisition complete (T_RESOURCES(us))
Sep 19 15:57:04 HA1 IPaddr[26898]: INFO: Resource is stopped
Sep 19 15:57:04 HA1 heartbeat: [26862]: info: Local Resource acquisition completed.
Sep 19 15:57:04 HA1 harc[26941]: info: Running /usr/etc/ha.d//rc.d/ip-request-resp ip-request-resp
Sep 19 15:57:04 HA1 ip-request-resp[26941]: received ip-request-resp IPaddr::172.16.6.66/21/eth0 OK yes
Sep 19 15:57:04 HA1 ResourceManager[26964]: info: Acquiring resource group: ha1 IPaddr::172.16.6.66/21/eth0 test
Sep 19 15:57:05 HA1 IPaddr[26992]: INFO: Resource is stopped
Sep 19 15:57:05 HA1 ResourceManager[26964]: info: Running /etc/ha.d/resource.d/IPaddr 172.16.6.66/21/eth0 start
Sep 19 15:57:05 HA1 IPaddr[27077]: INFO: Using calculated netmask for 172.16.6.66: 255.255.248.0
Sep 19 15:57:05 HA1 IPaddr[27077]: INFO: eval ifconfig eth0:0 172.16.6.66 netmask 255.255.248.0 broadcast 172.16.7.255
Sep 19 15:57:05 HA1 IPaddr[27051]: INFO: Success
Sep 19 15:57:05 HA1 logger: /etc/ha.d/resource.d/test called with status
Sep 19 15:57:05 HA1 ResourceManager[26964]: info: Running /etc/ha.d/resource.d/test start
Sep 19 15:57:05 HA1 logger: /etc/ha.d/resource.d/test called with start
可以看到HA1和HA2都启动起来了 我们的test脚本也运行了。我们的 172.16.6.66的IP也启来了。
然后再去 ha-log里面看一下
[root@HA1 ~]# cat /var/log/ha-log
Sep 19 15:56:37 HA1 heartbeat: [26814]: info: Version 2 support: false
Sep 19 15:56:37 HA1 heartbeat: [26814]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Sep 19 15:56:37 HA1 heartbeat: [26814]: info: **************************
Sep 19 15:56:37 HA1 heartbeat: [26814]: info: Configuration validated. Starting heartbeat 3.0.2
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: heartbeat: version 3.0.2
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: Heartbeat generation: 1284708296
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 19 15:56:37 HA1 heartbeat: [26815]: info: Local status now set to: 'up'
Sep 19 15:56:41 HA1 heartbeat: [26815]: info: Link ha1:eth1 up.
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Link ha2:eth1 up.
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Status update for node ha2: status up
harc[26822]: 2010/09/19_15:56:47 info: Running /usr/etc/ha.d//rc.d/status status
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Comm_now_up(): updating status to active
Sep 19 15:56:47 HA1 heartbeat: [26815]: info: Local status now set to: 'active'
Sep 19 15:56:48 HA1 heartbeat: [26815]: info: Status update for node ha2: status active
harc[26842]: 2010/09/19_15:56:48 info: Running /usr/etc/ha.d//rc.d/status status
Sep 19 15:57:04 HA1 heartbeat: [26815]: info: remote resource transition completed.
Sep 19 15:57:04 HA1 heartbeat: [26815]: info: remote resource transition completed.
Sep 19 15:57:04 HA1 heartbeat: [26815]: info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[26898]: 2010/09/19_15:57:04 INFO: Resource is stopped
Sep 19 15:57:04 HA1 heartbeat: [26862]: info: Local Resource acquisition completed.
harc[26941]: 2010/09/19_15:57:04 info: Running /usr/etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp[26941]: 2010/09/19_15:57:04 received ip-request-resp IPaddr::172.16.6.66/21/eth0 OK yes
ResourceManager[26964]: 2010/09/19_15:57:04 info: Acquiring resource group: ha1 IPaddr::172.16.6.66/21/eth0 test
IPaddr[26992]: 2010/09/19_15:57:05 INFO: Resource is stopped
ResourceManager[26964]: 2010/09/19_15:57:05 info: Running /etc/ha.d/resource.d/IPaddr 172.16.6.66/21/eth0 start
IPaddr[27077]: 2010/09/19_15:57:05 INFO: Using calculated netmask for 172.16.6.66: 255.255.248.0
IPaddr[27077]: 2010/09/19_15:57:05 INFO: eval ifconfig eth0:0 172.16.6.66 netmask 255.255.248.0 broadcast 172.16.7.255
IPaddr[27051]: 2010/09/19_15:57:05 INFO: Success
ResourceManager[26964]: 2010/09/19_15:57:05 info: Running /etc/ha.d/resource.d/test start
内容和messages里面的差不多。
HA2里面的日志
Sep 19 23:57:24 HA2 heartbeat: [14041]: info: Version 2 support: false
Sep 19 23:57:24 HA2 heartbeat: [14041]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Sep 19 23:57:24 HA2 heartbeat: [14041]: info: **************************
Sep 19 23:57:24 HA2 heartbeat: [14041]: info: Configuration validated. Starting heartbeat 3.0.2
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: heartbeat: version 3.0.2
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: Heartbeat generation: 1284893027
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 19 23:57:24 HA2 heartbeat: [14042]: info: Local status now set to: 'up'
Sep 19 23:57:26 HA2 heartbeat: [14042]: info: Link ha1:eth1 up.
Sep 19 23:57:26 HA2 heartbeat: [14042]: info: Status update for node ha1: status up
Sep 19 23:57:26 HA2 heartbeat: [14042]: info: Link ha2:eth1 up.
Sep 19 23:57:26 HA2 harc[14049]: info: Running /usr/etc/ha.d//rc.d/status status
Sep 19 23:57:26 HA2 heartbeat: [14042]: info: Comm_now_up(): updating status to active
Sep 19 23:57:26 HA2 heartbeat: [14042]: info: Local status now set to: 'active'
Sep 19 23:57:26 HA2 heartbeat: [14042]: info: Status update for node ha1: status active
Sep 19 23:57:26 HA2 harc[14067]: info: Running /usr/etc/ha.d//rc.d/status status
Sep 19 23:57:42 HA2 heartbeat: [14042]: info: local resource transition completed.
Sep 19 23:57:42 HA2 heartbeat: [14042]: info: Initial resource acquisition complete (T_RESOURCES(us))
Sep 19 23:57:42 HA2 heartbeat: [14086]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys ha2] to acquire.
Sep 19 23:57:43 HA2 heartbeat: [14042]: info: remote resource transition completed.
会看到 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys ha2] to acquire.
说明没有任何本地资源,该机器将作为备份服务器并闲置,它只监听来自主服务器的心跳直到主服务器失败为止。
# tcpdump -i eth1 -n -p udp port 694
可以查看到 eth1过来的心跳广播。如下所示:
23:06:49.576155 IP 10.0.0.1.40661 > 10.0.0.255.ha-cluster: UDP, length 174
23:06:49.734999 IP 10.0.0.2.50487 > 10.0.0.255.ha-cluster: UDP, length 174
23:06:50.324281 IP 10.0.0.1.40661 > 10.0.0.255.ha-cluster: UDP, length 167
23:06:50.324283 IP 10.0.0.1.40661 > 10.0.0.255.ha-cluster: UDP, length 174
23:06:50.486151 IP 10.0.0.2.50487 > 10.0.0.255.ha-cluster: UDP, length 174
10) 模拟故障
我们现在把主服务器的电源直接拔掉 模拟宕机 再找一台机器一直ping 172.16.6.66
我们会在ha2的ha-log里面看到如下信息
Sep 20 00:03:47 HA2 heartbeat: [14042]: WARN: node ha1: is dead
Sep 20 00:03:47 HA2 heartbeat: [14042]: WARN: No STONITH device configured.
Sep 20 00:03:47 HA2 heartbeat: [14042]: WARN: Shared disks are not protected.
Sep 20 00:03:47 HA2 heartbeat: [14042]: info: Resources being acquired from ha1.
Sep 20 00:03:47 HA2 heartbeat: [14042]: info: Link ha1:eth1 dead.
harc[14105]: 2010/09/20_00:03:48 info: Running /usr/etc/ha.d//rc.d/status status
Sep 20 00:03:48 HA2 heartbeat: [14106]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys ha2] to acquire.
mach_down[14135]: 2010/09/20_00:03:48 info: Taking over resource group IPaddr::172.16.6.66/21/eth0
ResourceManager[14162]: 2010/09/20_00:03:48 info: Acquiring resource group: ha1 IPaddr::172.16.6.66/21/eth0 test
IPaddr[14190]: 2010/09/20_00:03:48 INFO: Resource is stopped
ResourceManager[14162]: 2010/09/20_00:03:48 info: Running /etc/ha.d/resource.d/IPaddr 172.16.6.66/21/eth0 start
IPaddr[14275]: 2010/09/20_00:03:48 INFO: Using calculated netmask for 172.16.6.66: 255.255.248.0
IPaddr[14275]: 2010/09/20_00:03:48 INFO: eval ifconfig eth0:0 172.16.6.66 netmask 255.255.248.0 broadcast 172.16.7.255
IPaddr[14249]: 2010/09/20_00:03:48 INFO: Success
ResourceManager[14162]: 2010/09/20_00:03:48 info: Running /etc/ha.d/resource.d/test start
mach_down[14135]: 2010/09/20_00:03:48 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[14135]: 2010/09/20_00:03:49 info: mach_down takeover complete for node ha1.
Sep 20 00:03:49 HA2 heartbeat: [14042]: info: mach_down takeover complete.
我们会看到
node ha1: is dead
说明 ha1以经宕机
资源脚本首先用status参数调用 然后用start参数启动test 脚本。以完成故障转移。
也可以从messages 里面看到
logger: /etc/ha.d/resource.d/test called with start
说明我们的test脚本以经在HA2中运行了。
再看一下IP地址
HA2的eht0:0以经有了 是172.16.6.66
一旦完成故障转移,则备份服务器会再次监视主服务器的心跳 如果主服务器启动则会再将服务转移回主服务器。
试验成功。
如果不需要 主服务器恢复后自动获得主权限 要在ha.cf中 加入一条
auto_failback on