rhel6-heartbeat


本文参见本人百度空间博文:http://hi.baidu.com/wangziyin/item/ffbc773008f3d89eb711dba6


heartbeat博文参见本人百度空间博文

1.环境介绍:



desk34.example.com 192.168.122.34

desk33.example.com192.168.122.33

需要的安装包:heartbeat-3.0.4-1.el6.x86_64.rpm heartbeat-libs-3.0.4-1.el6.x86_64.rpm

heartbeat-devel-3.0.4-1.el6.x86_64.rpmldirectord-3.9.2-1.2.x86_64.rpm

[root@desk34 ~]# yumlocalinstall heartbeat-3.0.4-1.el6.x86_64.rpm heartbeat-libs-3.0.4-1.el6.x86_64.rpmheartbeat-devel-3.0.4-1.el6.x86_64.rpmldirectord-3.9.2-1.2.x86_64.rpm -y

住配置目录:/etc/ha.d/

[[email protected]]# less README.config

[[email protected]]# rpm -q heartbeat -d

/usr/share/doc/heartbeat-3.0.4/AUTHORS

/usr/share/doc/heartbeat-3.0.4/COPYING

/usr/share/doc/heartbeat-3.0.4/COPYING.LGPL

/usr/share/doc/heartbeat-3.0.4/ChangeLog

/usr/share/doc/heartbeat-3.0.4/README

/usr/share/doc/heartbeat-3.0.4/apphbd.cf

/usr/share/doc/heartbeat-3.0.4/authkeys

/usr/share/doc/heartbeat-3.0.4/ha.cf

/usr/share/doc/heartbeat-3.0.4/haresources

[[email protected]]# cd /usr/share/doc/heartbeat-3.0.4/

[[email protected]]# cp authkeys ha.cf haresources /etc/ha.d/


[[email protected]]# vim authkeys #heartbeat的认证文件

auth3

#1crc

#2sha1 HI!

3md5 Hello!

[[email protected]]# chmod 600 authkeys #认证文件的权限必须为600


[[email protected]]# vim ha.cf #主配置文件

[[email protected]]# grep ^# ha.cf -v | grep " "

debugfile/var/log/ha-debug #调试日志文件

logfile/var/log/ha-log #系统运行日志文件

logfacility local0 #日志等级

keepalive2 #心跳频率。单位为秒,2秒,换其他单位需将单位带上如:200ms

deadtime10 #节点死亡时间阀值,从节点在10秒之后还未收到心跳,就认为主节点死亡

warntime10 #发出警告时间

initdead120 #守护进程首次启动后应该等待120秒后在启动主服务器上的资源

udpport 694 #心跳信息传递为udp端口

bcasteth0#采用udp广播方式


#baud19200 #窗口波特率,与serial一起使用

#serial/dev/ttyS0 #采用串口传递心跳信息

#ucast eth0 10.0.0.3 #采用eth0的udp单薄来通知心跳

#mcasteth0 225.0.0.1 694 1 0 #采用udp多波通知心跳

#ucasteth0 192.168.1.2 #udp单播方式,ip为从节点的ip从节点上设置为主节点的ip

auto_failbackon #当主节点恢复后,是否自动切回


#stonithbaytech /etc/ha.d/conf/stonith.baytech ##stonith 用来保证共享存储环境中的数据完整性

watchdog/dev/watchdog # watchdog 能让系统在出现故障1 分钟后重启该机器,这个功能可以帮助服务器在确实停止心

跳后能够重新恢复心跳。如果使用该特性,修改系统中/etc/modprobe.conf,添加如下行

optionssoftdog nowayout=0

这样在系统启动的时候,在内核中装入"softdog"内核模块,用来生成实际的设备文件

/dev/watchdog


也可以将:modprobe softdog 写入到rd.local文件中,开机自动加载

node desk34.example.com #主节点,一定要写在上面

node desk33.example.com #从节点

ping192.168.122.1 #本网络的网关,以确定网络的联通性

respawnhacluster /usr/lib64/heartbeat/ipfail

apiauthipfail gid=haclient uid=hacluster


##默认heartbeat 并不检测除本身之外的其他任何服务,也不检测网络状况。所以当网络中断时,并不会进行Load Balancer 和 Backup之间的切换。 可以通过 ipfail插件,设置'pingnodes'来解决这一问题,但不能使用一个集群节点作为ping 的节点。


[[email protected]]# vim haresources #heartbeat资源文件

desk34.example.comIPaddr::192.168.122.122/24/eth0 httpd #这个文件中定义了实现集群所需的各个软件的启动脚本,这些脚本必须放在/etc/init.d或者 /etc/ha.d/resource.d 目录里 IPaddr 的作用是启动虚拟ip192.168.122.122 httpd为启动的服务名称。


desk33节点上安装desk34上的安装的安装包,做相同的配置:

[[email protected]]# scp authkeys haresources ha.cf desk33:/etc/ha.d/

[[email protected]]# modprobe softdog


启动heartbeat


##注意一定要先启动主节点:

[[email protected]]# tail -f /var/log/ha-log

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: glib: UDPBroadcast heartbeat closed on port 694 interface eth0 - Status: 1

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: glib: pingheartbeat started.

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info:G_main_add_TriggerHandler: Added signal manual handler

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info:G_main_add_TriggerHandler: Added signal manual handler

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: notice: Using watchdogdevice: /dev/watchdog

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info:G_main_add_SignalHandler: Added signal handler for signal 17

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: Local status nowset to: 'up'

Sep12 03:15:13 desk34.example.com heartbeat: [3217]: info: Link192.168.122.1:192.168.122.1 up.

Sep12 03:15:13 desk34.example.com heartbeat: [3217]: info: Status updatefor node 192.168.122.1: status ping

Sep12 03:15:13 desk34.example.com heartbeat: [3217]: info: Linkdesk34.example.com:eth0 up.

主节点启动的日志文件,其一直在等待从节点的启动;

Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: Linkdesk34.example.com:eth0 up.

Sep 1203:16:45 desk34.example.com heartbeat: [3217]: info: Linkdesk33.example.com:eth0 up.

Sep 1203:16:45 desk34.example.com heartbeat: [3217]: info: Status updatefor node desk33.example.com: status up

.........................................................

IPaddr(IPaddr_192.168.122.122)[3483]:2013/09/12_03:16:56INFO: eval ifconfig eth0:0 192.168.122.122 netmask 255.255.255.0broadcast 192.168.122.255

/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.122.122)[3457]:2013/09/12_03:16:56INFO: Success

ResourceManager(default)[3370]:2013/09/12_03:16:56info: Running /etc/init.d/httpd start #服务器动


测试:

1.主节点上会出现虚拟ip:192.168.122.122

[[email protected]]# ifconfig


eth0:0 Link encap:Ethernet HWaddr 52:54:00:4E:C6:2F

inet addr:192.168.122.122 Bcast:192.168.122.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1


http://192.168.122.122

2.heartbeat关闭

关闭desk34上的heartbeat:

[[email protected]]# /etc/init.d/heartbeat stop

StoppingHigh-Availability services: Done.

虚拟ip浮动到desk33上:


[[email protected]]# ifconfig

eth0:0 Link encap:Ethernet HWaddr 52:54:00:D0:FE:21

inet addr:192.168.122.122 Bcast:192.168.122.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

当主节点的heartbeat启动后,资源就又会重新回切到主节点


3.关闭资源:

desk34上关闭httpd服务,会发现heartbeat并没有切换到从节点。证明heartbeat本身对资源没有健康检查


4,断网模拟

断掉desk34的网卡:

[[email protected]]# ifconfig eth0 down

资源切换到了desk33上


5.主节点内核崩溃:

[[email protected]]# echo c > /proc/sysrq-trigger

资源有desk33接管:


西安石油大学计算机学院

王兹银

[email protected]