rhel6-heartbeat
本文参见本人百度空间博文:http://hi.baidu.com/wangziyin/item/ffbc773008f3d89eb711dba6
heartbeat博文参见本人百度空间博文
1.环境介绍:
desk34.example.com 192.168.122.34
desk33.example.com192.168.122.33
需要的安装包:heartbeat-3.0.4-1.el6.x86_64.rpm heartbeat-libs-3.0.4-1.el6.x86_64.rpm
heartbeat-devel-3.0.4-1.el6.x86_64.rpmldirectord-3.9.2-1.2.x86_64.rpm
[root@desk34 ~]# yumlocalinstall heartbeat-3.0.4-1.el6.x86_64.rpm heartbeat-libs-3.0.4-1.el6.x86_64.rpmheartbeat-devel-3.0.4-1.el6.x86_64.rpmldirectord-3.9.2-1.2.x86_64.rpm -y
住配置目录:/etc/ha.d/
[[email protected]]# less README.config
[[email protected]]# rpm -q heartbeat -d
/usr/share/doc/heartbeat-3.0.4/AUTHORS
/usr/share/doc/heartbeat-3.0.4/COPYING
/usr/share/doc/heartbeat-3.0.4/COPYING.LGPL
/usr/share/doc/heartbeat-3.0.4/ChangeLog
/usr/share/doc/heartbeat-3.0.4/README
/usr/share/doc/heartbeat-3.0.4/apphbd.cf
/usr/share/doc/heartbeat-3.0.4/authkeys
/usr/share/doc/heartbeat-3.0.4/ha.cf
/usr/share/doc/heartbeat-3.0.4/haresources
[[email protected]]# cd /usr/share/doc/heartbeat-3.0.4/
[[email protected]]# cp authkeys ha.cf haresources /etc/ha.d/
[[email protected]]# vim authkeys #heartbeat的认证文件
auth3
#1crc
#2sha1 HI!
3md5 Hello!
[[email protected]]# chmod 600 authkeys #认证文件的权限必须为600
[[email protected]]# vim ha.cf #主配置文件
[[email protected]]# grep ^# ha.cf -v | grep " "
debugfile/var/log/ha-debug #调试日志文件
logfile/var/log/ha-log #系统运行日志文件
logfacility local0 #日志等级
keepalive2 #心跳频率。单位为秒,2秒,换其他单位需将单位带上如:200ms
deadtime10 #节点死亡时间阀值,从节点在10秒之后还未收到心跳,就认为主节点死亡
warntime10 #发出警告时间
initdead120 #守护进程首次启动后应该等待120秒后在启动主服务器上的资源
udpport 694 #心跳信息传递为udp端口
bcasteth0#采用udp广播方式
#baud19200 #窗口波特率,与serial一起使用
#serial/dev/ttyS0 #采用串口传递心跳信息
#ucast eth0 10.0.0.3 #采用eth0的udp单薄来通知心跳
#mcasteth0 225.0.0.1 694 1 0 #采用udp多波通知心跳
#ucasteth0 192.168.1.2 #udp单播方式,ip为从节点的ip从节点上设置为主节点的ip
auto_failbackon #当主节点恢复后,是否自动切回
#stonithbaytech /etc/ha.d/conf/stonith.baytech ##stonith 用来保证共享存储环境中的数据完整性
watchdog/dev/watchdog # watchdog 能让系统在出现故障1 分钟后重启该机器,这个功能可以帮助服务器在确实停止心
跳后能够重新恢复心跳。如果使用该特性,修改系统中/etc/modprobe.conf,添加如下行
optionssoftdog nowayout=0
这样在系统启动的时候,在内核中装入"softdog"内核模块,用来生成实际的设备文件
/dev/watchdog
也可以将:modprobe softdog 写入到rd.local文件中,开机自动加载
node desk34.example.com #主节点,一定要写在上面
node desk33.example.com #从节点
ping192.168.122.1 #本网络的网关,以确定网络的联通性
respawnhacluster /usr/lib64/heartbeat/ipfail
apiauthipfail gid=haclient uid=hacluster
##默认heartbeat 并不检测除本身之外的其他任何服务,也不检测网络状况。所以当网络中断时,并不会进行Load Balancer 和 Backup之间的切换。 可以通过 ipfail插件,设置'pingnodes'来解决这一问题,但不能使用一个集群节点作为ping 的节点。
[[email protected]]# vim haresources #heartbeat资源文件
desk34.example.comIPaddr::192.168.122.122/24/eth0 httpd #这个文件中定义了实现集群所需的各个软件的启动脚本,这些脚本必须放在/etc/init.d或者 /etc/ha.d/resource.d 目录里 IPaddr 的作用是启动虚拟ip192.168.122.122 httpd为启动的服务名称。
desk33节点上安装desk34上的安装的安装包,做相同的配置:
[[email protected]]# scp authkeys haresources ha.cf desk33:/etc/ha.d/
[[email protected]]# modprobe softdog
启动heartbeat
##注意一定要先启动主节点:
[[email protected]]# tail -f /var/log/ha-log
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: glib: UDPBroadcast heartbeat closed on port 694 interface eth0 - Status: 1
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: glib: pingheartbeat started.
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info:G_main_add_TriggerHandler: Added signal manual handler
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info:G_main_add_TriggerHandler: Added signal manual handler
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: notice: Using watchdogdevice: /dev/watchdog
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info:G_main_add_SignalHandler: Added signal handler for signal 17
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: Local status nowset to: 'up'
Sep12 03:15:13 desk34.example.com heartbeat: [3217]: info: Link192.168.122.1:192.168.122.1 up.
Sep12 03:15:13 desk34.example.com heartbeat: [3217]: info: Status updatefor node 192.168.122.1: status ping
Sep12 03:15:13 desk34.example.com heartbeat: [3217]: info: Linkdesk34.example.com:eth0 up.
主节点启动的日志文件,其一直在等待从节点的启动;
Sep 1203:15:13 desk34.example.com heartbeat: [3217]: info: Linkdesk34.example.com:eth0 up.
Sep 1203:16:45 desk34.example.com heartbeat: [3217]: info: Linkdesk33.example.com:eth0 up.
Sep 1203:16:45 desk34.example.com heartbeat: [3217]: info: Status updatefor node desk33.example.com: status up
.........................................................
IPaddr(IPaddr_192.168.122.122)[3483]:2013/09/12_03:16:56INFO: eval ifconfig eth0:0 192.168.122.122 netmask 255.255.255.0broadcast 192.168.122.255
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.122.122)[3457]:2013/09/12_03:16:56INFO: Success
ResourceManager(default)[3370]:2013/09/12_03:16:56info: Running /etc/init.d/httpd start #服务器动
测试:
1.主节点上会出现虚拟ip:192.168.122.122
[[email protected]]# ifconfig
eth0:0 Link encap:Ethernet HWaddr 52:54:00:4E:C6:2F
inet addr:192.168.122.122 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
http://192.168.122.122
2.heartbeat关闭
关闭desk34上的heartbeat:
[[email protected]]# /etc/init.d/heartbeat stop
StoppingHigh-Availability services: Done.
虚拟ip浮动到desk33上:
[[email protected]]# ifconfig
eth0:0 Link encap:Ethernet HWaddr 52:54:00:D0:FE:21
inet addr:192.168.122.122 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
当主节点的heartbeat启动后,资源就又会重新回切到主节点
3.关闭资源:
desk34上关闭httpd服务,会发现heartbeat并没有切换到从节点。证明heartbeat本身对资源没有健康检查
4,断网模拟
断掉desk34的网卡:
[[email protected]]# ifconfig eth0 down
资源切换到了desk33上
5.主节点内核崩溃:
[[email protected]]# echo c > /proc/sysrq-trigger
资源有desk33接管:
西安石油大学计算机学院
王兹银