heartbeat至今一共有3个版本,v1,v2,v3;
v1比较古老,我们常用的是v2的,
v2版本的heartbeat不仅有messaging layers,还具有crm功能。且crm功能有2个,分别为haresoures(兼容v1的haresources),与crm;
v3即发展成3个项目,分别为heartbeat、pacemaker、以及cluster-glue。
hearbeat最核心的功能无非是心跳检测,以及资源接管。
心跳检测在heartbeat messaging layel层完成,资源接管也是heartbeat自己完成的。而人工需要做的就是告诉heartbeat,资源的类型,即定义资源。
一个普通的web服务的高可用性集群,有三种资源:
vip:对外提供统一的访问地址
httpd:web 服务,ha中的每个节点必须有
共享存储:保证用户访问web,内容相同。
而资源与资源之间也有一定的先后顺序,如若共享存储,httpd服务无法启动;
理清楚了思路,准备下试验环境,
ha node: 192.168.1.106 192.168.1.107(messaging layel)
vip:192.168.1.22
共享存储:192.168.1.170 nfs服务器
一、搭建NFS服务器,提供共享存储
注:在实验环境中,完全可以使用本地存储,测试试验效果。
搭建共享存储nfs,用于网页index.html
[root@one ~]# mkdir -pv /web/htmldoc mkdir: created directory `/web [root@one ~]# echo >> /web/htmldoc/index.html [root@one ~]# echo >> /etc/exports [root@one ~]# service nfs restart Shutting down NFS daemon: [FAILED] Shutting down NFS mountd: [FAILED] Shutting down NFS quotas: [FAILED] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Stopping RPC idmapd: [ OK ] Starting RPC idmapd: [ OK ] Starting NFS daemon: [ OK ] [root@one ~]# showmount -e 192.168.1.170 Export list 192.168.1.170: /web/htmldoc 192.168.1.0/255.255.255.0
二、配置ha 节点互信,无密码访问以及时间同步
既然是cluster,cluster之间节点之间如何管理?利用ssh执行任务。
时间同步,可利用ntp
三、配置节点主机名解析,
可使用dns,也可使用/etc/hosts,建议使用/etc/hosts,如dns故障,将导致集群损坏。
四、安装httpd服务
五、安装heartbeat
可从epel上下载,下载地址:
http://dl.fedoraproject.org/pub/epel/5/x86_64/repoview/letter_h.group.html
heartbeat - Heartbeat subsystem for High-Availability Linux
heartbeat-devel - Heartbeat development package
heartbeat-gui - Provides a gui interface to manage heartbeat clusters
heartbeat-ldirectord - Monitor daemon for maintaining high availability resources 为ipvs高可用提供规则自动生成及后端realserver健康状态检查的组件;
heartbeat-pils - Provides a general plugin and interface loading library
heartbeat-stonith - Provides an interface to Shoot The Other Node In The Head
heartbeat-devel 不需要安装,heartbeat-ldirectord现在用不上,把剩下的安装上,即可。
[root@node1 ~]# rpm -ivh heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-gui-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.4-11.el5.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm warning: heartbeat-2.1.4-11.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 217521f6 error: Failed dependencies: libltdl.so.3()(64bit) needed by heartbeat-2.1.4-11.el5.x86_64 libnet.so.1()(64bit) needed by heartbeat-2.1.4-11.el5.x86_64 libltdl.so.3()(64bit) needed by heartbeat-gui-2.1.4-11.el5.x86_64 libltdl.so.3()(64bit) needed by heartbeat-pils-2.1.4-11.el5.x86_64 libltdl.so.3()(64bit) needed by heartbeat-stonith-2.1.4-11.el5.x86_64 libopenhpi.so.2()(64bit) needed by heartbeat-stonith-2.1.4-11.el5.x86_64 系统缺少依赖包,是用yum安装,且libnet,yum源中没有,需要自行在epel上下载。 使用yum安装 [root@node1 ~]# yum --nogpgcheck localinstall heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-gui-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.4-11.el5.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm libnet-1.1.6-7.el5.x86_64.rpm
[root@node1 ~]# ssh node2 yum --nogpgcheck localinstall heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-gui-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.4-11.el5.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm  libnet-1.1.6-7.el5.x86_64.rpm
六、heartbeat配置
heartbeat服务主目录 /etc/ha.d/,rpm安装完成并没有配置文件,可以示例文件中copy
[root@node1 ~]# ll /etc/ha.d/ total 24 -rwxr-xr-x 1 root root 745 Mar 21 2010 harc drwxr-xr-x 2 root root 4096 Jun 10 05:48 rc.d -rw-r--r-- 1 root root 692 Mar 21 2010 README.config drwxr-xr-x 2 root root 4096 Jun 10 05:48 resource.d -rw-r--r-- 1 root root 7864 Mar 21 2010 shellfuncs [root@node1 ~]# cp -p /usr/share/doc/heartbeat-2.1.4/ apphbd.cf COPYING.LGPL GettingStarted.txt hb_report.html README startstop authkeys DirectoryMap.txt ha.cf hb_report.txt Requirements.html AUTHORS faqntips.html HardwareGuide.html heartbeat_api.html Requirements.txt ChangeLog faqntips.txt HardwareGuide.txt heartbeat_api.txt rsync.html COPYING GettingStarted.html haresources logd.cf rsync.txt [root@node1 ~]# cp -p /usr/share/doc/heartbeat-2.1.4/{authkeys,ha.cf,haresources} /etc/ha.d/
3个主配置文件详解:
authkeys 密钥文件,权限600
authkeys文件用于设定Heartbeat的认证方式,共有3种可用的认证方式,即 crc、md5和sha1。3种认证方式的安全性依次提高,但是占用的系统资源也依次增加。如果Heartbeat集群运行在安全的网络上,可以使用 crc方式;如果HA每个节点的硬件配置很高,建议使用sha1,这种认证方式安全级别最高;如果是处于网络安全和系统资源之间,可以使用md5认证方 式。
[root@node1 ~]# cat /etc/ha.d/authkeys auth 3 #1 crc #2 sha1 HI! 3 md5 Hello!
ha.cf heartbeat主配置文件
ha.cf主要定义集群的配置,如集群节点,心跳监控等
[root@node1 ~]# cat /etc/ha.d/ha.cf #debugfile /var/log/ha-debug 用于记录heartbeat的调试信息 #logfile /var/log/ha-log 指明heartbeat的日志位置 #如果没有定义上述二个日志文件,heartbeat将把日志信息送往logfacility local0对应的/var/log/messages,若三个参数都没有定义,那默认情况下,将在/var/log/下建立ha-debug,ha-log记录日志 logfacility local0 #keepalive 2 发送心跳报文的间隔,默认单位为秒,如果你毫秒为单位,那么需要在后面跟ms单位,如1500ms即代表1.5s #deadtime 30 用于指定集群节点,认为对方宕机的间隔 #warntime 10 指定心跳延迟的时间为10秒。当10秒钟内备份节点不能接收到主节点的心跳信号时,就会往日志中写入一个警告日志,但此时不会切换服务。 #initdead 120 在某些系统上,系统启动或重启之后需要经过一段时间网络才 能正常工作,该选项用于解决这种情况产生的时间间隔。取值至少为deadtime的两倍。 #udpport 694 设置广播/多播/组播通信使用的端口,694为默认使用的端口号 #baud 19200 设置若使用串口作为心跳时的速率 #serial /dev/ttyS0 # Linux 设置作为心跳的串口设备 #serial /dev/cuaa0 # FreeBSD 设置作为心跳的串口设备 #serial /dev/cuad0 # FreeBSD 6.x 设置作为心跳的串口设备 #serial /dev/cua/a # Solaris 设置作为心跳的串口设备 #bcast eth0 # Linux 广播网卡 #bcast eth1 eth2 # Linux #bcast le0 # Solaris #bcast le1 le2 # Solaris #mcast eth0 225.0.0.1 694 1 0 多播地址以及网卡等信息 #ucast eth0 192.168.1.2 组播网卡即地址 auto_failback on 定义是否failback #stonith_host * baytech 10.0.0.3 mylogin mysecretpassword 定义stonith,stonith的主要作用是使出现问题的节点从集群环境中脱离,进而释放集群资源,避免两个节点争用一个资源的情形发生。保证共享数据的安全性和完整性。 #stonith_host ken3 rps10 /dev/ttyS1 kathy 0 #stonith_host kathy rps10 /dev/ttyS1 ken3 0 #watchdog /dev/watchdog 俗称看门狗,若节点一分钟内没有心跳,将自我重新启动。使用该特性,需要在内核中载入内核模块,用来生成实际的设备文件, 如果系统中没有这个内核模块,就需要指定此模块,重新编译内核。 #编译完成输入 加载该模块。然后输入(应为10), 输入(应为130)。最后,生成设备文件: 。即可使用此功能。 #node ken3 设置集群中的节点,注意:节点名必须与uname �Cn相匹配 node1 192.168.1.106 node2 192.168.1.107 #ping 10.10.10.254 ping指令以及下面的ping_group指令是用于建立伪集群成员,它们必须与下述#的ipfail指令一起使用,它们的作用是监测物理链路,也就是说如果集群节点与上述伪设备不相通,那么该节点也将无权接管资源或服务,它将释放掉资源。 #ping_group group1 10.10.10.254 10.10.10.253 #hbaping fc-card-name # # # Processes started and stopped with heartbeat. Restarted unless # they exit with rc=100 # #respawn userid /path/name/to/run #respawn hacluster /usr/lib/heartbeat/ipfail 该选项是可选配置,列出与 heartbeat一起启动和关闭的进程,该进程一般是和heartbeat集成的插件,这些进程 遇到故障可以自动重新启动。 #最常用的进程是ipfail,此进程用于检测和处理网络故障, 需要配合ping语句指定的ping node来检测网络的连通性。其中hacluster表示启动ipfail进程的身份。 #apiauth client-name gid=gidlist uid=uidlist 设置你所指定的启动进程的权限 #apiauth ipfail gid=haclient uid=hacluster 设置你所指定的启动进程的权限 #hopfudge 1 #deadping 30 #hbgenmethod time #realtime off #debug 1 #apiauth ipfail uid=hacluster #apiauth ccm uid=hacluster #apiauth cms uid=hacluster #apiauth ping gid=haclient uid=alanr,root #apiauth gid=haclient #msgfmt classic/netstring # use_logd yes/no #conn_logd_time 60 #compression bz2 #compression_threshold 2
修改如下:
[root@node2 ~]# cat /etc/ha.d/ha.cf | grep -v ^# | grep -v ^$ logfacility local0 keepalive 1 udpport 694 bcast eth0 # Linux auto_failback on node node1 node node2 ping 192.168.1.253
haresources
该文件主要是为集群配置资源或者服务,(资源定义 +resource agent)
格式: nodename resource1::参数1::参数2::参数N resource2::参数1::参数2::参数 resourceN
nodename是集群中某一节点的名称(主节点)
resource每一个资源都是一个shell脚本,它们的搜索路径为/etc/init.d/(基于lsb,不允许传递参数,只提供stop|start|restart|status)
/etc/ha.d/resource.d   有/usr/lib64/heartbeat   基于ocf,开放式的集群框架,允许传递很多参数)
如:默认集群的vip,会配置在于vip同网段的网卡的别名上,主要是依靠于/usr/lib64/heartbeat/findif
定义资源如下:
node1 IPaddr::192.168.1.22/24/eth0 Filesystem::192.168.1.170:/web/htmldoc::/var/www/html::nfs httpd
启动heartbeart
[root@node1 ~]# service heartbeat start Starting High-Availability services: 2014/06/10_07:58:08 INFO: Resource stopped [ OK ] [root@node1 ~]# ssh node2 service heartbeat start Starting High-Availability services: 2014/06/10_07:58:11 INFO: Resource stopped [ OK ]
查看资源情况
[root@node1 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:0C:29:29:C5:5D inet addr:192.168.1.106 Bcast:255.255.255.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe29:c55d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:23218 errors:0 dropped:0 overruns:0 frame:0 TX packets:23367 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5099502 (4.8 MiB) TX bytes:6811677 (6.4 MiB) eth0:0 Link encap:Ethernet HWaddr 00:0C:29:29:C5:5D inet addr:192.168.1.22 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1674 errors:0 dropped:0 overruns:0 frame:0 TX packets:1674 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4341233 (4.1 MiB) TX bytes:4341233 (4.1 MiB) sit0 Link encap:IPv6--IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) [root@node1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16G 2.6G 13G 18% / /dev/sda1 99M 13M 82M 14% /boot tmpfs 501M 0 501M 0% /dev/shm /dev/scd0 3.5G 3.5G 0 100% /mnt 192.168.1.170:/web/htmldoc 16G 2.1G 13G 14% /var/www/html [root@node1 ~]# netstat -tnpl | grep :80 tcp 0 0 :::80 :::* LISTEN 29856/httpd
访问测试:
测试ha功能,heartbeat提供了测试脚本:
[root@node1 ~]# /usr/lib64/heartbeat/hb_standby 2014/06/10_08:00:13 Going standby [all
查看日志:
[root@node1 ~]# tail -f /var/log/messages Jun 10 07:58:49 node1 heartbeat: [29288]: info: remote resource transition completed. Jun 10 07:58:49 node1 heartbeat: [29288]: info: node1 wants to go standby [foreign] Jun 10 07:58:50 node1 heartbeat: [29288]: info: standby: node2 can take our foreign resources Jun 10 07:58:50 node1 heartbeat: [29903]: info: give up foreign HA resources (standby). Jun 10 07:58:50 node1 heartbeat: [29903]: info: foreign HA resource release completed (standby). Jun 10 07:58:50 node1 heartbeat: [29288]: info: Local standby process completed [foreign]. Jun 10 07:58:50 node1 heartbeat: [29288]: WARN: 1 lost packet(s) [node2] [76:78] Jun 10 07:58:50 node1 heartbeat: [29288]: info: remote resource transition completed. Jun 10 07:58:50 node1 heartbeat: [29288]: info: No pkts missing from node2! Jun 10 07:58:50 node1 heartbeat: [29288]: info: Other node completed standby takeover of foreign resources. Jun 10 08:00:13 node1 heartbeat: [29288]: info: node1 wants to go standby [all] Jun 10 08:00:14 node1 heartbeat: [29288]: info: standby: node2 can take our all resources Jun 10 08:00:14 node1 heartbeat: [29938]: info: give up all HA resources (standby). Jun 10 08:00:14 node1 ResourceManager[29951]: info: Releasing resource group: node1 IPaddr::192.168.1.22/24/eth0 Filesystem::192.168.1.170:/web/htmldoc::/var/www/html::nfs httpd Jun 10 08:00:14 node1 ResourceManager[29951]: info: Running /etc/init.d/httpd stop Jun 10 08:00:14 node1 ResourceManager[29951]: info: Running /etc/ha.d/resource.d/Filesystem 192.168.1.170:/web/htmldoc /var/www/html nfs stop Jun 10 08:00:14 node1 Filesystem[30025]: INFO: Running stop 192.168.1.170:/web/htmldoc on /var/www/html Jun 10 08:00:14 node1 Filesystem[30025]: INFO: Trying to unmount /var/www/html Jun 10 08:00:14 node1 Filesystem[30025]: INFO: unmounted /var/www/html successfully Jun 10 08:00:14 node1 Filesystem[30014]: INFO: Success Jun 10 08:00:14 node1 ResourceManager[29951]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.22/24/eth0 stop Jun 10 08:00:14 node1 IPaddr[30143]: INFO: ifconfig eth0:0 down Jun 10 08:00:14 node1 avahi-daemon[3313]: Withdrawing address record 192.168.1.22 on eth0. Jun 10 08:00:14 node1 IPaddr[30114]: INFO: Success Jun 10 08:00:14 node1 heartbeat: [29938]: info: all HA resource release completed (standby). Jun 10 08:00:14 node1 heartbeat: [29288]: info: Local standby process completed [all]. Jun 10 08:00:15 node1 heartbeat: [29288]: WARN: 1 lost packet(s) [node2] [164:166] Jun 10 08:00:15 node1 heartbeat: [29288]: info: remote resource transition completed. Jun 10 08:00:15 node1 heartbeat: [29288]: info: No pkts missing from node2! Jun 10 08:00:15 node1 heartbeat: [29288]: info: Other node completed standby takeover of all resources.
再次访问
查看node2上的资源
[root@node2 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:0C:29:3F:4F:0A inet addr:192.168.1.107 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe3f:4f0a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11302 errors:0 dropped:0 overruns:0 frame:0 TX packets:8589 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4326627 (4.1 MiB) TX bytes:1420870 (1.3 MiB) eth0:0 Link encap:Ethernet HWaddr 00:0C:29:3F:4F:0A inet addr:192.168.1.22 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:6126 errors:0 dropped:0 overruns:0 frame:0 TX packets:6126 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:10490665 (10.0 MiB) TX bytes:10490665 (10.0 MiB) sit0 Link encap:IPv6--IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) [root@node2 ~]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/scd0 on /mnt type iso9660 (ro) 192.168.1.170:/web/htmldoc on /var/www/html type nfs (rw,addr=192.168.1.170) [root@node2 ~]# netstat -tnpl | grep :80 tcp 0 0 :::80