一、准备工作
Heartbeat 3.0.6:
1
|
# wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/958e11be8686.tar.bz2
|
Cluster Glue 1.0.12:
1
|
# wget http://hg.linux-ha.org/glue/archive/0a7add1d9996.tar.bz2
|
Resource Agents 3.9.6:
1
|
# wget https://github.com/ClusterLabs/resource-agents/archive/v3.9.6.tar.gz
|
1
2
3
4
|
# yum install gcc gcc-c++ autoconf automake libtool glib2-devel libxml2-devel bzip2 bzip2-devel e2fsprogs-devel libxslt-devel libtool-ltdl-devel asciidoc
# groupadd haclient
# useradd -g haclient hacluster
# yum install httpd
|
二、编译Cluster Glue
1
2
3
4
5
6
|
# tar -jxvf cluster-clue-1.0.12.tar.bz2
# cd Reusable-Cluster-Components-glue--0a7add1d9996/
# ./autogen.sh
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' ##注:32位系统去掉64
# make
# make install
|
编译错误1:
1
2
3
4
5
|
Making all in libltdl
gmake[1]: 进入目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/libltdl”
gmake[1]: *** 没有规则可以创建目标“all”。 停止。
gmake[1]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/libltdl”
make: *** [all-recursive] 错误 1
|
解决:
1
|
# yum install libtool-ltdl-devel
|
编译错误2:
1
2
3
4
5
6
|
collect2: error: ld returned 1 exit status
gmake[2]: *** [ipctest] 错误 1
gmake[2]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/lib/clplumbing”
gmake[1]: *** [all-recursive] 错误 1
gmake[1]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/lib”
make: *** [all-recursive] 错误 1
|
解决:
1
|
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
|
注:如使用32位系统时,将LIBS改为LIBS='/lib/libuuid.so.1'
编译错误3:
1
2
3
4
5
6
|
gmake[2]: a2x:命令未找到
gmake[2]: *** [hb_report.8] 错误 127
gmake[2]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/doc”
gmake[1]: *** [all-recursive] 错误 1
gmake[1]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/doc”
make: *** [all-recursive] 错误 1
|
解决:
1
|
# yum install asciidoc
|
三、编译Resource Agents
1
2
3
4
5
6
|
# tar -zxvf resource-agents-3.9.6.tar.gz
# cd resource-agents-3.9.6
# ./autogen.sh
#./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
# make
# make install
|
四、编译Heartbeat
1
2
3
4
5
6
7
|
# tar -jxvf heartbeat-3.0.6.tar.bz2
# cd Heartbeat-3-0-958e11be8686/
# ./bootstrap
# export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
# make
# make install
|
1
2
3
4
5
6
7
8
|
# cp doc/{ha.cf,haresources,authkeys} /usr/local/heartbeat/etc/ha.d/
# chkconfig --add heartbeat
# chkconfig heartbeat on
# chmod 600 /usr/local/heartbeat/etc/ha.d/authkeys
# mkdir -pv /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/
# cp /usr/lib/ocf/lib/heartbeat/ocf-* /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/
|
编译错误1:
1
2
3
4
5
6
7
8
9
10
11
12
|
checking heartbeat/glue_config.h usability... no
checking heartbeat/glue_config.h presence... no
checking for heartbeat/glue_config.h... no
configure: error: in `/root/Heartbeat-3-0-958e11be8686':
configure: error: Core development headers were not found
See `config.log' for more details
checking heartbeat/glue_config.h usability... no
checking heartbeat/glue_config.h presence... no
checking for heartbeat/glue_config.h... no
configure: error: in `/root/Heartbeat-3-0-958e11be8686':
configure: error: Core development headers were not found
See `config.log' for more details
|
解决:
1
|
# export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"
|
编译错误2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
In file included from ../include/lha_internal.h:41:0,
from uuid_parse.c:25:
/usr/local/heartbeat/include/heartbeat/glue_config.h:105:0: error: "HA_HBCONF_DIR" redefined [-Werror]
#define HA_HBCONF_DIR "/usr/local/heartbeat/etc/ha.d/"
^
In file included from ../include/lha_internal.h:38:0,
from uuid_parse.c:25:
../include/config.h:390:0: note: this is the location of the previous definition
#define HA_HBCONF_DIR "/usr/local/heartbeat/etc/ha.d"
^
uuid_parse.c:36:26: fatal error: replace_uuid.h: No such file or directory
#include <replace_uuid.h>
^
cc1: all warnings being treated as errors
compilation terminated.
gmake[1]: *** [uuid_parse.lo] 错误 1
gmake[1]: 离开目录“/root/Heartbeat-3-0-958e11be8686/replace”
make: *** [all-recursive] 错误 1
|
解决:
1
|
# ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1'
|
五、Heartbeat配置
Heartbeat的配置主要涉及到ha.cf、haresources、authkeys这三个文件。其中ha.cf是主配置文件,haresource用来配置要让Heartbeat托管的服务,authkey是用来指定Heartbeat的认证方式。
1.配置ha.cf----主配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# cat /usr/local/heartbeat/etc/ha.d/ha.cf |grep ^[^#]
debugfile /var/log/ha-debug ##用于记录heartbeat的调试信息
logfile/var/log/ha-log ##用于记录heartbeat的日志信息
logfacilitylocal0 ##设置heartbeat的日志,这里用的是系统日志
keepalive 2 ##设定心跳(监测)时间时间为2秒
deadtime 30 ##指定若备用节点在30秒内未收到主节点心跳信号,则接管主服务器资源
warntime 10 ##指定心跳延迟的时间为10秒,10秒内备节点不能接收主节点心跳信号,
即往日志写入警告日志,但不会切换服务
initdead 120 ##系统启动或重启后预留的忽略时间段,取值至少为deadtime的两倍
udpport 694 ##广播/单播通讯使用的Udp端口
bcast eno16777736 # Linux ##使用网卡eno16777736发送心跳检测
#mcast eth0 225.0.0.1 694 1 0 ##采用网卡eth0的Udp多播来组织心跳,一般在备用节点
不止一台时使用。Bcast、ucast和mcast分别代表广播、单播和多播,是组织心跳的的方式,任选其一
#ucast eno16777736 192.168.10.133 ##采用网卡eno16777736的udp单播来组织心跳,后面跟的IP地址为双机对方IP地址
auto_failback on ##定义当主节点恢复后,是否将服务自动切回
#watchdog /dev/watchdog ##可选配置,通过Heartbeat监控系统运行状态。
node node1 ##主节点名称,与uname -n显示一致
node node2 ##备用节点名称
ping 192.168.10.1 ##通过ping网关检测心跳是否正常,仅用来测试网络
respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail ##指定和heartbeat一起启动、关闭的进程,可选
#apiauth ipfail gid=haclient uid=hacluster ##设置启动IPfail的用户和组
|
注:
①watchdog /dev/watchdog:可选配置,通过Heartbeat监控系统运行状态。该特性需在内核中载入"softdog"内核模块,用来生成实际的设备文件,如系统中没有该模块,需进行指定,重新编译内核。编译完成输入 "insmod softdog"加载模块,然后输入"grep misc /proc/devices",输入"cat /proc/misc |grep watchdog",最后生成设备文件:"mknod /dev/watchdog c 10 130" 即可使用
②espawn hacluster /usr/lib/heartbeat/ipfail:指定和heartbeat一起启动、关闭的进程,可选。这些进程一般是和heartbeat集成的插件,遇到故障可自动重启。IPfail进程用于检测和处理网络故障,需配合ping语句指定ping node检测网络连通性;hacluster表示启动IPfail进程的用户。
2.配置haresources-----资源文件
Haresources文件用于指定双机系统的主节点、集群IP、子网掩码、广播地址及启动服务集群资源,
文件每一行可包含一个或多个资源脚本名,资源间使用空格隔开,参数间使用两个冒号隔开,主节点
和备份节点中资源文件haresources要完全一样。
一般格式为:
node-name network <resource-group>
node-name表示主节点的主机名,必须和ha.cf文件中指定的节点名一致。network用于设定集群的
IP地址、子网掩码和网络设备标识等。resource-group用于指定需Heartbeat托管的服务(即这些
服务可由Heartbeat来启动和关闭)。
注意:这里指定的IP地址就是集群对外服务的IP地址;
如要托管这些服务,必须将服务写成可通过start/stop来启动或关闭的脚本,放到/etc/init.d/
或/etc/ha.d/resource.d/目录下,Heartbeat会根据脚本名称自动去/etc/init.d或者
/etc/ha.d/resource.d目录下找到相应脚本进行启动或关闭操作。
1
2
|
# cat /usr/local/heartbeat/etc/ha.d/haresources |grep -v "#"
node1 IPaddr::192.168.10.222/24/eno16777736
|
node1是HA集群的主节点,IPaddr为heartbeat自带的执行脚本,heartbeat首先将执行/etc/ha.d/resource.d/IPaddr 192.168.10.222/24 start的操作,即虚拟一个子网掩码为255.255.255.0,IP为192.168.10.222的地址,此IP为heartbeat对外提供服务的网络地址,同时指定此IP使用的网络接口
注:如下有haresources详细中文解释
http://blog.chinaunix.net/uid-20788470-id-1841644.html
3.配置authkeys-----心跳密钥验证文件
1
2
3
|
# grep -v "#" /usr/local/heartbeat/etc/ha.d/authkeys
auth 2
2 sha1 HI!
|
注:auth后填序号,可任意填写,但第二行开头必须为序号名,然后为验证方式,支持三种( crc md5 sha1 )方式验证,最后面是自定义密钥。
六、配置双机互信(可选)并复制文件至备机
HA-01(192.168.10.132):
1
2
|
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
|
HA-02(192.168.10.133):
1
2
|
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
|
复制配置文件至备机:
1
|
# scp /usr/local/heartbeat/etc/ha.d/* [email protected]:/usr/local/heartbeat/etc/ha.d/
|
七、测试
1
2
3
4
|
# systemctl start httpd
# /etc/init.d/heartbeat start ##开启heartbeat
# getenforce 0
# systemctl stop firewalld
|
查看log信息
1
2
3
4
5
6
|
# tail /var/log/ha-log
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Illegal directive [ucast] in /usr/local/heartbeat/etc/ha.d//ha.cf
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Illegal directive [ping] in /usr/local/heartbeat/etc/ha.d//ha.cf
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Client child command [/usr/lib/heartbeat/ipfail] is not executable
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Heartbeat not started: configuration error.
Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Configuration error, heartbeat not started.
|
问题解决:
更改IPfail路径:
1
|
respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail
|
建立plugin软链接:
1
2
|
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/
|
继续查看log信息
1
2
3
4
5
6
7
8
9
10
11
|
# tail /var/log/ha-log
Oct 26 13:11:46 node1 heartbeat: [9744]: info: remote resource transition completed.
Oct 26 13:11:46 node1 heartbeat: [9744]: info: node1 wants to go standby [foreign]
Oct 26 13:11:46 node1 heartbeat: [9744]: info: standby: node2 can take our foreign resources
Oct 26 13:11:46 node1 heartbeat: [11892]: info: give up foreign HA resources (standby).
Oct 26 13:11:46 node1 heartbeat: [11892]: info: foreign HA resource release completed (standby).
Oct 26 13:11:46 node1 heartbeat: [9744]: info: Local standby process completed [foreign].
Oct 26 13:11:47 node1 heartbeat: [9744]: WARN: 1 lost packet(s) for [node2] [11:13]
Oct 26 13:11:47 node1 heartbeat: [9744]: info: remote resource transition completed.
Oct 26 13:11:47 node1 heartbeat: [9744]: info: No pkts missing from node2!
Oct 26 13:11:47 node1 heartbeat: [9744]: info: Other node completed standby takeover of foreign resources.
|
问题解决:
1
2
|
# vi /usr/local/heartbeat/etc/ha.d/haresources
node1 IPaddr::192.168.10.222/24/eno16777736
|
注:haresources下需添加IPaddr::
问题:
1
2
3
4
5
6
7
8
9
10
11
|
# tail /var/log/ha-log
Oct 26 17:01:55 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (425 messages in queue)
Oct 26 17:01:56 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (426 messages in queue)
Oct 26 17:01:57 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (427 messages in queue)
Oct 26 17:01:57 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (428 messages in queue)
Oct 26 17:01:58 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (429 messages in queue)
Oct 26 17:01:59 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (430 messages in queue)
Oct 26 17:01:59 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (431 messages in queue)
Oct 26 17:02:00 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (432 messages in queue)
Oct 26 17:02:01 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (433 messages in queue)
Oct 26 17:02:01 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (434 messages in queue)
|
解决:node2未关闭防火墙,systemctl stop firewalld关闭防火墙问题解决
问题:
1
2
3
|
# tail /var/log/ha-log
IPaddr(IPaddr_192.168.10.222)[6854]:2015/10/26_17:20:58 ERROR: Setup problem: couldn't find command: ifconfig
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[6828]:2015/10/26_17:20:58 ERROR: Program is not installed
|
解决:yum install net-tools后即可使用ifconfig命令
重启heartbeat,继续查看log信息:
1
2
3
4
5
6
7
8
9
10
11
12
|
# systemctl restart hearbeat
# tail /var/log/ha-log
Oct 26 19:25:36 node1 heartbeat: [1783]: info: Configuration validated. Starting heartbeat 3.0.6
Oct 26 19:25:37 node1 heartbeat: [1783]: info: heartbeat: version 3.0.6
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Heartbeat generation: 1445827146
Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eno16777736
Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eno16777736 - Status: 1
Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: ping heartbeat started.
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Local status now set to: 'up'
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Link 192.168.10.1:192.168.10.1 up.
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Status update for node 192.168.10.1: status ping
Oct 26 19:25:37 node1 heartbeat: [1783]: info: Link node1:eno16777736 up.
|
使用ifconfig命令查看
浏览器输入http://localhost查看
down掉node1节点,查看会不会漂移至node2节点
node1:
1
|
# systemctl stop heartbeat
|
node2:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# tail /var/log/ha-log
mach_down(default)[1937]:2015/10/26_20:03:58 info: Taking over resource group IPaddr::192.168.10.222/24/eno16777736
ResourceManager(default)[1964]:2015/10/26_20:03:58 info: Acquiring resource group: node1 IPaddr::192.168.10.222/24/eno16777736
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[1992]:2015/10/26_20:03:58 INFO: Resource is stopped
ResourceManager(default)[1964]:2015/10/26_20:03:58 info: Running /usr/local/heartbeat/etc/ha.d//resource.d/IPaddr 192.168.10.222/24/eno16777736 start
IPaddr(IPaddr_192.168.10.222)[2083]:2015/10/26_20:03:58 INFO: Using calculated netmask for 192.168.10.222: 255.255.255.0
IPaddr(IPaddr_192.168.10.222)[2083]:2015/10/26_20:03:58 INFO: eval ifconfig eno16777736:0 192.168.10.222 netmask 255.255.255.0 broadcast 192.168.10.255
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[2057]:2015/10/26_20:03:58 INFO: Success
mach_down(default)[1937]:2015/10/26_20:03:58 info: /usr/local/heartbeat/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1937]:2015/10/26_20:03:58 info: mach_down takeover complete for node node1.
Oct 26 20:03:58 node2 heartbeat: [1711]: info: mach_down takeover complete.
mach_down(default)[1937]:2015/10/26_20:03:58 info: mach_down takeover complete for node node1.
Oct 26 20:03:58 node2 heartbeat: [1711]: info: mach_down takeover complete.
Oct 26 20:04:29 node2 heartbeat: [1711]: WARN: node node1: is dead
Oct 26 20:04:29 node2 heartbeat: [1711]: info: Dead node node1 gave up resources.
Oct 26 20:04:29 node2 heartbeat: [1711]: info: Link node1:eno16777736 dead.
Oct 26 20:04:29 node2 ipfail: [1737]: info: Status update: Node node1 now has status dead
Oct 26 20:04:29 node2 ipfail: [1737]: info: NS: We are still alive!
Oct 26 20:04:29 node2 ipfail: [1737]: info: Link Status update: Link node1/eno16777736 now has status dead
Oct 26 20:04:30 node2 ipfail: [1737]: info: Asking other side for ping node count.
Oct 26 20:04:30 node2 ipfail: [1737]: info: Checking remote count of ping nodes.
|
使用ifconfig命令查看IP是否漂移至node2:
IP已漂移至node2,使用浏览器输入http://localhost查看
OK啦!
附:heartbeat官网:
http://www.linux-ha.org/wiki/Main_Page