1、实验环境:
Node1:192.168.1.17(RHEL5.8_32bit,web server)
Node2:192.168.1.18(RHEL5.8_32bit,web server)
NFS :192.168.1.19(RHEL5.8_32bit,nfs server)
VIP:192.168.1.20(webip)
2、准备工作
<1> 配置主机名
节点名称使用/etc/hosts解析;节点名称必须跟uname -n命令的执行结果一致
Node1:
# hostname node1.ikki.com # vim /etc/sysconfig/network HOSTNAME=node1.ikki.com
Node2:
# hostname node1.ikki.com # vim /etc/sysconfig/network HOSTNAME=node2.ikki.com
<2> 配置节点ssh基于密钥方式互相通信
Node1:
# ssh-keygen -t rsa # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
Node2:
# ssh-keygen -t rsa # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
<3> 配置各节点基于主机名互相通信
Node1&Node2:
# vim /etc/hosts 192.168.1.17 node1.ikki.com node1 192.168.1.18 node2.ikki.com node2
<4> 配置各节点时间同步
Node1&Node2:
# crontab -e */5 * * * * /sbin/ntpdate 202.120.2.101 &> /dev/null
3、安装corosync和pacemaker(各个节点)
<1> 依赖的rpm包:
libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openib, perl-TimeDate, libnes
<2> 下载软件包至本地某专用目录(如/root/cluster):
# cd /root/cluster # ls cluster-glue-1.0.6-1.6.el5.i386.rpm cluster-glue-libs-1.0.6-1.6.el5.i386.rpm corosync-1.2.7-1.1.el5.i386.rpm corosynclib-1.2.7-1.1.el5.i386.rpm heartbeat-3.0.3-2.3.el5.i386.rpm heartbeat-libs-3.0.3-2.3.el5.i386.rpm libesmtp-1.0.4-5.el5.i386.rpm pacemaker-1.1.5-1.1.el5.i386.rpm pacemaker-libs-1.1.5-1.1.el5.i386.rpm resource-agents-1.0.4-1.1.el5.i386.rpm
<3> 安装本地软件包及依赖包:
# cd /root/cluster # yum -y --nogpgcheck localinstall *.rpm
4、配置corosync
Node1:
# cd /etc/corosync # cp corosync.conf.example corosync.conf # vim corosync.conf # 添加如下内容: service { ver: 0 name: pacemaker # use_mgmtd: yes } aisexec { user: root group: root } # vim corosync.conf # 修改如下内容: bindnetaddr: 192.168.1.0 # 网卡所在网络的网络地址 secauth: on # 开启认证 to_syslog: no # 关闭系统日志记录(使用单独logfile记录) threads: 2 # 设置线程数
生成节点间通信时用到的认证密钥文件:
# corosync-keygen
将corosync.conf和authkey复制至Node2:
# scp -p corosync.conf authkey node2:/etc/corosync/
分别为两个节点创建corosync生成的日志所在的目录:
# mkdir /var/log/cluster # ssh node2 'mkdir /var/log/cluster'
5、启动服务并检查
Node1:
# /etc/init.d/corosync start
查看corosync引擎是否正常启动:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log Sep 16 18:59:29 corosync [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Sep 16 18:59:29 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Sep 16 19:28:26 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:170. Sep 16 19:54:14 corosync [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Sep 16 19:54:14 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/cluster/corosync.log Sep 16 18:59:29 corosync [TOTEM ] Initializing transport (UDP/IP). Sep 16 18:59:29 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Sep 16 18:59:29 corosync [TOTEM ] The network interface [192.168.1.17] is now up. Sep 16 18:59:29 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
查看pacemaker是否正常启动:
# grep pcmk_startup /var/log/cluster/corosync.log Sep 16 18:59:29 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Sep 16 18:59:29 corosync [pcmk ] Logging: Initialized pcmk_startup Sep 16 18:59:29 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 Sep 16 18:59:29 corosync [pcmk ] info: pcmk_startup: Service: 9 Sep 16 18:59:29 corosync [pcmk ] info: pcmk_startup: Local hostname: node1.ikki.com
如以上检查正常,即可启动Node2上的corosync(启动Node2需要在Node1上远程启动,勿要在Node2节点上直接启动)
# ssh node2 -- /etc/init.d/corosync start
查看集群节点的启动状态:
# crm status ============ Last updated: Tue Sep 17 23:39:11 2013 Stack: openais Current DC: node1.ikki.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1.ikki.com node2.ikki.com ]
查看corosync启动的相关进程:
# ps auxf root 13200 0.6 0.7 86880 3952 ? Ssl 12:29 4:06 corosync root 13208 0.0 0.4 11724 2104 ? S 12:29 0:00 \_ /usr/lib/heartbeat/stonithd 101 13209 0.0 0.7 12872 3820 ? S 12:29 0:01 \_ /usr/lib/heartbeat/cib root 13210 0.0 0.4 6572 2156 ? S 12:29 0:00 \_ /usr/lib/heartbeat/lrmd 101 13211 0.0 0.3 12060 2040 ? S 12:29 0:00 \_ /usr/lib/heartbeat/attrd 101 13212 0.0 0.5 8836 2900 ? S 12:29 0:00 \_ /usr/lib/heartbeat/pengine 101 13213 0.0 0.6 12280 3112 ? S 12:29 0:02 \_ /usr/lib/heartbeat/crmd
6、配置集群禁用stonith设备
corosync默认启用了stonith,而当前实验环境并没有相应的stonith设备,因此需要禁用stonith:
# crm configure property stonith-enabled=false
查看当前的配置信息:
# crm configure show node node1.ikki.com node node2.ikki.com property $id="cib-bootstrap-options" \ dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \
7、为集群添加IP地址资源(webip):
Node1:
# crm configure primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.20
查看资源启动状态:
# crm status ============ Last updated: Tue Sep 17 23:48:10 2013 Stack: openais Current DC: node1.ikki.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1.ikki.com node2.ikki.com ] webip (ocf::heartbeat:IPaddr): Started node1.ikki.com
查看webip是否生效:
# ifconfig eth0:0 Link encap:Ethernet HWaddr 08:00:27:F1:60:13 inet addr:192.168.1.20 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
8、配置集群禁用法定票数
Node2:
停止Node1上的corosync服务:
# ssh node1 -- /etc/init.d/corosync stop
查看集群工作状态:
# crm status ============ Last updated: Tue Sep 17 23:49:41 2013 Stack: openais Current DC: node2.ikki.com - partition WITHOUT quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node2.ikki.com ] OFFLINE: [ node1.ikki.com ]
在双节点集群环境中法定票数无法起效,当Node1离线时,则webip资源无法转移至Node2,因此需要禁用quorum:
# crm configure property no-quorum-policy=ignore
再次查看集群工作状态:
# crm status ============ Last updated: Tue Sep 17 23:51:27 2013 Stack: openais Current DC: node2.ikki.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1.ikki.com node2.ikki.com ] webip (ocf::heartbeat:IPaddr): Started node2.ikki.com
启动Node1上的corosync服务:
# ssh node1 -- /etc/init.d/corosync start
为资源指定默认黏性值:
# crm configure rsc_defaults resource-stickiness=100
9、配置active/passive模型的高可用Web集群
<1> 在各节点上安装httpd服务并提供测试页面
<2> 为集群添加web服务资源(httpd):
# crm configure primitive httpd lsb:httpd
查看资源的启用状态:
# crm status ============ Last updated: Tue Sep 17 23:54:36 2013 Stack: openais Current DC: node2.ikki.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ node1.ikki.com node2.ikki.com ] webip (ocf::heartbeat:IPaddr): Started node1.ikki.com httpd (lsb:httpd): Started node2.ikki.com
<3> 配置资源约束:
# crm configure colocation httpd-with-ip INFINITY: httpd webip
<4> 配置资源顺序(资源启动顺序为webip, httpd):
# crm configure order httpd-after-ip mandatory: webip httpd
<5> 配置集群位置约束:
# crm configure location prefer-node1 httpd rule 200: #uname eq node1.ikki.com
10、搭建NFS服务器
NFS:
# mkdir -p /web/htdocs # vim /etc/exports /web/htdocs 192.168.1.0/24(ro) # exportfs -rav
11、为集群添加由nfs提供的webstore资源并配置约束
Node1:
<1> 添加webstore资源
# crm configure primitive webstore ocf:heartbeat:Filesystem params device=192.168.1.19:/web/htdocs directory=/var/www/html fstype=nfs op start timeout=60 op stop timeout=60
<2> 设置位置约束
# crm configure colocation httpd_with_webstore inf: httpd webstore
<3> 设置顺序约束
# crm configure order webstore_before_httpd mandatory: webstore httpd
<4> 设置顺序约束(使用crm交互式命令)
# crm(live)configure# edit 删除此前定义的约束order httpd_after_ip inf: webip httpd # crm(live)configure# order webstore_after_ip inf: webip webstore # crm(live)configure# verify # crm(live)configure# commit
12、集群配置总览和查看资源状态
<1> 查看集群配置
# crm configure show node node1.ikki.com \ attributes standby="off" node node2.ikki.com primitive httpd lsb:httpd \ meta target-role="Started" primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.20" \ meta target-role="Started" primitive webstore ocf:heartbeat:Filesystem \ params device="192.168.1.19:/web/htdocs" directory="/var/www/html" fstype="nfs" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ meta target-role="Started" location perfer_node1 httpd \ rule $id="perfer_node1-rule" 200: #uname eq node1.ikki.com colocation httpd_with_webip inf: httpd webip colocation httpd_with_webstore inf: httpd webstore order webstore_after_ip inf: webip webstore order webstore_before_httpd inf: webstore httpd property $id="cib-bootstrap-options" \ dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1379355508"
<2> 查看资源状态
# crm status ============ Last updated: Tue Sep 17 23:58:35 2013 Stack: openais Current DC: node2.ikki.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 3 Resources configured. ============ Online: [ node1.ikki.com node2.ikki.com ] webip (ocf::heartbeat:IPaddr): Started node1.ikki.com httpd (lsb:httpd): Started node1.ikki.com webstore (ocf::heartbeat:Filesystem): Started node1.ikki.com