Corosync+Pacemaker实现web高可用集群
oreosync在传递信息的时候可以通过一个简单的配置文件来定义信息传递的方式和协议等。它是一个新兴的软件,2008年推出,但其实它并不是一个真正意义上的新软件,在2002年的时候有一个项目Openais , 它由于过大,分裂为两个子项目,其中可以实现HA心跳信息传输的功能就是Corosync ,它的代码60%左右来源于Openais. Corosync可以提供一个完整的HA功能,但是要实现更多,更复杂的功能,那就需要使用Openais了。Corosync是未来的发展方向。在以后的新项目里,一般采用Corosync,而hb_gui可以提供很好的HA管理功能,可以实现图形化的管理。另外相关的图形化有RHCS的套件luci+ricci。
节点1,IP地址:172.16.23.11 主机名node1.wl.com 主服务器
节点2,IP地址:172.16.23.12 主机名 node2.wl.com 备用服务器
172.16.23.11 node1.wl.com node1 //别名
172.16.23.12 node2.wl.com node2 //别名
date 0416185012.33 //当前时间 :格式月日时分年秒
hwclock –w //把系统时间同步到硬件
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
# yum -y --nogpgcheck localinstall *.rpm //本地yum安装,可帮助解决依赖关系
安装httpd服务 yum install httpd
#vim /corosync.conf 配置文件内容如下
totem {
mcastaddr: 226.194.1.23 //多播方式,ip在224-239网段可随意设置 mcastport: 5405
}
logging { // 子系统设置
to_logfile: yes
to_syslog: no //日志有两个,这里关闭,方便查找日志内容
logfile: /var/log/cluster/corosync.log
debug: off //如果想排错,可临时开启
timestamp: on
logger_subsys {
}
group: root
创建日志存放文件夹mkdir /var/log/cluster
# scp -p corosync.conf authkey node2:/etc/corosync/ //拷贝内容到另一主机
# /etc/init.d/corosync start
service corosync start 5560
查看corosync引擎是否正常启动: /var/log/cluster/corosync.log
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/cluster/corosync.log
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt
SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [192.168.0.5] is now up.
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
查看pacemaker是否正常启动:
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.a.org
如果上面命令执行均没有问题,接着可以执行如下命令启动node2上的corosync
# ssh node2 '/etc/init.d/corosync' start
chkconfig httpd off
注意:启动node2需要在node1上使用如上命令进行,不要在node2节点上直接启动;
使用如下命令查看集群节点的启动状态:
# crm status
Last updated: Tue Jun 14 19:07:06 2011
Stack: openais
Current DC: node1.a.org - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
Online: [ node1.a.org node2.a.org ]
从上面的信息可以看出两个节点都已经正常启动,并且集群已经牌正常工作状态。
# crm_verify -L
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources
have been defined
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the
stonith-enabled option
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to
ensure data integrity
Errors found during check: config not valid
我们里可以通过如下命令先禁用stonith:
# crm configure property stonith-enabled=false
使用如下命令查看当前的配置信息:
# crm configure show
node node1.a.org
node node2.a.org
property $id="cib-bootstrap-options" \
cluster-infrastructure="openais" \
从中可以看出stonith已经被禁用。
查看集群系统所支持的类型:
# crm ra classes
ocf / heartbeat pacemaker
说明:corosync支持heartbeat,LSB和ocf等类型的资源代理,目前较为常用的类型为LSB和OCF两类,stonith类专为配置stonith设备而用
查看某种类别下资源代理列表,方法如下
#crm ra list lsb| ocf heartbeat| ocf pacemaker | stonith //crm ra list 类别
例子:查看资源代理帮助信息
# crm ra info ocf:heartbeat:IPaddr
# crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=172.16.23.1 //打开网页时的ip地址
primitive 基本类型资源
通过如下的命令执行结果可以看出此资源已经在node1.a.org上启动:
# crm status
当然,也可以在node1上执行ifconfig命令看到此地址已经在eth0的别名上生效:
# ifconfig
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
而后我们到node2上通过如下命令停止node1上的corosync服务:
# ssh node1 "/etc/init.d/corosync" stop
查看集群工作状态:
# crm status
Last updated: Tue Jun 14 19:37:23 2011
Stack: openais
Current DC: node2.a.org - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
Online: [ node2.a.org ]
OFFLINE: [ node1.a.org ]
上面的信息显示node1.a.org已经离线,但资源WebIP却没能在node2.a.org上启动。这是因为此时的集群状态为"WITHOUT quorum",
即已经失去了quorum,此时集群服务本身已经不满足正常运行的条件,这对于只有两节点的集群来讲是不合理的。因此,我们可以通
过如下的命令来修改忽略quorum不能满足的集群状态检查:
# crm configure property no-quorum-policy=ignore
片刻之后,集群就会在目前仍在运行中的节点node2上启动此资源了,如下所示:
# crm status
Last updated: Tue Jun 14 19:43:42 2011
Stack: openais
Current DC: node2.a.org - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
Online: [ node2.a.org ]
OFFLINE: [ node1.a.org ]
好了,验正完成后,我们正常启动node1.a.org:
# ssh node1 -- /etc/init.d/corosync start
# crm configure rsc_defaults resource-stickiness=100
# chkconfig httpd off //不能让开机自动启动
们这里使用lsb类型:
新建资源WebSite:
# crm configure primitive WebSite lsb:httpd
查看配置文件中生成的定义:
primitive WebIP ocf:heartbeat:IPaddr \
primitive WebSite lsb:httpd
property $id="cib-bootstrap-options" \
cluster-infrastructure="openais" \
查看资源的启用状态:
# crm status Online: [ node1.a.org node2.a.org ]
WebSite (lsb:httpd): Started node2.a.org
因此,对于前述的WebIP和WebSite可能会运行于不同节点的问题,可以通过以下命令来解决:
# crm configure colocation website-with-ip INFINITY: WebSite WebIP
接着,我们还得确保WebSite在某节点启动之前得先启动WebIP,这可以使用如下命令实现:
# crm configure order httpd-after-ip mandatory: WebIP WebSite