转变为Active/Active
目录
8.1. 需求 ............................................................................ 57
8.2. 安装一个集群文件系统 - GFS2 ..................................................... 57
8.3. 整合 Pacemaker-GFS2 ............................................................. 58
8.3.1. 添加 DLM 服务 ............................................................ 59
8.3.2. 添加 GFS2 服务 ........................................................... 60
8.4. 创建一个 GFS2 文件系统 .......................................................... 61
8.4.1. 准备工作 ................................................................. 61
8.4.2. 创建并迁移数据到 GFS2 分区 ............................................... 62
8.5. 8.5. 重新为集群配置GFS2 ......................................................... 63
8.6. 重新配置 Pacemaker 为 Active/Active ............................................. 64
8.6.1. 恢复测试 ................................................................. 67
8.1.?需求 Active/Active集群一个主要的需求就是数据在两台机器上面都是可用并且是同步的。Pacemaker没有要
求你怎么实现,你可以用SAN,但是自从DRBD支持多主模式,我们也可以用这个来实现。
唯一的限制是我们要用一个针对集群的文件系统(我们之前用的ext4,它并不是这样一个文件系统)。
OCFS2或者GFS2都是可以的,但是在Fedora 13上面,我们用GFS2。
8.2.?安装一个集群文件系统 - GFS2
首先我们在各个节点上面安装GFS2。
[root@pcmk-1 ~]# yum install -y gfs2-utils gfs-pcmk
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package gfs-pcmk.x86_64 0:3.0.5-2.fc12 set to be updated
--> Processing Dependency: libSaCkpt.so.3(OPENAIS_CKPT_B.01.01)(64bit) for package: gfs-
pcmk-3.0.5-2.fc12.x86_64
--> Processing Dependency: dlm-pcmk for package: gfs-pcmk-3.0.5-2.fc12.x86_64
--> Processing Dependency: libccs.so.3()(64bit) for package: gfs-pcmk-3.0.5-2.fc12.x86_64
--> Processing Dependency: libdlmcontrol.so.3()(64bit) for package: gfs-pcmk-3.0.5-2.fc12.x86_64
--> Processing Dependency: liblogthread.so.3()(64bit) for package: gfs-pcmk-3.0.5-2.fc12.x86_64
--> Processing Dependency: libSaCkpt.so.3()(64bit) for package: gfs-pcmk-3.0.5-2.fc12.x86_64
---> Package gfs2-utils.x86_64 0:3.0.5-2.fc12 set to be updated
--> Running transaction check
---> Package clusterlib.x86_64 0:3.0.5-2.fc12 set to be updated
---> Package dlm-pcmk.x86_64 0:3.0.5-2.fc12 set to be updated
---> Package openaislib.x86_64 0:1.1.0-1.fc12 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================================================
?Package ? ? ? ? ? ? ? ?Arch ? ? ? ? ? ? ? Version ? ? ? ? ? ? ? ? ? Repository ? ? ? ?Size
===========================================================================================
Installing:
?gfs-pcmk ? ? ? ? ? ? ? x86_64 ? ? ? ? ? ? 3.0.5-2.fc12 ? ? ? ? ? ? ?custom ? ? ? ? ? 101 k
?gfs2-utils ? ? ? ? ? ? x86_64 ? ? ? ? ? ? 3.0.5-2.fc12 ? ? ? ? ? ? ?custom ? ? ? ? ? 208 k第?8?章?转变为Active/Active
58
Installing for dependencies:
?clusterlib ? ? ? ? ? ? x86_64 ? ? ? ? ? ? 3.0.5-2.fc12 ? ? ? ? ? ? ?custom ? ? ? ? ? ?65 k
?dlm-pcmk ? ? ? ? ? ? ? x86_64 ? ? ? ? ? ? 3.0.5-2.fc12 ? ? ? ? ? ? ?custom ? ? ? ? ? ?93 k
?openaislib ? ? ? ? ? ? x86_64 ? ? ? ? ? ? 1.1.0-1.fc12 ? ? ? ? ? ? ?fedora ? ? ? ? ? ?76 k
Transaction Summary
===========================================================================================
Install ? ? ? 5 Package(s)
Upgrade ? ? ? 0 Package(s)
Total download size: 541 k
Downloading Packages:
(1/5): clusterlib-3.0.5-2.fc12.x86_64.rpm ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ?65 kB ? ? 00:00
(2/5): dlm-pcmk-3.0.5-2.fc12.x86_64.rpm ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ?93 kB ? ? 00:00
(3/5): gfs-pcmk-3.0.5-2.fc12.x86_64.rpm ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| 101 kB ? ? 00:00
(4/5): gfs2-utils-3.0.5-2.fc12.x86_64.rpm ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| 208 kB ? ? 00:00
(5/5): openaislib-1.1.0-1.fc12.x86_64.rpm ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ?76 kB ? ? 00:00
-------------------------------------------------------------------------------------------
Total ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 992 kB/s | 541 kB ? ? 00:00
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
? Installing ? ? : clusterlib-3.0.5-2.fc12.x86_64 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1/5
? Installing ? ? : openaislib-1.1.0-1.fc12.x86_64 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2/5
? Installing ? ? : dlm-pcmk-3.0.5-2.fc12.x86_64 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 3/5
? Installing ? ? : gfs-pcmk-3.0.5-2.fc12.x86_64 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4/5
? Installing ? ? : gfs2-utils-3.0.5-2.fc12.x86_64 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 5/5
Installed:
? gfs-pcmk.x86_64 0:3.0.5-2.fc12 ? ? ? ? ? ? ? ? ? ?gfs2-utils.x86_64 0:3.0.5-2.fc12
Dependency Installed:
? clusterlib.x86_64 0:3.0.5-2.fc12 ? dlm-pcmk.x86_64 0:3.0.5-2.fc12
? openaislib.x86_64 0:1.1.0-1.fc12 ?
Complete!
[root@pcmk-1 x86_64]#
警告
If this step fails, it is likely that your version/distribution does not ship the
"Pacemaker" versions of dlm_controld and/or gfs_controld. Normally these files would
be called dlm_controld.pcmk and gfs_controld.pcmk and live in the /usr/sbin directory.
If you cannot locate an installation source for these files, you will need to install
a package called cman and reconfigure Corosync to use it as outlined in 附录?C, Using
CMAN for Cluster Membership and Quorum.
When using CMAN, you can skip 第?8.3?节 “整合 Pacemaker-GFS2” where dlm-clone and
gfs-clone are created, and proceed directly to 第?8.4?节 “创建一个 GFS2 文件系统”.
8.3.?整合 Pacemaker-GFS2
GFS2要求运行两个服务,首先是用户空间访问内核的分布式锁管理(DLM)的接口。 DLM是用来统筹哪个
节点可以处理某个特定的文件,并且与Pacemaker集成来得到节点之间的关系1
和隔离能力。
1
The list of nodes the cluster considers to be available添加 DLM 服务
59
另外一个服务是GFS2自身的控制进程,也是与Pacemaker集成来得到节点之间的关系。
8.3.1.?添加 DLM 服务
DLM控制进程需要在所有可用的集群节点上面运行,所以我们用shell交互模式来添加一个cloned类型的
资源。
[root@pcmk-1 ~]# crm
crm(live)# cib new stack-glue
INFO: stack-glue shadow CIB created
crm(stack-glue)# configure primitive dlm ocf:pacemaker:controld op monitor interval=120s
crm(stack-glue)# configure clone dlm-clone dlm meta interleave=true
crm(stack-glue)# configure show xml
crm(stack-glue)# configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4"
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip="192.168.122.101" cidr_netmask="32" \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
op monitor interval="120s"
ms WebDataClone WebData \
? ? ? ? meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone dlm-clone dlm \
meta interleave="true"
location prefer-pcmk-1 WebSite 50: pcmk-1
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation website-with-ip inf: WebSite ClusterIP
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: ClusterIP WebSite
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled="false" \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”
注意
TODO: Explain the meaning of the interleave option
看看配置文件有没有错误,然后退出shell看看集群的反应。
crm(stack-glue)# cib commit stack-glue
INFO: commited 'stack-glue' shadow CIB to the cluster
crm(stack-glue)# quit第?8?章?转变为Active/Active
60
bye
[root@pcmk-1 ~]# crm_mon
============
Last updated: Thu Sep ?3 20:49:54 2009
Stack: openais
Current DC: pcmk-2 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
5 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
WebSite (ocf::heartbeat:apache): ? ? ? ?Started pcmk-2
Master/Slave Set: WebDataClone
? ? ? ? Masters: [ pcmk-1 ]
? ? ? ? Slaves: [ pcmk-2 ]
ClusterIP? ? ? ? (ocf::heartbeat:IPaddr): ? ? ? ?Started pcmk-2
Clone Set: dlm-clone
Started: [ pcmk-2 pcmk-1 ]
WebFS ? (ocf::heartbeat:Filesystem): ? ?Started pcmk-2
8.3.2.?添加 GFS2 服务
一旦DLM启动了,我们可以加上GFS2的控制进程了。
用crm shell来创建gfs-control这个集群资源:
[root@pcmk-1 ~]# crm
crm(live)# cib new gfs-glue --force
INFO: gfs-glue shadow CIB created
crm(gfs-glue)# configure primitive gfs-control ocf:pacemaker:controld params daemon=gfs_controld.pcmk args="-g
0" op monitor interval=120s
crm(gfs-glue)# configure clone gfs-clone gfs-control meta interleave=true
现在确保Pacemaker只在有dlm服务运行的节点上面启动 gfs-control 服务
crm(gfs-glue)# configure colocation gfs-with-dlm INFINITY: gfs-clone dlm-clone
crm(gfs-glue)# configure order start-gfs-after-dlm mandatory: dlm-clone gfs-clone
看看配置文件有没有错误,然后退出shell看看集群的反应。
crm(gfs-glue)# configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4"
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip="192.168.122.101" cidr_netmask="32" \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
primitive gfs-control ocf:pacemaker:controld \创建一个 GFS2 文件系统
61
params daemon=”gfs_controld.pcmk” args=”-g 0” \
op monitor interval="120s"
ms WebDataClone WebData \
? ? ? ? meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone dlm-clone dlm \
? ? ? ? meta interleave="true"
clone gfs-clone gfs-control \
meta interleave="true"
location prefer-pcmk-1 WebSite 50: pcmk-1
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation gfs-with-dlm inf: gfs-clone dlm-clone
colocation website-with-ip inf: WebSite ClusterIP
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: ClusterIP WebSite
order start-gfs-after-dlm inf: dlm-clone gfs-clone
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled="false" \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”
crm(gfs-glue)# cib commit gfs-glue
INFO: commited 'gfs-glue' shadow CIB to the cluster
crm(gfs-glue)# quit
bye
[root@pcmk-1 ~]# crm_mon
============
Last updated: Thu Sep ?3 20:49:54 2009
Stack: openais
Current DC: pcmk-2 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
6 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
WebSite (ocf::heartbeat:apache): ? ? ? ?Started pcmk-2
Master/Slave Set: WebDataClone
? ? ? ? Masters: [ pcmk-1 ]
? ? ? ? Slaves: [ pcmk-2 ]
ClusterIP? ? ? ? (ocf::heartbeat:IPaddr): ? ? ? ?Started pcmk-2
Clone Set: dlm-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
Clone Set: gfs-clone
Started: [ pcmk-2 pcmk-1 ]
WebFS ? (ocf::heartbeat:Filesystem): ? ?Started pcmk-1
8.4.?创建一个 GFS2 文件系统
8.4.1.?准备工作
在我们对一个已存在的分区做任何操作之前,我们要确保它没有被挂载。我们告诉集群停止WebFS这个
资源来确保这一点。这可以确保其他使用WebFS的资源会正确的依次关闭。
[root@pcmk-1 ~]# crm_resource --resource WebFS --set-parameter target-role --meta --parameter-value Stopped
[root@pcmk-1 ~]# crm_mon
============第?8?章?转变为Active/Active
62
Last updated: Thu Sep ?3 15:18:06 2009
Stack: openais
Current DC: pcmk-1 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
6 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
Master/Slave Set: WebDataClone
? ? ? ? Masters: [ pcmk-1 ]
? ? ? ? Slaves: [ pcmk-2 ]
ClusterIP? ? ? ? (ocf::heartbeat:IPaddr):? ? ? ? Started pcmk-1
Clone Set: dlm-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
Clone Set: gfs-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
注意
注意 Apache and WebFS 两者都已经停止了。
8.4.2.?创建并迁移数据到 GFS2 分区
现在集群的基层和集成部分都正常运行,我们现在创建一个GFS2分区
警告
这个操作会清除DRBD分区上面的所有数据,请备份重要的数据。
我们要为GFS2分区指定一系列附加的参数。
首先我们要用 -p选项来指定我们用的是内核的DLM,然后我们用-j来表示我们为两个日志保留足够的空
间(每个操作文件系统的节点各一个)。
最后,我们用-t来指定lock table的名称。这个字段的格式是 clustername:fsname(集群名称:文件系
统名称)。fsname的话,我们只要用一个唯一的并且能描述我们这个集群的名称就好了,我们用默认的
pcmk。
如果要更改集群的名称,找到包含name:pacemaker的配置文件区域,然后添加如下所示的选项即可。
clustername: myname
在每个节点都执行以下命令。
[root@pcmk-1 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t pcmk:web /dev/drbd1
This will destroy any data on /dev/drbd1.
It appears to contain: data
Are you sure you want to proceed? [y/n] y
Device: ? ? ? ? ? ? ? ? ? ?/dev/drbd1
Blocksize: ? ? ? ? ? ? ? ? 4096
Device Size ? ? ? ? ? ? ? ?1.00 GB (131072 blocks)
Filesystem Size: ? ? ? ? ? 1.00 GB (131070 blocks)8.5. 重新为集群配置GFS2
63
Journals: ? ? ? ? ? ? ? ? ?2
Resource Groups: ? ? ? ? ? 2
Locking Protocol: ? ? ? ? ?"lock_dlm"
Lock Table: ? ? ? ? ? ? ? ?"pcmk:web"
UUID: ? ? ? ? ? ? ? ? ? ? ?6B776F46-177B-BAF8-2C2B-292C0E078613
[root@pcmk-1 ~]#
然后再迁移数据到这个新的文件系统。现在我们创建一个跟上次不一样的主页。
[root@pcmk-1 ~]# mount /dev/drbd1 /mnt/
[root@pcmk-1 ~]# cat <<-END >/mnt/index.html
<html>
<body>My Test Site - GFS2</body>
</html>
END
[root@pcmk-1 ~]# umount /dev/drbd1
[root@pcmk-1 ~]# drbdadm?verify?wwwdata
[root@pcmk-1 ~]#
8.5.?8.5. 重新为集群配置GFS2
[root@pcmk-1 ~]# crm
crm(live)# cib new GFS2
INFO: GFS2 shadow CIB created
crm(GFS2)# configure delete WebFS
crm(GFS2)# configure primitive WebFS ocf:heartbeat:Filesystem params device="/dev/drbd/by-res/wwwdata"
directory="/var/www/html" fstype=”gfs2”
现在我们重新创建这个资源, 我们也要重建跟这个资源相关的约束条件,因为shell会自动删除跟
WebFS相关的约束条件。
crm(GFS2)# configure colocation WebSite-with-WebFS inf: WebSite WebFS
crm(GFS2)# configure colocation fs_on_drbd inf: WebFS WebDataClone:Master
crm(GFS2)# configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
crm(GFS2)# configure order WebSite-after-WebFS inf: WebFS WebSite
crm(GFS2)# configure colocation WebFS-with-gfs-control INFINITY: WebFS gfs-clone
crm(GFS2)# configure order start-WebFS-after-gfs-control mandatory: gfs-clone WebFS
crm(GFS2)# configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype=”gfs2”
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip="192.168.122.101" cidr_netmask="32" \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
primitive gfs-control ocf:pacemaker:controld \
? ?params daemon=”gfs_controld.pcmk” args=”-g 0” \
? ? ? ? op monitor interval="120s"
ms WebDataClone WebData \
? ? ? ? meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"第?8?章?转变为Active/Active
64
clone dlm-clone dlm \
? ? ? ? meta interleave="true"
clone gfs-clone gfs-control \
? ? ? ? meta interleave="true"
colocation WebFS-with-gfs-control inf: WebFS gfs-clone
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation gfs-with-dlm inf: gfs-clone dlm-clone
colocation website-with-ip inf: WebSite ClusterIP
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: ClusterIP WebSite
order start-WebFS-after-gfs-control inf: gfs-clone WebFS
order start-gfs-after-dlm inf: dlm-clone gfs-clone
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled="false" \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”
看看配置文件有没有错误,然后退出shell看看集群的反应。
crm(GFS2)# cib commit GFS2
INFO: commited 'GFS2' shadow CIB to the cluster
crm(GFS2)# quit
bye
[root@pcmk-1 ~]# crm_mon
============
Last updated: Thu Sep ?3 20:49:54 2009
Stack: openais
Current DC: pcmk-2 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
6 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
WebSite (ocf::heartbeat:apache): ? ? ? ?Started pcmk-2
Master/Slave Set: WebDataClone
? ? ? ? Masters: [ pcmk-1 ]
? ? ? ? Slaves: [ pcmk-2 ]
ClusterIP? ? ? ? (ocf::heartbeat:IPaddr): ? ? ? ?Started pcmk-2
Clone Set: dlm-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
Clone Set: gfs-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-1
8.6.?重新配置 Pacemaker 为 Active/Active
基本上所有的事情都已经准备就绪了。最新的DRBD是支持 Primary/Primary(主/主)模式的,并且我们
的文件系统的是针对集群的。所有我们要做的事情就是重新配置我们的集群来使用它们(的先进功能)。
这次操作会改很多东西,所以我们再次使用交互模式
[root@pcmk-1 ~]# crm
[root@pcmk-1 ~]# cib new active重新配置 Pacemaker 为 Active/Active
65
如果我们不能访问这些服务,那做成 Active/Active是没有必要的,所以我们要先clone这个IP地址,
克隆的IPaddr2资源用的是iptables规则来保证每个请求都只由一个节点来处理。附件的meta选项告诉
集群我们要克隆多少个实例(每个节点一个"请求桶")。并且如果其他节点挂了,剩下的节点可以处理所
有的请求。否则这些请求都会被丢弃。
[root@pcmk-1 ~]# configure clone WebIP ClusterIP ?\
? ? ? ? meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
现在我们要告诉集群如何决定请求怎样分配给节点。我们要设置 clusterip_hash这个参数来实现它。
打开ClusterIP的配置
[root@pcmk-1 ~]# configure edit ?ClusterIP
在参数行添加以下内容:
clusterip_hash="sourceip"
完整的定义就像下面一样:
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \
? ? ? ? op monitor interval="30s"
以下是完整的配置
[root@pcmk-1 ~]# crm
crm(live)# cib new active
INFO: active shadow CIB created
crm(active)# configure clone WebIP ClusterIP ?\
? ? ? ? meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
crm(active)# configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype=”gfs2”
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip=”192.168.122.101” cidr_netmask=”32” clusterip_hash=”sourceip” \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
primitive gfs-control ocf:pacemaker:controld \
? ?params daemon=”gfs_controld.pcmk” args=”-g 0” \
? ? ? ? op monitor interval="120s"
ms WebDataClone WebData \
? ? ? ? meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone WebIP ClusterIP \
meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
clone dlm-clone dlm \
? ? ? ? meta interleave="true"
clone gfs-clone gfs-control \第?8?章?转变为Active/Active
66
? ? ? ? meta interleave="true"
colocation WebFS-with-gfs-control inf: WebFS gfs-clone
colocation WebSite-with-WebFS inf: WebSite WebFS
colocation fs_on_drbd inf: WebFS WebDataClone:Master
colocation gfs-with-dlm inf: gfs-clone dlm-clone
colocation website-with-ip inf: WebSite WebIP
order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
order WebSite-after-WebFS inf: WebFS WebSite
order apache-after-ip inf: WebIP WebSite
order start-WebFS-after-gfs-control inf: gfs-clone WebFS
order start-gfs-after-dlm inf: dlm-clone gfs-clone
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled="false" \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”
请注意所有跟ClusterIP相关的限制都已经被更新到与WebIP相关,这是使用crm shell的另一个好处。
然后我们要把文件系统和apache资源变成clones。同样的 crm shell会自动更新相关约束。
crm(active)# configure clone WebFSClone WebFS
crm(active)# configure clone WebSiteClone WebSite
最后要告诉集群现在允许把两个节点都提升为 Primary(换句话说 Master).
crm(active)# configure edit WebDataClone
把 master-max 改为 2
crm(active)# configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype=”gfs2”
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip=”192.168.122.101” cidr_netmask=”32” clusterip_hash=”sourceip” \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
primitive gfs-control ocf:pacemaker:controld \
? ?params daemon=”gfs_controld.pcmk” args=”-g 0” \
? ? ? ? op monitor interval="120s"
ms WebDataClone WebData \
? ? ? ? meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone WebFSClone WebFS
clone WebIP ClusterIP ?\
? ? ? ? meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
clone WebSiteClone WebSite
clone dlm-clone dlm \
? ? ? ? meta interleave="true"恢复测试
67
clone gfs-clone gfs-control \
? ? ? ? meta interleave="true"
colocation WebFS-with-gfs-control inf: WebFSClone gfs-clone
colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone
colocation fs_on_drbd inf: WebFSClone WebDataClone:Master
colocation gfs-with-dlm inf: gfs-clone dlm-clone
colocation website-with-ip inf: WebSiteClone WebIP
order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
order WebSite-after-WebFS inf: WebFSClone WebSiteClone
order apache-after-ip inf: WebIP WebSiteClone
order start-WebFS-after-gfs-control inf: gfs-clone WebFSClone
order start-gfs-after-dlm inf: dlm-clone gfs-clone
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled="false" \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”
看看配置文件有没有错误,然后退出shell看看集群的反应。
crm(active)# cib commit active
INFO: commited 'active' shadow CIB to the cluster
crm(active)# quit
bye
[root@pcmk-1 ~]# crm_mon
============
Last updated: Thu Sep ?3 21:37:27 2009
Stack: openais
Current DC: pcmk-2 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
6 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
Master/Slave Set: WebDataClone
? ? ? ? Masters: [ pcmk-1 pcmk-2 ]
Clone Set: dlm-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
Clone Set: gfs-clone
? ? ? ? Started: [ pcmk-2 pcmk-1 ]
Clone Set: WebIP
Started: [ pcmk-1 pcmk-2 ]
Clone Set: WebFSClone
Started: [ pcmk-1 pcmk-2 ]
Clone Set: WebSiteClone
Started: [ pcmk-1 pcmk-2 ]
8.6.1.?恢复测试
注意
TODO: Put one node into standby to demonstrate failover68第?9
69
配置 STONITH
目录
9.1. 为什么需要 STONITH .............................................................. 69
9.2. 你该用什么样的STONITH设备。 ..................................................... 69
9.3. 配置STONITH ..................................................................... 69
9.3.1. 例子 ..................................................................... 70
9.1.?为什么需要 STONITH
STONITH 是爆其他节点的头( Shoot-The-Other-Node-In-The-Head)的缩写,它能保护你的数据不被
不正常的节点破坏或是并发写入。
因为如果一个节点没有相应,但并不代表它没有在操作你的数据,100%保证数据安全的做法就是在允许
另外一个节点操作数据之前,使用STONITH来保证节点真的下线了。
STONITH另外一个用场是在当集群服务无法停止的时候。这个时候,集群可以用STONITH来强制使节点下
线,从而可以安全的得在其他地方启动服务。
9.2.?你该用什么样的STONITH设备。 重要的一点是STONITH设备可以让集群区分节点故障和网络故障。
人们常常犯得一个错误就是选择远程电源开关作为STONITH设备(比如许多主板自带的IPMI控制器) 。在
那种情况下,集群不能分辨节点是真正的下线了,还是网络无法连通了。
同样地, 任何依靠可用节点的设备(比如测试用的基于SSH的“设备”)都是不适当的。
9.3.?配置STONITH
1. 找到正确的STONITH驱动: stonith -L
2. 因为设备的不同, 配置的参数也不一样。 想看设备所需设置的参数,可以用: stonith -t {type}
-n
希望开发者选择了合适的名称,如果不是这样,你可以在活动的机器上面执行以下命令来获得更多信息
。
lrmadmin -M stonith {type} pacemaker
输出应该是XML格式的文本文件,它包含了更详细的描述
1. 创建stonith.xml文件 包含了一个原始的源,它定义了资stonith类下面的某个type和这个type所需
的参数。
2. 如果这个设备可以击杀多个设备并且支持从多个节点连接过来,那我们从这个原始资源创建一个克
隆。
3. 使用cibadmin来更新CIB配置文件:cibadmin -C -o resources --xml-file stonith.xml第?9?章?配置 STONITH
70
9.3.1.?例子
假设我们有一个 包含两个节点的IBM BladeCenter,控制界面的IP是192.168.122.31,然后我们选择
external/ibmrsa作为驱动,然后配置下面列表当中的参数。
[root@pcmk-1 ~]# stonith -t external/ibmrsa -n
hostname ?ipaddr ?userid ?passwd ?type
假设我们知道管理界面的用户名和密码,我们要创建一个STONITH的资源:
[root@pcmk-1 ~]# crm
crm(live)# cib new stonith
INFO: stonith shadow CIB created
crm(stonith)# configure primitive rsa-fencing stonith::external/ibmrsa \
? ? ? ? params hostname=”pcmk-1 pcmk-2" ipaddr=192.168.122.31 userid=mgmt passwd=abc123 type=ibm \
? ? ? ? op monitor interval="60s"
crm(stonith)# configure clone Fencing rsa-fencing
最后,我们要重新打开之前禁用的STONITH:
crm(stonith)# configure property stonith-enabled="true"
crm(stonith)# configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype=”gfs2”
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip=”192.168.122.101” cidr_netmask=”32” clusterip_hash=”sourceip” \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
primitive gfs-control ocf:pacemaker:controld \
? ?params daemon=”gfs_controld.pcmk” args=”-g 0” \
? ? ? ? op monitor interval="120s"
primitive rsa-fencing stonith::external/ibmrsa \
params hostname=”pcmk-1 pcmk-2" ipaddr=192.168.122.31 userid=mgmt passwd=abc123 type=ibm \
op monitor interval="60s"
ms WebDataClone WebData \
? ? ? ? meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone Fencing rsa-fencing
clone WebFSClone WebFS
clone WebIP ClusterIP ?\
? ? ? ? meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
clone WebSiteClone WebSite
clone dlm-clone dlm \
? ? ? ? meta interleave="true"
clone gfs-clone gfs-control \
? ? ? ? meta interleave="true"
colocation WebFS-with-gfs-control inf: WebFSClone gfs-clone
colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone
colocation fs_on_drbd inf: WebFSClone WebDataClone:Master
colocation gfs-with-dlm inf: gfs-clone dlm-clone
colocation website-with-ip inf: WebSiteClone WebIP
order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
order WebSite-after-WebFS inf: WebFSClone WebSiteClone例子
71
order apache-after-ip inf: WebIP WebSiteClone
order start-WebFS-after-gfs-control inf: gfs-clone WebFSClone
order start-gfs-after-dlm inf: dlm-clone gfs-clone
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled="true" \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”7273
附录?A.?配置扼要重述
目录
A.1. 最终的集群配置文件 .............................................................. 73
A.2. 节点列表 ........................................................................ 74
A.3. 集群选项 ........................................................................ 74
A.4. 资源 ............................................................................ 74
A.4.1. 默认选项 ................................................................. 74
A.4.2. 隔离 ..................................................................... 75
A.4.3. 服务地址 ................................................................. 75
A.4.4. 分布式锁控制器 ........................................................... 75
A.4.5. GFS 控制守护进程 ......................................................... 75
A.4.6. DRBD - 共享存储 .......................................................... 76
A.4.7. 集群文件系统 ............................................................. 76
A.4.8. Apache ................................................................... 76
A.1.?最终的集群配置文件
[root@pcmk-1 ~]# crm configure show
node pcmk-1
node pcmk-2
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype=”gfs2”
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip=”192.168.122.101” cidr_netmask=”32” clusterip_hash=”sourceip” \
? ? ? ? op monitor interval="30s"
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
primitive gfs-control ocf:pacemaker:controld \
? ?params daemon=”gfs_controld.pcmk” args=”-g 0” \
? ? ? ? op monitor interval="120s"
primitive rsa-fencing stonith::external/ibmrsa \
? ? ? ? params hostname=”pcmk-1 pcmk-2" ipaddr=192.168.122.31 userid=mgmt passwd=abc123 type=ibm \
? ? ? ? op monitor interval="60s"
ms WebDataClone WebData \
? ? ? ? meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
clone Fencing rsa-fencing
clone WebFSClone WebFS
clone WebIP ClusterIP ?\
? ? ? ? meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
clone WebSiteClone WebSite
clone dlm-clone dlm \
? ? ? ? meta interleave="true"
clone gfs-clone gfs-control \
? ? ? ? meta interleave="true"
colocation WebFS-with-gfs-control inf: WebFSClone gfs-clone
colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone
colocation fs_on_drbd inf: WebFSClone WebDataClone:Master
colocation gfs-with-dlm inf: gfs-clone dlm-clone
colocation website-with-ip inf: WebSiteClone WebIP附录?A.?配置扼要重述
74
order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
order WebSite-after-WebFS inf: WebFSClone WebSiteClone
order apache-after-ip inf: WebIP WebSiteClone
order start-WebFS-after-gfs-control inf: gfs-clone WebFSClone
order start-gfs-after-dlm inf: dlm-clone gfs-clone
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled=”true” \
? ? ? ? no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”
A.2.?节点列表 这个列表中的集群节点是集群自动添加的。
node pcmk-1
node pcmk-2
A.3.?集群选项 这是集群自动存储集群信息的地方
1. dc-version - DC使用的Pacemaker的版本(包括源代码的hash)
2. 集群-基层 - 集群使用的基层软件 (heartbeat or openais/corosync)
3. expected-quorum-votes - 预期的集群最大成员数
以及管理员设置集群操作的方法选项
1. stonith-enabled=true - 使用STONITH
2. no-quorum-policy=ignore - 忽略达不到法定人数的情况,继续运行资源
property $id="cib-bootstrap-options" \
? ? ? ? dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \
? ? ? ? cluster-infrastructure="openais" \
? ? ? ? expected-quorum-votes=”2” \
? ? ? ? stonith-enabled=”true” \
? ? ? ? no-quorum-policy="ignore"
A.4.?资源
A.4.1.?默认选项
这里我们设置所有资源共用的集群选项
1. resource-stickiness - 资源粘稠值
rsc_defaults $id="rsc-options" \
? ? ? ? resource-stickiness=”100”隔离
75
A.4.2.?隔离
注意
TODO: Add text here
primitive rsa-fencing stonith::external/ibmrsa \
? ? ? ? params hostname=”pcmk-1 pcmk-2" ipaddr=192.168.122.31 userid=mgmt passwd=abc123 type=ibm \
? ? ? ? op monitor interval="60s"
clone Fencing rsa-fencing
A.4.3.?服务地址
用户需要一个不变的地址来访问集群所提供的服务。此外,我们clone了地址,以便在两个节点上都使
用这个IP。一个iptables规则(resource agent的一部分)是用来确保每个请求只能由两个节点中的某
一个处理。这些额外的集群选项告诉我们想要两个clone(每个节点一个“请求桶”)实例,如果一个
节点失效,那么剩下的节点处理这两个请求桶。
primitive ClusterIP ocf:heartbeat:IPaddr2 \
? ? ? ? params ip=”192.168.122.101” cidr_netmask=”32” clusterip_hash=”sourceip” \
? ? ? ? op monitor interval="30s"
clone WebIP ClusterIP ?
? ? ? ? meta globally-unique=”true” clone-max=”2” clone-node-max=”2”
注意
TODO: The RA should check for globally-unique=true when cloned
A.4.4.?分布式锁控制器
像GFS2集群文件系统需要一个锁管理。该服务启动守护进程,提供了访问内核中的锁管理器的用户空间
应用程序(如GFS2守护进程)。因为我们需要它在集群中的所有可用节点中运行,我们把它clone。
primitive dlm ocf:pacemaker:controld \
? ? ? ? op monitor interval="120s"
clone dlm-clone dlm \
? ? ? ? meta interleave="true
注意
TODO: Confirm interleave is no longer needed
A.4.5.?GFS 控制守护进程
GFS2还需要一个user-space到kernel的桥梁,每个节点上要运行。所以在这里我们还有一个clone,
但是这一次我们还必须指定它只能运行在有DLM的机器上(colocation 约束),它只能在DLM后启
动 (order约束)。此外,gfs-control clone应该只关系与其配对的DLM实例,所以我们还要设置
interleave 选项附录?A.?配置扼要重述
76
primitive gfs-control ocf:pacemaker:controld \
? ?params daemon=”gfs_controld.pcmk” args=”-g 0” \
? ? ? ? op monitor interval="120s"
clone gfs-clone gfs-control \
? ? ? ? meta interleave="true"
colocation gfs-with-dlm inf: gfs-clone dlm-clone
order start-gfs-after-dlm inf: dlm-clone gfs-clone
A.4.6.?DRBD - 共享存储
在这里,我们定义了DRBD技术服务,并指定DRBD应该管理的资源(从drbd.conf)。我们让它作为主/从
资源,并且为了active/active,用设置master-max=2来允许两者都晋升为master。我们还可以设置通
知选项,这样,当时集群的节点的状态发生改变时,该集群将告诉DRBD的agent。
primitive WebData ocf:linbit:drbd \
? ? ? ? params drbd_resource="wwwdata" \
? ? ? ? op monitor interval="60s"
ms WebDataClone WebData \
? ? ? ? meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
A.4.7.?集群文件系统
群集文件系统可确保文件读写正确。我们需要指定我们想挂载并使用GFS2的块设备(由DRBD提供)。这
又是一个clone,因为它的目的是在两个节点上都可用。这些额外的限制确保它只在有gfs-control和
drbd 实例的节点上运行。
primitive WebFS ocf:heartbeat:Filesystem \
? ? ? ? params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype=”gfs2”
clone WebFSClone WebFS
colocation WebFS-with-gfs-control inf: WebFSClone gfs-clone
colocation fs_on_drbd inf: WebFSClone WebDataClone:Master
order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start
order start-WebFS-after-gfs-control inf: gfs-clone WebFSClone
A.4.8.?Apache
最后我们有了真正的服务,Apache,我们只需要告诉集群在哪里可以找到它的主配置文件,并限制其只
在挂载了文件系统和有可用IP节点上运行
primitive WebSite ocf:heartbeat:apache \
? ? ? ? params configfile="/etc/httpd/conf/httpd.conf" \
? ? ? ? op monitor interval="1min"
clone WebSiteClone WebSite
colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone
colocation website-with-ip inf: WebSiteClone WebIP
order apache-after-ip inf: WebIP WebSiteClone
order WebSite-after-WebFS inf: WebFSClone WebSiteClone77
附录?B.?Sample Corosync Configuration
例?B.1.?Sample Corosync.conf for a two-node cluster
# Please read the Corosync.conf.5 manual page
compatibility: whitetank
totem {
? ? ? ? version: 2
? ? ? ? # How long before declaring a token lost (ms)
? ? ? ? token: ? ? ? ? ?5000
? ? ? ? # How many token retransmits before forming a new configuration
? ? ? ? token_retransmits_before_loss_const: 10
? ? ? ? # How long to wait for join messages in the membership protocol (ms)
? ? ? ? join: ? ? ? ? ? 1000
? ? ? ? # How long to wait for consensus to be achieved before starting a new
? ? ? ? # round of membership configuration (ms)
? ? ? ? consensus: ? ? ?6000
? ? ? ? # Turn off the virtual synchrony filter
? ? ? ? vsftype: ? ? ? ?none
? ? ? ? # Number of messages that may be sent by one processor on receipt of the token
? ? ? ? max_messages: ? 20
? ? ? ? # Stagger sending the node join messages by 1..send_join ms
? ? ? ? send_join: 45
? ? ? ? # Limit generated nodeids to 31-bits (positive signed integers)
? ? ? ? clear_node_high_bit: yes
? ? ? ? # Disable encryption
? ? ? ? secauth:? ? ? ? off
? ? ? ? # How many threads to use for encryption/decryption
? ? ? ? threads: ? ? ? ? ? 0
? ? ? ? # Optionally assign a fixed node id (integer)
? ? ? ? # nodeid: ? ? ? ? 1234
? ? ? ? interface {
? ? ? ? ? ? ? ? ringnumber: 0
? ? ? ? ? ? ? ? # The following values need to be set based on your environment
? ? ? ? ? ? ? ? bindnetaddr: 192.168.122.0
? ? ? ? ? ? ? ? mcastaddr: 226.94.1.1
? ? ? ? ? ? ? ? mcastport: 4000
? ? ? ? }
}
logging {
? ? ? ? debug: off
? ? ? ? fileline: off
? ? ? ? to_syslog: yes
? ? ? ? to_stderr: off
? ? ? ? syslog_facility: daemon
? ? ? ? timestamp: on
}附录?B.?Sample Corosync Configuration
78
amf {
? ? ? ? mode: disabled
}
79
附录?C.?Using CMAN for Cluster
Membership and Quorum
目录
C.1. Background ...................................................................... 79
C.2. Adding CMAN Support ............................................................ 79
C.2.1. Adding CMAN Support - cluster.conf ....................................... 79
C.2.2. Adding CMAN Support - corosync.conf ...................................... 80
C.1.?Background
CMAN v31
is a Corsync plugin that monitors the names and number of active cluster nodes
in order to deliver membership and quorum information to clients (such as the Pacemaker
daemons).
In a traditional Corosync-Pacemaker cluster, a Pacemaker plugin is loaded to provide
membership and quorum information. The motivation for wanting to use CMAN for this
instead, is to ensure all elements of the cluster stack are making decisions based on the
same membership and quorum data. 2
CMAN has been around longer than Pacemaker and is part of the Red Hat cluster stack, so
it is available and supported by many distributions and other pieces of software (such as
OCFS2 and GFS2). For this reason it makes sense to support it.
C.2.?Adding CMAN Support
警告
Be sure to disable the Pacemaker plugin before continuing with this section. In most
cases, this can be achieved by removing /etc/corosync/service.d/pcmk and stopping
Corosync.
C.2.1.?Adding CMAN Support - cluster.conf
The preferred approach for enabling CMAN is to configure cluster.conf and use the /etc/
init.d/cman script to start Corosync. Its far easier to maintain and start automatically
starts the necessary pieces for using GFS2.
You can find some documentation on Installing CMAN and Creating a Basic Cluster
Configuration File3
at the Red Hat website. However please ignore the parts about Fencing,
1
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Cluster_Suite_Overview/index.html#s2-
clumembership-overview-CSO
2
A failure to do this can lead to what is called internal split-brain - a situation where different parts of the
stack disagree about whether some nodes are alive or dead - which quickly leads to unnecssary down-time and/or
data corruption.
3
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s1-creating-cluster-
cli-CA.html附录?C.?Using CMAN for Cluster Membership and Quorum
80
Failover Domains, or HA Services and anything to do with rgmanager and fenced. All these
continue to be handled by Pacemaker in the normal manner.
例?C.1.?Sample cluster.conf for a two-node cluster
<?xml version="1.0"?>
<cluster config_version="1" name="beekhof">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="pcmk-1" nodeid="1">
<fence/>
</clusternode>
<clusternode name="pcmk-2" nodeid="2">
<fence/>
</clusternode>
</clusternodes>
<cman/>
<fencedevices/>
<rm/>
</cluster>
C.2.2.?Adding CMAN Support - corosync.conf
The alternative is to add the necessary cman configuration elements to corosync.conf. We
recommend you place these directives in /etc/corosync/service.d/cman as they will differ
between machines.
If you choose this approach, you would continue to start and stop Corosync with it's init
script as previously described in this document.
例?C.2.?Sample corosync.conf extensions for a two-node cluster
[root@pcmk-1 ~]# cat <<-END >>/etc/corosync/service.d/cman
cluster {
name: beekhof
clusternodes {
clusternode {
votes: 1
nodeid: 1
name: pcmk-1
}
clusternode {
votes: 1
nodeid: 2
name: pcmk-2
}
}
cman {
expected_votes: 2
cluster_id: 123
nodename: `uname -n`
two_node: 1
max_queued: 10
}
}
service {Adding CMAN Support - corosync.conf
81
name: corosync_cman
ver: 0
}
quorum {
provider: quorum_cman
}
END
警告
Verify that nodename was set appropriately on each host.8283
附录?D.?延伸阅读 Project Website
http://www.clusterlabs.org1
Cluster Commands
一个综合的指南,包含了Novell所写的集群命令,可以在这里被找到:
http://www.novell.com/documentation/sles11/book_sleha/index.html?page=/documentation/
sles11/book_sleha/data/book_sleha.html
Corosync
http://www.corosync.org2
1
http://www.clusterlabs.org/
2
http://www.corosync.org/8485
附录?E.?修订历史 修订 1 Mon May 17 2010 Andrew Beekhof [email protected]
Import from Pages.app8687
索引
F
feedback
contact information for this manual,xi88