一、 Cluster相关软件安装
1、 服务组件说明
l Configuration Information
ccsd – Cluster Configuration System
l High-Availability Management
aisexec - OpenAIS cluster manager: communications, encryption, quorum, membership
rgmanager - Cluster resource group manager
l Shared Storage Related
fenced - I/O Fencing
DLM - Distributed Locking Manager
dlm_controld - Manages DLM groups
lock_dlmd - Manages interaction between DLM GFS
clvmd - Clustered Logical Volume Manager
l Deployment
luci - Conga project
system-config-cluster
2、 相关架构
l RHEL4 CMAN/DLM Architecture
l RHEL5 CMAN/DLM/OpenAIS Architecture
3、安装
[root@node1 ~]# yum install cman rgmanager –y
[root@node1 ~]# yum install -y cluster-cim lvm2-cluster system-config-cluster
[root@node1 ~]# yum install ricci -y
[root@node1 ~]# yum install luci
[root@node1 ~]# service ricci start
启动 oddjobd: [确定]
generating SSL certificates... done
启动 ricci: [确定]
[root@node1 ~]# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
/usr/sbin/cman_tool: ccsd is not running [失败]
[root@node1 ~]# service rgmanager start
[root@node1 ~]# servic qdiskd start
[root@node1 ~]# luci_admin init
Initializing the luci server
Creating the 'admin' user
Enter password:
Confirm password:
Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized
You must restart the luci server for changes to take effect.
Run "service luci restart" to do so
[root@node1 ~]# service luci restart
Shutting down luci: [确定]
Starting luci: Generating https SSL certificates... done [确定]
Point your web browser to https://node1:8084 to access luci
[root@node1 ~]# chkconfig ricci on
[root@node1 ~]# chkconfig cman on
[root@node1 ~]# chkconfig rgmanager on
[root@node1 ~]# chkconfig qdiskd on
[root@node1 ~]# chkconfig luci on
二、 创建仲裁盘
[root@node1 ~]# mkqdisk -l cluster8_qdisk -c /dev/sdc
mkqdisk v0.6.0
Writing new quorum disk label 'cluster8_qdisk' to /dev/sdc.
WARNING: About to destroy all data on /dev/sdc; proceed [N/y] ? y
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...
[root@node1 ~]# mkqdisk -L
mkqdisk v0.6.0
/dev/disk/by-id/scsi-1IET_00010001:
/dev/disk/by-path/ip-10.160.100.40:3260-iscsi-iqn.20120116.target:dskpub-lun-1:
/dev/sdc:
Magic: eb7a62c2
Label: cluster8_qdisk
Created: Mon Jan 16 13:45:21 2012
Host: node1
Kernel Sector Size: 512
Recorded Sector Size: 512
[root@node1 ~]# service qdiskd start #要求所有node上都启动该服务
#quorum disk--仲裁盘
l qdisk磁盘大小10M就行了,需这里必须用-l来设置一个卷标,是因为/dev/sdc有时会变动,这里不能用udev的方法来写,所以要用卷标.在任意一个节点操作即可
mkqdisk -L 查看仲裁盘
mkqdisk -f <label> 查看某个仲裁盘信息
l 可以设置多久和中仲裁盘通信一次,方法是用ping,比如1秒ping一次,ping个50次都成功说明对方是活着的。
三、 利用luci配置管理cluster
1、 创建cluter(添加节点到cluter8中)
2、 为cluster添加仲裁盘
l 参数说明:
interval 2 :每2秒钟做一次
votes 1 :票数为1
TKO 10 :做十次
Minimum Score :每次得一分
Device --/dev/sda3
Label---cluster8_qdisk
l 假如做十次的话,每次得一分,必须大于5分才说明集群正常
Path to program interval score
ping -c1 -t1 192.168.0.254 2 1
用cman_tool status你会发现如果是两个节点的话,期待票数会变成3,总票数也为3,Quorum为2,这说明用Quorum也会加上上面的Votes设置的票数,注意要改一个参数在/etc/cluster/cluster.conf里expected_votes="3" two_node="0",因为这里的expected_votes要手动改一下,并且two_node是两节点集群的开关,改为0说明是大于两节点的集群,如果为1刚说明是两节点的集群。然后各node重启rgmanager、qdisk、cman服务。
3、 为节点添加fence设备(各节点操作类似)
4、 创建cluster域
#priority:优先级数字越小级别越高,表示服务在node上启动的优先级。
Prioritiezed:开启优先级,域中开启优先级,节点上才能设置。
Restrict failover…:严格限制域中成员,限制该域中的成员只能属于该域。
5、 添加共享资源
共享gfs文件系统,其他共享资源创建与此一样
6、 将共享的资源添加到服务中
7、 查看和编辑配置文件(/etc/cluster/cluster.conf)
[root@node1 ~]# vim /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="cluster8" config_version="9" name="cluster8">
#每修改一次cluster.conf,必须将confg_version的数字增加1
<clusternode name="node1hb" nodeid="1" votes="5">
<fence/>
</clusternode>
<clusternode name="node3hb" nodeid="2" votes="5">
<fence/>
</clusternode>
<clusternode name="node2hb" nodeid="3" votes="5">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="13"/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="httpd_fail" nofailback="0" ordered="1" restricted="1">
<failoverdomainnode name="node1hb" priority="1"/>
<failoverdomainnode name="node3hb" priority="10"/>
<failoverdomainnode name="node2hb" priority="5"/>
</failoverdomain>
</failoverdomains>
<resources>
<clusterfs device="/dev/vg01/lv01" force_unmount="0" fsid="34218" fstype="gfs" mountpoint="/var/www/ht
ml" name="httpd_files" self_fence="0"/>
<ip address="192.168.32.21" monitor_link="0"/>
<script file="/etc/init.d/httpd" name="httpd"/>
</resources>
<service autostart="1" domain="httpd_fail" exclusive="1" name="httpd_srv" recovery="relocate">
<clusterfs fstype="gfs" ref="httpd_files"/>
<ip ref="192.168.32.21"/>
<script ref="httpd"/>
</service>
</rm>
<quorumd interval="5" label="cluster8_qdisk" min_score="1" tko="10" votes="10">
<heuristic interval="5" program="ping -c1 -t1 192.168.32.254" score="1"/>
</quorumd>
</cluster>
[root@node1 ~]# ccs_tool update /etc/cluster/cluster.conf
Config file updated from version 9 to 10
Update complete.
#更新cluster版本
四、 启动服务并测试
1、 开关闭服务
[root@node1 ~]# clusvcadm -R httpd_srv
Local machine trying to restart service:httpd_srv...Success
#启动httpd_srv服务
[root@node1 ~]# clusvcadm -s httpd_srv
Local machine stopping service:httpd_srv...Success
#关闭httpd_srv服务
2、检测cluster状态
[root@node1 ~]# clustat -l
Cluster Status for cluster8 @ Mon Jan 16 15:45:19 2012
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1hb 1 Online, Local, rgmanager
node3hb 2 Online, rgmanager
node2hb 3 Online, rgmanager
/dev/disk/by-id/scsi-1IET_00010001 0 Online, Quorum Disk
Service Information
------- -----------
Service Name : service:httpd_srv
Current State : started (112)
Flags : none (0)
Owner : node1hb
Last Owner : node1hb
Last Transition : Mon Jan 16 15:44:43 2012
#clustat -i 3 :每三秒刷新下一次显示cluster状态
[root@node1 ~]# ccs_test connect
Connect successful.
Connection descriptor = 36960
[root@node1 ~]# ccs_tool lsnode
Cluster name: cluster8, config_version: 9
Nodename Votes Nodeid Fencetype
node1hb 5 1
node3hb 5 2
node2hb 5 3
[root@node1 ~]# cman_tool services
type level name id state
fence 0 default 00010003 none
[1 2 3]
dlm 1 clvmd 00010001 none
[1 2 3]
dlm 1 rgmanager 00020003 none
[1 2 3]
dlm 1 gfslv01 00030001 none
[1]
gfs 2 gfslv01 00020001 none
[1]