RHCS是RedHatClusterSuite的缩写,也就是红帽子集群套件,RHCS是一个能够提供高可用性、高可靠性、负载均衡、存储共享且经济廉价的集群工具集合,它将集群系统中三大集群架构融合一体,可以给web应用、数据库应用等提供安全、稳定的运行环境。
特点:可以100多台机器作为节点
服务不需要集群软件
共享存储是有用的,不是必须的
网络电源切换需要为多机器服务
红帽集群套件进程:
这是RHCS集群的一个基础套件,提供一个集群的基本功能,使各个节点组成集群在一起工作,具体包含分布式集群管理器(CMAN)、成员关系管理、锁管理(DLM)、配置文件管理(CCS)、栅设备(FENCE)。
集群配置系统:ccsd(cluster configure system)
高可用管理:aisexec:集群建通信、成员管理
Rgmanager:集群资源管理
共享存储:dlm:分布式锁机制
部署:luci和system-config-cluster方式配集群
配置文件:
/etc/cluster/cluster.conf
1、配置centos源
如果可以联网,我们可以使用centsos源来进行yum安装
#vi /etc/yum.conf
[base]
name=Red Hat Enterprise Linux $releasever -Base
baseurl=http://ftp.twaren.net/Linux/CentOS/5/os/$basearch/
gpgcheck=1
[update]
name=Red Hat Enterprise Linux $releasever -Updates
baseurl=http://ftp.twaren.net/Linux/CentOS/5/updates/$basearch/
gpgcheck=1
[extras]
name=Red Hat Enterprise Linux $releasever -Extras
baseurl=http://ftp.twaren.net/Linux/CentOS/5/extras/$basearch/
gpgcheck=1
[addons]
name=Red Hat Enterprise Linux $releasever -Addons
baseurl=http://ftp.twaren.net/Linux/CentOS/5/addons/$basearch/
gpgcheck=1
Yum install scsi-target*
RHA5上用yum安装程序时候,出现下面的异常问题:
Is this ok [y/N]: y
Downloading Packages:
warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID e8562897
RHA5
解决办法:
rpm --import http://centos.ustc.edu.cn/centos/RPM-GPG-KEY-CentOS-5
如果是低版本,可以使用
rpm --import http://centos.ustc.edu.cn/centos/RPM-GPG-KEY-centos4
如果本地没有联网,则可以配置一个本地光盘yum源
#vi /etc/yum.conf
[Base]
name=RHEL5 ISO Base
baseurl=file:///media/cdrom/Server
enabled=1
gpgcheck=0
[Cluster]
name=RHEL5 ISO Cluster
baseurl=file:///media/cdrom/Cluster
enabled=1
gpgcheck=0
2.2部署集群软件
Yum install ricci
Chkconfig ricci on
Service ricci restart
Yum install luci 配置管理节软件luci
[root@localhost iscsi]# luci_admin init 设置luci密码
Initializing the luci server
Creating the 'admin' user
Enter password:
Confirm password:
Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized
You must restart the luci server for changes to take effect.
Run "service luci restart" to do so
#chkconfig luci on
#service luci restart
[root@localhost iscsi]# service luci restart
Shutting down luci: [确定]
Starting luci: Generating https SSL certificates... done
[确定]
Point your web browser to https://ipap.128:8084 to access luci
2.3配置集群套件
https://管理ip:8084
集群-》创建新集群,输入主机名和密码
单击提交,它就开始在节点进行软件安装和部署,成功后再集群列表里便可以看到创建的集群,并可以进行集群的开启和停止与解散。
如果出现如下错误,则确认hosts文件和防火墙与selinux是否关闭。
#chkconfig rgmanager on 这是一个可选组件,开启资源管理
#service rgmanager restart
切换到集群的节点,可以看到几个功能框
Fence是一个虚拟的,分别在集群下面配置自己的fence
Add a failover domain,设置优先级,如果优先级高的会优先,一般设为一样,不让随便跑。
Add a sharable fence device,选择一个fence卡,切换机制。
2.4配置集群服务
配置一个ip:集群-》增加一个集群ip
添加资源:输入资源名,路径和磁盘为同一磁盘
配置服务:集群-》配置一个服务
选择服务名,选择切换域,选择轮训策略,这里选择relocate轮训。
通过单击add resource to this service,选择我们刚刚建立的资源,配置完毕在服务页开启开启服务。
2.5 集群测试
Ping 192.168.68.130 虚拟的出的服务ip
#clustat -i 1 查看集群状态
Cluster Status for cluster1 @ Mon Dec 16 15:35:51 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
192.168.68.128 1 Online, Local, rgmanager
192.168.68.129 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:httpd 192.168.68.128 started
#ip addr list 查看ip状态
[root@localhost rc.d]# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: peth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
我们可以通过图形界面更改服务ip
#Clustat -i 1
Cluster Status for cluster1 @ Mon Dec 16 15:39:16 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
192.168.68.128 1 Online, Local, rgmanager
192.168.68.129 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:httpd 192.168.68.129 started
这是一个早期的RHCS配置工具,我们首先再管理节点
Yum install sys-config-cluster安装该工具
Yum install cman
Yum install rmanger
然后cp /etc/cluseter/cluster.conf 到所有节点
Chkconfig cman on
Chkconfig rgmanger on
Service cman start
Service rgmanager start
然后sys-config-clusetr开始配置即可
RHCS可以通过iscsi使用共享存储,提高磁盘容量和工作效率。
Udev是一个策略,它使得linux对磁盘的识别变得简单。
Iscsi使用tcp/ip协议,可以挂载共享存储。
Yum install scsi-target*
Chkconfig tgtd on
Service tgtd start
root@localhost yum.repos.d]# fdisk /dev/hdb 磁盘分区
The number of cylinders for this disk is set to 8322.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): n
Command action
l logical (5 or over)
p primary partition (1-4)
l
First cylinder (6403-8322, default 6403):
Using default value 6403
Last cylinder or +size or +sizeM or +sizeK (6403-8322, default 8322): +500MB
Command (m for help): p
Disk /dev/hdb: 4294 MB, 4294967296 bytes
16 heads, 63 sectors/track, 8322 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Device Boot Start End Blocks Id System
/dev/hdb1 1 8322 4194256+ 5 Extended
/dev/hdb5 1 195 98217 8e Linux LVM
/dev/hdb6 196 390 98248+ 8e Linux LVM
/dev/hdb7 391 585 98248+ 8e Linux LVM
/dev/hdb8 586 2524 977224+ 8e Linux LVM
/dev/hdb9 2525 4463 977224+ 8e Linux LVM
/dev/hdb10 4464 6402 977224+ 8e Linux LVM
/dev/hdb11 6403 7372 488848+ 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
[root@localhost yum.repos.d]# partprobe /dev/hdb
[root@ipap ~]# mkfs.ext3 /dev/hdb11 格式化磁盘
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
122400 inodes, 488848 blocks
24442 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67633152
60 block groups
8192 blocks per group, 8192 fragments per group
2040 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@ipap ~]# mkfs.gfs2 -p lock_dlm -t cluster1:my-gfs2 -j 4 /dev/hdb11
This will destroy any data on /dev/hdb11.
It appears to contain a ext3 filesystem.
Are you sure you want to proceed? [y/n] y
3.2 创建挂载目标
创建挂载磁盘
#tgtadm --lld iscsi --op new --mode target --tid 1 -T ipap.2013-12.disk1
分配逻辑单元
#tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/hdb11
分配权限
tgtadm --lld iscsi --op bind --mode target --tid 1 -I 192.168.68.129
写入开机脚本
history |tail -n 4 >>/etc/rc.d/rc.local
Vi /etc/rc.d/rc.local 修改开机脚本
显示配置文件
[root@localhost yum.repos.d]# tgtadm --lld iscsi --mode target --op show
Target 1: ipap.2013-12.disk1
System information:
Driver: iscsi
State: ready
I_T nexus information:
LUN information:
LUN: 0
Type: controller
SCSI ID: IET 00010000
SCSI SN: beaf10
Size: 0 MB, Block size: 1
Online: Yes
Removable media: No
Readonly: No
Backing store type: null
Backing store path: None
Backing store flags:
LUN: 1
Type: disk
SCSI ID: IET 00010001
SCSI SN: beaf11
Size: 501 MB, Block size: 512
Online: Yes
Removable media: No
Readonly: No
Backing store type: rdwr
Backing store path: /dev/hdb11
Backing store flags:
Account information:
ACL information:
192.168.68.129
You have new mail in /var/spool/mail/root
授权模式
/etc/iscsi/iscsid.conf配置文件
#node.session.auth.authmethod = CHAP
#node.session.auth.username = username
#node.session.auth.password = password
[root@localhost iscsi]# chkconfig iscsi on
[root@localhost iscsi]# service iscsi restart
Stopping iSCSI daemon: iscsiadm: can not connect to iSCSI daemon (111)!
iscsiadm: initiator reported error (20 - could not connect to iscsid)
iscsiadm: Could not stop iscsid. Trying sending iscsid SIGTERM or SIGKILL signals manually
iscsid 已停 [确定]
Turning off network shutdown. Starting iSCSI daemon: [确定]
[确定]
设置 iSCSI 目标:iscsiadm: No records found!
[确定]
rpm -ivh iscsi-initiator-utils-6.2.0.871-0.10.el5.x86_64.rpm
发现介质
[root@localhost ~]# iscsiadm -m discovery -t sendtargets -p 192.168.68.128:3260192.168.68.128:3260,1 ipap.2013-12.disk1
挂载介质
[root@localhost ~]# iscsiadm -m node -T ipap.2013-12.disk1 -p 192.168.68.128:3260 -l
Logging in to [iface: default, target: ipap.2013-12.disk1, portal: 192.168.68.128,3260]
Login to [iface: default, target: ipap.2013-12.disk1, portal: 192.168.68.128,3260]: successful
[root@localhost ~]# fdisk -l
Disk /dev/hda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 83 Linux
/dev/hda2 14 1044 8281507+ 8e Linux LVM
Disk /dev/sda: 500 MB, 500580864 bytes
16 heads, 60 sectors/track, 1018 cylinders
Units = cylinders of 960 * 512 = 491520 bytes
Disk /dev/sda doesn't contain a valid partition table
删除介质
[root@localhost ~]# iscsiadm -m node -T ipap.2013-12.disk1 -p 192.168.68.128:3260 -u
[root@localhost ~]#Iscsiadm -m node -o delete -T ipap.2013-12.disk1 -p 192.168.68.128:3260
3.4 RHCS使用共享介质
在一个多节点的RHCS集群系统中,一个节点失败后,集群的服务和资源可以自动转移到其它节点上,但是这种转移是有条件的,例如,在一个四节点的集群中,一旦有两个节点发生故障,整个集群系统将会挂起,集群服务也随即停止,而如果配置了存储集群GFS文件系统,那么只要有一个节点发生故障,所有节点挂载的GFS文件系统将hung住。此时共享存储将无法使用,这种情况的出现,对于高可用的集群系统来说是绝对不允许的,解决这种问题就要通过表决磁盘来实现了。
mkdisk是一个集群仲裁磁盘工具集,可以用来创建一个qdisk
共享磁盘也可以查看共享磁盘的状态信息。mkqdisk操作只能创建16
个节点的投票空间,Heuristics就是这么一个扩充选项,它允许通过第三方应用程序来辅助定位节点状态,常用的有ping网关或路由,或者通过脚本程序等,如果试探失败,qdiskd会认为此节点失败,进而试图重启此节点,以使节点进入正常状态。
待续……