两台IBM X3650M3, 一台DS3400光纤存储。
操作系统: CentOS 5.9 x64,安装有图形界面,开发包,开发库,老的软件开发工具。
为了避免环境干扰,关闭selinux, 关闭防火墙,本文出自:http://koumm.blog.51cto.com
说明:IBM服务器采用ipmi lan方式实现内部fence设备,需要将专用IMM2口或标注有SYSTEM MGMT网口接入交换机, 与本地IP地址同段。
主机名: node1
ipmi地址: 10.10.10.85/24
eth1: 192.168.233.83/24
eth1:0 10.10.10.87/24
IBM服务器需要将专用IMM2口或标注有SYSTEM MGMT网口接入交换机。
主机名: node2
ipmi地址: 10.10.10.86/24
eth1: 192.168.233.84/24
eth1:0 10.10.10.88/24
# cat /etc/hosts
192.168.233.83 node1
192.168.233.84 node2
192.168.233.90 vip
10.10.10.85 node1_ipmi
10.10.10.86 node2_ipmi
# mount /dev/cdrom /mnt # mount -o loop centos59.iso /mnt
说明: 通过本地光盘做为yum安装源。
# vi /etc/yum.repos.d/centos59.repo
[centos59]
name=Centos59
baseurl=file:///mnt/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
[root@node1 ~]# dmidecode |grep -C 6 IPMI |tail -n 7
IPMI Device Information
Interface Type: KCS (Keyboard Control Style)
Specification Version: 2.0
I2C Slave Address: 0x10
NV Storage Device: Not Present
Base Address: 0x0000000000000CA2 (I/O)
Register Spacing: Successive Byte Boundaries
# yum install OpenIPMI OpenIPMI-devel OpenIPMI-tools OpenIPMI-libs
# service ipmi start
# chkconfig ipmi on
(3) 进行IPMI的基本网络配置:以下指令分别配置了IP地址、掩码、网关、允许进入开关。
node1,node2 IPMI接口IP配置
原始IPMI地址配置如下:
ipmitool lan set 1 ipaddr 10.10.10.86
ipmitool lan set 1 netmask 255.255.255.0
#ipmitool lan set 1 defgw ipaddr 10.10.10.254
ipmitool lan set 1 access on
ipmitool lan print 1
# 显示当前用户列表
ipmitool user list 1
ipmitool user set password 2 passwd
说明:2为用户UID,这里使用IBM默认用户名与密码USERID/PASSW0RD
查看本机状态
[root@node1 ~]# ipmitool -H 10.10.10.85 -U USERID -P PASSW0RD power status
Chassis Power is on
查看节点2状态
[root@node1 ~]# ipmitool -H 10.10.10.86 -U USERID -P PASSW0RD power status
Chassis Power is on
远程重启节点2服务器
[root@node1 ~]# ipmitool -H 10.10.10.86 -U USERID -P PASSW0RD power reset
Chassis Power is on
正常返回结果会是:Chassis Power is on
其它测试命令如开机,关机,重启,如下命令可以用于远程管理。
ipmitool -H 10.10.10.86 -U USERID -P PASSW0RD power on
ipmitool -H 10.10.10.86 -U USERID -P PASSW0RD power off
ipmitool -H 10.10.10.86 -U USERID -P PASSW0RD power reset
node1, node2 挂载了一个1TB的磁盘, 集群文件系统,100M仲裁盘
node1上WWN号查看,HBA卡安装卡槽位置随各机而异。
cat /sys/class/fc_host/host5/port_name 0x10000000c9a55a09
node2上WWN号查看
cat /sys/class/fc_host/host5/port_name 0x10000000c9a56308
本文采用IBM DS3400存储带内管理方式连接IBM存储,全程远程配置。
# tar zxvf SM10.70_Linux_Single-10.70.x5.25.tgz
# cd Linux10p70_single
# cd Linux/
# ls
SMagent-LINUX-10.02.A5.08-1.i386.rpm SMesm-LINUX-10.70.G5.07-1.noarch.rpm SMutil-LINUX-10.00.A5.16-1.i386.rpm
SMclient-LINUX-10.70.G5.25-1.noarch.rpm SMruntime-LINUX-10.70.A5.00-1.i586.rpm
#
# rpm -ivh SMclient-LINUX-10.70.G5.25-1.noarch.rpm SMruntime-LINUX-10.70.A5.00-1.i586.rpm SMesm-LINUX-10.70.G5.07-1.noarch.rpm Preparing... ########################################### [100%] 1:SMesm ########################################### [ 33%] 2:SMruntime ########################################### [ 67%] 3:SMclient ########################################### [100%] SMmonitor started. #
# cd /opt/IBM_DS/agent
#
4)启动存储客户端连接存储进行配置管理(图形界面,可开启VNC进行配置),终端窗口中操作如下。
# cd /opt/IBM_DS/client # ./SMclient
# rdac-LINUX-09.03.0C05.0652-source.tar # tar zxvf rdac-LINUX-09.03.0C05.0331-source.tar.gz # chmod -R +x /tmp/linuxrdac-09.03.0C05.0331# cd linuxrdac-09.03.0C05.0652 # make clean # make # make install
# reboot
192.168.233.83 node1 (管理机) 192.168.233.84 node2
安装 ricci、rgmanager、gfs、cman
(1) node1(管理节点)安装RHCS软件包,luci是管理端软件包,只在管理端安装。
yum install luci ricci cman cman-devel gfs2-utils rgmanager system-config-cluster -y
chkconfig luci on chkconfig ricci on chkconfig rgmanager on chkconfig cman on service rgmanager start service ricci start service cman start
[root@node1 cluster]# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
/usr/sbin/cman_tool: ccsd is not running
[FAILED]
yum install ricci cman cman-devel gfs2-utils rgmanager system-config-cluster -y
chkconfig ricci on chkconfig rgmanager on chkconfig cman on service rgmanager start service ricci start service cman start
[root@node2 cluster]# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... failed
/usr/sbin/cman_tool: ccsd is not running
[FAILED]
***************************************
Starting cluster:
Loading modules... failed
FATAL: Module lock_dlm not found. [FAILED]
这是因为还没有加入集群没有产生配置文件/etc/cluster/cluster.conf
***************************************
配置故障一例:因手动指定多播地址,造成集群节点之后无法通迅。
Aug 20 14:55:17 node1 ccsd[9977]: Cluster is not quorate. Refusing connection.
Aug 20 14:55:17 node1 ccsd[9977]: Error while processing connect: Connection refused
Aug 20 14:55:17 node1 ccsd[9977]: Cluster is not quorate. Refusing connection.
Aug 20 14:55:17 node1 ccsd[9977]: Error while processing connect: Connection refused
***************************************
说明:在管理节点上进行操作。
# luci_admin init
Initializing the luci server
Creating the 'admin' user
Enter password:
Confirm password:
Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized
# service luci start
https://192.168.233.83:8084
admin/111111
登录进管理界面,点击cluster->Create a New Cluster->填入如下内容:
Cluster Name: rhcs
选中如下选项,然后提交,集群会经过install,reboot,config,join两步过程才能成功。
Use locally installed packages.
Enable shared storage support
check if node passwords are identical
说明:
(1) 这步会生成集群配置文件/etc/cluster/cluster.conf
(2) 也可以直接创建该配置文件。
分别ssh到 node1,node2上,再次启动cman服务。
说明:
RHCS要实现完整的集群功能,必须要实现fence功能。
正是由于有了fence设备可以使用,才得以完整测试RHCS HA功能。
(1)登录进管理界面,点击cluster-> Cluster List->
(2)分别选择node1,node2进行如下操作:
(3)选择"Add a fence device to this level",选择IPMI LAN。
Failover Domains -> Add
名字:rhcs_failover
勾选Prioritized,
No Failback具体情况自己设定;
勾选两台节点,
设定其优先级。
点击提交。
node1节点上:
node2节点上:直接在节点2上就看到了LVM配置信息
node1节点上,只用在一个节点上操作就可以了。
说明:
rhcs:gfs2这个rhcs就是集群的名字,gfs2是定义的名字,相当于标签吧。
-j是指定挂载这个文件系统的主机个数。
node1,node2 上创建GFS挂载点
# mkdir /cluster
# mount.gfs2 /dev/vg/cluster /cluster
# vi /etc/fstab /dev/mapper/vg-cluster /cluster gfs2 defaults 0 0
[root@node1 ~]# mkdir /cluster [root@node1 ~]# mount.gfs2 /dev/vg/cluster /cluster [root@node1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 107G 9.2G 93G 10% / /dev/sda1 99M 17M 77M 18% /boot tmpfs 12G 0 12G 0% /dev/shm /dev/mapper/vg-cluster 1.0T 389M 1.0T 1% /cluster [root@node1 ~]#
[root@node2 ~]# mount.gfs2 /dev/vg/cluster /cluster [root@node2 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 107G 9.2G 93G 10% / /dev/sda1 99M 17M 77M 18% /boot tmpfs 12G 0 12G 0% /dev/shm /dev/mapper/vg-cluster 1.0T 389M 1.0T 1% /cluster [root@node2 ~]#
说明: #表决磁盘是共享磁盘,无需要太大,本例采用/dev/sdc1 100M来进行创建。
[root@node1 ~]# fdisk -l Disk /dev/sda: 145.9 GB, 145999527936 bytes 255 heads, 63 sectors/track, 17750 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 83 Linux /dev/sda2 14 17750 142472452+ 8e Linux LVM Disk /dev/sdb: 1099.5 GB, 1099511627776 bytes 255 heads, 63 sectors/track, 133674 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 133674 1073736373+ 8e Linux LVM Disk /dev/sdc: 104 MB, 104857600 bytes 4 heads, 50 sectors/track, 1024 cylinders Units = cylinders of 200 * 512 = 102400 bytes Device Boot Start End Blocks Id System /dev/sdc1 1 1024 102375 83 Linux [root@node1 ~]#
[root@node1 ~]# mkqdisk -L
mkqdisk v0.6.0
/dev/disk/by-id/scsi-3600a0b80007573a5000006f153e7e43c-part1:
/dev/sdc1:
Magic: eb7a62c2
Label: qdisk
Created: Tue Aug 19 22:09:48 2014
Host: node1
Kernel Sector Size: 512
Recorded Sector Size: 512
[root@node1 ~]#
[root@node2 cluster]# mkqdisk -L
mkqdisk v0.6.0
/dev/disk/by-id/scsi-3600a0b80007573a5000006f153e7e43c-part1:
/dev/sdc1:
Magic: eb7a62c2
Label: qdisk
Created: Tue Aug 19 22:09:48 2014
Host: node1
Kernel Sector Size: 512
Recorded Sector Size: 512
[root@node2 cluster]#
(3) 配置表决磁盘qdisk,IP通常采用网关地址,要能够ping通,一定要注意网关是否禁ping。
# 进入管理界面 cluster-> cluster list -> 点击Cluster Name: rhcs;
# 选择"Quorum Partition",选择"use a Quorum Partition"
interval : 2
votes : 2
TKO : 10
Minimum Score: 1
Device : /dev/sdc1
Path to program : ping -c3 -t2 192.168.233.254
Interval : 3
Score : 2
# 点击apply
chkconfig qdiskd on service qdiskd start
<?xml version="1.0"?> <cluster alias="rhcs" config_version="12" name="rhcs"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="node2" nodeid="1" votes="1"> <fence> <method name="1"> <device name="node2_ipmi"/> </method> </fence> </clusternode> <clusternode name="node1" nodeid="2" votes="1"> <fence> <method name="1"> <device name="node1_ipmi"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_ipmilan" auth="PASSWORD" ipaddr="10.10.10.85" login="USERID" name="node1_ipmi" passwd="PASSW0RD"/> <fencedevice agent="fence_ipmilan" auth="PASSWORD" ipaddr="10.10.10.86" login="USERID" name="node2_ipmi" passwd="PASSW0RD"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="rhcs_failover" nofailback="0" ordered="1" restricted="0"> <failoverdomainnode name="node2" priority="2"/> <failoverdomainnode name="node1" priority="1"/> </failoverdomain> </failoverdomains> <resources/> </rm> <quorumd device="/dev/sdc1" interval="2" min_score="1" tko="10" votes="2"> <heuristic interval="3" program="ping -c3 -t2 192.168.233.254" score="2"/> </quorumd> <cman expected_votes="4"/> </cluster>