Corosync+Pacemaker+Ldirectord+Lvs+Httpd
一、硬件环境
4台虚拟机在同一网段
操作系统:centos6.3
关闭系统不必要的服务脚本
#!/bin/bash services=`chkconfig --list|cut -f1|cut -d" " -f1` for ser in $services do if [ "$ser" == "network" ] || [ "$ser" == "rsyslog" ] || [ "$ser" == "sshd" ] || [ "$ser" == "crond" ] || [ "$ser" == "atd" ]; then chkconfig "$ser" on else chkconfig "$ser" off fi done reboot
二、ip地址规划
master 172.30.82.45 slave 172.30.82.58 node1 172.30.82.3 node2 172.30.82.11 VIP 172.30.82.61
三、注意:
1、设置各个节点间的时间同步
ntpdate 172.30.82.254 &>/dev/null
2、基于hosts文件实现能够互相用主机名访问,修改/etc/hosts文件
3、使用uname -n执行结果要和主机名相同
4、确保ldirectord服务关闭开机启动
chkconfig ldirectord off
5、关闭selinux
setenfroce 0
四、相关软件下载及安装
从pacemaker1.1.8开始,crm发展成了一个独立项目,叫crmsh。也就是说,我们安装了pacemaker后,并没有crm这个命令,我们要实现对集群资源管理,还需要独立安装crmsh
pssh-2.3.1-4.1.x86_64.rpm crmsh-2.1-1.6.x86_64.rpm python-pssh-2.3.1-4.1.x86_64.rpm下载地址 http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/ libdnet 下载地址: http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/letter_l.group.html ldirectord-3.9.6-0rc1.1.1.x86_64.rpm 下载地址: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/ yum install corosync pacemaker libesmtp –y yum install -y python-dateutil python-lxml redhat-rpm-config cluster-glue cluster-glue-libs resource-agents yum --nogpgcheck localinstall pssh-2.3.1-4.1.x86_64.rpm crmsh-2.1-1.6.x86_64.rpm python-pssh-2.3.1-4.1.x86_64.rpm ldirectord-3.9.6-0rc1.1.1.x86_64.rpm
五、配置director节点的高可用
1、拷贝配置文件 cp corosync.conf.example corosync.conf cp /usr/share/doc/ldirectord-3.9.6/ldirectord.cf /etc/ha.d/ 2、生成autokeys文件 corosync-keygen 3、修改corosync.conf totem { version: 2 secauth: off #是否开启秘钥认证 threads: 0 #发送集群节点认证信息使用的进程数 interface { ringnumber: 0 #为避免冗余环路设定的所在的网络接口 bindnetaddr: 172.30.82.0 #集群所在网络 mcastaddr: 239.238.16.1 #集群通告组播地址 mcastport: 5405 #服务端口 ttl: 1 } } logging { fileline: off #日志是否打印行号 to_stderr: no #是否输出标准错误(到显示器) to_logfile: yes #定义日志 logfile: /var/log/corosync.log to_syslog: no #是否开启系统日志 debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { #服务启动时启动pacemaker ver: 0 name: pacemaker } 4、修改ldirectord配置文件ldirectord.cf checktimeout=3 # 检测超时 checkinterval=1 # 检测间隔 autoreload=yes # 从新载入客户机 logfile="/var/log/ldirectord.log" # 日志路径 logfile="local0" quiescent=no # realserver 宕机后从lvs列表中删除,恢复后自动添加进列表 virtual=172.30.82.61:80 # 监听VIP地址80端口 real=172.30.82.3:80 gate # 真机IP地址和端口 路由模式 real=172.30.82.11:80 gate fallback=127.0.0.1:80 gate # 如果real节点都宕机,则回切到环回地址 service=http # 服务是http request=".text.html" # 保存在real的web根目录并且可以访问,通过它来判断real是否存活 receive="OK" # 检测文件内容 scheduler=rr # 调度算法 protocol=tcp # 检测协议 checktype=negotiate # 检测类型 checkport=80 # 检测端口 5、复制配置文件到备用节点: scp -P authkeys corosync.conf ldirectord.cf slave:/etc/ha.d/
六、DR模型下配置realserver脚本:
#!/bin/bash VIP=172.30.82.61 host=`/bin/hostname` case "$1" in start) # Start LVS-DR real server on this machine. /sbin/ifconfig lo down /sbin/ifconfig lo up echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce /sbin/ifconfig lo:0 $VIP netmask 255.255.255.255 up /sbin/route add -host $VIP dev lo:0 ;; stop) # Stop LVS-DR real server loopback device(s). /sbin/ifconfig lo:0 down echo "0" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "0" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "0" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "0" >/proc/sys/net/ipv4/conf/all/arp_announce ;; status) # Status of LVS-DR real server. islothere=`/sbin/ifconfig lo:0 | grep $VIP` isrothere=`netstat -rn | grep "lo" | grep $VIP` if [ ! "$islothere" -o ! "$isrothere" ];then # Either the route or the lo:0 device # not found. echo "LVS-DR real server is stopped." else echo "LVS-DR real server is running." fi ;; *) # Invalid entry. echo "$0: Usage: $0 {start|status|stop}" exit 1 ;; esac
七、real上安装httpd服务并添加测试页面
1、node1 yum install -y httpd echo "Welcome to realserver 1" >/var/www/html/index.html echo "OK" >/var/www/html/.text.html service httpd start 2、node2 yum install -y httpd echo "Welcome to realserver 2" >/var/www/html/index.html echo "OK" >/var/www/html/.text.html service httpd start
八、开启、配置并测试高可用集群服务
1、在master上执行 service corosync start ssh slave 'service corosync start' 注意:启动node2需要在node1上使用如上命令进行,不要在node2节点上直接启动; 查看corosync引擎是否正常启动 [root@master corosync]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/corosync.log May 19 23:11:05 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055. May 19 23:11:46 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. May 19 23:11:46 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf' 查看初始化成员节点通知是否正常发出: [root@master corosync]# grep TOTEM /var/log/corosync.log May 19 19:59:44 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). May 19 19:59:44 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). May 19 19:59:44 corosync [TOTEM ] The network interface [172.30.82.45] is now up. May 19 19:59:44 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. 检查启动过程中是否有错误产生: [root@master corosync]# # grep ERROR: /var/log/corosync.log 查看pacemaker是否正常启动: May 19 23:11:46 corosync [pcmk ] info: pcmk_startup: CRM: Initialized May 19 23:11:46 corosync [pcmk ] Logging: Initialized pcmk_startup May 19 23:11:46 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 May 19 23:11:46 corosync [pcmk ] info: pcmk_startup: Service: 9 May 19 23:11:46 corosync [pcmk ] info: pcmk_startup: Local hostname: master 使用如下命令查看集群节点的启动状态: [root@master corosync]# crm status Last updated: Wed May 20 00:10:38 2015 Last change: Tue May 19 22:49:50 2015 Stack: classic openais (with plugin) Current DC: slave - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ master slave ] 2、配置集群资源,这里需要配置2个基本资源1个组资源 a、配置vip crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=172.30.82.61 nic=eth0 cidr_netmask=24 b、配置ldirectord服务资源 crm(live)configure#priimitive ldir lsb:ldirectord c、配置组资源,组资源将基本资源定义在同一台服务器上运行,默认情况集群资源会均衡运行在集群中各个节点 crm(live)configure#group lvsserver vip ldir d、不用定义组,可以通过资源粘性及资源约束来也可定义资源的倾向性,这里只是举例: 顺序约束:资源的启动顺序 crm(live)configure# order vip_before_ldir mandatory: vip ldir 排列约束:哪些资源运行在一起 crm(live)configure# colocation ldir_with_vip inf: vip ldir 位置约束:资源更倾向运行在那个节点上 crm(live)configure# location vip_on_mater vip rule 100: #uname eq node1 e、其他的一些配置 禁用stonith设备 crm(live)configure# property stonith-enabled=false 设定集群未到达法定票数的工作机制为忽略,因为只有两台服务器只能选此项 crm(live)configure#no-quorum-policy=ignore corosync的框架、运行原理、配置命令说明需自行研究,这里倾向于环境搭建及测试 查看集群配置信息库: crm(live)configure#show node master node slave primitive ldir lsb:ldirectord primitive vip IPaddr \ params ip=172.30.82.61 nic=eth0 cidr_netmask=24 group lvsserver vip ldir property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore 验证配置语法: crm(live)configure# verify 不报错即提交固化配置: crm(live)configure# commit 3、测试集群服务,客户端访问172.30.82.61 a、master 上执行: [root@master corosync]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.30.82.61:80 rr -> 172.30.82.3:80 Route 1 0 13 -> 172.30.82.11:80 Route 1 0 14 b、slave 上执行: [root@slave ha.d]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn 说明集群资源只运行在master上 4、集群资源转移测试 a、master上执行 service corosync stop [root@master log]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn b、在slave上执行 [root@slave log]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.30.82.61:80 rr -> 172.30.82.3:80 Route 1 1 17 -> 172.30.82.11:80 Route 1 0 18 说明集群资源转移成功 c、后端服务故障检测node1上执行 service httpd stop 查看master集群服务 [root@master log]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.30.82.61:80 rr -> 172.30.82.11:80 Route 1 0 0 恢复node1服务 service httpd start [root@master log]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.30.82.61:80 rr -> 172.30.82.11:80 Route 1 0 -> 172.30.82.3:80 Route 1