Corosync+Pacemaker+NFS+Httpd高可用web集群(使用资源管理工具pcs配置)
框架:pcs(Corosync+pacemaker)+nfs+httpd
集群节点1:192.168.88.132 cen7.field.com
集群节点2:192.168.88.133 node2.field.com
集群节点3:192.168.88.134 node1.field.com
vip: 192.168.88.188 资源代理:ocf:heartbeat:IPaddr
nfs服务器:node1.field.com 资源代理:ocf:heartbeat:Filesystem
web服务器:cen7.field.com node2.field.com node1.field.com 资源代理:systemd:httpd
配置集群的前提
(1)、时间同步;
(2)、.基于当前正在使用的主机名互相访问;
(3)、是否会用到仲裁设备;
配置集群的前提:参考《使用资源管理工具pcs配置Corosync+pacemaker》
配置nfs服务器:参考《Corosync+pacemaker+nfs+httpd高可用web集群》(使用资源管理工具crmsh配置)
各节点安装配置httpd:参考《Corosync+pacemaker+nfs+httpd高可用web集群》(使用资源管理工具crmsh配置)
此处不做赘述
一、启动corosync和pacemaker
1、使用crm交互模式:删除此前定义的资源
[root@cen7 corosync]# crm configure
crm(live)configure# edit
ERROR: Cannot delete running resources: webip, webserver, webstore
Edit or discard changes (yes to edit, no to discard) (y/n)? n
该报错显示:不能删除正在运行的资源,删除资源之前需要先停止资源
# resource资源管理子命令:所有的资源都在这个子命令后定义
crm(live)configure# cd
crm(live)# cd resource
crm(live)resource# help
#命令用法可以使用“help [command]”查看
crm(live)resource# help stop
Usage:
stop
#关闭此前定义的资源:webip、webserver、webstore
crm(live)resource# stop webip webserver webstore
#删除此前定义的资源:delete
crm(live)resource# cd ../configure
crm(live)configure# delete webip webserver webstore
INFO: hanging order:webserver_after_webstore deleted
INFO: constraint colocation:webserver_with_webstore_and_webip updated
INFO: hanging order:webstore_after_webip deleted
INFO: hanging colocation:webserver_with_webstore_and_webip deleted
INFO: hanging location:webservice_prefer_cen7 deleted
crm(live)configure# verify
ERROR: Warnings found during check: config may not be valid
注意:如果有无意义的配置会报如上error,使用show查看配置,直接使用edit编辑删除无效配置:wq保存退出即可。
crm(live)configure# show
node 1: cen7.field.com \
attributes standby=off
node 2: node2.field.com
node 3: node1.field.com
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.18-11.el7_5.3-2b07d5c5a9 \
cluster-infrastructure=corosync \
stonith-enabled=false \
default-resource-stickiness=50
crm(live)configure# commit
2、确认pcs状态信息
[root@cen7 corosync]# pcs status
Cluster name:
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 10:46:11 2018
Last change: Thu Aug 2 10:45:10 2018 by root via cibadmin on cen7.field.com
3 nodes configured
0 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/enabled
pcsd: inactive/disabled
3、查看配置文件:备份使用crmsh管理时的corosync.conf文件
[root@cen7 corosync]# ls
authkey corosync.conf corosync.conf.example corosync.conf.example.udpu corosync.xml.example uidgid.d
[root@cen7 corosync]# cp corosync.conf corosync.conf.bak080215
停止pacemaker和corosync服务
[root@cen7 corosync]# ansible hacluster -m service -a 'name=pacemaker state=stopped'
[root@cen7 corosync]# ansible hacluster -m service -a 'name=corosync state=stopped'
二、安装pcs配置集群管理工具
1、安装pcs
[root@cen7 corosync]# ansible hacluster -m yum -a 'name=pcs state=latest'
2、配置pcs
1)、修改pcs安装时默认用户hacluster密码,两个节点密码设置一致。
[root@cen7 corosync]# ansible hacluster -m shell -a 'echo hacluster |passwd --stdin hacluster'
192.168.88.133 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码 。
passwd:所有的身份验证令牌已经成功更新。
192.168.88.134 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码 。
passwd:所有的身份验证令牌已经成功更新。
192.168.88.132 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码 。
passwd:所有的身份验证令牌已经成功更新。
2)、集群节点间认证
注意:认证时用户必须是pcs安装时的默认用户hacluster,否则无法认证通过
[root@cen7 corosync]# pcs cluster auth cen7.field.com node1.field.com node2.field.com -u hacluster
Password:
Error: Unable to communicate with node1.field.com
Error: Unable to communicate with cen7.field.com
Error: Unable to communicate with node2.field.com
此处认证不通过,逐一排查原因,可能是:iptables、setLinux、Firewall,centos7默认开启Firewall
此处全部认证不通过,原因是未启动pcsd服务
[root@cen7 corosync]# firewall-cmd --state
not running
[root@cen7 corosync]# ansible hacluster -m service -a 'name=pcsd state=started enabled=yes'
[root@cen7 corosync]# pcs cluster auth cen7.field.com node1.field.com node2.field.com -u hacluster
Password:
node1.field.com: Authorized
cen7.field.com: Authorized
node2.field.com: Authorized
3)、创建并启动集群
[root@cen7 corosync]# pcs cluster setup --name=mycluster cen7.field.com node1.field.com node2.field.com
Error: cen7.field.com: node is already in a cluster -->提示此前节点已经定义过集群,使用--force强制创建为新集群
Error: node1.field.com: node is already in a cluster
Error: node2.field.com: node is already in a cluster
Error: nodes availability check failed, use --force to override. WARNING: This will destroy existing cluster on the nodes.
[root@cen7 corosync]# pcs cluster setup --name=mycluster cen7.field.com node1.field.com node2.field.com --force
Destroying cluster on nodes: cen7.field.com, node1.field.com, node2.field.com...
node1.field.com: Stopping Cluster (pacemaker)...
node2.field.com: Stopping Cluster (pacemaker)...
cen7.field.com: Stopping Cluster (pacemaker)...
node2.field.com: Successfully destroyed cluster
node1.field.com: Successfully destroyed cluster
cen7.field.com: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'cen7.field.com', 'node1.field.com', 'node2.field.com'
node1.field.com: successful distribution of the file 'pacemaker_remote authkey'
node2.field.com: successful distribution of the file 'pacemaker_remote authkey'
cen7.field.com: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
cen7.field.com: Succeeded
node1.field.com: Succeeded
node2.field.com: Succeeded
Synchronizing pcsd certificates on nodes cen7.field.com, node1.field.com, node2.field.com...
node1.field.com: Success
cen7.field.com: Success
node2.field.com: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1.field.com: Success
cen7.field.com: Success
node2.field.com: Success
#由以上信息,可以看到已经配置成功
4)、查看pcs状态:显示当前节点无运行集群
[root@cen7 corosync]# pcs status
Error: cluster is not currently running on this node
三、配置Corosync
1、编辑Corosync配置文件并启动启动集群
[root@cen7 corosync]# ls
corosync.conf corosync.conf.bak080215 corosync.conf.example corosync.conf.example.udpu corosync.xml.example uidgid.d
[root@cen7 corosync]# vim corosync.conf
[root@cen7 corosync]# grep -v '^[[:space:]]*#' corosync.conf
totem {
version: 2
cluster_name: mycluster
secauth: off
transport: udpu
}
nodelist {
node {
ring0_addr: cen7.field.com
nodeid: 1
}
node {
ring0_addr: node1.field.com
nodeid: 2
}
node {
ring0_addr: node2.field.com
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}
配置各节点
各节点Corosync配置文件和认证密钥文件相同:只需将corosync.conf和authkey复制至node2、node1节点即可完成配置
[root@cen7 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/
#scp -p 保留原有权限
authkey 100% 128 75.3KB/s 00:00
corosync.conf 100% 3031 1.7MB/s 00:00
[root@cen7 corosync]# scp -p authkey corosync.conf node1:/etc/corosync/
authkey 100% 128 49.4KB/s 00:00
corosync.conf 100% 3131 1.1MB/s 00:00
2、启动所有集群节点并确认状态
[root@cen7 corosync]# pcs cluster start --all
cen7.field.com: Starting Cluster...
node1.field.com: Starting Cluster...
node2.field.com: Starting Cluster...
1)、使用“corosync-cfgtool -s”查看集群节点信息
可以看到:节点192.168.88.133/132/134均已启动成功, 状态为:ring 0 active with no faults
[root@cen7 corosync]# ansible hacluster -m shell -a 'corosync-cfgtool -s'
192.168.88.133 | SUCCESS | rc=0 >>
Printing ring status.
Local node ID 3
RING ID 0
id = 192.168.88.133
status = ring 0 active with no faults
192.168.88.134 | SUCCESS | rc=0 >>
Printing ring status.
Local node ID 2
RING ID 0
id = 192.168.88.134
status = ring 0 active with no faults
192.168.88.132 | SUCCESS | rc=0 >>
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.88.132
status = ring 0 active with no faults
2)、使用“ corosync-cmapctl”命令查看集群成员信息
[root@cen7 corosync]# corosync-cmapctl |grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.88.132)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.88.134)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.88.133)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined
3)、查看集群节点状态
[root@cen7 corosync]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:04:41 2018
Last change: Thu Aug 2 11:03:03 2018 by hacluster via crmd on node1.field.com
3 nodes configured
0 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
4)、“crm_verify -L -V”命令查看配置正确与否
[root@cen7 corosync]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
以上错误是因为默认开启stonish,配置集群的工作属性,使用以下命令禁用stonith,可消除该错误。
[root@cen7 corosync]# pcs property set stonith-enabled=false
[root@cen7 corosync]# crm_verify -L -V
四、使用pcs配置Corosync+pacemaker+nfs+httpd高可用集群
1、定义集群资源和约束
1)、定义VIP资源:
Pcs常见资源命名说明:
pcs resource create
pcs resource delete
op:指定参数
monitor 定义资源监控选项:interval(检测间隔时间)timeout(监控超时时间)
以下命令表示配置虚拟IP地址“192.168.88.188”,检测间隔时间为20s,超时时长为,10s
[root@cen7 corosync]# pcs resource create webip ocf:heartbeat:IPaddr ip="192.168.88.188" op monitor interval=20s timeout=10s
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:15:43 2018
Last change: Thu Aug 2 11:15:38 2018 by root via cibadmin on cen7.field.com
3 nodes configured
1 resource configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
webip (ocf::heartbeat:IPaddr): Starting cen7.field.com
#可以看到:webip资源已经定义成功并且启动在cen7节点上
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cen7 corosync]# pcs resource delete webip
[root@cen7 corosync]# pcs resource create webip ocf:heartbeat:IPaddr ip="192.168.88.188" op monitor interval=20s timeout=10s
2)、定义nfs文件系统资源:文件系统资源属于ocf资源代理
文件系统资源中,必须指明的参数有device(设备)、directory(挂载目录)、fstype(文件系统类型)
文件系统资源常见参数说明:
start启动资源选项:interval(间隔时间)、timeout(超时时间)
stop停止资源选项:interval(间隔时间)、timeout(超时时间)
monitor资源监控选项:interval(间隔时间)timeout(监控超时时间)
以下命令表示创建nfs文件系统资源:
将"192.168.88.134:/www/hadocs"nfs共享目录挂载到/var/www/html/目录下,启动超时时间60s,关闭超时时间60s,监控检测间隔时间为20s,超时时长为40s
[root@cen7 corosync]# pcs resource create webstore ocf:heartbeat:Filesystem device="192.168.88.134:/www/hadocs" directory="/var/www/html/"
fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:20:37 2018
Last change: Thu Aug 2 11:20:35 2018 by root via cibadmin on cen7.field.com
3 nodes configured
2 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
webip (ocf::heartbeat:IPaddr): Started cen7.field.com
webstore (ocf::heartbeat:Filesystem): Started node1.field.com
#可以看到:webstore资源已经定义成功并且启动在node1节点上
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
3)、定义httpd资源
以下命令表示定义httpd资源,监控检测间隔时间为30s,超时时长为20s
[root@cen7 corosync]# pcs resource create webserver systemd:httpd op monitor interval=30s timeout=20s
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:22:43 2018
Last change: Thu Aug 2 11:22:42 2018 by root via cibadmin on cen7.field.com
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
webip (ocf::heartbeat:IPaddr): Started cen7.field.com
webstore (ocf::heartbeat:Filesystem): Started node1.field.com
webserver (systemd:httpd): Starting node2.field.com
#可以看到:webserver资源已经定义成功并且启动在node2节点上
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
4)、定义组资源:组中资源会在同一节点运行
以下命令表示创建webservice组,包含webip、webstore、webserver资源
[root@cen7 corosync]# pcs resource group add webservice webip webstore webserver
2、确认高可用集群配置成功与否及其高可用性
1)、查看集群状态:
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:24:01 2018
Last change: Thu Aug 2 11:23:51 2018 by root via cibadmin on cen7.field.com
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started cen7.field.com
webstore (ocf::heartbeat:Filesystem): Started cen7.field.com
webserver (systemd:httpd): Started cen7.field.com
#定义组资源后,所有资源都启动在cen7节点上
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
2)、手动使当前节点转为备用状态,验证资源流转
命令说明:
pcs cluster standby
pcs cluster unstandby
[root@cen7 corosync]# pcs cluster standby cen7.field.com
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:25:13 2018
Last change: Thu Aug 2 11:25:06 2018 by root via cibadmin on cen7.field.com
3 nodes configured
3 resources configured
Node cen7.field.com: standby
Online: [ node1.field.com node2.field.com ]
Full list of resources:
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.field.com
webstore (ocf::heartbeat:Filesystem): Started node1.field.com
webserver (systemd:httpd): Starting node1.field.com
#可以看到:资源成功流转到了node1节点
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
3)、确认VIP是否流转成功:node1上已经启动vip:192.168.88.188
[root@cen7 corosync]# ansible 192.168.88.134 -m shell -a 'ip addr list |grep ens'
192.168.88.134 | SUCCESS | rc=0 >>
2: ens34:
inet 192.168.88.134/24 brd 192.168.88.255 scope global noprefixroute ens34
inet 192.168.88.188/24 brd 192.168.88.255 scope global secondary ens34
4)、确认网站页面访问是否能正常:各节点均能成功访问
[root@cen7 corosync]# ansible hacluster -m shell -a 'curl 192.168.88.188 warn=False'
192.168.88.133 | SUCCESS | rc=0 >>
hacluster page on NFS Service
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 39 100 39 0 0 19345 0 --:--:-- --:--:-- --:--:-- 39000
192.168.88.134 | SUCCESS | rc=0 >>
hacluster page on NFS Service
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 39 100 39 0 0 35583 0 --:--:-- --:--:-- --:--:-- 39000
192.168.88.132 | SUCCESS | rc=0 >>
hacluster page on NFS Service
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 39 100 39 0 0 24405 0 --:--:-- --:--:-- --:--:-- 39000
5)、将node1节点转为备用状态,验证资源流转
[root@cen7 corosync]# pcs cluster standby node1.field.com
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:40:14 2018
Last change: Thu Aug 2 11:40:02 2018 by root via cibadmin on cen7.field.com
3 nodes configured
3 resources configured
Node cen7.field.com: standby
Node node1.field.com: standby
Online: [ node2.field.com ]
Full list of resources:
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.field.com
webstore (ocf::heartbeat:Filesystem): Started node2.field.com
webserver (systemd:httpd): Starting node2.field.com
#成功流转到了node2节点上
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
6)、恢复节点为在线状态
[root@cen7 corosync]# pcs cluster unstandby node1.field.com
[root@cen7 corosync]# pcs cluster unstandby cen7.field.com
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:40:39 2018
Last change: Thu Aug 2 11:40:36 2018 by root via cibadmin on cen7.field.com
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.field.com
webstore (ocf::heartbeat:Filesystem): Started node2.field.com
webserver (systemd:httpd): Started node2.field.com
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
3、定义资源约束
资源约束常见命令说明:
pcs constraint:定义/查看资源约束pcs constraint --full
pcs constraint colocation:配置资源捆绑关系:指定哪些资源捆绑一起,在同一节点上运行
pcs constraint order :配置资源启动顺序:指定排列约束中的资源启动顺序
pcs constraint location:配置资源位置约束:指定资源首选在哪些节点上运行,由节点黏性值绝对优先级
1)、指定资源首选在哪些节点上运行:定义单个节点对资源黏性值
以下命令表示webservice对节点cen7的黏性值为100
[root@cen7 corosync]# pcs constraint location add webservice_prefer_cen7 webservice cen7.field.com 100
查看集群状态:
[root@cen7 corosync]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.field.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 2 11:42:57 2018
Last change: Thu Aug 2 11:42:45 2018 by root via cibadmin on cen7.field.com
3 nodes configured
3 resources configured
Online: [ cen7.field.com node1.field.com node2.field.com ]
Full list of resources:
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started cen7.field.com
webstore (ocf::heartbeat:Filesystem): Started cen7.field.com
webserver (systemd:httpd): Starting cen7.field.com
#定义了资源倾向性后所有资源都启动在了cen7节点上
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
2)、定义集群节点默认黏性值
pcs property list:查看集群属性
[root@cen7 corosync]# pcs property list --all | grep default
default-action-timeout: (null)
default-resource-stickiness: (null)
is-managed-default: (null)
placement-strategy: default
以下命令定义各节点黏性值为0
[root@cen7 corosync]# pcs property set default-resource-stickiness=0
[root@cen7 corosync]# pcs property list --all | grep default-resource-stickiness
default-resource-stickiness: 0
[root@cen7 corosync]# ip addr list| grep ens
2: ens32:
inet 192.168.88.132/24 brd 192.168.88.255 scope global noprefixroute ens32
inet 192.168.88.188/24 brd 192.168.88.255 scope global secondary ens32
[root@cen7 corosync]# curl 192.168.88.188