高可用,大家可能会想到比较简单的Keepalived,或者更早一点的 heartbeat,也可能会用到 Corosync+Pacemaker,那么他们之间有什么区别。
Heartbeat到了v3版本后,拆分为多个子项目:Heartbeat、cluster-glue、Resource Agent、Pacemaker。
Heartbeat:只负责维护集群各节点的信息以及它们之前通信。
Cluster-glue:当于一个中间层,可以将heartbeat和crm(pacemaker)联系起来,主要包含2个部分,LRM和STONITH;
Resource Agent :用来控制服务启停,监控服务状态的脚本集合,这些脚本将被LRM调用从而实现各种资源启动、停止、监控等等。
pacemaker:原Heartbeat 拆分出来的资源管理器,用来管理整个HA的控制中心,客户端通过pacemaker来配置管理监控整个集群。它不能提供底层心跳信息传递的功能,它要想与对方节点通信需要借助底层(新拆分的heartbeat或corosync)的心跳传递服务,将信息通告给对方。
Pacemaker 是集群资源管理器。它实现了集群服务的最大可用性(即。通过使用首选集群基础设施(Corosync 或 Heartbeat)提供的消息传递和成员功能,检测并从节点和资源级故障中恢复。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IS4ItXR1-1595782926429)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719141242214.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mkceyaPE-1595782926438)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719141623200.png)]
CIB (aka. 集群信息基础) CIB (Aka. Cluster Information Base)
CRMd (aka. 集群资源管理守护进程) CRMD (Aka. Cluster Resource Management Daemon)
DC(指定协调员 Designated Co-ordinator))
将它们通过集群消息传递基础结构(集群消息传递基础结构反过来将它们传递给它们的 LRMd 进程)传递给其他节点上的 LRMd (Local Resource Management daemon)或 CRMd 对等点
节点会把他们所有操作的日志发给DC,然后根据预期的结果和实际的结果(之间的差异), 执行下一个等待中的命令,或者取消操作,并让PEngine根据非预期的结果重新计算集群的理想状态。
PEngine (aka. PE or 策略引擎) PENGINE (Aka. Pe Or strategy engine)
STONITHd
N To N 架构
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oQXXB5wQ-1595782926449)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719142256106.png)]
corosync在高可用中处于消息发送层,用于检测节点间通讯是否正常,而pacemaker则用于管理集群资源。通常在使用corosync和pacemaker的时候,我们都会使用统一的工具对它们进行管理,例如旧式的crmsh和新式的pcs。
使用crmsh或者pcs管理的好处是我们不必面向配置文件,而是直接通过命令行的方式管理集群节点,减少编辑配置文件造成的错误。另一个好处是降低学习成本,我们可以不必学习corosync和pacemaker的相关配置命令,只需要学习crmsh或者pcs如何使用。
OS 版本:
[root@node0 corosync]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
IP信息:
node0 192.168.0.70
node1 192.168.0.71
node2 192.168.0.72
【ALL】
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl status firewalld.service
setenforce 0
sed -i '/^SELINUX=/c\SELINUX=disabled' /etc/selinux/config
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node0
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node0
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
【ALL】在每个节点上均执行安装命令
yum -y install corosync pacemaker pcs resource-agents
-- yum install corosync pacemaker pcs resource-agents fence-agents-all
【ALL】启动 pcs 服务,并设置开机自启动
systemctl start pcsd.service
systemctl enable pcsd.service
【ALL】设置 hacluster密码
安装组件生成的hacluster用户,用来本地启动pcs进程,因此我们需要设定密码,每个节点的密码相同
echo hacluster | passwd --stdin hacluster
【ONE】查看pcs信息
[root@node0 ~]# rpm -ql pacemaker
/etc/sysconfig/pacemaker
/usr/lib/ocf/resource.d/.isolation
/usr/lib/ocf/resource.d/.isolation/docker-wrapper
/usr/lib/ocf/resource.d/pacemaker/controld
/usr/lib/ocf/resource.d/pacemaker/remote
/usr/lib/systemd/system/pacemaker.service
/usr/libexec/pacemaker/attrd
/usr/libexec/pacemaker/cib
/usr/libexec/pacemaker/cibmon
/usr/libexec/pacemaker/crmd
/usr/libexec/pacemaker/lrmd
/usr/libexec/pacemaker/lrmd_internal_ctl
/usr/libexec/pacemaker/pengine
/usr/libexec/pacemaker/stonith-test
/usr/libexec/pacemaker/stonithd
/usr/sbin/crm_attribute
/usr/sbin/crm_master
/usr/sbin/crm_node
/usr/sbin/pacemakerd
/usr/sbin/stonith_admin
/usr/share/doc/pacemaker-1.1.21
/usr/share/doc/pacemaker-1.1.21/COPYING
/usr/share/doc/pacemaker-1.1.21/ChangeLog
/usr/share/licenses/pacemaker-1.1.21
/usr/share/licenses/pacemaker-1.1.21/GPLv2
/usr/share/man/man7/crmd.7.gz
/usr/share/man/man7/ocf_pacemaker_controld.7.gz
/usr/share/man/man7/ocf_pacemaker_remote.7.gz
/usr/share/man/man7/pengine.7.gz
/usr/share/man/man7/stonithd.7.gz
/usr/share/man/man8/crm_attribute.8.gz
/usr/share/man/man8/crm_master.8.gz
/usr/share/man/man8/crm_node.8.gz
/usr/share/man/man8/pacemakerd.8.gz
/usr/share/man/man8/stonith_admin.8.gz
/usr/share/pacemaker/alerts
/usr/share/pacemaker/alerts/alert_file.sh.sample
/usr/share/pacemaker/alerts/alert_smtp.sh.sample
/usr/share/pacemaker/alerts/alert_snmp.sh.sample
/var/lib/pacemaker/cib --库文件
/var/lib/pacemaker/pengine -- 库文件
【ONE】查看corosync 安装信息
[root@node0 ~]# rpm -ql corosync
/etc/corosync -- 配置文件路径
/etc/corosync/corosync.conf.example -- 配置文件模板
/etc/corosync/corosync.conf.example.udpu
/etc/corosync/corosync.xml.example
/etc/corosync/uidgid.d
/etc/dbus-1/system.d/corosync-signals.conf
/etc/logrotate.d/corosync -- 日志处理配置文件
/etc/sysconfig/corosync
/etc/sysconfig/corosync-notifyd
/usr/bin/corosync-blackbox
/usr/bin/corosync-xmlproc
/usr/lib/systemd/system/corosync-notifyd.service
/usr/lib/systemd/system/corosync.service
/usr/sbin/corosync -- bin文件
/usr/sbin/corosync-cfgtool
/usr/sbin/corosync-cmapctl
/usr/sbin/corosync-cpgtool
/usr/sbin/corosync-keygen
/usr/sbin/corosync-notifyd
/usr/sbin/corosync-quorumtool
/usr/share/corosync
/usr/share/corosync/corosync
/usr/share/corosync/corosync-notifyd
/usr/share/corosync/xml2conf.xsl
/usr/share/doc/corosync-2.4.5
/usr/share/doc/corosync-2.4.5/LICENSE
/usr/share/doc/corosync-2.4.5/SECURITY
/usr/share/man/man5/corosync.conf.5.gz
/usr/share/man/man5/corosync.xml.5.gz
/usr/share/man/man5/votequorum.5.gz
/usr/share/man/man8/cmap_keys.8.gz
/usr/share/man/man8/corosync-blackbox.8.gz
/usr/share/man/man8/corosync-cfgtool.8.gz
/usr/share/man/man8/corosync-cmapctl.8.gz
/usr/share/man/man8/corosync-cpgtool.8.gz
/usr/share/man/man8/corosync-keygen.8.gz
/usr/share/man/man8/corosync-notifyd.8.gz
/usr/share/man/man8/corosync-quorumtool.8.gz
/usr/share/man/man8/corosync-xmlproc.8.gz
/usr/share/man/man8/corosync.8.gz
/usr/share/man/man8/corosync_overview.8.gz
/usr/share/snmp/mibs/COROSYNC-MIB.txt
/var/lib/corosync --库文件路径
/var/log/cluster -- 日志文件路径
【ONE】在某个节点上执行
本例在 node0 上执行
[root@node0 corosync]# pcs cluster auth node0 node1 node2 -u hacluster -p hacluster --force
node1: Authorized
node0: Authorized
node2: Authorized
【ONE】生成corosync 配置文件(随便在哪个节点上执行均可)
[root@node2 corosync]#
[root@node2 corosync]# pcs cluster setup --name cluster_test01 node0 node1 node2
Destroying cluster on nodes: node0, node1, node2...
node2: Stopping Cluster (pacemaker)...
node0: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node0: Successfully destroyed cluster
node2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node0', 'node1', 'node2'
node0: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
node1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node0: Succeeded
node1: Succeeded
node2: Succeeded
Synchronizing pcsd certificates on nodes node0, node1, node2...
node1: Success
node0: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node0: Success
node2: Success
[root@node0 corosync]# ll /etc/corosync/corosync.conf
-rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
[root@node0 corosync]#
[root@node1 corosync]# ll /etc/corosync/corosync.conf
-rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
[root@node2 corosync]# ll /etc/corosync/corosync.conf
-rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
启动集群中的节点
[root@node0 corosync]# pcs cluster start node1
node1: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
[root@node0 corosync]#
[root@node1 corosync]# ps -ef |grep coro
root 10691 1 8 23:42 ? 00:00:00 corosync
root 10716 9461 0 23:42 pts/1 00:00:00 grep --color=auto coro
[root@node1 corosync]# ps -ef |grep pace
root 10706 1 1 23:42 ? 00:00:00 /usr/sbin/pacemakerd -f
haclust+ 10707 10706 1 23:42 ? 00:00:00 /usr/libexec/pacemaker/cib
root 10708 10706 0 23:42 ? 00:00:00 /usr/libexec/pacemaker/stonithd
root 10709 10706 0 23:42 ? 00:00:00 /usr/libexec/pacemaker/lrmd
haclust+ 10710 10706 0 23:42 ? 00:00:00 /usr/libexec/pacemaker/attrd
haclust+ 10711 10706 0 23:42 ? 00:00:00 /usr/libexec/pacemaker/pengine
haclust+ 10712 10706 0 23:42 ? 00:00:00 /usr/libexec/pacemaker/crmd
root 10718 9461 0 23:42 pts/1 00:00:00 grep --color=auto pace
[root@node1 corosync]#
[root@node1 corosync]#
[root@node1 corosync]#
查看节点状态
[root@node0 corosync]# pcs cluster status
Error: cluster is not currently running on this node
[root@node1 corosync]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node1 (version 1.1.21-4.el7-f14e36fd43) - partition WITHOUT quorum
Last updated: Sun Jul 19 23:46:47 2020
Last change: Sun Jul 19 23:43:06 2020 by hacluster via crmd on node1
3 nodes configured
0 resources configured
PCSD Status:
node2: Online
node1: Online
node0: Online
启动所有节点
[root@node0 corosync]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node0: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node0: Starting Cluster (pacemaker)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...
[root@node0 corosync]# pcs status
Cluster name: cluster_test01
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Jul 19 23:55:09 2020
Last change: Sun Jul 19 23:47:47 2020 by hacluster via crmd on node1
3 nodes configured
0 resources configured
Online: [ node0 node2 ]
OFFLINE: [ node1 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
解决告警问题
WARNINGS:
No stonith devices and stonith-enabled is not false
root@node0 corosync]# pcs property set stonith-enabled=false
[root@node0 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Sun Jul 19 23:59:13 2020
Last change: Sun Jul 19 23:57:13 2020 by root via cibadmin on node0
3 nodes configured
0 resources configured
Online: [ node0 node1 node2 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
查看corosync状态
[root@node0 corosync]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 node0 (local)
2 1 node1
3 1 node2
查看 pacemaker进程
[root@node0 corosync]# ps axf |grep pacemaker
5003 pts/2 S+ 0:00 \_ grep --color=auto pacemaker
4792 ? Ss 0:00 /usr/sbin/pacemakerd -f
4793 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
4794 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
4795 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
4796 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
4797 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
4798 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
检查配置文件
[root@node0 corosync]# pcs property set stonith-enabled=true
[root@node0 corosync]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@node0 corosync]# pcs property set stonith-enabled=false
[root@node0 corosync]#
[root@node0 corosync]# crm_verify -L -V
创建VIP
[root@node0 corosync]# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.0.75 cidr_netmask=32 op monitor interval=30s
[root@node0 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 00:22:48 2020
Last change: Mon Jul 20 00:22:38 2020 by root via cibadmin on node0
3 nodes configured
1 resource configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node0 corosync]#
[root@node0 corosync]#
[root@node0 corosync]# ip ad list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:37:1b:18 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.70/24 brd 192.168.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 192.168.0.75/32 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::4427:bd05:1cf9:1f4f/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::19de:291a:ae81:cfd7/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::4f88:fe38:1a5e:4b05/64 scope lin
查看资源采用的标准类型
[root@node0 /]# pcs resource standards
lsb
ocf
service
systemd
查看可用的ocf资源提供者
[root@node0 /]# pcs resource providers
heartbeat
openstack
pacemaker
查看特定标准下所支持的脚本,例:ofc:heartbeat 下的脚本
[root@node0 /]# pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
awseip
awsvip
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd
将 某个节点设置为standby 状态
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:07:41 2020
Last change: Mon Jul 20 00:22:38 2020 by root via cibadmin on node0
3 nodes configured
1 resource configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node0 /]# pcs cluster standby node2
[root@node0 /]#
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:07:53 2020
Last change: Mon Jul 20 02:07:50 2020 by root via cibadmin on node0
3 nodes configured
1 resource configured
Node node2: standby
Online: [ node0 node1 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node0 /]# pcs cluster unstandby node2
[root@node0 /]#
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:08:04 2020
Last change: Mon Jul 20 02:08:02 2020 by root via cibadmin on node0
3 nodes configured
1 resource configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node0 /]#
重启资源
[root@node0 /]# pcs resource restart VIP
VIP successfully restarted
清理集群错误日志
root@node0 /]# pcs resource cleanup
Cleaned up all resources on all nodes
无法仲裁的时候,选择忽略
[root@node0 /]# pcs property set no-quorum-policy=ignore
[root@node0 /]# pcs property list
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster_test01
dc-version: 1.1.21-4.el7-f14e36fd43
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
[root@node0 /]# pcs property show
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster_test01
dc-version: 1.1.21-4.el7-f14e36fd43
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
设置集群开机自动启动
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:19:22 2020
Last change: Mon Jul 20 02:17:39 2020 by root via cibadmin on node0
3 nodes configured
1 resource configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node0 /]#
[root@node0 /]# pcs cluster enable --all
node0: Cluster Enabled
node1: Cluster Enabled
node2: Cluster Enabled
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 02:22:06 2020
Last change: Mon Jul 20 02:17:39 2020 by root via cibadmin on node0
3 nodes configured
1 resource configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node0 /]#
Centos 7 下 Corosync + Pacemaker + psc 实现 httpd 服务高可用
【ALL】所有节点都安装
yum -y install httpd
service httpd start
[root@node0 /]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@node0 /]#
[root@node0 /]# service httpd status
Redirecting to /bin/systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2020-07-20 02:27:21 CST; 1min 16s ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 19333 (httpd)
Status: "Total requests: 10; Current requests/sec: 0; Current traffic: 0 B/sec"
CGroup: /system.slice/httpd.service
├─19333 /usr/sbin/httpd -DFOREGROUND
├─19334 /usr/sbin/httpd -DFOREGROUND
├─19335 /usr/sbin/httpd -DFOREGROUND
├─19337 /usr/sbin/httpd -DFOREGROUND
├─19338 /usr/sbin/httpd -DFOREGROUND
├─19387 /usr/sbin/httpd -DFOREGROUND
├─19388 /usr/sbin/httpd -DFOREGROUND
├─19389 /usr/sbin/httpd -DFOREGROUND
├─19390 /usr/sbin/httpd -DFOREGROUND
├─19391 /usr/sbin/httpd -DFOREGROUND
└─19392 /usr/sbin/httpd -DFOREGROUND
Jul 20 02:27:21 node0 systemd[1]: Starting The Apache HTTP Server...
Jul 20 02:27:21 node0 httpd[19333]: AH00558: httpd: Could not reliably determine the server's fully qualif...ssage
Jul 20 02:27:21 node0 systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@node0 /]#
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EVrLcYGP-1595782926469)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726214043166.png)]
vim /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from all
</Location>
systemctl stop httpd
systemctl status httpd
注意,这次是在node1上
[root@node1 corosync]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s
[root@node1 corosync]#
[root@node1 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:33:32 2020
Last change: Mon Jul 20 04:33:25 2020 by root via cibadmin on node1
3 nodes configured
2 resources configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
WebSite (ocf::heartbeat:apache): Started node1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node1 corosync]#
创建了一个httpd 的集群资源 WebSite,主节点在 node1 上。检测页:http://localhost/server-status, 检测时间:30s/次。 但是有一个新的问题,虚拟IP在node0上, httpd资源在 node1上,会导致客户端无法访问。如果VIP在任何节点都不存在,那么WebSite也不能运行。
[root@node1 corosync]# pcs resource op defaults timeout=120s
Warning: Defaults do not apply to resources which override them with their own defined values
[root@node1 corosync]# pcs resource op defaults
timeout=120s
[root@node1 corosyn
[root@node1 corosync]# pcs constraint colocation add WebSite with VIP INFINITY
[root@node1 corosync]#
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6R2N1vv4-1595782926475)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726234823356.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RJZIcsNm-1595782926482)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726234857301.png)]
Centos 7 下 Corosync + Pacemaker + pcs+ HA-proxy 实现业务高可用
1、删除现有的WebSite 资源【ONE】
[root@node1 corosync]# pcs resource delete WebSite
Attempting to stop: WebSite... Stopped
[root@node1 corosync]#
2、安装 haproxy 服务 【ALL】
yum -y install haproxy
3、配置 httpd 服务监控本地网卡80服务 【ALL】
Listen 80 修改为 Listen 网卡IP:80
grep -w 80 /etc/httpd/conf/httpd.conf
sed -i "/Listen[[:blank:]]80/c\ Listen 192.168.0.70:80" /etc/httpd/conf/httpd.conf
systemctl restart httpd
grep -w 80 /etc/httpd/conf/httpd.conf
systemctl status httpd
grep -w 80 /etc/httpd/conf/httpd.conf
sed -i "/Listen[[:blank:]]80/c\ Listen 192.168.0.71:80" /etc/httpd/conf/httpd.conf
systemctl restart httpd
grep -w 80 /etc/httpd/conf/httpd.conf
systemctl status httpd
grep -w 80 /etc/httpd/conf/httpd.conf
sed -i "/Listen[[:blank:]]80/c\ Listen 192.168.0.72:80" /etc/httpd/conf/httpd.conf
systemctl restart httpd
grep -w 80 /etc/httpd/conf/httpd.conf
systemctl status httpd
4、配置 haproxy【ALL】
vim /etc/haproxy/haproxy.cfg
追加
#---------------------------------------------------------------------
# listen httpd server
#---------------------------------------------------------------------
listen httpd_cluster
bind 192.168.0.71:80
balance roundrobin
option tcpka
option httpchk
option tcplog
server node0 node0:80 check port 80 inter 2000 rise 2 fall 5
server node1 node1:80 check port 80 inter 2000 rise 2 fall 5
server node2 node2:80 check port 80 inter 2000 rise 2 fall 5
5、创建 haproxy 资源
[root@node0 /]# pcs resource create haproxy systemd:haproxy op monitor interval="5s"
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:57:31 2020
Last change: Mon Jul 20 04:57:27 2020 by root via cibadmin on node0
3 nodes configured
2 resources configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
haproxy (systemd:haproxy): FAILED node1
Failed Resource Actions:
* haproxy_start_0 on node1 'unknown error' (1): call=37, status=complete, exitreason='',
last-rc-change='Mon Jul 20 04:57:28 2020', queued=0ms, exec=2276ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node0 /]#
[root@node0 /]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:58:39 2020
Last change: Mon Jul 20 04:57:27 2020 by root via cibadmin on node0
3 nodes configured
2 resources configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
haproxy (systemd:haproxy): Stopped
Failed Resource Actions:
* haproxy_start_0 on node2 'unknown error' (1): call=27, status=complete, exitreason='',
last-rc-change='Mon Jul 20 04:57:33 2020', queued=0ms, exec=2242ms
* haproxy_start_0 on node0 'unknown error' (1): call=51, status=complete, exitreason='',
last-rc-change='Mon Jul 20 04:57:37 2020', queued=0ms, exec=2252ms
* haproxy_start_0 on node1 'unknown error' (1): call=37, status=complete, exitreason='',
last-rc-change='Mon Jul 20 04:57:28 2020', queued=0ms, exec=2276ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node0 /]#
资源已创建、启动,但是有报错,这是因为在其他节点的haproxy配置中监控的 虚拟IP并没有落在这些节点上
清除集群报错
[root@node0 ~]# pcs resource cleanup
Cleaned up all resources on all nodes
[root@node0 ~]#
重启 haproxy资源
[root@node0 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:46:40 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
3 nodes configured
2 resources configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
haproxy (systemd:haproxy): Started node0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
停止node0 ,模拟node0故障
[root@node0 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:46:40 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
3 nodes configured
2 resources configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
haproxy (systemd:haproxy): Started node0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node1 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:47:28 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
3 nodes configured
2 resources configured
Online: [ node0 node1 node2 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node0
haproxy (systemd:haproxy): Stopping node0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@node1 ~]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node2 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 05:47:35 2020
Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
3 nodes configured
2 resources configured
Online: [ node1 node2 ]
OFFLINE: [ node0 ]
Full list of resources:
VIP (ocf::heartbeat:IPaddr2): Started node1
haproxy (systemd:haproxy): Started node1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
访问web
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oSzEn9yZ-1595782926490)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200727010018506.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sBOQDE9m-1595782926500)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200727010030607.png)]