Corosyn+Pacemaker+Pcs实现HA

高可用介绍

高可用,大家可能会想到比较简单的Keepalived,或者更早一点的 heartbeat,也可能会用到 Corosync+Pacemaker,那么他们之间有什么区别。

Heartbeat到了v3版本后,拆分为多个子项目:Heartbeat、cluster-glue、Resource Agent、Pacemaker。

Heartbeat:只负责维护集群各节点的信息以及它们之前通信。

Cluster-glue:当于一个中间层,可以将heartbeat和crm(pacemaker)联系起来,主要包含2个部分,LRM和STONITH;

Resource Agent :用来控制服务启停,监控服务状态的脚本集合,这些脚本将被LRM调用从而实现各种资源启动、停止、监控等等。

pacemaker:原Heartbeat 拆分出来的资源管理器,用来管理整个HA的控制中心,客户端通过pacemaker来配置管理监控整个集群。它不能提供底层心跳信息传递的功能,它要想与对方节点通信需要借助底层(新拆分的heartbeat或corosync)的心跳传递服务,将信息通告给对方。

Pacemaker 介绍

什么是Pacemaker

Pacemaker 是集群资源管理器。它实现了集群服务的最大可用性(即。通过使用首选集群基础设施(Corosync 或 Heartbeat)提供的消息传递和成员功能,检测并从节点和资源级故障中恢复。

架构

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IS4ItXR1-1595782926429)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719141242214.png)]

内部组件

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mkceyaPE-1595782926438)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719141623200.png)]

  • CIB (aka. 集群信息基础) CIB (Aka. Cluster Information Base)

    • CIB 使用 XML 表示集群的配置和集群中所有资源当前状态
    • CIB 的内容在整个集群中自动保持同步
    • PEngine 使用它来计算集群的理想状态以及如何实现它
    • 指令列表将被反馈给 DC (指定协调员 Designated Co-ordinator)
  • CRMd (aka. 集群资源管理守护进程) CRMD (Aka. Cluster Resource Management Daemon)

    • Pacemaker 通过选择一个 CRMd 实例充当主机来集中所有集群决策。
    • 如果选择的 CRMd 进程,或者它所在的节点失败了… … 一个新的进程很快就会建立起来
  • DC(指定协调员 Designated Co-ordinator))

    • DC 按照所需的顺序执行 PEngine 的指令
  • 将它们通过集群消息传递基础结构(集群消息传递基础结构反过来将它们传递给它们的 LRMd 进程)传递给其他节点上的 LRMd (Local Resource Management daemon)或 CRMd 对等点

  • 节点会把他们所有操作的日志发给DC,然后根据预期的结果和实际的结果(之间的差异), 执行下一个等待中的命令,或者取消操作,并让PEngine根据非预期的结果重新计算集群的理想状态。

  • PEngine (aka. PE or 策略引擎) PENGINE (Aka. Pe Or strategy engine)

  • STONITHd

    • 在某些情况下,可能会需要关闭节点的电源来保证共享数据的完整性或是完全地恢复资源。为此Pacemaker引入了STONITHd。
    • STONITH是 Shoot-The-Other-Node-In-The-Head(爆其他节点的头)的缩写,并且通常是靠远程电源开关来实现的
  • N To N 架构

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oQXXB5wQ-1595782926449)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200719142256106.png)]

更新:

  • corosync在高可用中处于消息发送层,用于检测节点间通讯是否正常,而pacemaker则用于管理集群资源。通常在使用corosync和pacemaker的时候,我们都会使用统一的工具对它们进行管理,例如旧式的crmsh和新式的pcs。

  • 使用crmsh或者pcs管理的好处是我们不必面向配置文件,而是直接通过命令行的方式管理集群节点,减少编辑配置文件造成的错误。另一个好处是降低学习成本,我们可以不必学习corosync和pacemaker的相关配置命令,只需要学习crmsh或者pcs如何使用。

配置互信

环境:

  • OS 版本:

    [root@node0 corosync]# cat /etc/redhat-release
    CentOS Linux release 7.8.2003 (Core)
    
  • IP信息:

    node0 192.168.0.70
    node1 192.168.0.71
    node2 192.168.0.72
    

永久关闭防火墙并禁止开机启动与Selinux

【ALL】

systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl status firewalld.service

setenforce 0 
sed -i '/^SELINUX=/c\SELINUX=disabled' /etc/selinux/config

配置互信

  • node0
ssh-keygen  -t   rsa
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node1
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node2
  • node1
ssh-keygen  -t   rsa
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node0
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node2
  • node2
ssh-keygen  -t   rsa
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node0
ssh-copy-id -i  ~/.ssh/id_rsa.pub  root@node1

安装 corosync 与 pacemaker/pcs

  • 【ALL】在每个节点上均执行安装命令

    yum -y install corosync pacemaker  pcs resource-agents
    -- yum install corosync pacemaker pcs resource-agents fence-agents-all
    
    
  • 【ALL】启动 pcs 服务,并设置开机自启动

    systemctl start pcsd.service
    systemctl enable pcsd.service
    
  • 【ALL】设置 hacluster密码

    安装组件生成的hacluster用户,用来本地启动pcs进程,因此我们需要设定密码,每个节点的密码相同

    echo hacluster | passwd --stdin hacluster
    
  • 【ONE】查看pcs信息

    [root@node0 ~]# rpm -ql pacemaker
    /etc/sysconfig/pacemaker
    /usr/lib/ocf/resource.d/.isolation
    /usr/lib/ocf/resource.d/.isolation/docker-wrapper
    /usr/lib/ocf/resource.d/pacemaker/controld
    /usr/lib/ocf/resource.d/pacemaker/remote
    /usr/lib/systemd/system/pacemaker.service
    /usr/libexec/pacemaker/attrd
    /usr/libexec/pacemaker/cib
    /usr/libexec/pacemaker/cibmon
    /usr/libexec/pacemaker/crmd
    /usr/libexec/pacemaker/lrmd
    /usr/libexec/pacemaker/lrmd_internal_ctl
    /usr/libexec/pacemaker/pengine
    /usr/libexec/pacemaker/stonith-test
    /usr/libexec/pacemaker/stonithd
    /usr/sbin/crm_attribute
    /usr/sbin/crm_master
    /usr/sbin/crm_node
    /usr/sbin/pacemakerd
    /usr/sbin/stonith_admin
    /usr/share/doc/pacemaker-1.1.21
    /usr/share/doc/pacemaker-1.1.21/COPYING
    /usr/share/doc/pacemaker-1.1.21/ChangeLog
    /usr/share/licenses/pacemaker-1.1.21
    /usr/share/licenses/pacemaker-1.1.21/GPLv2
    /usr/share/man/man7/crmd.7.gz
    /usr/share/man/man7/ocf_pacemaker_controld.7.gz
    /usr/share/man/man7/ocf_pacemaker_remote.7.gz
    /usr/share/man/man7/pengine.7.gz
    /usr/share/man/man7/stonithd.7.gz
    /usr/share/man/man8/crm_attribute.8.gz
    /usr/share/man/man8/crm_master.8.gz
    /usr/share/man/man8/crm_node.8.gz
    /usr/share/man/man8/pacemakerd.8.gz
    /usr/share/man/man8/stonith_admin.8.gz
    /usr/share/pacemaker/alerts
    /usr/share/pacemaker/alerts/alert_file.sh.sample
    /usr/share/pacemaker/alerts/alert_smtp.sh.sample
    /usr/share/pacemaker/alerts/alert_snmp.sh.sample
    /var/lib/pacemaker/cib  --库文件
    /var/lib/pacemaker/pengine -- 库文件
    
  • 【ONE】查看corosync 安装信息

    [root@node0 ~]# rpm -ql corosync
    /etc/corosync  -- 配置文件路径
    /etc/corosync/corosync.conf.example -- 配置文件模板
    /etc/corosync/corosync.conf.example.udpu
    /etc/corosync/corosync.xml.example
    /etc/corosync/uidgid.d
    /etc/dbus-1/system.d/corosync-signals.conf
    /etc/logrotate.d/corosync   -- 日志处理配置文件
    /etc/sysconfig/corosync
    /etc/sysconfig/corosync-notifyd
    /usr/bin/corosync-blackbox
    /usr/bin/corosync-xmlproc
    /usr/lib/systemd/system/corosync-notifyd.service
    /usr/lib/systemd/system/corosync.service
    /usr/sbin/corosync   -- bin文件
    /usr/sbin/corosync-cfgtool
    /usr/sbin/corosync-cmapctl
    /usr/sbin/corosync-cpgtool
    /usr/sbin/corosync-keygen
    /usr/sbin/corosync-notifyd
    /usr/sbin/corosync-quorumtool
    /usr/share/corosync
    /usr/share/corosync/corosync
    /usr/share/corosync/corosync-notifyd
    /usr/share/corosync/xml2conf.xsl
    /usr/share/doc/corosync-2.4.5
    /usr/share/doc/corosync-2.4.5/LICENSE
    /usr/share/doc/corosync-2.4.5/SECURITY
    /usr/share/man/man5/corosync.conf.5.gz
    /usr/share/man/man5/corosync.xml.5.gz
    /usr/share/man/man5/votequorum.5.gz
    /usr/share/man/man8/cmap_keys.8.gz
    /usr/share/man/man8/corosync-blackbox.8.gz
    /usr/share/man/man8/corosync-cfgtool.8.gz
    /usr/share/man/man8/corosync-cmapctl.8.gz
    /usr/share/man/man8/corosync-cpgtool.8.gz
    /usr/share/man/man8/corosync-keygen.8.gz
    /usr/share/man/man8/corosync-notifyd.8.gz
    /usr/share/man/man8/corosync-quorumtool.8.gz
    /usr/share/man/man8/corosync-xmlproc.8.gz
    /usr/share/man/man8/corosync.8.gz
    /usr/share/man/man8/corosync_overview.8.gz
    /usr/share/snmp/mibs/COROSYNC-MIB.txt
    /var/lib/corosync --库文件路径
    /var/log/cluster -- 日志文件路径
    
    
  • 【ONE】在某个节点上执行

    本例在 node0 上执行

    [root@node0 corosync]# pcs  cluster auth node0 node1 node2 -u hacluster -p hacluster --force
    node1: Authorized
    node0: Authorized
    node2: Authorized
    
  • 【ONE】生成corosync 配置文件(随便在哪个节点上执行均可)

    [root@node2 corosync]#
    [root@node2 corosync]# pcs cluster setup --name cluster_test01 node0 node1 node2
    Destroying cluster on nodes: node0, node1, node2...
    node2: Stopping Cluster (pacemaker)...
    node0: Stopping Cluster (pacemaker)...
    node1: Stopping Cluster (pacemaker)...
    node1: Successfully destroyed cluster
    node0: Successfully destroyed cluster
    node2: Successfully destroyed cluster
    
    Sending 'pacemaker_remote authkey' to 'node0', 'node1', 'node2'
    node0: successful distribution of the file 'pacemaker_remote authkey'
    node2: successful distribution of the file 'pacemaker_remote authkey'
    node1: successful distribution of the file 'pacemaker_remote authkey'
    Sending cluster config files to the nodes...
    node0: Succeeded
    node1: Succeeded
    node2: Succeeded
    
    Synchronizing pcsd certificates on nodes node0, node1, node2...
    node1: Success
    node0: Success
    node2: Success
    Restarting pcsd on the nodes in order to reload the certificates...
    node1: Success
    node0: Success
    node2: Success
    
    [root@node0 corosync]#  ll /etc/corosync/corosync.conf
    -rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
    [root@node0 corosync]#
    [root@node1 corosync]# ll /etc/corosync/corosync.conf
    -rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
    [root@node2 corosync]#  ll /etc/corosync/corosync.conf
    -rw-r--r--. 1 root root 435 Jul 19 23:39 /etc/corosync/corosync.conf
    
    
  • 启动集群中的节点

    • 只启动node1
    [root@node0 corosync]# pcs cluster start node1
    node1: Starting Cluster (corosync)...
    node1: Starting Cluster (pacemaker)...
    [root@node0 corosync]#
    [root@node1 corosync]# ps -ef |grep coro
    root      10691      1  8 23:42 ?        00:00:00 corosync
    root      10716   9461  0 23:42 pts/1    00:00:00 grep --color=auto coro
    [root@node1 corosync]# ps -ef |grep pace
    root      10706      1  1 23:42 ?        00:00:00 /usr/sbin/pacemakerd -f
    haclust+  10707  10706  1 23:42 ?        00:00:00 /usr/libexec/pacemaker/cib
    root      10708  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/stonithd
    root      10709  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/lrmd
    haclust+  10710  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/attrd
    haclust+  10711  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/pengine
    haclust+  10712  10706  0 23:42 ?        00:00:00 /usr/libexec/pacemaker/crmd
    root      10718   9461  0 23:42 pts/1    00:00:00 grep --color=auto pace
    [root@node1 corosync]#
    [root@node1 corosync]#
    [root@node1 corosync]#
    
    
    • 查看节点状态

      [root@node0 corosync]# pcs cluster status
      Error: cluster is not currently running on this node
      
      [root@node1 corosync]# pcs cluster status
      Cluster Status:
       Stack: corosync
       Current DC: node1 (version 1.1.21-4.el7-f14e36fd43) - partition WITHOUT quorum
       Last updated: Sun Jul 19 23:46:47 2020
       Last change: Sun Jul 19 23:43:06 2020 by hacluster via crmd on node1
       3 nodes configured
       0 resources configured
      
      PCSD Status:
      
      
        node2: Online
        node1: Online
        node0: Online
      
      
    • 启动所有节点

      [root@node0 corosync]# pcs cluster start --all
      node1: Starting Cluster (corosync)...
      node0: Starting Cluster (corosync)...
      node2: Starting Cluster (corosync)...
      node0: Starting Cluster (pacemaker)...
      node1: Starting Cluster (pacemaker)...
      node2: Starting Cluster (pacemaker)...
      
      [root@node0 corosync]# pcs status
      Cluster name: cluster_test01
      
      WARNINGS:
      No stonith devices and stonith-enabled is not false
      
      Stack: corosync
      Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
      Last updated: Sun Jul 19 23:55:09 2020
      Last change: Sun Jul 19 23:47:47 2020 by hacluster via crmd on node1
      
      3 nodes configured
      0 resources configured
      
      Online: [ node0 node2 ]
      OFFLINE: [ node1 ]
      
      No resources
      
      
      Daemon Status:
        corosync: active/disabled
        pacemaker: active/disabled
      
      
  • 解决告警问题

    WARNINGS:
    No stonith devices and stonith-enabled is not false
    root@node0 corosync]# pcs property set stonith-enabled=false
    [root@node0 corosync]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Sun Jul 19 23:59:13 2020
    Last change: Sun Jul 19 23:57:13 2020 by root via cibadmin on node0
    
    3 nodes configured
    0 resources configured
    
    Online: [ node0 node1 node2 ]
    
    No resources
    
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    
    
  • 查看corosync状态

    [root@node0 corosync]# pcs status corosync
    
    Membership information
    ----------------------
        Nodeid      Votes Name
             1          1 node0 (local)
             2          1 node1
             3          1 node2
    
    
  • 查看 pacemaker进程

    [root@node0 corosync]# ps axf |grep pacemaker
      5003 pts/2    S+     0:00          \_ grep --color=auto pacemaker
      4792 ?        Ss     0:00 /usr/sbin/pacemakerd -f
      4793 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
      4794 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
      4795 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
      4796 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
      4797 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
      4798 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd
    
    
  • 检查配置文件

    [root@node0 corosync]# pcs property set stonith-enabled=true
    [root@node0 corosync]# crm_verify -L -V
       error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
       error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
       error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
    Errors found during check: config not valid
    [root@node0 corosync]# pcs property set stonith-enabled=false
    [root@node0 corosync]#
    [root@node0 corosync]# crm_verify -L -V
    
    
  • 创建VIP

    [root@node0 corosync]# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.0.75 cidr_netmask=32 op monitor interval=30s
    [root@node0 corosync]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 00:22:48 2020
    Last change: Mon Jul 20 00:22:38 2020 by root via cibadmin on node0
    
    3 nodes configured
    1 resource configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    [root@node0 corosync]#
    [root@node0 corosync]#
    [root@node0 corosync]# ip ad list
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 00:0c:29:37:1b:18 brd ff:ff:ff:ff:ff:ff
        inet 192.168.0.70/24 brd 192.168.0.255 scope global noprefixroute eth0
           valid_lft forever preferred_lft forever
        inet 192.168.0.75/32 brd 192.168.0.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::4427:bd05:1cf9:1f4f/64 scope link tentative noprefixroute dadfailed
           valid_lft forever preferred_lft forever
        inet6 fe80::19de:291a:ae81:cfd7/64 scope link tentative noprefixroute dadfailed
           valid_lft forever preferred_lft forever
        inet6 fe80::4f88:fe38:1a5e:4b05/64 scope lin
    

    查看 pacemaker 默认支持的资源

  • 查看资源采用的标准类型

    [root@node0 /]# pcs resource standards
    lsb
    ocf
    service
    systemd
    
    
  • 查看可用的ocf资源提供者

    [root@node0 /]# pcs resource providers
    heartbeat
    openstack
    pacemaker
    
    
  • 查看特定标准下所支持的脚本,例:ofc:heartbeat 下的脚本

    [root@node0 /]# pcs resource agents ocf:heartbeat
    aliyun-vpc-move-ip
    apache
    aws-vpc-move-ip
    awseip
    awsvip
    azure-lb
    clvm
    conntrackd
    CTDB
    db2
    Delay
    dhcpd
    docker
    Dummy
    ethmonitor
    exportfs
    Filesystem
    galera
    garbd
    iface-vlan
    IPaddr
    IPaddr2
    IPsrcaddr
    iSCSILogicalUnit
    iSCSITarget
    LVM
    LVM-activate
    lvmlockd
    MailTo
    mysql
    nagios
    named
    nfsnotify
    nfsserver
    nginx
    NodeUtilization
    oraasm
    oracle
    oralsnr
    pgsql
    portblock
    postfix
    rabbitmq-cluster
    redis
    Route
    rsyncd
    SendArp
    slapd
    Squid
    sybaseASE
    symlink
    tomcat
    vdo-vol
    VirtualDomain
    Xinetd
    
    
  • 将 某个节点设置为standby 状态

    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 02:07:41 2020
    Last change: Mon Jul 20 00:22:38 2020 by root via cibadmin on node0
    
    3 nodes configured
    1 resource configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    [root@node0 /]# pcs cluster standby node2
    [root@node0 /]#
    [root@node0 /]#
    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 02:07:53 2020
    Last change: Mon Jul 20 02:07:50 2020 by root via cibadmin on node0
    
    3 nodes configured
    1 resource configured
    
    Node node2: standby
    Online: [ node0 node1 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    [root@node0 /]# pcs cluster unstandby node2
    [root@node0 /]#
    [root@node0 /]#
    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 02:08:04 2020
    Last change: Mon Jul 20 02:08:02 2020 by root via cibadmin on node0
    
    3 nodes configured
    1 resource configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    [root@node0 /]#
    
    
  • 重启资源

    [root@node0 /]# pcs resource restart  VIP
    VIP successfully restarted
    
  • 清理集群错误日志

    root@node0 /]# pcs resource cleanup
    Cleaned up all resources on all nodes
    
  • 无法仲裁的时候,选择忽略

    [root@node0 /]# pcs property set no-quorum-policy=ignore
    [root@node0 /]# pcs  property list
    Cluster Properties:
     cluster-infrastructure: corosync
     cluster-name: cluster_test01
     dc-version: 1.1.21-4.el7-f14e36fd43
     have-watchdog: false
     no-quorum-policy: ignore
     stonith-enabled: false
    [root@node0 /]# pcs  property show
    Cluster Properties:
     cluster-infrastructure: corosync
     cluster-name: cluster_test01
     dc-version: 1.1.21-4.el7-f14e36fd43
     have-watchdog: false
     no-quorum-policy: ignore
     stonith-enabled: false
    
    
  • 设置集群开机自动启动

    • 设置之前
    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 02:19:22 2020
    Last change: Mon Jul 20 02:17:39 2020 by root via cibadmin on node0
    
    3 nodes configured
    1 resource configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    
    
    • 设置
    [root@node0 /]#
    [root@node0 /]# pcs cluster enable --all
    node0: Cluster Enabled
    node1: Cluster Enabled
    node2: Cluster Enabled
    
    
    • 设置之后
    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 02:22:06 2020
    Last change: Mon Jul 20 02:17:39 2020 by root via cibadmin on node0
    
    3 nodes configured
    1 resource configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@node0 /]#
    
    

实例1:

Centos 7 下 Corosync + Pacemaker + psc 实现 httpd 服务高可用

安装并启动HTTPD

【ALL】所有节点都安装

yum -y install httpd
service httpd start

[root@node0 /]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@node0 /]#
[root@node0 /]# service httpd status
Redirecting to /bin/systemctl status httpd.service
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-07-20 02:27:21 CST; 1min 16s ago
     Docs: man:httpd(8)
           man:apachectl(8)
 Main PID: 19333 (httpd)
   Status: "Total requests: 10; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─19333 /usr/sbin/httpd -DFOREGROUND
           ├─19334 /usr/sbin/httpd -DFOREGROUND
           ├─19335 /usr/sbin/httpd -DFOREGROUND
           ├─19337 /usr/sbin/httpd -DFOREGROUND
           ├─19338 /usr/sbin/httpd -DFOREGROUND
           ├─19387 /usr/sbin/httpd -DFOREGROUND
           ├─19388 /usr/sbin/httpd -DFOREGROUND
           ├─19389 /usr/sbin/httpd -DFOREGROUND
           ├─19390 /usr/sbin/httpd -DFOREGROUND
           ├─19391 /usr/sbin/httpd -DFOREGROUND
           └─19392 /usr/sbin/httpd -DFOREGROUND

Jul 20 02:27:21 node0 systemd[1]: Starting The Apache HTTP Server...
Jul 20 02:27:21 node0 httpd[19333]: AH00558: httpd: Could not reliably determine the server's fully qualif...ssage
Jul 20 02:27:21 node0 systemd[1]: Started The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@node0 /]#

测试HTTP服务OK

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EVrLcYGP-1595782926469)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726214043166.png)]

【ALL】开始Apache URL监控页

vim /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from all
</Location>

【ALL】关闭 httpd 服务,添加httpd 资源时会重新启动http服务,如果不关闭,会报错。

 systemctl stop httpd
 systemctl status httpd

添加资源 WebSite

注意,这次是在node1上

[root@node1 corosync]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s
[root@node1 corosync]#
[root@node1 corosync]# pcs status
Cluster name: cluster_test01
Stack: corosync
Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Jul 20 04:33:32 2020
Last change: Mon Jul 20 04:33:25 2020 by root via cibadmin on node1

3 nodes configured
2 resources configured

Online: [ node0 node1 node2 ]

Full list of resources:

 VIP    (ocf::heartbeat:IPaddr2):       Started node0
 WebSite        (ocf::heartbeat:apache):        Started node1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node1 corosync]#

创建了一个httpd 的集群资源 WebSite,主节点在 node1 上。检测页:http://localhost/server-status, 检测时间:30s/次。 但是有一个新的问题,虚拟IP在node0上, httpd资源在 node1上,会导致客户端无法访问。如果VIP在任何节点都不存在,那么WebSite也不能运行。

设置 资源检测超时时间

[root@node1 corosync]# pcs resource op defaults timeout=120s
Warning: Defaults do not apply to resources which override them with their own defined values
[root@node1 corosync]# pcs resource op defaults
timeout=120s
[root@node1 corosyn

绑定服务资源和 VIP 资源,始终保持在一个节点上

[root@node1 corosync]#  pcs constraint colocation add WebSite with VIP INFINITY
[root@node1 corosync]#

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6R2N1vv4-1595782926475)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726234823356.png)]

浏览器访问测试

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RJZIcsNm-1595782926482)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200726234857301.png)]

实例2

Centos 7 下 Corosync + Pacemaker + pcs+ HA-proxy 实现业务高可用

  • 1、删除现有的WebSite 资源【ONE】

    [root@node1 corosync]# pcs resource delete WebSite
    Attempting to stop: WebSite... Stopped
    [root@node1 corosync]#
    
  • 2、安装 haproxy 服务 【ALL】

    yum  -y  install haproxy
    
  • 3、配置 httpd 服务监控本地网卡80服务 【ALL】

    Listen 80 修改为 Listen 网卡IP:80

    • node0
    grep -w 80 /etc/httpd/conf/httpd.conf
    sed -i  "/Listen[[:blank:]]80/c\ Listen 192.168.0.70:80" /etc/httpd/conf/httpd.conf
    systemctl restart httpd
    grep -w 80 /etc/httpd/conf/httpd.conf
    systemctl status httpd
    
    • node1
    grep -w 80 /etc/httpd/conf/httpd.conf
    sed -i  "/Listen[[:blank:]]80/c\ Listen 192.168.0.71:80" /etc/httpd/conf/httpd.conf
    systemctl restart httpd
    grep -w 80 /etc/httpd/conf/httpd.conf
    systemctl status httpd
    
    • node2
    grep -w 80 /etc/httpd/conf/httpd.conf
    sed -i  "/Listen[[:blank:]]80/c\ Listen 192.168.0.72:80" /etc/httpd/conf/httpd.conf
    systemctl restart httpd
    grep -w 80 /etc/httpd/conf/httpd.conf
    systemctl status httpd
    
  • 4、配置 haproxy【ALL】

    vim /etc/haproxy/haproxy.cfg
    追加
    
    #---------------------------------------------------------------------
    # listen httpd server
    #---------------------------------------------------------------------
    listen httpd_cluster
        bind 192.168.0.71:80
        balance  roundrobin
        option  tcpka
        option  httpchk
        option  tcplog
        server node0 node0:80 check port 80 inter 2000 rise 2 fall 5
        server node1 node1:80 check port 80 inter 2000 rise 2 fall 5
        server node2 node2:80 check port 80 inter 2000 rise 2 fall 5
    
  • 5、创建 haproxy 资源

    [root@node0 /]# pcs resource create haproxy systemd:haproxy op monitor interval="5s"
    [root@node0 /]#
    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 04:57:31 2020
    Last change: Mon Jul 20 04:57:27 2020 by root via cibadmin on node0
    
    3 nodes configured
    2 resources configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
     haproxy        (systemd:haproxy):      FAILED node1
    
    Failed Resource Actions:
    * haproxy_start_0 on node1 'unknown error' (1): call=37, status=complete, exitreason='',
        last-rc-change='Mon Jul 20 04:57:28 2020', queued=0ms, exec=2276ms
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@node0 /]#
    [root@node0 /]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 04:58:39 2020
    Last change: Mon Jul 20 04:57:27 2020 by root via cibadmin on node0
    
    3 nodes configured
    2 resources configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP    (ocf::heartbeat:IPaddr2):       Started node0
     haproxy        (systemd:haproxy):      Stopped
    
    Failed Resource Actions:
    * haproxy_start_0 on node2 'unknown error' (1): call=27, status=complete, exitreason='',
        last-rc-change='Mon Jul 20 04:57:33 2020', queued=0ms, exec=2242ms
    * haproxy_start_0 on node0 'unknown error' (1): call=51, status=complete, exitreason='',
        last-rc-change='Mon Jul 20 04:57:37 2020', queued=0ms, exec=2252ms
    * haproxy_start_0 on node1 'unknown error' (1): call=37, status=complete, exitreason='',
        last-rc-change='Mon Jul 20 04:57:28 2020', queued=0ms, exec=2276ms
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@node0 /]#
    
    

    资源已创建、启动,但是有报错,这是因为在其他节点的haproxy配置中监控的 虚拟IP并没有落在这些节点上

  • 清除集群报错

    [root@node0 ~]# pcs resource cleanup
    Cleaned up all resources on all nodes
    [root@node0 ~]# 
    
    
  • 重启 haproxy资源

    [root@node0 ~]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 05:46:40 2020
    Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
    
    3 nodes configured
    2 resources configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP	(ocf::heartbeat:IPaddr2):	Started node0
     haproxy	(systemd:haproxy):	Started node0
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    
    
  • 停止node0 ,模拟node0故障

    [root@node0 ~]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 05:46:40 2020
    Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
    
    3 nodes configured
    2 resources configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP	(ocf::heartbeat:IPaddr2):	Started node0
     haproxy	(systemd:haproxy):	Started node0
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    
    
    
    [root@node1 ~]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node0 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 05:47:28 2020
    Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
    
    3 nodes configured
    2 resources configured
    
    Online: [ node0 node1 node2 ]
    
    Full list of resources:
    
     VIP	(ocf::heartbeat:IPaddr2):	Started node0
     haproxy	(systemd:haproxy):	Stopping node0
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    [root@node1 ~]# pcs status
    Cluster name: cluster_test01
    Stack: corosync
    Current DC: node2 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
    Last updated: Mon Jul 20 05:47:35 2020
    Last change: Mon Jul 20 05:46:34 2020 by root via crm_resource on node0
    
    3 nodes configured
    2 resources configured
    
    Online: [ node1 node2 ]
    OFFLINE: [ node0 ]
    
    Full list of resources:
    
     VIP	(ocf::heartbeat:IPaddr2):	Started node1
     haproxy	(systemd:haproxy):	Started node1
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    
    
  • 访问web

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oSzEn9yZ-1595782926490)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200727010018506.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sBOQDE9m-1595782926500)(Pacemaker%E4%BB%8B%E7%BB%8D.assets/image-20200727010030607.png)]

你可能感兴趣的:(Linux,HA)