高可用corosync+pacemaker

服务器架构演变

1.C/S 与 B/S 架构

C/S 架构(开发成本更高,难度更高)
客户端直接面向服务器
客户端向服务器发出请求
服务器收到请求后向客户端作出响应

测试:
环境:

真机作为client
server1作为服务器

1.给真机,server1安装http服务,并设置为开机自启

yum install httpd -y
systemctl start httpd
systemctl enable httpd

2.在服务器server1默认发布目录编写内容,用client测试是否能看到

[root@server1 ~]# mkdir /var/www/html -p
[root@server1 ~]# echo www.westos.org-vm1 > /var/www/html/index.html
[root@foundation4 ~]# curl 172.25.4.1
www.westos.org                        #能看到server1默认发布目录的内容

B/S架构(互联网公司用的更多,易于使用,开发成本低)
客户机浏览器http向WEB服务器发出请求,WEB服务器查询数据库服务器
数据库服务器的结果发送给WEB服务器,WEB服务器对客户机http作出响应

动态语言需要连接数据库

2.高可用HA(High Availability)架构
就是为了解决单点故障
server1 server2,只命中server2

高可用集群就是当某一个节点或服务器发生故障时,另一个节点能够自动且立即向外提供服务,即将有故障节点上的资源转移到另一个节点上去,这样另一个节点有了资源既可以向外提供服务。高可用集群是用于单个节点发生故障时,能够自动将资源、服务进行切换,这样可以保证服务一直在线。在这个过程中,对于客户端来说是透明的。
假设系统一直能够提供服务,我们说系统的可用性是100%。
如果系统每运行100个时间单位,会有1个时间单位无法提供服务,我们说系统的可用性是99%。
很多公司的高可用目标是4个9,也就是99.99%,这就意味着,系统的年停机时间为8.76个小时。

扩展
从硬件方面看

纵向扩展:给vm1增大内存
横向扩展:添加一个vm2,两个轮训,规模

从软件方面看
高可用

操作系统:
rhel > OEL > Cenos
redhat 开源,收的是服务费,核心节点部署,稳定
OEL : orical 全球顶尖
centos : 不稳定 完全free,社区维护 出了问题没有人帮你解决

DNS只负责解析,如果有一台机器挂了他不知道
需要实现一台机器挂了另一台机器能接管

corosync+pacemaker实现高可用

双机热备:

双机热备是一种概念,各种设备均可以采用此概念进行部署,比如三层交换机 、路由器、防火墙、服务器等。如果仅部署一台设备,难免会有单点故障的风险,所以部署两台,一主一备较为保险,一台坏了,另一台自动“顶上”,保证业务不中断,这就是双机热备。

环境:
client:172.25.4.250
server1:172.25.4.1
server2:172.25.4.2
工具:
corosync:心跳
pacemaker:集群资源管理

创建集群

开一台server2,安装http并且设置为开机自启

yum install httpd -y
systemctl start httpd
systemctl enable httpd

编辑默认发布目录

[root@server2 ~]# mkdir /var/www/html -p
[root@server2 ~]# echo www.westos.org-vm2 > /var/www/html/index.html

在yum源里加上高可用配件和存储(server1和server2都设置)

[kiosk@foundation4 Desktop]$ cd /var/www/html/iso/
[kiosk@foundation4 iso]$ ls
addons            GPL       media.repo               RPM-GPG-KEY-redhat-release
EFI               images    Packages                 TRANS.TBL
EULA              isolinux  repodata
extra_files.json  LiveOS    RPM-GPG-KEY-redhat-beta
[kiosk@foundation4 iso]$ cd addons/
[kiosk@foundation4 addons]$ ls
HighAvailability  ResilientStorage
[root@server1 ~]# vim /etc/yum.repos.d/dvd.repo
[dvd]
name=rhel7.6
baseurl=http://172.25.4.250/iso
gpgcheck=0

[HighAvailability]           #高可用
name=HighAvailability
baseurl=http://172.25.4.250/iso/addons/HighAvailability
gpgcheck=0

[ResilientStorage]          #存储
name=ResilientStorage
baseurl=http://172.25.4.250/iso/addons/ResilientStorage
gpgcheck=0
[root@server1 ~]# yum repolist
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
HighAvailability                                         | 4.3 kB     00:00     
ResilientStorage                                         | 4.3 kB     00:00     
dvd                                                      | 4.3 kB     00:00     
(1/4): HighAvailability/group_gz                           | 3.5 kB   00:00     
(2/4): HighAvailability/primary_db                         |  33 kB   00:00     
(3/4): ResilientStorage/group_gz                           | 5.1 kB   00:00     
(4/4): ResilientStorage/primary_db                         |  39 kB   00:00     
repo id                              repo name                            status
HighAvailability                     HighAvailability                        51
ResilientStorage                     ResilientStorage                        56
dvd                                  rhel7.6                              5,152
repolist: 5,259

建议两个服务器设置简单的主机名,例如server1
免密(两个互相做)
firewalld,selinux关掉

[root@server1 ~]# ssh-keygen
[root@server1 ~]# ssh-copy-id server2

server1,server2安装pacemaker corosync

yum install pacemaker corosync -y
systemctl start pcsd.service
systemctl enable pcsd.service

给server1,server2设置密码(密码必须一样)

[root@server1 ~]# echo westos | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@server1 ~]#yum install -y bash-*              #能补齐子命令

pcs:集群管理工具

[root@server1 ~]# pcs cluster auth server1 server2         #给server1,server2认证
Username: hacluster
Password: 
server1: Authorized
server2: Authorized
[root@server1 ~]# pcs cluster setup --name mycluster server1 server2    #给server1,server2建立一个集群,名称:mycluster

平台搭建完成!

[root@server1 ~]# pcs cluster start --all            #启动server1,server2 节点,启动这两个节点的corosync+pacemaker服务
server1: Starting Cluster (corosync)...
server2: Starting Cluster (corosync)...
server2: Starting Cluster (pacemaker)...
server1: Starting Cluster (pacemaker)...
[root@server1 ~]# pcs status                       #查看集群状态
Cluster name: mycluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Aug  4 16:47:51 2020
Last change: Tue Aug  4 15:54:43 2020 by hacluster via crmd on server1

2 nodes configured
0 resources configured

Online: [ server1 server2 ]

No resources


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled    #corosync+pacemaker,这两个服务没有开
  pcsd: active/enabled

如何在集群添加资源

[root@server1 ~]# pcs property set stonith-enabled=false     #把fence屏蔽掉
[root@server1 ~]# crm_verify -LV        #把fence屏蔽掉,这个命令才能执行。校验集群状态
[root@server1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=172.25.4.100 op monitor interval=30s
                                                      #添加一个vip=172.25.4.100资源,每30s监控一次ip是否存活
                                                      #ocf资源分类,里面有一个heartbeat
                                                      #op:option    monitor:监控   interval:频率
[root@server1 ~]# systemctl start corosync.service --now
[root@server1 ~]# systemctl start pacemaker.service --now
[root@server1 ~]# systemctl enable corosync.service --now
[root@server1 ~]# systemctl enable pacemaker.service --now
[root@server1 ~]# pcs status             #查看集群状态
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Aug  4 17:00:09 2020
Last change: Tue Aug  4 16:57:38 2020 by root via cibadmin on server1

2 nodes configured
1 resource configured

Online: [ server1 server2 ]

Full list of resources:

 vip	(ocf::heartbeat:IPaddr2):	Started server1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

测试:

[root@server1 ~]# ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:5a:95:fe brd ff:ff:ff:ff:ff:ff
    inet 172.25.4.1/24 brd 172.25.4.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 172.25.4.100/24 brd 172.25.4.255 scope global secondary eth0        #vip在server1上
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe5a:95fe/64 scope link 
       valid_lft forever preferred_lft forever

[root@server2 ~]# ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:42:b0:86 brd ff:ff:ff:ff:ff:ff
    inet 172.25.4.2/24 brd 172.25.4.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe42:b086/64 scope link 
       valid_lft forever preferred_lft forever

client做解析

[root@foundation4 ~]# vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.25.4.100    www.westos.com
172.25.4.1      server1   controller
172.25.4.2      server2

网页测试访问www.westos.com
显示:www…westos.org-vm1

服务故障:
启动:pcs resource create vip ocfIPaddr2 ip=172.25.4.100 op monitor interval=30s
如果一台服务器集群套件关掉,vip就会到另外一台

[root@server1 ~]# pcs cluster stop server1          #把server1停掉
server1: Stopping Cluster (pacemaker)...
server1: Stopping Cluster (corosync)...
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Aug  4 17:11:51 2020
Last change: Tue Aug  4 16:57:37 2020 by root via cibadmin on server1

2 nodes configured
1 resource configured

Online: [ server2 ]
OFFLINE: [ server1 ]             #server1 offline

Full list of resources:

 vip	(ocf::heartbeat:IPaddr2):	Started server2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@server2 ~]# ip addr                         # vip到server2上了
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:42:b0:86 brd ff:ff:ff:ff:ff:ff
    inet 172.25.4.2/24 brd 172.25.4.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 172.25.4.100/24 brd 172.25.4.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe42:b086/64 scope link 
       valid_lft forever preferred_lft forever
[root@server1 ~]# pcs cluster start server1             #开启server1集群服务
server1: Starting Cluster (corosync)...
server1: Starting Cluster (pacemaker)...
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Aug  4 17:13:50 2020
Last change: Tue Aug  4 16:57:37 2020 by root via cibadmin on server1

2 nodes configured
1 resource configured

Online: [ server1 server2 ]

Full list of resources:

 vip	(ocf::heartbeat:IPaddr2):	Started server2              # 但是server1没有接管

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

双机热备会产生资源浪费,两台服务器只用了一台
相同应用资源,应该避免切换,如果server1比server2资源更好,可以切换
切换会有资源损失

网络故障:
(模拟网卡坏了)

[root@server2 ~]# ip addr del 172.25.4.100/24 dev eth0             # server2删除vip,vip又会回到server1
[root@server2 ~]# ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:42:b0:86 brd ff:ff:ff:ff:ff:ff
    inet 172.25.4.2/24 brd 172.25.4.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe42:b086/64 scope link 
       valid_lft forever preferred_lft forever
[root@server2 ~]# ip addr        #30s之后
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:42:b0:86 brd ff:ff:ff:ff:ff:ff
    inet 172.25.4.2/24 brd 172.25.4.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 172.25.4.100/24 brd 172.25.4.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe42:b086/64 scope link 
       valid_lft forever preferred_lft forever
[root@server2 ~]# pcs resource create apache systemd:httpd op monitor interval=1min   #添加http资源,调动systemd的httpd脚本
[root@server3 ~]# systemctl status httpd.service 
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Aug  4 17:34:08 2020
Last change: Tue Aug  4 17:33:51 2020 by root via cibadmin on server2

2 nodes configured
2 resources configured

Online: [ server1 server2 ]

Full list of resources:

 vip	(ocf::heartbeat:IPaddr2):	Started server2          #vip在server2
 apache	(systemd:httpd):	Started server1                  #http在server1,客户无法访问

Failed Actions:
* vip_monitor_30000 on server2 'not running' (7): call=7, status=complete, exitreason='',
    last-rc-change='Tue Aug  4 17:17:04 2020', queued=0ms, exec=0ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

网页测试:没情况

解决方法:

[root@server2 ~]# pcs resource group add webgroup vip apache      #以组的形式添加资源并且排列顺序
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Aug  4 17:36:53 2020
Last change: Tue Aug  4 17:36:50 2020 by root via cibadmin on server2

2 nodes configured
2 resources configured

Online: [ server1 server2 ]

Full list of resources:

 Resource Group: webgroup
     vip	(ocf::heartbeat:IPaddr2):	Started server2
     apache	(systemd:httpd):	Starting server2

Failed Actions:
* vip_monitor_30000 on server2 'not running' (7): call=7, status=complete, exitreason='',
    last-rc-change='Tue Aug  4 17:17:04 2020', queued=0ms, exec=0ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

你可能感兴趣的:(高可用corosync+pacemaker)