高可用实践—负载均衡与Keepalived+VIP

在互联网时代的早期,计算机普及程度较低,业务简单,并发量相对较小,单体应用常常足以支撑业务量。随着互联网红利来临,并发量的增大,也对单体服务提出了较大的挑战,常见的解决方式是**增加服务器性能**(磁盘、内存、CPU),**集群部署**等。但单机并不能无限制增加资源且利用率会大幅度下降,集群部署需要前置的网关进行路由,网关层仍旧需要处理高并发与单点问题。本文将就`Nginx`反向代理服务器讲解网关层(流量网关,非应用网关,如`springcloud gateway`等)的**负载均衡算法**与基于**Keepalived+VIP**的高可用方案。
## 一、什么是负载均衡
负载均衡,英文名称为Load Balance,其含义就是指将负载(工作任务)进行平衡、分摊到多个操作单元上进行运行,简而言之就是充当流量统一入口,调度后方部署在多台机器的应用。
## 二、负载均衡分类
按软硬件分类:
- **硬件负载均衡**,基于`ASIC`实现,性能高,如常用的F5等,成本较高。
- **软件负载均衡**,如反向代理服务器`Nginx`等,适用于中小型企业,成本低廉。
按网络分类:
- **四层负载均衡**,维持同一个TCP连接,性能高,如`LVS`。
- **七层负载均衡**, 基于各类应用层协议,功能较为丰富,但性能不如四层负载均衡,如`Nginx`。
## 三、负载均衡算法
这里我们列举常用的负载均衡算法:
-   **轮循均衡**(Round Robin):每次客户端请求轮流分配给内部服务器,不断循环。这种算法适合于服务器软硬件配置大致相同的场景。
-   **权重轮循均衡**(Weighted Round Robin):类似与轮询算法,但会根据服务器的不同处理能力,给每个服务器分配不同的权值,使请求按比例打算到内部服务器。如服务器 A、B、C 的权值被设计成 1、2、2,则服务器 A、B、C 将分别接收到 20%、40%、40%的服务请求。此种均衡算法适合服务器配置不均的场景。
-   **随机均衡**(Random):把客户端的请求随机分配给内部服务器,理论上在数据足够大的场景下能达到相对均衡的分布。
-   **一致性哈希均衡**(Consistency Hash):构建一个环形hash表,根据请求中某一些数据(可以是 MAC、IP 地址,也可以是更上层,如应用层HTTP报文中的某些参数信息)作为特征值来计算需要落在的节点上,为保证服务均与打散与节点宕机后仍能命中服务,会创建多个虚拟节点。
- ...
## 四、Keepalived+VIP+DNS轮询方案
部署环境如下:
**VIP** | **内网IP** | **主机名** | **Nginx端口** |
| ----------- | ------------ | ----------- | ---------------
**192.168.16.11** | **192.168.16.16** | **keepalive-nginx-1** | **8031** |
**192.168.16.11** | **192.168.16.17** | **keepalive-nginx-2** | **8031** |

参考架构图:

高可用实践—负载均衡与Keepalived+VIP_第1张图片

**1. 安装nginx**
- 将`nginx`添加到`yum repro`库中
```
rpm -Uvh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpm
```
- 安装`nginx`
`yum -y install nginx`
- 验证
```
[root@localhost ~]# nginx -v
nginx version: nginx/1.20.2
```
- 配置Nginx端口

```
vi /etc/nginx/conf.d/default.conf

# 192.168.16.10
server {
    listen       8001; #修改default端口为8031
    server_name  localhost;
    ...
}

# 192.168.16.11
server {
    listen       8031; #修改default端口为8031
    server_name  localhost;
    ...
}
```

- 启动 Nginx,并设置开机启动
`systemctl start nginx & systemctl enable nginx`

如果报权限错误,关闭SELINUX 

`vi /etc/selinux/config`,将`SELINUX=enforcing`改为`SELINUX=disabled`
- 查看 Nginx 启动状态
```
[root@localhost ~]# systemctl status nginx
● nginx.service - nginx - high performance web server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-04-27 15:54:19 CST; 1min 40s ago
     Docs: http://nginx.org/en/docs/
 Main PID: 1050 (nginx)
   CGroup: /system.slice/nginx.service
           ├─1050 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
           ├─1051 nginx: worker process
           ├─1052 nginx: worker process
           ├─1053 nginx: worker process
           └─1054 nginx: worker process

Apr 27 15:54:19 localhost.localdomain systemd[1]: Starting nginx - high performance web server...
Apr 27 15:54:19 localhost.localdomain systemd[1]: Started nginx - high performance web server.
```
- 页面验证

```
C:\Users\86189>curl http://192.168.16.11:8031/




Welcome to nginx!



Welcome to nginx!


If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.

For online documentation and support please refer to
nginx.org.

Commercial support is available at
nginx.com.

Thank you for using nginx.



成功返回`nginx`欢迎页。

    
**2. 安装keepalived**
- 下载 Keepalived
wget https://www.keepalived.org/software/keepalived-2.1.5.tar.gz
- 安装 Keepalived
```
# 安装依赖
$ yum -y install gcc-c++
$ yum -y install openssl-devel 
# 安装keepalived
$ tar -xvzf keepalived-2.1.5.tar.gz
$ cd keepalived-2.1.5
$ ./configure --prefix=/usr/local/keepalived
$ make & make install
```
- 配置 Keepalived
```
# 创建/etc/keepalived目录
$ mkdir /etc/keepalived 
$ cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf
$ cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/keepalived
```
- 修改EnvironmentFile配置
`vi /lib/systemd/system/keepalived.service`
```
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/run/keepalived.pid
KillMode=process
EnvironmentFile=-/etc/sysconfig/keepalived # 此处修改为/etc/sysconfig/keepalived
ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target
```

- 配置keepalive

两台机器执行`vi /etc/keepalived/keepalived.conf`

使用 `ip addr`查看网卡信息:

```
# 182.168.16.16
[root@localhost app]# ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:01:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:32:f8:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.16/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 192.168.16.11/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f219:afff:106:5f5f/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
       
# 182.168.16.16
[root@localhost app]# ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:d0:71:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.17/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::d296:dcd5:28ce:e88a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
```


主机 192.168.16.17 配置:
```
# 192.168.16.16主机
# 全局定义,定义全局的配置选项
global_defs {
# 指定keepalived在发生切换操作时发送email,发送给哪些email
# 建议在keepalived_notify.sh中发送邮件
  notification_email {
    [email protected]
  }
  notification_email_from [email protected] # 发送email时邮件源地址
    smtp_server 192.168.200.1 # 发送email时smtp服务器地址
    smtp_connect_timeout 30 # 连接smtp的超时时间
    router_id nginx-16-1 # 机器标识,通常可以设置为hostname
    vrrp_skip_check_adv_addr # 如果接收到的报文和上一个报文来自同一个路由器,则不执行检查。默认是跳过检查
    vrrp_garp_interval 0 # 单位秒,在一个网卡上每组gratuitous arp消息之间的延迟时间,默认为0
    vrrp_gna_interval 0 # 单位秒,在一个网卡上每组na消息之间的延迟时间,默认为0
}
# 检测脚本配置
vrrp_script checkhaproxy
{
  script "/etc/keepalived/check_nginx.sh" # 检测脚本路径
    interval 5 # 检测时间间隔(秒)
    weight 0 # 根据该权重改变priority,当值为0时,不改变实例的优先级
}
# VRRP实例配置
vrrp_instance VI_1 {
  state BACKUP  # 设置初始状态为'备份'
    interface ens3 # 设置绑定VIP的网卡,例如ens3
    virtual_router_id 51  # 配置集群VRID,互为主备的VRID需要是相同的值
    nopreempt               # 设置非抢占模式,只能设置在state为backup的节点上
    priority 100 # 设置优先级,值范围0~254,值越大优先级越高,最高的为master
    advert_int 1 # 组播信息发送时间间隔,两个节点必须设置一样,默认为1秒
# 验证信息,两个节点必须一致
    authentication {
      auth_type PASS # 认证方式,可以是PASS或AH两种认证方式
        auth_pass 1111 # 认证密码
    }
  unicast_src_ip 192.168.16.16  # 设置本机内网IP地址
    unicast_peer {
      192.168.16.17             # 对端设备的IP地址
    }
# VIP,当state为master时添加,当state为backup时删除
  virtual_ipaddress {
    192.168.16.11 # 设置高可用虚拟VIP,如果是腾讯云的CVM,需要填写控制台申请到的HAVIP地址。
  }
   # 要执行的检查脚本
  track_script {
    checkhaproxy
  }
  notify_master "/etc/keepalived/keepalived_notify.sh MASTER" # 当切换到master状态时执行脚本
    notify_backup "/etc/keepalived/keepalived_notify.sh BACKUP" # 当切换到backup状态时执行脚本
    notify_fault "/etc/keepalived/keepalived_notify.sh FAULT" # 当切换到fault状态时执行脚本
    notify_stop "/etc/keepalived/keepalived_notify.sh STOP" # 当切换到stop状态时执行脚本
    garp_master_delay 1    # 设置当切为主状态后多久更新ARP缓存
    garp_master_refresh 5   # 设置主节点发送ARP报文的时间间隔
    # 跟踪接口,里面任意一块网卡出现问题,都会进入故障(FAULT)状态
    track_interface {
      ens3
    }
}
```
备机 192.168.16.17 配置:
```

# 全局定义,定义全局的配置选项
global_defs {
# 指定keepalived在发生切换操作时发送email,发送给哪些email
# 建议在keepalived_notify.sh中发送邮件
  notification_email {
    [email protected]
  }
  notification_email_from [email protected] # 发送email时邮件源地址
    smtp_server 192.168.200.1 # 发送email时smtp服务器地址
    smtp_connect_timeout 30 # 连接smtp的超时时间
    router_id nginx-17-2 # 机器标识,通常可以设置为hostname
    vrrp_skip_check_adv_addr # 如果接收到的报文和上一个报文来自同一个路由器,则不执行检查。默认是跳过检查
    vrrp_garp_interval 0 # 单位秒,在一个网卡上每组gratuitous arp消息之间的延迟时间,默认为0
    vrrp_gna_interval 0 # 单位秒,在一个网卡上每组na消息之间的延迟时间,默认为0
}
# 检测脚本配置
vrrp_script checkhaproxy
{
  script "/etc/keepalived/check_nginx.sh" # 检测脚本路径
    interval 5 # 检测时间间隔(秒)
    weight 0 # 根据该权重改变priority,当值为0时,不改变实例的优先级
}
# VRRP实例配置
vrrp_instance VI_1 {
  state BACKUP  # 设置初始状态为'备份'
    interface ens3 # 设置绑定VIP的网卡,例如ens3
    virtual_router_id 51  # 配置集群VRID,互为主备的VRID需要是相同的值
    nopreempt               # 设置非抢占模式,只能设置在state为backup的节点上
    priority 50 # 设置优先级,值范围0~254,值越大优先级越高,最高的为master
    advert_int 1 # 组播信息发送时间间隔,两个节点必须设置一样,默认为1秒
# 验证信息,两个节点必须一致
    authentication {
      auth_type PASS # 认证方式,可以是PASS或AH两种认证方式
        auth_pass 1111 # 认证密码
    }
  unicast_src_ip 192.168.16.17  # 设置本机内网IP地址
    unicast_peer {
      192.168.16.16             # 对端设备的IP地址
    }
# VIP,当state为master时添加,当state为backup时删除
  virtual_ipaddress {
    192.168.16.11 # 设置高可用虚拟VIP,如果是腾讯云的CVM,需要填写控制台申请到的HAVIP地址。
  }
  # 要执行的检查脚本
  track_script {
    checkhaproxy
  }
  notify_master "/etc/keepalived/keepalived_notify.sh MASTER" # 当切换到master状态时执行脚本
    notify_backup "/etc/keepalived/keepalived_notify.sh BACKUP" # 当切换到backup状态时执行脚本
    notify_fault "/etc/keepalived/keepalived_notify.sh FAULT" # 当切换到fault状态时执行脚本
    notify_stop "/etc/keepalived/keepalived_notify.sh STOP" # 当切换到stop状态时执行脚本
    garp_master_delay 1    # 设置当切为主状态后多久更新ARP缓存
    garp_master_refresh 5   # 设置主节点发送ARP报文的时间间隔
    # 跟踪接口,里面任意一块网卡出现问题,都会进入故障(FAULT)状态
    track_interface {
      ens3
    }
}
```

定义检测脚本:
`vi /etc/keepalived/check_nginx.sh`

```
#!/usr/bin/env bash

NGINXPID="/run/nginx.pid"
if [ ! -f $NGINXPID ];then
   killall keepalived
fi

```
定义告警脚本:

```
#!/usr/bin/env bash 
 # Use of this source code is governed by a MIT style 
 # license that can be found in the LICENSE file. 
   # /etc/keepalived/keepalived_notify.sh 
 log_file=/var/log/keepalived.log 
   iam::keepalived::mail() { 
 # 这里可以添加email逻辑,当keepalived变动时及时告警 
 : 
 } 
 iam::keepalived::log() { 
 echo "[`date '+%Y-%m-%d %T'`] $1" >> ${log_file} 
 } 
   [ ! -d /var/keepalived/ ] && mkdir -p /var/keepalived/ 
   case "$1" in 
 "MASTER" ) 
 iam::keepalived::log "notify_master" 
 ;; 
 "BACKUP" ) 
 iam::keepalived::log "notify_backup" 
 ;; 
 "FAULT" ) 
 iam::keepalived::log "notify_fault" 
 ;; 
 "STOP" ) 
 iam::keepalived::log "notify_stop" 
 ;; 
 *) 
 iam::keepalived::log "keepalived_notify.sh: state error!" 
 ;; 
 esac 

```
- 启动 Keepalived,并设置开机启动

```
$ systemctl start keepalived
$ systemctl enable keepalived
```
- 检查 Keepalived 状态

```
systemctl status keepalived
* keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-28 11:11:52 CST; 3s ago
  Process: 236527 ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 236528 (keepalived)
    Tasks: 3
   CGroup: /system.slice/keepalived.service
           |-236528 /usr/local/keepalived/sbin/keepalived -D
           |-236529 /usr/local/keepalived/sbin/keepalived -D
           `-236530 /usr/local/keepalived/sbin/keepalived -D

Apr 28 11:11:52 localhost.localdomain Keepalived_vrrp[236530]: (VI_1) Entering ...
Apr 28 11:11:52 localhost.localdomain Keepalived_vrrp[236530]: VRRP sockpool: [...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Gained...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Gained...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Gained...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Apr 28 11:11:52 localhost.localdomain Keepalived_healthcheckers[236529]: Activa...
Hint: Some lines were ellipsized, use -l to show in full.
```
提示`Active: active (running)`即可。

- 配置文件解析

配置文件,大致分为下面 4 个部分。
1.  global_defs:全局定义,定义全局的配置选项。
2.  vrrp_script checkhaproxy:检测脚本配置。
3.  vrrp_instance VI_1:VRRP 实例配置。
4.  virtual_server:LVS 配置。如果没有配置 LVS+Keepalived,不需要该配置。
- 验证虚拟ip
使用`systemctl restart keepalived`重启keepalive服务,两台机器分分别执行`ip addr`,可以看到:

```
# 192.168.16.16
2: ens3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:82:f8:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.16/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 192.168.16.11/32 scope global ens3
       valid_lft forever preferred_lft forever

```
主机增加了一个虚拟IP,`192.168.16.11`。如有异常可查看日志:`tailf var/log/messages`。

**3. 部署实践**

- 两台服务器均部署测试应用服务
`nohup java -jar -Dserver.port=8083 -Xms1024m -Xmx1024m springboot-web-demo-1.0-SNAPSHOT.jar &`

- 创建测试服务
`vi /etc/nginx/conf.d/cqbdri.conf`

```
# 192.168.16.16
server {
    listen       8033;
    server_name  192.168.16.16;
    root         /usr/share/nginx/html;
    location / {
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_pass  http://test;
      client_max_body_size 5m;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}
```
```
# 192.168.16.17
server {
    listen       8033;
    server_name  192.168.16.17;
    root         /usr/share/nginx/html;
    location / {
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_pass  http://test;
      client_max_body_size 5m;
    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }
}
```
- 服务测试
设置服务器名称:`hostnamectl set-hostname nginx1/nginx2`

DOS窗口下执行:`curl http://192.168.16.11:8033/hello?name=winson`

返回结果:`Hello winson! I'm Edge controller!`

nginx日志查看:`/var/log/nginx/access.log`


- 配置nginx负载均衡
`vi /etc/nginx/nginx.conf`
添加如下内容:

```
# 192.168.16.16
 upstream test {
       server 127.0.0.1:8083 weight=2;
       server 192.168.16.17:8083 weight =1;
   }
   
 # 192.168.16.17
 upstream test {
       server 127.0.0.1:8083 weight=1;
       server 192.168.16.16:8083 weight=2;
   }
```
- 负载均衡测试
DOS窗口下执行:`curl http://192.168.16.11:8033/hello?name=winson`

根据权重策略循环返回:
2次`Hello winson! I'm Edge controller! nginx-1` 

一次`Hello winson! I'm Edge controller! nginx-2` 
- Keepalive测试
1. 杀掉主机nginx进程
`systemctl stop nginx`

2. 查看nginx pid
执行`cat /run/nginx.pid`,返回:

```
#主机
[root@localhost keepalived]# cat /run/nginx.pid
cat: /run/nginx.pid: No such file or directory

#备机
[root@localhost keepalived]# cat /run/nginx.pid
8873
```

3. 查看ip漂移情况
执行ip addr:

```
# 192.168.16.16
2: ens3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:01:82:f8:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.16/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f269:aeff:106:5f5f/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
       
# 192.168.16.17
2: ens3: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:d0:71:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.16.17/24 brd 192.168.16.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 192.168.16.11/32 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::da96:dcd5:28ca:e88a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
```
可以看到虚拟`192.168.16.11`已经漂移到了机器`192.168.16.17`上。

4. 测试keepalive
DOS窗口下执行:`curl http://192.168.16.11:8033/hello?name=winson`

根据权重策略循环返回:

2次`Hello winson! I'm Edge controller! nginx-1` 

1次`Hello winson! I'm Edge controller! nginx-2`

nginx日志查看:`/var/log/nginx/access.log`


## 五、总结
以上我们就完成了了基于keepalive+VIP的高可用负载均衡方案,但仍旧存在一些问题:
1. 仅有一个VIP,备机始终处于闲置状态,如何提高使用率?
可以配置两个虚拟IP,上游使用**智能DNS**或**HTTPDNS**轮询,提高资源使用率。

参考架构图:

高可用实践—负载均衡与Keepalived+VIP_第2张图片

2. keepalive主备所在的交换机故障如何实现高可用?
可以使用交换机**堆叠模式**,服务器分别接在两个不同的交换机上,也可以在不通机架做冷备,手动切换。


## 六、参考资料
- [负载均衡](https://icyfenix.cn/architect-perspective/general-architecture/diversion-system/load-balancing.html)
- [Keepalived+VIP](https://www.itlaoqi.com/chapter.html?sid=129&cid=2837)

你可能感兴趣的:(java,运维开发,运维)