Nginx学习笔记--高可用Nginx架构:keepalived+nginx

Nginx作为对外暴露的访问入口,必须具有高可用性,才能保证能够正常提供服务。单机Nginx服务的情况下,一旦出现宕机,将会导致需要Nginx路由的服务不可用访问,因此,保证Nginx服务的HA(high availabitlity),也就是高可用性。

keepalived+lvs+nginx如何保证Nginx高可用?

keepalived是一个集群高可用的轻量级解决方案,关于他的介绍不多做描述,度娘很多。这里主要分析一下是如何保证nginx高可用。

我们都知道单机无法保证高可用,那么必定要实现主备或者集群来保证其可用性。Nginx本身并没有提供这样的功能,keepalived就是解决这种问题的一种实现方案。利用keepalived可以实现主备架构,在master故障发生时进行故障转移,选举备机作为新的master提供服务,同时结合keepalived提供的检测机制,可以保证Nginx的高可用。

按照我的理解,画了下面的架构图,下面看图分析。


Nginx学习笔记--高可用Nginx架构:keepalived+nginx_第1张图片

  1. 首先是外部请求,客户端访问在 keepalived中的vrrp配置的对外暴露的虚拟ip,访问到keepalived-service-master所在服务器server1,此时keepalived-service-backup服务做备用,不提供对外服务。
  2. 通过keepalived-service-master中的路由配置,keepalived将请求路由到实际处理请求的service,在HA Nginx架构中,keepalived将请求路由到Nginx服务,并且这个Nginx服务需要根keepalived服务在同一个server上面(稍后分析为何需要同一个server),一个keepalived服务相当于“监控”一个Nginx服务,图中keepalived-master-service的请求都路由到 Nginx service1(这里也可以配置多个Nginx服务),不会到Nginx service2
  3. Nginx service1接收请求之后,根据负载配置分发请求。

以上就是一个正常的请求通过keepalived的处理流程,在此时server2中的keepalived和nginx服务其实都是没有处理请求的,只做备机。下面分析如果Nginx-service1不可用时,如何保证高可用。

Nginx学习笔记--高可用Nginx架构:keepalived+nginx_第2张图片
如上图所示,加入Nginx service1服务挂了,那么势必需要启用备用Nginx服务,这时候就需要Keepalived发挥作用了。

  1. 在keepalived中,配置定时执行脚本,检测Nginx service是否可用。例如上图,如果Nginx service1不可用,那么就需要把请求都转移到Nginx service2中。通过脚本方式,检测到Nginx service1不可用,此时我们自动把server1keepalived service master关闭

    ps:因为需要执行shell脚本检测Nginx服务的可用性以及自动关闭Keepalived service,所以二者需要在同一台server。

  2. server1keepalived service master关闭后,会自动选举server2keepalived service backup为新的master,通过虚拟ip访问的所有请求都会转发到server2keepalived service。同样,server2keepalived service只会转发请求给同机器的Nginx service2,这样就完成了请求转移处理,保证Nginx的服务可用。

keepalived安装与配置

安装步骤

  1. 下载压缩包,官网地址
  2. 解压缩压缩包
  3. 新建一个安装路径,我使用的是 /usr/local/lib/keepalived
  4. apt-get install libssl-dev ,在ubuntu下安装openssl依赖,其他系统类似。
  5. 安装前进入解压缩文件夹,执行命令预先配置 ./configure -prefix=/usr/local/lib/keepalived --sysconf=/etc
  6. 配置好安装路径之后,执行安装命令make && make install

做完上面的工作之后,keepalived就安装完成了,但是为了操作方便,我们可以把keepalived的相关命令添加到系统中。

  1. 进入安装路径,为keepalived命令创建软连接,进入/usr/local/lib/keepalived,执行命令ln -s sbin/keepalived /sbin
  2. 复制解压文件夹中的init.d到系统环境中cp /usr/local/download/keepalived-2.0.11/keepalived/etc/init.d/keepalived /etc/init.d
  3. ubuntu中检查添加系统服务命令: sysv-rc-conf --list,查看是否有keepalived。
  4. sysv-rc-conf keepalived on启用keepalived相关命令。

做完以上工作就可以使用service keepalived [start|stop|restart]等命令操作。

配置步骤

基本配置

在这里面,配置文件放在/etc/keepalived/文件夹下面。我们需要配置一个vrrp虚拟路由,拥有主备两个节点。完整的配置文件说明在官网地址中有详细说明。执行vim /etc/keepalived/keepalived.conf

精简原配置文件,Master配置文件如下。

! Configuration File for keepalived

global_defs {
# String identifying the machine (doesn't have to be hostname).
# (default: local host name)
   router_id LVS_DEVEL_15
}
#vrrp 虚拟路由冗余协议定义部分
vrrp_instance VI_1 {
# Initial state, MASTER|BACKUP
# As soon as the other machine(s) come up,
# an election will be held and the machine
# with the highest priority will become MASTER.
# So the entry here doesn't matter a whole lot.
# 实际上还是根据优先级来选取master 这个地方的定义不重要
    state MASTER
# interface for inside_network, bound by vrrp
    interface eth0
# arbitrary unique number from 0 to 255
# used to differentiate multiple instances of vrrpd
# running on the same NIC (and hence same socket).
    virtual_router_id 51
#优先级决定谁是master
    priority 100
# VRRP Advert interval in seconds (e.g. 0.92) (use default),vrrp主备之间检查时间间隔
    advert_int 1

    authentication {
        auth_type PASS
 # should be the same on all machines.所有节点应该相同
        auth_pass 1111
    }
 #对外暴露的虚拟ip,可以配置多个
    virtual_ipaddress {
        192.168.0.16
    }
}

#为虚拟ip配置真实ip映射
virtual_server 192.168.0.16 80 {
#health check
    delay_loop 6
#负载均衡算法,表示这里可以配置多个realserver
    lb_algo rr
    lb_kind NAT
 #会话保持时间
    persistence_timeout 50
    protocol TCP
#路由到实际的工作nginx服务进行请求分发,master转发到15的nginx
    real_server 192.168.0.15 80 {
        weight 1
        TCP_CHECK {
             connect_timeout 3  #超时时间
             delay_before_retry 3 #重试间隔
             connect_port 80   #监测端口 
        }
    }
}

备份机Backup配置文件如下。

! Configuration File for keepalived

global_defs {
# String identifying the machine (doesn't have to be hostname).
# (default: local host name)
   router_id LVS_DEVEL_15
}
#vrrp 虚拟路由冗余协议定义部分
vrrp_instance VI_1 {
# Initial state, MASTER|BACKUP
# As soon as the other machine(s) come up,
# an election will be held and the machine
# with the highest priority will become MASTER.
# So the entry here doesn't matter a whole lot.
# 实际上还是根据优先级来选取master 这个地方的定义不重要
    state BACKUP
# interface for inside_network, bound by vrrp
    interface eth0
# arbitrary unique number from 0 to 255
# used to differentiate multiple instances of vrrpd
# running on the same NIC (and hence same socket).
    virtual_router_id 51
#优先级决定谁是master
    priority 50
# VRRP Advert interval in seconds (e.g. 0.92) (use default),vrrp主备之间检查时间间隔
    advert_int 1

    authentication {
        auth_type PASS
 # should be the same on all machines.所有节点应该相同
        auth_pass 1111
    }
 #对外暴露的虚拟ip,可以配置多个
    virtual_ipaddress {
        192.168.0.16
    }
}

#为虚拟ip配置真实ip映射
virtual_server 192.168.0.16 80 {
#health check
    delay_loop 6
#负载均衡算法,表示这里可以配置多个realserver
    lb_algo rr
    lb_kind NAT
 #会话保持时间
    persistence_timeout 50
    protocol TCP
#路由到实际的工作nginx服务进行请求分发,备机转发到13的nginx
    real_server 192.168.0.13 80 {
        weight 1
        TCP_CHECK {
             connect_timeout 3  #超时时间
             delay_before_retry 3 #重试间隔
             connect_port 80   #监测端口 
        }
    }
}

上面的配置文件,配置了一个vrrp实例VI_1,拥有Master(195.168.0.15)和backup(192.168.0.13),对外暴露的虚拟ip是192.168.0.16。

客户端访问192.168.0.16:80,请求都会路由到Master进行处理转发,当Master的keepalived服务挂了,备机的keepalived服务升级为Master,继续对外提供服务。目前,上面的配置中,没有配置定时检测Nginx服务可用性以及自动关闭故障Keepalived master服务,所以故障目前不能转移,但是正常操作可以。

  • 13、15分别执行命令service keepalived satrt启动keepalived服务

  • 分别执行ip addr命令

    13(备机)ip信息如下

    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:97:9b:fd brd ff:ff:ff:ff:ff:ff
        inet 192.168.0.13/24 brd 192.168.0.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::a00:27ff:fe97:9bfd/64 scope link 
           valid_lft forever preferred_lft forever
    
    

    15(Master)ip信息如下

    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:42:db:66 brd ff:ff:ff:ff:ff:ff
        inet 192.168.0.15/24 brd 192.168.0.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet 192.168.0.16/32 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::a00:27ff:fe42:db66/64 scope link 
           valid_lft forever preferred_lft forever
    

    可以明显看出Master的网卡上挂载了192.168.0.16这个虚拟ip

  • 在15(master)上执行命令 service keepalived stop,然后在13(备机)上执行ip addr,ip信息如下

    1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default 
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:97:9b:fd brd ff:ff:ff:ff:ff:ff
        inet 192.168.0.13/24 brd 192.168.0.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet 192.168.0.16/32 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::a00:27ff:fe97:9bfd/64 scope link 
           valid_lft forever preferred_lft forever
    
    

    备机成为Master之后,网卡信息上也挂载了192.168.0.16这个虚拟ip,表示此时由备机对外提供服务。

配置踩过的坑

  1. vrrp_strict配置需要注释掉,否则可能无法ping 通我们配置的对外暴露的虚拟ip,192.168.0.16这个虚拟ip。
  2. 如果注释了 vrrp_strict,依然不能ping 通虚拟ip,那么尝试执行iptables --list,查看拟配置的虚拟ip是否已经包含在iptables规则中,如果在,那么也可能导致你的虚拟ip无法ping通。
  3. 再另外可能就是防火墙的问题,粗暴点关闭防火墙

定时检测脚本配置–实现故障时自动转移

在基本配置中我们配置了基本的转发,与备机启用,没有配置检测检测,无法做到自动的故障转移,还需要手动执行service keepalived stop来关闭故障Nginx服务对应的keepalived,下面配置定时检测Nginx服务可用性,并且决定是否自动关闭keepalived服务,实现故障自动转移。

Master初始主机配置内容修改为如下。新增了【newadd】标记部分

! Configuration File for keepalived

global_defs {
# String identifying the machine (doesn't have to be hostname).
# (default: local host name)
   router_id LVS_DEVEL_15

# Don't run scripts configured to be run as root if any part of the path,启用执行脚本【newadd】
# is writable by a non-root user.
   enable_script_security
}
#定义vrrp执行的脚本【newadd】
vrrp_script nginx_check_for_keepalived {
   script "/usr/local/lib/keepalived/script/nginx-check-for-keepalived.sh"
#执行周期2秒一次
   interval 2
   user root

}


#vrrp 虚拟路由冗余协议定义部分
vrrp_instance VI_1 {
# Initial state, MASTER|BACKUP
# As soon as the other machine(s) come up,
# an election will be held and the machine
# with the highest priority will become MASTER.
# So the entry here doesn't matter a whole lot.
# 实际上还是根据优先级来选取master 这个地方的定义不重要
    state MASTER
# interface for inside_network, bound by vrrp
    interface eth0
# arbitrary unique number from 0 to 255
# used to differentiate multiple instances of vrrpd
# running on the same NIC (and hence same socket).
    virtual_router_id 51
#优先级决定谁是master
    priority 100
# VRRP Advert interval in seconds (e.g. 0.92) (use default),vrrp主备之间检查时间间隔
    advert_int 1

    authentication {
        auth_type PASS
 # should be the same on all machines.所有节点应该相同
        auth_pass 1111
    }
 #对外暴露的虚拟ip,可以配置多个
    virtual_ipaddress {
        192.168.0.16
    }
#配置检测脚本【newadd】
    track_script {
        nginx_check_for_keepalived
    }

}

#为虚拟ip配置真实ip映射
virtual_server 192.168.0.16 80 {
#health check
    delay_loop 6
#负载均衡算法,表示这里可以配置多个realserver
    lb_algo rr
    lb_kind NAT
 #会话保持时间
    persistence_timeout 50
    protocol TCP
#路由到实际的工作nginx服务进行请求分发
    real_server 192.168.0.15 80 {
        weight 1
        TCP_CHECK {
             connect_timeout 3  #超时时间
             delay_before_retry 3 #重试间隔
             connect_port 80   #监测端口 
        }
    }
}

配置修改之后,我么你需要在对应的路径创建相应脚本,在/usr/local/lib/keepalived/script/下面创建nginx-check-for-keepalived.shshell脚本。脚本内容大致为检测nginx是否存活,不存活的话就关闭keepalived服务,启用备机keepalived服务。下面简单写了个脚本,内容可能不够严谨,仅供参考(shell命令不够熟悉…emmmm)。

#! /bin/bash
c=`ps -ef|grep nginx|grep -v nginx-check-for-keepalived|wc -l`
if [ $c -le 1 ];then
        echo "nginx service is dead"
        service keepalived stop
else echo "nginx service is healthy"
fi
  1. 以上配置完成之后,重启主机的keepalived服务
  2. 执行命令,关闭nginx服务
  3. 观察keepalived服务是否也会自动关闭
  4. 原master keepalived service自动关闭后,请求自动转发到备机,故障时转移成功。

遇到的坑

配置完成后,发现脚本一直没有执行成功,执行cat /var/log/syslog |grep keepalived查看keepalived日志,发现报错如下。

Jan  9 22:32:55 zhoujy-VirtualBox Keepalived_vrrp[4368]: Error exec-ing command '/usr/local/lib/keepalived/script/nginx-check-for-keepalived.sh', error 8: Exec format error
Jan  9 22:32:57 zhoujy-VirtualBox Keepalived_vrrp[4369]: Error exec-ing command '/usr/local/lib/keepalived/script/nginx-check-for-keepalived.sh', error 8: Exec format error
Jan  9 22:32:59 zhoujy-VirtualBox Keepalived_vrrp[4370]: Error exec-ing command 

原来是脚本开头定义出错了,平时执行脚本可以,但是keepalived执行时报错。

#! /bin/bash

写成了

# !/bin/bash

keepalived执行脚本时报错,改正后即可。

总结

总的来说,使用keepalived保证Nginx高可用,就是基于主-备架构,利用keepalived实现故障时自动切换到备机。一般使用一个keepalived服务+一个Nginx服务搭配作为一个(主节点),备机节点也一样。相当于keepalived服务监控着Nginx服务,然后利用keepalived自身的故障选举机制,实现间接实现Nginx的故障转移。

你可能感兴趣的:(nginx)