keepalived+nginx高可用负载均衡常见故障问题解决方法

在采用nginx+keepalived方案实现网站高可用负载均衡时,常会出现脑裂问题和主服务器故障但是没有释放资源的问题。
1.对于第一个问题,可以用脚本监控备服务器,一旦发现备服务器上出现vip地址,立即发送告警:

#!/bin/bash
ip a s eth0 | grep "192.168.22.3" >/dev/null
if [ $? -eq 0 ]
then
    echo "keepalived 出故障" | mail -s 异常告警-keepalived [email protected]
fi

将该脚本添加进定时任务里,每2分钟执行一次:

*/2 * * * * /usr/bin/sh /server/scripts/monitoring_keep.sh

注意:备服务器上出现vip地址,一种情况是脑裂问题,另一种情况是主服务器出故障,vip正常切换到备服务器,但是无论哪种原因,都说明集群出现问题,都应该进行告警。

2.对于第二个问题,常见原因是,nginx出故障,但是keepalived工作正常,主服务器占用着vip地址资源,但是却无法提供正常的服务,解决方法是实时监控主服务器的nginx服务,一旦发现nginx出故障,立即释放vip。
1)编写监控nginx服务的脚本:

vim /server/scripts/monitoring_web.sh
#!/bin/bash
num=`ps -ef|grep -c nginx`
if [ $num -lt 2 ]
then
  exit 1
 else
  exit 0
fi

2)给脚本添加可执行权限:

chmod +x /server/scripts/monitoring_web.sh

3)lb01和lb02互为主备,keepalived配置文件内容如下:
lb01配置文件:

[root@lb01 ~]# vim /etc/keepalived/keepalived.conf 
! Configuration File for keepalived
global_defs {
   router_id lb01
}
vrrp_script check_web {
script "/server/scripts/monitoring_web.sh"
interval 2
weight -60
}
vrrp_instance one {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.22.3/24 dev eth0 label eth0:3
    }
    track_script {
             check_web
    }
}
vrrp_instance two {
    state BACKUP
    interface eth0
    virtual_router_id 52
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.22.4/24 dev eth0 label eth0:4
    }
}

lb02配置文件:

[root@lb02 scripts]# vim /etc/keepalived/keepalived.conf 
! Configuration File for keepalived
global_defs {
   router_id lb02
}
vrrp_script monitoring_web {
script "/server/scripts/monitoring_web.sh"
interval 2
weight -60
}
vrrp_instance one {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.22.3/24 dev eth0 label eth0:3
    }
}
vrrp_instance two {
    state MASTER
    interface eth0
    virtual_router_id 52
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.22.4/24 dev eth0 label eth0:4
    }
    track_script {
             monitoring_web
    }        
}

对配置文件weight命令做简要说明:
当weight为正数时:
①如果脚本执行结果为真,则 priority+weight;
②如果脚本执行结果为假,则 priority不变;
当weight为负数时:
①如果脚本执行结果为真,则 priority不变;
②如果脚本执行结果为假,则 priority+weight(减weight绝对值)。
由于我把主配置文件里weight配置为-60,为负,所以当nginx出现故障时,ps -ef|grep -c nginx得到结果必定小于2,会返回1(exit 1),于是keepalived服务监控到的结果是假,就会将priority的值减去weight的值,得到90,比备服务器的100小,主动释放vip资源。同理,当nginx恢复正常后,ps -ef|grep -c nginx得到的结果必定大于2,返回0(exit 0),于是keepalived服务监控到的结果是真, priority值不变,还是保持150,比备服务器的值大,抢回vip资源。

你可能感兴趣的:(linux,nginx,负载均衡,运维)