在上篇文章中我们简单介绍了keepalived这个软件的安装,启动分析情况,这次我们来介绍keepalived的故障切换、故障恢复、及vrrp_script模块实现对集群资源的监控,整体架构还是和上次的一样,这里就不再说明了


1、keepalived的故障切换过程分析

首先在keepalived主节点上面关闭httpd服务,然后看看keepalived是如何实现 故障切换的

[root@centos01 keepalived]# /etc/init.d/httpd stop
Stopping httpd:                                            [  OK  ]


观察备用节点log,可以看到vip地址漂移到这里来了

wKiom1eZonnCP2mpAABXPkGmB_U315.png-wh_50

[root@centos02 keepalived]# tail -f /var/log/messages|grep -v PYTHOn
Jul 28 14:07:03 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) Transition to MASTER STATE
Jul 28 14:07:05 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) Entering MASTER STATE
Jul 28 14:07:05 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) setting protocol VIPs.
Jul 28 14:07:05 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) Sending gratuitous ARPs on eth0 for 172.16.80.100
Jul 28 14:07:05 centos02 Keepalived_healthcheckers[4363]: Netlink reflector reports IP 172.16.80.100 added
Jul 28 14:07:10 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) Sending gratuitous ARPs on eth0 for 172.16.80.100


同时在主节点上面抓包,我们来简单分析下这个包

[root@centos01 keepalived]# tcpdump -i eth0 -n -vvv -s0 -w httpd.cap


HA 集群软件 keepalived 详解2_第1张图片



源IP对应的是主节点的IP地址 172.16.80.116,而目的地址是组播 224.0.0.18,当我们把主节点上面httpd服务停止时,可以看到 主节点上面优先级立刻变成0,也看到这里是明文传输的,密码 1111,接下来我们看下一个包

HA 集群软件 keepalived 详解2_第2张图片

我们可以看到这个包源IP地址变成了备用节点的IP地址 172.16.80.117,而目标地址依然是224.0.0.18

从优先级85 我们也可知道这是备用节点设置的优先级


查看vip漂移情况

主节点IP,可以看到没有vip地址

[root@centos01 keepalived]# ip addr

1: lo: mtu 65536 qdisc noqueue state UNKNOWN 

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host 

       valid_lft forever preferred_lft forever

2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000

    link/ether 00:0c:29:4c:62:c9 brd ff:ff:ff:ff:ff:ff

    inet 172.16.80.116/24 brd 172.16.80.255 scope global eth0

    inet6 fe80::20c:29ff:fe4c:62c9/64 scope link 

       valid_lft forever preferred_lft forever


备用节点,可以看到vip地址漂移到备用节点这里

[root@centos02 keepalived]# ip addr

1: lo: mtu 65536 qdisc noqueue state UNKNOWN 

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host 

       valid_lft forever preferred_lft forever

2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000

    link/ether 00:0c:29:45:fe:30 brd ff:ff:ff:ff:ff:ff

    inet 172.16.80.117/24 brd 172.16.80.255 scope global eth0

    inet 172.16.80.100/32 scope global eth0

    inet6 fe80::20c:29ff:fe45:fe30/64 scope link 

       valid_lft forever preferred_lft forever


2、故障恢复切换分析

在主用节点上面启动httpd,观察日志


主节点上面


HA 集群软件 keepalived 详解2_第3张图片

[root@centos01 keepalived]# tail -f /var/log/messages|grep -v "PYTHON"

Jul 28 14:39:38 centos01 Keepalived_vrrp[65334]: VRRP_Script(check_httpd) succeeded

Jul 28 14:39:38 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) prio is higher than received advert

Jul 28 14:39:38 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) Transition to MASTER STATE

Jul 28 14:39:38 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) Received lower prio advert, forcing new election

Jul 28 14:39:40 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) Entering MASTER STATE

Jul 28 14:39:40 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) setting protocol VIPs.

Jul 28 14:39:40 centos01 Keepalived_healthcheckers[65333]: Netlink reflector reports IP 172.16.80.100 added

Jul 28 14:39:40 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) Sending gratuitous ARPs on eth0 for 172.16.80.100

Jul 28 14:39:46 centos01 Keepalived_vrrp[65334]: VRRP_Instance(HA_1) Sending gratuitous ARPs on eth0 for 172.16.80.100


可以看到vip地址再次漂移回主节点上面


备用节点log

HA 集群软件 keepalived 详解2_第4张图片

[root@centos02 keepalived]# tail -f /var/log/messages|grep -v "PYTHON"

Jul 28 14:39:38 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) Received higher prio advert

Jul 28 14:39:38 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) Entering BACKUP STATE

Jul 28 14:39:38 centos02 Keepalived_vrrp[4364]: VRRP_Instance(HA_1) removing protocol VIPs.

Jul 28 14:39:38 centos02 Keepalived_healthcheckers[4363]: Netlink reflector reports IP 172.16.80.100 removed


可以看到备用节点上面vip地址被移除了


再来看看实际的vip地址情况

主用节点

HA 集群软件 keepalived 详解2_第5张图片

备用节点

HA 集群软件 keepalived 详解2_第6张图片


纵观keepalived的整个运行过程及切换过程,看似合理,事实以上并非如此,在一个高负载,高并发 追求稳定的业务系统中,执行一次主备切换对业务系统影响很大,因此不到万不得已,尽量不要进行主备切换,也就是说在主节点发生故障后必须要切换到备用节点,而在主节点恢复后,不希望再次切换到主节点,知道备用节点发生故障时才进行切换,这就是里面的不抢占功能 通过keepalived的 nopreempt选项来实现


vrrp_script模块内容比较多,我们还是下次再来介绍吧