一、MAD
Multi-Active Detection,多Active检测。IRF链路故障会导致一个IRF变成多个新的IRF。这些IRF拥有相同的IP地址等三层配置,会引起地址冲突,导致故障在网络中扩大。为了提高系统的可用性,当IRF分裂时我们就需要一种机制,能够检测出网络中同时存在多个IRF,并进行相应的处理尽量降低IRF分裂对业务的影响。MAD就是这样一种检测和处理机制。它主要提供以下功能:
(1)分裂检测
通过ARP(Address Resolution Protocol)、ND(Neighbor Discovery Protocol)、LACP(Link Aggregation Control Protocol,链路聚合控制协议)或者BFD(Bidirectional Forwarding Detection,双向转发检测)来检测网络中是否存在多个IRF。
(2)冲突处理
IRF分裂后,通过分裂检测机制IRF会检测到网络中存在其它处于Active状态(表示IRF处于正常工作状态)的IRF。
· 对于BFD MAD/ ARP MAD/ND MAD检测,冲突处理会直接让Master成员编号小的IRF处于Active状态,继续正常工作;其它IRF迁移到Recovery状态。
· 对于LACP MAD检测,冲突处理会先比较两个IRF中成员设备的数量,数量多的IRF处于Active状态,继续工作;数量少的迁移到Recovery状态;如果成员数量相等,则Master成员编号小的IRF处于Active状态,继续正常工作;其它IRF迁移到Recovery状态。
IRF迁移到Recovery状态后会关闭该IRF中所有成员设备上除保留端口以外的其它所有物理端口(通常为业务接口),以保证该IRF不能再转发业务报文。缺省情况下,只有IRF链路物理端口是保留端口,用户也可以通过mad exclude interface命令行将其它端口设置为保留端口。
(3)MAD故障恢复
IRF链路故障导致IRF分裂,从而引起多Active冲突。因此修复故障的IRF链路,让冲突的IRF重新合并为一个IRF,就能恢复MAD故障。如果在MAD故障恢复前,处于Active状态的IRF出现其他故障,则可以通过命令行先启用Recovery状态的IRF,让它接替原IRF工作,以便保证业务尽量少受影响,再恢复MAD故障。
二、BFD
Bidirectional Forwarding Detection,双向转发检测。如果说MAD是机制,那么BFD就是检测手段。
(一) BFD MAD检测原理
BFD MAD检测是通过BFD协议来实现的。要使BFD MAD检测功能正常运行,除在三层接口下使能BFD MAD检测功能外,还需要在该接口上配置MAD IP地址。MAD IP地址与普通IP地址不同的地方在于:MAD IP地址与成员设备是绑定的,IRF中的每个成员设备上都需要配置,且所有成员设备的MAD IP必须属于同一网段。
l 当IRF正常运行时,只有Master上配置的MAD IP地址生效,Slave设备上配置的MAD IP地址不生效,BFD会话处于down状态;(使用display bfd session命令查看BFD会话的状态。如果Session State显示为Up,则表示激活状态;如果显示为Down,则表示处于down状态)
l 当IRF分裂形成多个IRF系统时,不同IRF中Master上配置的MAD IP地址均会生效,BFD会话被激活,此时会检测到多Active冲突。
l 检测到多Active冲突后,会直接让Master成员编号小的IRF处于Active状态,继续正常工作;其它IRF上报MAD冲突事件给IRF模块,IRF模块将该IRF迁移到MAD Recovery状态。
图6 BFD MAD交互流程
(二) BFD MAD检测组网要求
BFD MAD检测方式需要使用中间设备(如图7所示),每个成员设备都需要连接到中间设备,这些BFD链路专用于MAD检测。这些链路连接的接口必须属于同一VLAN,在该VLAN接口视图下给不同成员设备配置同一网段下的不同IP地址。
在用于BFD MAD检测的接口下必须使用mad ip address命令配置MAD IP地址,而不要配置其它IP地址(包括使用ip address命令配置的普通IP地址、VRRP虚拟IP地址等),以免影响MAD检测功能。
图7 BFD MAD检测组网示意图
IRF支持的MAD检测的四种方式区别:
IRF支持的MAD检测方式有:LACP MAD检测、BFD MAD检测、ARP MAD检测和ND MAD检测。四种MAD检测机制各有特点,用户可以根据现有组网情况进行选择。由于LACP MAD和BFD MAD、ARP MAD、ND MAD冲突处理的原则不同,请不要同时配置。BFD MAD、ARP MAD、ND MAD这三种方式独立工作,彼此之间互不干扰,可以同时配置。
MAD检测方式 |
优势 |
限制 |
LACP MAD |
检测速度快,利用现有聚合组网即可实现,无需占用额外接口,利用聚合链路同时传输普通业务报文和MAD检测报文(扩展LACP报文) |
组网中需要使用H3C设备作为中间设备,每个成员设备都需要连接到中间设备 |
BFD MAD |
检测速度较快,组网形式灵活,对其它设备没有要求 |
当堆叠设备大于两台时,组网中需要使用中间设备,每个成员设备都需要连接到中间设备,这些BFD链路专用于MAD检测 |
ARP MAD |
非聚合的IPv4组网环境,和MSTP配合使用,无需占用额外端口。在使用中间设备的组网中对中间设备没有要求 |
检测速度慢于前两种。
|
ND MAD |
非聚合的IPv6组网环境,和MSTP配合使用,无需占用额外端口。在使用中间设备的组网中对中间设备没有要求 |
检测速度慢于前两种 |
表1 MAD检测机制的比较
三、MAD 、BFD及IRF分裂验证(重要)
《1》 堆叠分裂后,设备并不会自动重启,只有加入现有的堆叠组的情况下才会。
[H3C]interface FortyGigE 1/0/53
[H3C-FortyGigE1/0/53]SHUT
[H3C-FortyGigE1/0/53]%May 21 16:49:23:411 2020 H3C STM/3/STM_LINK_DOWN: IRF port 1 went down.
%May 21 16:49:23:413 2020 H3C DEV/3/BOARD_REMOVED: Board was removed from slot 2, type is H3C S5820V2-54Q.
%May 21 16:49:23:415 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface FortyGigE1/0/53 changed to down.
%May 21 16:49:23:428 2020 H3C LAGG/6/LAGG_INACTIVE_PHYSTATE: Member port GE2/0/2 of aggregation group BAGG10 changed to the inactive state, because the physical state of the port is down.
%May 21 16:49:23:428 2020 H3C LAGG/6/LAGG_INACTIVE_PHYSTATE: Member port GE2/0/3 of aggregation group BAGG1 changed to the inactive state, because the physical state of the port is down.
%May 21 16:49:23:446 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface FortyGigE1/0/53 changed to down.
%May 21 16:49:23:464 2020 H3C IFNET/3/IF_WARN: The jumboframe of the aggregate interface Bridge-Aggregation1 is not supported on the member port GigabitEthernet1/0/3
%May 21 16:49:23:505 2020 H3C BFD/5/BFD_CHANGE_FSM: Sess[192.168.1.1/192.168.1.2, LD/RD:129/129, Interface:Vlan4000, SessType:Ctrl, LinkType:INET], Ver:1, Sta: DOWN->INIT, Diag: 0 (No Diagnostic)
%May 21 16:49:23:798 2020 H3C SHELL/5/SHELL_LOGOUT: Console logged out from con1.
%May 21 16:49:23:879 2020 H3C IFNET/3/IF_WARN: The jumboframe of the aggregate interface Bridge-Aggregation10 is not supported on the member port GigabitEthernet1/0/2
%May 21 16:49:24:407 2020 H3C BFD/5/BFD_MAD_INTERFACE_CHANGE_STATE: BFD MAD function enabled on Vlan-interface4000 changed to the normal state.
%May 21 16:49:24:707 2020 H3C BFD/5/BFD_CHANGE_FSM: Sess[192.168.1.1/192.168.1.2, LD/RD:129/129, Interface:Vlan4000, SessType:Ctrl, LinkType:INET], Ver:1, Sta: INIT->UP, Diag: 0 (No Diagnostic)
%May 21 16:49:24:709 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface GigabitEthernet1/0/1 changed to down.
%May 21 16:49:24:710 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface GigabitEthernet1/0/1 changed to down.
%May 21 16:49:24:710 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface Vlan-interface4000 changed to down.
%May 21 16:49:24:710 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface Vlan-interface4000 changed to down.
查看两部机子的状态:
A机:
dis irf
MemberID Role Priority CPU-Mac Description
*+1 Master 32 764d-ea56-0104 ---
--------------------------------------------------
* indicates the device is the master.
+ indicates the device through which the user logs in.
The bridge MAC of the IRF is: 764d-ea56-0100
Auto upgrade : yes
Mac persistent : 6 min
Domain ID : 0
B机:
dis irf
MemberID Role Priority CPU-Mac Description
*+2 Master 1 764e-0a91-0204 ---
--------------------------------------------------
* indicates the device is the master.
+ indicates the device through which the user logs in.
The bridge MAC of the IRF is: 764d-ea56-0100
Auto upgrade : yes
Mac persistent : 6 min
Domain ID : 0
《2》 MAD检查线主要保证在堆叠分裂后双Acitve的装载下只有member小的成员业务端口开启,其他成员则进入Recovery状态;
A机:
dis int brief
Brief information on interfaces in route mode:
Link: ADM - administratively down; Stby - standby
Protocol: (s) - spoofing
Interface Link Protocol Primary IP Description
InLoop0 UP UP(s) --
MGE0/0/0 DOWN DOWN --
NULL0 UP UP(s) --
REG0 UP -- --
Vlan1 UP UP 10.1.1.2
Vlan4000 DOWN DOWN 192.168.1.1
B机:
dis ip int brief
*down: administratively down
(s): spoofing (l): loopback
Interface Physical Protocol IP Address Description
MGE0/0/0 down down -- --
Vlan1 down down 10.1.1.2 --
Vlan4000 down down 192.168.1.2 --
dis int vlan 1
Vlan-interface1
Current state: MAD ShutDown
Line protocol state: DOWN
Description: Vlan-interface1 Interface
Bandwidth: 100000 kbps
Maximum transmission unit: 1500
Internet address: 10.1.1.2/24 (primary)
IP packet frame type: Ethernet II, hardware address: 764d-ea56-0102
IPv6 packet frame type: Ethernet II, hardware address: 764d-ea56-0102
Last clearing of counters: Never
Last 300 seconds input rate: 1 bytes/sec, 8 bits/sec, 0 packets/sec
Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec
Input: 5 packets, 320 bytes, 0 drops
Output: 3 packets, 138 bytes, 0 drops
《3》 命令mad restore适用于在配置了MAD检测和IRF堆叠的设备断了堆叠口后,主设备在Active跑业务发生不可预料的情况下,改用处于Recovery状态的备机跑业务。(使用命令时要确认MAD检测是否已失效否则输入命令后接口会出现先UP后DOWN)
[H3C]interface GigabitEthernet 2/0/1
[H3C-GigabitEthernet2/0/1]dis th
#
interface GigabitEthernet2/0/1
port link-mode bridge
port access vlan 4000
combo enable fiber
#
return
[H3C-GigabitEthernet2/0/1]shut
[H3C-GigabitEthernet2/0/1]qu
[H3C]mad restore
This command will restore the device from multi-active conflict state. Continue? [Y/N]:y
Restoring from multi-active conflict state, please wait...
%May 21 16:56:33:633 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface GigabitEthernet2/0/2 changed to up.
%May 21 16:56:33:634 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface GigabitEthernet2/0/3 changed to up.
[H3C]%May 21 16:56:33:638 2020 H3C LAGG/6/LAGG_ACTIVE: Member port GE2/0/2 of aggregation group BAGG10 changed to the active state.
%May 21 16:56:33:640 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface GigabitEthernet2/0/2 changed to up.
%May 21 16:56:33:641 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation10 changed to up.
%May 21 16:56:33:641 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation10 changed to up.
%May 21 16:56:33:642 2020 H3C LAGG/6/LAGG_ACTIVE: Member port GE2/0/3 of aggregation group BAGG1 changed to the active state.
%May 21 16:56:33:645 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface GigabitEthernet2/0/3 changed to up.
%May 21 16:56:33:646 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation1 changed to up.
%May 21 16:56:33:646 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation1 changed to up.
%May 21 16:56:33:660 2020 H3C IFNET/3/PHY_UPDOWN: Physical state on the interface Vlan-interface1 changed to up.
%May 21 16:56:33:660 2020 H3C IFNET/5/LINK_UPDOWN: Line protocol state on the interface Vlan-interface1 changed to up.
[H3C]
[H3C]dis ip int brief
*down: administratively down
(s): spoofing (l): loopback
Interface Physical Protocol IP Address Description
MGE0/0/0 down down -- --
Vlan1 up up 10.1.1.2 --
Vlan4000 down down 192.168.1.2 --
本文章引用以下文章:
skytwen H3C IRF MAD检测原理及相关问题验证 https://www.cnblogs.com/sky5hat/p/10481939.html
IRF MAD应用模型及技术分析 https://www.h3c.com/cn/d_201510/922083_30005_0.htm
S7500E虚拟化技术配置指导 http://www.h3c.com/cn/d_201708/1018599_30005_0.htm