6 基于BGP EVPN实现Cisco VxLAN控制层面之MAC-IP学习和主机路由通告

一、说明

  • 本篇主要描述BGP EVPN VxLAN VNI间(同租户,不同VNI)互通的控制层面操作,同时也描述了BGP EVPN VxLAN相同VNI内主机互通的过程;
  • 本篇也描述了数据层面的转发过程;
  • 本篇网络拓扑和配置信息全部基于前两篇“4 基于BGP EVPN实现Cisco VxLAN实验 & 分布式任播网关”和“5 基于BGP EVPN实现Cisco VxLAN控制层面之MAC学习”;
  • 本篇新加了ARP抑制配置,另外与之前不同,本篇VRF名称由"Tenant-A"变更为"ta"。

二、拓扑

image.png

三、控制层面操作

3.1 MAC-IP学习过程

  • 本节详细介绍了本端VTEP交换机如何从终端主机生成的免费ARP消息中了解其本地连接的主机的IP地址,以及Host Mobility Manager(HMM-主机移动管理器)组件如何将信息装载进相关VNI的L2RIB中(保留MAC-IP地址信息的L2RIB数据库也被称为IP VRF);
  • 本节展示了如何使用BGP EVPN Route Type 2(MAC/MAC-IP通告路由)将路由从L2RIB导出到BGP Loc-RIB,再通过BGP Adj-RIB-Out通告给远端VTEP交换机;
  • 本节展示了路由信息如何最终到达远端VTEP的L2RIB中。

3.1.1 本端VTEP的ARP学习

  • PC1启动后,它会发送Gratuitous ARP(GARP-免费ARP)来验证其IP地址的唯一性,VTEP交换机Leaf-1从接口E1/3接收到GARP消息,并将来自PC1 MAC的MAC-IP地址绑定信息和来自GARP有效载荷的PC1 IP字段装载进ARP表中;
  • 下方展示了VRF ta的ARP表。在NX-OS中,本地学习的ARP条目的默认老化时间为1500秒,比MAC地址老化计时器短300秒。当ARP老化计时器超时后,交换机会通过向主机发送ARP请求来检查主机的存在。如果主机响应ARP请求,则交换机将重置老化计时器。如果主机未响应ARP请求,则该条目将从ARP表中删除,但在发送删除消息之前,会在BGP EVPN表中额外保留1800秒(MAC老化计时器)。MAC地址老化定时器应大于ARP老化定时器,这是因为ARP刷新进程还将更新MAC表,并且可以避免不必要的泛洪。
Leaf-1# sh ip arp vrf ta

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context ta
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
172.16.1.1      00:02:00  0050.7966.6806  Vlan10 

3.1.2 本端VTEP的MAC-IP

  • 主机移动管理器组件(HMM)将MAC-IP信息作为本地路由进行学习;
  • HMM将信息装载进本地主机数据库中,并将MAC-IP信息转发到L2RIB;
  • 本地主机数据库包含有关IP地址(/32)、MAC地址、SVI和本地接口的信息。L2RIB中具有相同的信息(除了没有SVI外);
  • 下方展示了Leaf-1上部分MAC-IP的学习过程;
Leaf-1# show system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
[10/12/20 14:25:31.870 CST 1 29704] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
[10/12/20 14:25:31.870 CST 2 29704] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6806, 172.16.1.1), l2 vni 0, l3 vni 13960, 
[10/12/20 14:25:31.870 CST 3 29704] Rcvd MAC-IP ROUTE msg: flags , admin_dist 7, seq 0, soo 0, peerid 0, 
[10/12/20 14:25:31.870 CST 4 29704] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 0, pc-ifindex 0
[10/12/20 14:25:31.871 CST 5 29704] (10,0050.7966.6806,172.16.1.1):MAC-IP entry created
[10/12/20 14:25:31.871 CST 6 29704] (10,0050.7966.6806,172.16.1.1,12):MAC-IP route created with flags 0, l3 vni 13960, seq 0
[10/12/20 14:25:31.871 CST 7 29704] (10,0050.7966.6806,172.16.1.1,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:31.871 CST 8 29704] (10,0050.7966.6806,172.16.1.1,12): esi (F), pc-ifindex 0
[10/12/20 14:25:31.875 CST 9 29704] (10,0050.7966.6806,172.16.1.1,12):Encoding MAC-IP best route (ADD, client id 5), esi: (F)
  • 下方展示了Leaf-上VRF ta的本地主机数据库中与PC1的MAC-IP相关绑定信息;
Leaf-1# show fabric forwarding ip local-host-db vrf ta
HMM host IPv4 routing table information for VRF ta
Status: *-valid, x-deleted, D-Duplicate, DF-Duplicate and frozen, 
        c-cleaned in 00:01:49

    Host                 MAC Address        SVI        Flags      Physical Interface
*   172.16.1.1/32        0050.7966.6806     Vlan10     0x420201   Ethernet1/3
  • 下方表明了有关L2RIB下IP VRF中PC1的MAC-IP的信息是由HMM组件产生的
Leaf-1# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated 
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops      
----------- -------------- ------ ---------- --------------- ---------------
10          0050.7966.6806 HMM    --            0          172.16.1.1     Local          
            Sent To: BGP
            L3-Info: 13960

3.1.3 本端VTEP的BGP路由导出

  • VTEP交换机Leaf-1将来自L2RIB的MAC-IP路由装载进BGP Loc-RIB中;
  • MAC-IP信息被作为单独的BGP EVPN Route Type 2更新进行通告(使用MAC-only和MAC IP的专用NLRI更新),MAC-only和MAC-IP路由更新携带的NLRI信息的区别在于:MAC-IP通告除了携带主机的MAC地址外,还携带了主机的IP地址、掩码信息以及MPLS标签栈2的信息,该信息定义了VRF ta中使用的L3VNI;
  • 另外MAC-IP更新消息中还有两个扩展团体属性,包含RT 65234:13960和路由器MAC 5e00.0000.0007;
  • 下方展示了VTEP交换机Leaf-1如何接收MAC-IP路由信息并将其安装到RIB和BGP Loc-RIB中的内部过程,掩码长度包括RD(8×8bit)+MAC地址(6×8bit)+IP地址(4×8bit)=18个8bit即144bit;
Leaf-1# show bgp internal event-history events | in 6806
BRIB:
2020 Oct 12 17:36:36.317231: (default) BRIB: [L2VPN EVPN] Installing prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/144 (local) via 3.3.3.3 label 10010 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
RIB:
2020 Oct 12 17:36:36.319783: (default) RIB: [L2VPN EVPN] add prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1] (flags 0x1) : OK
, total 1
EVENT:
2020 Oct 12 17:36:36.316899: EVT: Received from L2RIB MAC-IP route: Add ESI 0000.0000.0000.0000.0000 topo 10010 mac 0050.7966.6806 ip 172.16.1.1 L3 VN
I 13960 flags 00000000 soo 0 seq 0, reorig :0
  • 下方展示有关PC1的MAC-IP NLRI的BGP Loc-RIB;
Leaf-1# sh bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 969
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007

  Path-id 1 advertised to peers:
    1.1.1.1            2.2.2.2  
  • 上方输出中的前缀信息解释可参考下表;
前缀信息 说明 备注
2 BGP EVPN Route-Type 2 MAC/MAC-IP路由通告
0 Ethernet Segment Identifier (ESI) 全部置零=单宿主站点
0 Ethernet Tag Id EVPN路由必须使用0
48 MAC地址长度 /
0050.7966.6806 MAC地址 /
32 IP地址长度 /
172.16.1.1 IP地址 /
/272 MAC-IP VRF NLRI的长度(以bit为单位) RD(8×8bit) + MAC address(6×8bit) + L2VNI Id(3×8bit) + L3VNI Id(3×8bit) + IP address(4×8bit) + ESI(10×8bit) = 34×8bit即272bits
  • 上方输出中的L2VNI信息显示在“Received label”字段中,另外还有三个BGP扩展团体属性;
BGP扩展团体 说明 备注
RT:65234:10010 用于导出/导入策略(L2VNI) VNI 10010对应VLAN 10
RT:65234:13960 用于导出/导入策略(L3VNI) VNI 13960对应VLAN 3960
ENCAP:8 定义数据层面的封装类型为VxLAN /
Router MAC:5000.0003.0007 用于路由数据包的内层MAC头源地址 这是必要的,因为VxLAN为MAC in UDP封装机制,并且L3边界上的数据有效负载不携带源主机的MAC地址,所以使用RMAC。

3.1.4 远端VTEP的BGP路由导入

  • VTEP交换机Leaf-2接收BGP EVPN MAC路由通告并将其装载进BGP Adj-RIB-In数据库中,并且无需进行任何修改;
  • Leaf-2从BGP Adj-RIB-In数据库中将路由导入到BGP Loc-RIB,并通过最佳路径选择进程将其装载进L2RIB;
  • 当远端VTEP交换机Leaf-2将路由从BGP Adj-RIB装载进BGP Loc-RIB时,它将根据其BGP RID:VLAN ID组合将RD更改为4.4.4.4:32777,此过程与MAC-only路由导入相同,并且基于相同的RT 65234:10010;
  • 下方展示了内部导入过程,Leaf-2将接收到的MAC-IP路由装载进RD 3.3.3.3:32777的BGP Adj-RIB-In中,再将此路由导入到RD 4.4.4.4:32777的BGP Adj-RIB-In中,并装载进BGP Loc-RIB中,最后将其导入L2RIB中。请注意,下方输出还包含L3RIB的装载过程;
Leaf-2# show bgp internal event-history events | i 6806
2020 Oct 12 21:52:48.495013: (default) RIB: [L2VPN EVPN]: Send to L2RIB 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:
[0]:[0.0.0.0]/112
2020 Oct 12 21:52:48.494399: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 0 next hops, suppress 0
2020 Oct 12 21:52:48.494371: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x210, in_rib: yes
2020 Oct 12 21:52:48.493006: (default) BRIB: [L2VPN EVPN] Marking imported path for dest 4.4.4.4:32777:[2]:[0]:[0]:[48]:
[0050.7966.6806]:[0]:[0.0.0.0]/112 as deleted, path ibgp
2020 Oct 12 21:52:48.492893: EVT: [L2VPN EVPN] Deleting imported path [2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]
2020 Oct 12 21:52:48.492506: (default) RIB: [L2VPN EVPN] Add/delete 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no
2020 Oct 12 21:52:48.491786: (default) BRIB: [L2VPN EVPN] Marking path for dest 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.796
6.6806]:[0]:[0.0.0.0]/112 from peer 2.2.2.2 as deleted, pflags = 0x40000011, reeval=0
2020 Oct 12 21:52:48.474282: (default) RIB: [L2VPN EVPN] Suppressing 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]
:[0.0.0.0]/112 download to L2RIB
2020 Oct 12 21:52:48.474255: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 1 next hops, suppress 1
2020 Oct 12 21:52:48.474189: (default) RIB: [L2VPN EVPN] Adding 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0
.0.0]/112 via 3.3.3.3 to NH list (flags2: 0x0)
2020 Oct 12 21:52:48.473909: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x210, in_rib: yes
2020 Oct 12 21:52:48.473593: (default) IMP: [L2VPN EVPN] Import of 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[
0.0.0.0]/112 (EVI: 0) to RD 4.4.4.4:65534 (0) inhibited, no Type2 for EAD-ES import
2020 Oct 12 21:52:48.472917: (default) IMP: [L2VPN EVPN] Importing prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806
]:[0]:[0.0.0.0]/112 to  RD 4.4.4.4:32777
2020 Oct 12 21:52:48.466435: (default) RIB: [L2VPN EVPN] Add/delete 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no
2020 Oct 12 21:52:48.465106: (default) BRIB: [L2VPN EVPN] Marking path for dest 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.796
6.6806]:[0]:[0.0.0.0]/112 from peer 1.1.1.1 as deleted, pflags = 0x40000011, reeval=0
2020 Oct 12 21:47:48.453800: (default) RIB: [L2VPN EVPN]: Send to L2RIB 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:
[0]:[0.0.0.0]/112
2020 Oct 12 21:47:48.451605: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 1 next hops, suppress 0
2020 Oct 12 21:47:48.451584: (default) RIB: [L2VPN EVPN] Adding 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0
.0.0]/112 via 3.3.3.3 to NH list (flags2: 0x0)
2020 Oct 12 21:47:48.451553: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, in_rib: no
  • 下方展示了Leaf-2上BGP-RIB(BRIB)的部分输出(Adj-RIB-In和Loc-RIB)。输出的上半部分描述了从Spine-1接收到的原始、未修改的NLRI,该NLRI装载在Adj-RIB-In中。输出的中间部分显示了已装载进BGP Loc-RIB中并且修改了RD值的相同NLRI,此NLRI基于RT 65234:10010实现路由的正确导入。输出的下半部分显示了与中间部分相同的NLRI(此NLRI与RD 4.4.4.4:3一同装载),它用于VNI间(L3VNI)的流量转发,基于在VRF Context中的配置自动生成的RT 65234:13960导入到相关的L3VNI Loc-RIB。
Leaf-2# show bgp l2vpn evpn 172.16.1.1 vrf ta
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 801
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW

  Path type: internal, path is valid, not best reason: Neighbor Address
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 2.2.2.2 (2.2.2.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 2.2.2.2 

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 824
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:3    (L3VNI 13960)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 799
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

3.1.5 远端VTEP的IP VRF

  • 远端VTEP Leaf-2会验证从NLRI找到的下一跳IP地址的可达性,HMM组件将MAC-IP路由装载进L2RIB中。这时本地拓扑ID为10(基于VLAN 10),路由信息的来源是BGP,下一跳接口信息指向VTEP交换机Leaf-1的NVE1接口绑定的源IP地址;
  • 在此阶段,两个VTEP交换机在其L2RIB以及BGP表中都具有了PC1的MAC-IP信息,但是只有本端VTEP交换机Leaf-1才将MAC-IP绑定信息装载进ARP表中;
  • 下方展示了Leaf-2上的部分MAC-IP学习过程;
Leaf-2# sh system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
[10/12/20 14:25:33.711 CST 1 29679] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 5
[10/12/20 14:25:33.711 CST 2 29679] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6806, 172.16.1.1), l2 vni 0, l3 vni 0, 
[10/12/20 14:25:33.711 CST 3 29679] Rcvd MAC-IP ROUTE msg: flags , admin_dist 0, seq 0, soo 0, peerid 0, 
[10/12/20 14:25:33.711 CST 4 29679] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 1, pc-ifindex 0
[10/12/20 14:25:33.711 CST 5 29679] NH: 3.3.3.3
[10/12/20 14:25:33.713 CST 6 29679] (10,0050.7966.6806,172.16.1.1):MAC-IP entry created
[10/12/20 14:25:33.713 CST 7 29679] (10,0050.7966.6806,172.16.1.1,5):MAC-IP route created with flags 0, l3 vni 0, seq 0
[10/12/20 14:25:33.713 CST 8 29679] (10,0050.7966.6806,172.16.1.1,5): admin dist 20, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:33.714 CST 9 29679] (10,0050.7966.6806,172.16.1.1,5): esi (F), pc-ifindex 0
[10/12/20 14:25:45.795 CST a 29679] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
[10/12/20 14:25:45.795 CST b 29679] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6808, 172.16.1.3), l2 vni 0, l3 vni 13960, 
[10/12/20 14:25:45.795 CST c 29679] Rcvd MAC-IP ROUTE msg: flags , admin_dist 7, seq 0, soo 0, peerid 0, 
[10/12/20 14:25:45.795 CST d 29679] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 0, pc-ifindex 0
[10/12/20 14:25:45.795 CST e 29679] (10,0050.7966.6808,172.16.1.3):MAC-IP entry created
[10/12/20 14:25:45.795 CST f 29679] (10,0050.7966.6808,172.16.1.3,12):MAC-IP route created with flags 0, l3 vni 13960, s
eq 0
[10/12/20 14:25:45.795 CST 10 29679] (10,0050.7966.6808,172.16.1.3,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:45.795 CST 11 29679] (10,0050.7966.6808,172.16.1.3,12): esi (F), pc-ifindex 0
[10/12/20 14:25:45.800 CST 12 29679] (10,0050.7966.6808,172.16.1.3,12):Encoding MAC-IP best route (ADD, client id 5), es
  • 下方表明了L2RIB中的MAC-IP信息是由BGP产生的;
Leaf-2# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated 
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops      
----------- -------------- ------ ---------- --------------- ---------------
10          0050.7966.6806 BGP    --            0          172.16.1.1     3.3.3.3        
            Sent To: ARP
  • 经过以上阶段,两个VTEP交换机都具有了PC1的MAC-IP信息。

3.2 ARP抑制

  • 章节3.1说明了如何在BGP EVPN VxLAN Fabric中传播MAC-IP地址信息。本节介绍了VTEP交换机的ARP抑制机制如何利用MAC-IP绑定信息来减少VxLAN Fabric中不必要的2层BUM(广播、未知单播、组播)流量。

3.2.1 配置Leaf交换机:启用ARP抑制

Leaf-1配置:

interface nve1
  member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

Leaf-2配置:

interface nve1
  member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

Leaf-3配置:

interface nve1
  member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

3.2.2 查看ARP抑制缓存

  • 从启动PC1的阶段开始,当PC1开机后,PC1将GARP/ARP消息发送到网络,Leaf-1将MAC-IP绑定信息安装载进VRF ta的ARP表中,下方展示了Leaf-1的ARP表;
Leaf-1# show  ip arp vrf ta
Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context ta
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
172.16.1.1      00:01:03  0050.7966.6806  Vlan10  
  • 当在本端VTEP交换机上启用基于VNI的ARP抑制时,MAC-IP地址绑定信息也会从ARP表装载进本地ARP抑制缓存中,下方展示了Leaf-1的ARP抑制缓存表;
Leaf-1# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote
 Vtep Addrs

172.16.1.1      00:03:55 0050.7966.6806   10 Ethernet1/3         L
  • 在远端VTEP交换机(Leaf-2)上启用ARP抑制后,ARP抑制缓存信息将从L2RIB中获取。下方展示了Leaf-2上关于PC1的ARP抑制缓存表;
Leaf-2# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote Vtep Addrs

172.16.1.1      05:01:11 0050.7966.6806   10 (null)              R        3.3.3.3 

3.2.3 抑制场景对比:

  1. 无抑制:当收到ARP请求报文时,本地所有ARP请求都发往VNI所关联的组播组,并且所有加入该组播组的VTEP交换机都会接收ARP请求消息,并将其转发到数据包VxLAN包头中VNI ID所定义的广播域的端口;
  2. ARP抑制:当收到ARP请求报文时,本地VTEP交换机检查请求的MAC-IP绑定信息是否存储在本地ARP抑制缓存中。如果检查通过,则本地交换机直接将ARP回复发送给请求者,而不会将ARP请求泛洪到网络中。如果ARP抑制缓存检查未命中,则将ARP请求泛洪到网络中(建议在Intra-VNI访问可达性测试之通过后再启用ARP抑制);
  3. ARP和未知单播抑制:在命中ARP抑制检查的情况下,其工作原理与ARP抑制相同。但是如果未命中,则会丢弃ARP请求,所以此特性要求VxLAN Fabric中不能有静默主机。

3.3 主机路由通告:VNI间路由(L3VNI)

上篇和本篇前半部分介绍了终端主机的MAC和MAC-IP信息如何在VxLAN Fabirc中传播以及如何利用这些信息实现VNI内交换和MAC地址解析,也介绍了利用ARP抑制机制减少BUM流量。本节将说明如何将主机路由导入L3RIB,以及如何利用此信息实现VNI间路由。

3.3.1 本端VTEP RIB中的主机路由

  • 章节3.1介绍了本地VTEP交换机如何将MAC-IP地址绑定信息装载进ARP表中,以及HMM(主机移动管理器)组件如何将信息装载进L2RIB中。除了此过程之外,HMM组件还会将ARP表中的MAC-IP信息装载进L3RIB中;
  • 下方展示了本地VTEP交换机Leaf-1中的VRF ta的RIB。该路由是从VLAN 10中获悉的,并由HMM装载进RIB中;
Leaf-1# show  ip route  172.16.1.1 vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%' in via output denotes VRF 

172.16.1.1/32, ubest/mbest: 1/0, attached
    *via 172.16.1.1, Vlan10, [190/0], 1d05h, hmm

3.3.2 本端VTEP上BGP进程中的主机路由

  • 章节3.1还介绍了如何将MAC-IP信息从L2RIB发送到Loc-RIB,再从Loc-RIB发送到Adj-RIB-Out,然后将其通告为BGP EVPN Route type 2,发送至到远端VTEP交换机;
  • 下方展示了与PC1的IP地址相关的BGP Loc-RIB;
Leaf-1# show bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 969
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007

  Path-id 1 advertised to peers:
    1.1.1.1            2.2.2.2  

3.3.3 远端VTEP上BGP进程中的主机路由

  • 章节3.1没有说明MAC-IP路由信息如何最终进入远端VTEP交换机的L3RIB;
  • 有关PC1 MAC-IP NLRI的BGP EVPN Route Type 2更新还包含了RT 65234:13960(L3VNI);
  • 接收到的NLRI信息通过BGP的Import Policy Engine(基于RT 65234:13960导入)发送,最终将L3VNI条目发送到Loc-RIB;
  • 在Input Policy处理期间,原始RD 3.3.3.3:32777更改为VRF ta特定的RD 4.4.4.4:3:3(3 = VRF ta的VRF ID),RD用于在不同的VRF中的区分重叠的IP地址;
  • 下方展示了Leaf-2的BGP表,可以看到上方描述的所有详细信息(其中包含了原始的信息、修改RD后的信息、L3VNI信息等);
Leaf-2# show bgp l2vpn evpn 172.16.1.1 
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 801
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW

  Path type: internal, path is valid, not best reason: Neighbor Address
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 2.2.2.2 (2.2.2.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 2.2.2.2 

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 824
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:
[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

Route Distinguisher: 4.4.4.4:3    (L3VNI 13960)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 799
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:
[172.16.1.1]/272 
  AS-Path: NONE, path sourced internal to AS
    3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 13960
      Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
      Originator: 3.3.3.3 Cluster list: 1.1.1.1 

  Path-id 1 not advertised to any peer

  • 下方展示了Leaf-2上的VRF信息,其中包含了VRF ID;
Leaf-2# show vrf
VRF-Name                           VRF-ID State   Reason                        
default                                 1 Up      --                            
management                              2 Up      --                            
ta                                      3 Up      --   

3.3.4 将主机路由装载进远端VTEP的RIB

  • 该路由已从BGP Loc-RIB装载进L3 RIB。RIB条目包括有关下一跳地址和隧道ID、封装类型(VxLAN)、网段ID和路由来源(BGP)信息;
  • 在此阶段,本端VTEP交换机Leaf-1和远端VTEP交换机Leaf-2都能够将来自不同L2VNI主机的流量(VNI间流量)路由到PC1(属于L2VNI 10010)。
  • 下方展示了Leaf-2上VRF ta RIB中有关172.16.1.1/32的路由条目;
Leaf-2# show ip route 172.16.1.1 vrf ta 
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%' in via output denotes VRF 

172.16.1.1/32, ubest/mbest: 1/0
    *via 3.3.3.3%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x3030303 encap: VXLAN
  • 下方展示了BGP递归数据库,其中3.3.3.3用于目标172.16.1.1的下一跳;
Leaf-2# show nve internal bgp rnh database vni 13960
--------------------------------------------
Total peer-vni msgs recvd from bgp: 23
Peer add requests: 14
Peer update requests: 0
Peer delete requests: 9
Peer add/update requests: 14
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 5 vni 13960 

Flag codes: 0 - ISSU Done/ISSU N/A        1 - ADD_ISSU_PENDING         
            2 - DEL_ISSU_PENDING          3 - UPD_ISSU_PENDING
        

VNI    Peer-IP            Peer-MAC            Tunnel-ID  Encap     (A/S)  FlagsP
T   
13960  3.3.3.3            5000.0003.0007      0x3030303  vxlan     (1/0)    0  F
AB
13960  5.5.5.5            5000.0005.0007      0x5050505  vxlan     (1/0)    0  F
AB
  • 下方展示了Leaf-2上关于VRF ta的完整路由表;
Leaf-2# show  ip route vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%' in via output denotes VRF 

172.16.1.0/24, ubest/mbest: 1/0, attached
    *via 172.16.1.254, Vlan10, [0/0], 1d06h, direct
172.16.1.1/32, ubest/mbest: 1/0
    *via 3.3.3.3%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x3030303 encap: VXLAN
 
172.16.1.3/32, ubest/mbest: 1/0, attached
    *via 172.16.1.3, Vlan10, [190/0], 1d06h, hmm
172.16.1.5/32, ubest/mbest: 1/0
    *via 5.5.5.5%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x5050505 encap: VXLAN
 
172.16.1.254/32, ubest/mbest: 1/0, attached
    *via 172.16.1.254, Vlan10, [0/0], 1d06h, local
172.16.2.0/24, ubest/mbest: 1/0, attached
    *via 172.16.2.254, Vlan20, [0/0], 1d06h, direct
172.16.2.2/32, ubest/mbest: 1/0, attached
    *via 172.16.2.2, Vlan20, [190/0], 1d06h, hmm
172.16.2.4/32, ubest/mbest: 1/0
    *via 5.5.5.5%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn) 
segid: 13960 tunnelid: 0x5050505 encap: VXLAN
 
172.16.2.254/32, ubest/mbest: 1/0, attached
    *via 172.16.2.254, Vlan20, [0/0], 1d06h, local

四、数据层面操作

4.1 ARP抑制过程

  • 当PC1开机后,即使我们在VTEP Leaf-1的NVE1接口下启用了ARP抑制,从主机PC1接收到的GARP也会被VxLAN封装并泛洪到组播组239.0.0.1;
  • 这是因为VTEP Leaf-1在ARP表和ARP抑制缓存中都没有有关主机PC1的IP/MAC地址信息;
  • 从下方VTEP Leaf-1的Debug输出中也可以看到上方关于ARP的描述过程,Leaf-从主机PC1接收GARP,它没有172.16.1.1的缓存条目,因此必须泛洪该帧,然后Leaf-将更新其ARP抑制缓存和L2RIB;
Leaf-1# terminal monitor
Leaf-1# debug ip arp cache
Leaf-1# debug ip arp event
Leaf-1# debug ip arp suppression-event
Leaf-1# 
Leaf-1# 2020 Oct 13 20:47:51.940670 arp: arp_process_receive_packet_msg: VINCI: Anycast Proxy mode  
2020 Oct 13 20:47:51.940988 arp: arp_process_packet_in_l3_mode: GARP:  Vlan: 10, Dest-ip: 172.16.1.1, Mac-Addr: 0050.7966.6806, ifindex: 0x0   
2020 Oct 13 20:47:51.941107 arp: arp_cache_resolve_l3_addr: arp_cache_resolve_l3_addr 
2020 Oct 13 20:47:51.941173 arp: arp_cache_resolve_l3_addr: mac: 0050.7966.6806, phy-ifindex:0x1a000400, is_local:TRUE 
2020 Oct 13 20:47:51.941283 arp: arp_process_receive_packet_msg: GARP count on the interface Vlan10 is 1 
2020 Oct 13 20:47:51.941696 arp: arp_process_receive_packet_msg: NO GARP storm on interface Vlan10 
2020 Oct 13 20:47:51.941771 arp: arp_process_receive_packet_msg: Existing entry found for source 172.16.1.1 on Vlan10 
2020 Oct 13 20:47:51.941839 arp: arp_add_adj: arp_add_adj: Updating MAC on interface Vlan10, phy-interface Ethernet1/3, flags:0x1 
2020 Oct 13 20:47:51.941927 arp: arp_adj_update_state_get_action_on_add: Successful action on add Previous State:0x10, Current State:0x10 Received event:Data Plane Add, entry: 172.16.1.1, 0050.7966.6806, Vlan10, action to be taken send_to_am:FALSE, arp_aging:TRUE 
2020 Oct 13 20:47:51.942079 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Create request for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 268, vlan_mode: 2, ifindex: 0x901000a, phyifindex 0x1a000400 
2020 Oct 13 20:47:51.942191 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Post L2FM lookup MAC binding : for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 268, vlan_mode: 2, ifindex: 0x901000a, phyifindex 0x1a000400 
2020 Oct 13 20:47:51.942251 arp: arp_cache_create_cache_node: create node for uuid:268, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x10 is_timer: 0 
2020 Oct 13 20:47:51.942396 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Entry with same ip/vlan exists 
2020 Oct 13 20:47:51.942472 arp: arp_add_adj: Entry added for 172.16.1.1, 0050.7966.6806, state 2 on interface Vlan10, physical interface Ethernet1/3, ismct 0. flags:0x10, Rearp (interval: 0, count: 0), TTL: 1500 seconds update_shm:TRUE 
2020 Oct 13 20:47:51.942541 arp: arp_add_adj: Adj info: iod: 139, phy-iod: 9, ip: 172.16.1.1, mac: 0050.7966.6806, type: 0, sync: FALSE, suppress-mode: L2/L3 ARP Suppression flags:0x10 
2020 Oct 13 20:47:51.942595 arp: arp_process_receive_packet_msg: VINCI: enhanced_proxy: 0, traditional_proxy: 1, adj_added: 0 
2020 Oct 13 20:47:51.943681 arp: arp_cache_create_cache_node: create node for uuid:268, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x10 is_timer: 0 
2020 Oct 13 20:47:51.944623 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Entry with same ip/vlan exists 
2020 Oct 13 20:47:51.944702 arp: arp_add_adj: Entry added for 172.16.1.1, 0050.7966.6806, state 2 on interface Vlan10, physical interface Ethernet1/3, ismct 0. flags:0x10, Rearp (interval: 0, count: 0), TTL: 1500 seconds update_shm:TRUE 
2020 Oct 13 20:47:51.945113 arp: arp_add_adj: Adj info: iod: 139, phy-iod: 9, ip: 172.16.1.1, mac: 0050.7966.6806, type: 0, sync: FALSE, suppress-mode: L2/L3 ARP Suppression flags:0x10 
2020 Oct 13 20:47:51.945239 arp: arp_process_receive_packet_msg: Received ARP request on Vlan10 (Ethernet1/3) 
2020 Oct 13 20:47:51.945375 arp: arp_process_receive_packet_msg: Gratuitous ARP request received on Vlan10 (Ethernet1/3).Proxy or Anycast Gateway enabled on Vlan10.Dropping the packet 
  • 下方展示了Leaf-2上的Debug ARP中关于PC1的输出;
Leaf-2# terminal monitor
Leaf-2# debug ip arp cache
Leaf-2# debug ip arp event
Leaf-2# debug ip arp suppression-event
Leaf-2# 
2020 Oct 13 20:55:25.960139 arp: arp_l2rib_msg_cb: arp_l2rib_msg_cb: (Type: Route) Len: 184 Seq: 0, del: 0 (Prod: 5) , peer-id = 0 
2020 Oct 13 20:55:25.960255 arp: arp_l2rib_msg_cb: MAC address: 0050.7966.6806 Remote Host IP: 172.16.1.1 
2020 Oct 13 20:55:25.960564 arp: arp_l2rib_msg_cb: Host IP 172.16.1.1, Remote vtep addr count = 1 
2020 Oct 13 20:55:25.960647 arp: arp_l2rib_msg_cb: RNHs : 3.3.3.3 
2020 Oct 13 20:55:25.960752 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Create request for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 1290, vlan_mode: 2, ifindex: 0x0, phyifindex 0x0 
2020 Oct 13 20:55:25.960893 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Failed to get phy_iod for ifindex 0x0 : Reason no such pss key 
2020 Oct 13 20:55:25.960964 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Post L2FM lookup MAC binding : for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 1290, vlan_mode: 2, ifindex: 0x0, phyifindex 0x0 
2020 Oct 13 20:55:25.961034 arp: arp_cache_create_cache_node: create node for uuid:1290, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x0 is_timer: 0 
2020 Oct 13 20:55:25.961282 arp: arp_cache_create_cache_node: Host IP 172.16.1.1, Remote vtep addr count = 1 
2020 Oct 13 20:55:25.961349 arp: arp_cache_create_cache_node: RNHs : 3.3.3.3 
2020 Oct 13 20:55:25.961622 arp: arp_cache_create_cache_node: New entry: create node 0x6c13ea74 0x6c13ee1c, uuid: 1290, sw-bd: 10, ip:172.16.1.1, mac: 0050.7966.6806, is_local: FALSE, num-macs: 1 
  • 下方展示了Leaf-1的ARP缓存抑制表;
Leaf-1# show ip arp suppression-cache detail 
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote
 Vtep Addrs

172.16.1.1      00:03:44 0050.7966.6806   10 Ethernet1/3         L
  • 下方展示了Leaf-2的ARP缓存抑制表;
Leaf-2# show ip arp suppression-cache detail 
Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote
 Vtep Addrs

172.16.1.1      00:03:01 0050.7966.6806   10 (null)              R        3.3.3.
3

4.2 ARP抑制验证

  • 在PC3(172.16.1.3)上ping PC1(172.16.1.1)
PC3> ping 172.16.1.1
84 bytes from 172.16.1.1 icmp_seq=1 ttl=64 time=58.651 ms
84 bytes from 172.16.1.1 icmp_seq=2 ttl=64 time=52.082 ms
84 bytes from 172.16.1.1 icmp_seq=3 ttl=64 time=54.362 ms
84 bytes from 172.16.1.1 icmp_seq=4 ttl=64 time=67.275 ms
84 bytes from 172.16.1.1 icmp_seq=5 ttl=64 time=50.352 ms
  • 这时本地VTEP Leaf-2能够应答ARP请求消息,因为它具有存储在ARP抑制缓存中的信息。 因此,当主机首次加入网络时,它会发送一条GARP消息,以确保分配给它的IP地址是唯一的;
  • 由于ARP表或ARP抑制高速缓存都没有关于要求的IP-mac绑定的条目,因此该消息将泛洪到其他VTEP叶子交换机。但在这些表完成更新后,下次主机间通讯时无需再进行ARP请求泛洪;
  • 下方展示了Leaf-2发送ARP回复消息的过程;
Leaf-2# 2020 Oct 13 21:02:00.100412 arp: arp_process_receive_packet_msg: VINCI: Anycast Proxy mode  
2020 Oct 13 21:02:00.100797 arp: arp_cache_resolve_l3_addr: arp_cache_resolve_l3_addr 
2020 Oct 13 21:02:00.101111 arp: arp_cache_resolve_l3_addr: mac: 0050.7966.6806, phy-ifindex:0x0, is_local:FALSE 
2020 Oct 13 21:02:00.101405 arp: arp_process_packet_in_l3_mode: ARP request: iod: 139, Vlan: 10, Dest-ip: 172.16.1.1, Mac-Addr: 0050.7966.6806, ifindex: 0x0, is_local: FALSE 
2020 Oct 13 21:02:00.101802 arp: arp_send_response_internal: ARP response from 172.16.1.1 to 172.16.1.3 on Vlan10, phy iod Ethernet1/4, vlan 10, svi_flag: 1 
2020 Oct 13 21:02:00.101867 arp: arp_send_response_internal: arp_send_response_internal: VINCI: is_flood: 0, iod: 139 phyiod: 10 
2020 Oct 13 21:02:00.101953 arp: arp_send_packet: Packet for 0050.7966.6808/172.16.1.3, iod 139(Vlan10), phy_iod 10(Ethernet1/4), phy_is_mct 0, flood_bd 0, flood port 1, skip_unnumbered_flood 0 

4.3 同VRF,不同VNI下的主机互通

  • 关于同VNI下主机互通已在上篇展示,本篇不再展示;
  • 本节以PC1(172.16.1.1) ping PC2(172.162.2.)为例。

4.3.1 Leaf-1的VNI内交换

  • 因为目标IP地址在另一个子网中,所以PC1使用Anycast Gateway MAC(AGM) 1234.1234.1234作为目标MAC地址,PC1向其默认网关Leaf-1发送ICMP请求消息,可参考下图;


    image.png

4.3.2 Leaf-1上将数据包从L2VNI 10010路由到L3VNI 13960

  • 本地VTEP交换机Leaf-1接收帧。目标IP地址172.16.2.2(主机PC3)是通过BGP学习的,并与下一跳IP地址4.4.4.4(Leaf-2)一起装载进RIB中,并在数据平面中也封装了其他信息,例如L3VNI和封装类型;
  • Leaf-1对下一跳地址进行递归路由查找,封装原始数据包并加上包含VNI ID(13960)的VxLAN包头,并通过Spine-1和Spine2将数据包路由到Leaf-2(外层MAC地址属于Spine-1和Spine-2);
  • 因为VxLAN属于MAC in UDP封装类型,所以必须有内层源MAC地址和目标MAC地址。内层源MAC地址是从Inter-VNI路由中使用的SVI(SVI VLAN 3960)中获取的,内层目标地址是BGP扩展团体通过BGP更新接收到的RMAC。

4.3.3 Leaf-2上将数据包从L3VNI 13960路由到L2VNI 10020

  • 当VTEP交换机Leaf-2收到VxLAN封装的数据包时,它将拆掉VxLAN包头。由于VNI 13960已关联到VRF ta,因此路由决策基于VRF ta的RIB;

  • Leaf-2将原始ICMP请求路由到VLAN 20,并通过接口E1/3转发出去;

  • 以上过程描述了对称式集成路由与桥接(IRB)模型,其中数据包首先由本地VTEP交换,然后通过使用VxLAN包头中的公用L3VNI在VxLAN Fabric中进行路由。接收方VTEP交换机收到数据包后拆掉VxLAN封装,并根据原始IP数据包的目标IP地址做出路由决策。在路由选择决定之后,数据包被转发到目的地(bridge-route-route-bridge),数据包回程遵循相同的模型;

  • 使用对称式IRB提供了设计上的灵活性,因为与非对称式IRB不同,无需将所有VNI配置到所有的VTEP交换机。非对称式IRB基于"bridge-route-bridge"模型,其中没有公用的L3VNI用于VNI间路由。例如:如果我们在VxLAN Fabric中使用非对称式IRB,则主机PC1会将数据包发送至默认网关(bridge部分),就像在对称式IRB中一样。本地VTEP交换机Leaf-1做出路由决策,但不是使用的公用L3VNI,而是使用VxLAN包头中的VNI 10020,该包头关联到VLAN 20(VNI 10020关联的VLAN),这是“route”部分。接收方VTEP交换机Leaf-2收到数据包后拆掉VxLAN包头,并基于VxLAN 10020将数据包转发至VLAN 20,最终到达主机PC3。

  • 测试PC1 ping PC3,并在Spine与Leaf之间抓包,下方展示了抓包结果;


    image.png
  • 以上说明了如何在VxLAN Fabric中传播主机的IP地址以及如何将其装载进L3RIB中。

五、总结

image.png

六、引用参考

膜拜大佬:Toni Pasanen
https://nwktimes.blogspot.com/2018/05/vxlan-part-vii-vxlan-bgp-evpn-control.html

你可能感兴趣的:(6 基于BGP EVPN实现Cisco VxLAN控制层面之MAC-IP学习和主机路由通告)