MPLS MTU和Interface MTU的区别问题 
2007-1-20 11:53:00
这是一个ISP的MAN的真实故障。当时一台OSR7609和GSR12416之间跑的是MPLS/×××,用的GE电路连接。出现故障如下:
OSR7609 下联HUAWEI 8825用于小区及其它宽带的接入,其中下端的用户无法访问SINA,VIP.163等多个站点,其它站点访问正常。将笔记本直接接到OSR7609时故 障排除。后来通过抓包发现访问SINA或者VIP.163等几个站点的MTU值都不是标准的1500,通过更改端口上的MTU值故障排除。

不改变MTU值可能会造成如下2种故障情况:
1,PING大包,不通
2,无法访问某些站点

但是更改MTU后,如果IGP是OSPF的话,不同的MTU可能会造成OSPF 停留在INIT状态。

以下就是当时的配置:
interface GigabitEthernet15/3/0
description "connect to OSR-7609"
mtu 1538
ip address
no ip directed-broadcast
ip router isis isp
no negotiation auto
tag-switching ip
isis circuit-type level-2-only
isis metric 30 level-2

通 过后来大家的分析,这种情况会发生在使用以太端口上,而且通过dot1q,起INT VLAN方式的VRF的时候。后来查阅了一下CCO,发现有一条tag-switch MTU的命令,现在这条命令已经被MPLS MTU命令取代了。对于MPLS MTU这条命令CCO的描述如下:

The MTU in bytes includes the label stack in the value. For example, to transport an IPv4 packet of 1500 bytes from the edge through an MPLS core, you need an MPLS MTU of at least 1504 bytes. This value accounts for the single 4-byte label and avoids fragmentation. Use the following calculation to determine the MTU:

而且给出了如下的公式

MPLS MTU = edge MTU + (label stack * 4 bytes)

我们可以得出以太接口上面

MPLS MTU = 实际的MTU + 标签堆栈数量×4比特

这是使用MPLS MTU需要注意的问题

Setting the MPLS MTU to a high number can lead to packets being dropped on some devices, because the labeled packet is larger than the interface physical MTU.

"ATM interfaces cannot accommodate packets that exceed the Segmentation and Reassembly (SAR) buffer size, because labels are added to the packet. The bytes argument refers to the number of bytes in the packet before the addition of any labels. If each label is 4 bytes, the maximum value of bytes on an ATM interface is the physical MTU minus 4*x bytes, where x is the number of labels expected in the received packet.

"If a labeled IPv4 packet exceeds the MPLS MTU size for the interface, Cisco IOS software fragments the packet. If a labeled non-IPv4 packet exceeds the MPLS MTU size, the packet is dropped.

"All devices on a physical medium must have the same MPLS MTU value in order for MPLS to interoperate.

"The MTU for labeled packets for an interface is determined as follows:

?CIf the mpls mtu bytes command has been used to configure an MPLS MTU, the MTU for labeled packets is the bytes value.

?COtherwise, the MTU for labeled packets is the default MTU for the interface.

"Because labeling a packet makes it larger due to the label stack, you may want the MPLS MTU to be larger than the interface MTU or IP MTU in order to prevent the fragmentation of labeled packets, which would not be fragmented if they were unlabeled.

"Changing the interface MTU value (using the mtu interface configuration command) can affect the MPLS MTU of the interface. If the MPLS MTU value is the same as the interface MTU value (this is the default), and you change the interface MTU value, the MPLS MTU value will automatically be set to this new MTU as well. However, the reverse is not true; changing the MPLS MTU value has no effect on the interface MTU.

也就是说标准MPSL帧的MTU 应该是1504 而MPLS/×××帧的MTU应该是1508。但是在实际的环境中1508似乎不能很好的解决问题,而且在我们搭建的虚拟的试验环境中不会出现以上提到的 问题。回到我们上面提到的那个故障,可以设想我们故障的排除是由于提高了MTU的值,引起了MPLS MTU值的改变排除的故障。

但 是在CISCO的文档中是不推荐更改MPLS MTU的值的,在下文里面明确指出CISCO推荐的MPLS MTU的值应该是小于等于INT MTU的值,也就是说在上面的故障中如果我的数据包中的MTU如果是小于1500的话,在以太网端口上,加上标签带的字节数,MPLS MTU是不会大于1500的,不会出现故障,如果数据包MTU的值已经是1500了就会出现上面提到的问题。

If you have configuration files with MPLS MTU values that are larger than the interface MTU values and you upgrade to Cisco IOS Release 12.2(27)SBC, 12.2(28)SB, 12.2(33)SRA, or later, the software does not change the MPLS MTU value. When you reboot the router, the software accepts the values that are set for the MPLS MTU and the interface MTU. However, Cisco recommends you set the MPLS MTU values equal to or lower than the interface MTU values.

以下的内容是CCO上面关于MPLS MTU 和INT MTU的描写

Guidelines for Setting MPLS MTU and Interface MTU Values
When configuring the network to use MPLS, set the core-facing interface MTU values greater than the edge-facing interface MTU values, using one of the following methods:

"Set the interface MTU values on the core-facing interfaces to a higher value than the interface MTU values on the customer-facing interfaces to accommodate any packet labels, such as MPLS labels, that an interface might encounter. Make sure that the interface MTUs on the remote end interfaces have the same interface MTU values. The interface MTU values on both ends of the link must match.

"Set the interface MTU values on the customer-facing interfaces to a lower value than the interface MTU on the core-facing interfaces to accommodate any packet labels, such as MPLS labels, than an interface might encounter. When you set the interface MTU on the edge interfaces, ensure that the interface MTUs on the remote end interfaces have the same values. The interface MTU values on both ends of the link must match.

Changing the interface MTU can also modify the IP MTU, Connectionless Network Service (CLNS) MTU, and other MTU values, because they depend on the value of the interface MTU. The Open Shortest Path First (OSPF) routing protocol requires that the IP MTU values match on both ends of the link. Similarly, the Intermediate System-to-Intermediate System (IS-IS) routing protocol requires that the CLNS MTU values match on both ends of the link. If the values on both ends of the link do not match, IS-IS or OSPF cannot complete its initialization.

If the configuration of the adjacent router does not include the mpls mtu and mtu commands, add these commands to the router.

这段文章结合前面的我们可以得出几个答案就是:

1,建议核心端路由器端口的MTU值应该大于客户端MTU值,用来保证可以满足客户端的所有标签的应用。

2,不应该试图通过单独更改MPLS MTU的方式来解决问题,上面的文档明确了,MPLS MTU的值不应该大于INT MPLS的值。最好的方法就是通过更改INT MTU的值的增加,来避免核心设备不能支持更多的标签类型问题的出现

3,更改INT MTU会造成很多MTU值的改变,要保证IGP或者其它的协议正常运行必须保证设备对端的端口MTU一致。

总结一下,上面我提到的是不是能通过增大MPLS MTU值解决这个问题,从上面的资料分析应该是不可行了,至于MPLS MTU 和 INTERFACE MTU之间的关系文章里面讲的很模糊,下一步还需要进一步探讨。对于我上面提到故障我觉得原因可能是这样的.

对 于实际MTU小于1500的包的MPLS MTU,如果也小于以太端口上面的MTU就不会出现故障,但是如果实际的MTU大于等于1500,它得MPLS MTU也就会超过INT MTU的值出现上面出现的故障。(Setting the MPLS MTU to a high number can lead to packets being dropped on some devices, because the labeled packet is larger than the interface physical MTU. )改变INT MTU的值可以改变MPLS MTU的值,但是改变MPLS MTU的值不会对INT MTU的值有影响。

CISCO的文档里面好像有点前后矛盾,现在思路不清,可能上面的分析有问题。那位了解兄弟能帮我解决一下。明天在仔细的看一遍,是不是有什么东西遗漏了.


MPLS MTU Is Too Small in the MPLS ××× Backbone
If large packets are sent across the MPLS backbone with the Don't Fragment (DF) bit set in the IP packet header, and LSR interfaces and Ethernet switches are not configured to support large labeled packets, the packets will be dropped.

NOTE

In this scenario, end hosts will usually quietly reduce the size of the packets they send unless Path MTU Discovery is brokenwhich usually happens because of a misconfigured router or firewall. Path MTU Discovery is a mechanism that allows hosts to dynamically discover the Maximum Transmission Unit (MTU) of a path across a network. See RFC 1191 as well as the following URL for more information:

[url]http://www.cisco.com/en/US/tech/tk870/tk472/tk473/technologies_tech_note09186a008011a218.shtml#pmtud_fail[/url]



In an MPLS network, link MTU sizes must take the label stack into account. In a simple MPLS ××× network without MPLS TE, a label stack depth of two is used (TDP/LDP signaled IGP label + ××× label). If MPLS traffic engineering (TE) is being used between P routers in an MPLS ××× backbone, a label stack depth of three is used (RSVP signaled TE label + TDP/LDP signaled IGP label + ××× label). And if you are using Fast Reroute with MPLS TE, that is four labels. Each label is 4 bytes, so the total size of the label stack is number of labels multiplied by 4 bytes.

In this scenario, large packets are being dropped in the MPLS ××× backbone. Figure 6-41 illustrates the customer ××× and MPLS backbone topology used in this scenario.


Figure 6-41. Customer and MPLS ××× Backbone Topology

[View full size p_w_picpath]





Path MTU across the MPLS ××× backbone is verified using the extended ping vrf vrf_name command, as shown in Example 6-140.

Example 6-140. Extended ping vrf Command Output
HongKong_PE#ping vrf mjlnet_×××
Protocol [ip]:
Target IP address: 172.16.4.1
Repeat count [5]: 1
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 172.16.8.1
Type of service [0]:
Set DF bit in IP header? [no]: y
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]: y
Sweep min size [36]: 1450
Sweep max size [18024]: 1500
Sweep interval [1]:
Type escape sequence to abort.
Sending 51, [1450..1500]-byte ICMP Echos to 172.16.4.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!M.M.
Success rate is 92 percent (47/51), round-trip min/avg/max = 12/13/16 ms
HongKong_PE#



Highlighted lines 1 and 3 show the destination and source IP addresses used with the extended ping. In this case, the source is the VRF mjlnet_××× interface on HongKong_PE (172.16.8.1), and the destination is VRF mjlnet_××× interface on Chengdu_PE (172.16.4.1).

Repeat count is set to 1 packet in highlighted line 2. This is the repeat count per packet size, which is set in highlighted lines 5 to 7.

In highlighted line 4, the Don't Fragment (DF) bit is set. In highlighted lines 5 to 7, a ping sweep of packet sizes 1450 to 1500 is entered. Highlighted line 8 shows that ping is successful for most packet sizes, but that as the packet size nears 1500 bytes, the pings fail.

Note that the "M" character here indicates reception of an ICMP destination unreachable message (ICMP message type 3) from a router in the path across the network. This ICMP unreachable message carries code 4, which indicates that fragmentation is required on the (ping) packet, but that the Don't Fragment bit is set.

The MPLS MTU size for backbone LSRs is examined using the show mpls forwarding-table prefix detail command.

When the MPLS MTU size is examined on Chengdu_P, it is revealed that it is too small (see Example 6-141).

Example 6-141. Verifying the MPLS MTU Size Using the show mpls forwarding-table Command
Chengdu_P#show mpls forwarding-table 10.1.1.1 detail
Local Outgoing Prefix Bytes tag Outgoing Next Hop
tag tag or VC or Tunnel Id switched interface
18 Pop tag 10.1.1.1/32 1544 Fa1/0 10.20.10.1
MAC/Encaps=14/14, MTU=1500, Tag Stack{}
00049BD60C1C00D06354701C8847
No output feature configured
Per-packet load-sharing
Chengdu_P#



The IP address (BGP update source) of the egress PE router (Chengdu_PE) is specified in highlighted line 1. This address corresponds to the next-hop of all mjlnet_××× site 1 routes.

Highlighted line 2 shows that the outgoing interface for this prefix is interface Fast Ethernet 1/0.

Highlighted line 3 shows that the maximum packet size that can be label switched out of interface Fast Ethernet 1/0 without being fragmented is 1500 bytes. 1500 bytes is clearly not a sufficient maximum packet size if a two-label stack (IGP + ×××) is included (1500 + 8 = 1508). Note, however, that in this case, Chengdu_P is the penultimate hop router, so it will pop the IGP labelbut it is still a very good idea to accommodate a minimum of two labels here.

Chengdu_P's interface Fast Ethernet 1/0 is then configured to support large labeled packets, using the mpls mtu command as shown in Example 6-142.

Example 6-142. Configuration of the mpls mtu Command on Interface fastethernet 1/0 on Chengdu_P
Chengdu_P#conf t
Enter configuration commands, one per line. End with CNTL/Z.
Chengdu_P(config)#interface fastethernet 1/0
Chengdu_P(config-if)#mpls mtu 1508
Chengdu_P(config-if)#end
Chengdu_P#



The highlighted line indicates that interface fastethernet 1/0 is configured to support a label stack depth of two (1500 + [2 * 4]=1508). In this scenario, Cisco 6500 switches are being used in the POPs (in Chengdu and HongKong), so they are configured for jumbo frame support.

To enable support for jumbo frames on the Cisco 6500 series switch Ethernet ports, use the set port jumbo mod/port enable command, as shown in Example 6-143.

Example 6-143. Configuration of Jumbo Frame Support on the Cisco 6500
Chengdu_POP1> (enable) set port jumbo 3/1 enable
Jumbo frames enabled on port 3/1.



By enabling support for jumbo frames, the MTU is increased to 9216 bytes for most line cards.

After the MPLS MTU on all the applicable LSRs is reconfigured and jumbo frame support on the Cisco 6500 switches is enabled, extended ping is again used to verify that 1500-byte packets can be carried across the backbone without fragmentation.

Example 6-144 shows the output of the extended ping vrf vrf_name command after support for large labeled packets has been enabled in the MPLS ××× backbone.

Example 6-144. 1500-Byte Packets Can Now Be Carried Across the MPLS ××× Backbone
HongKong_PE#ping vrf mjlnet_×××
Protocol [ip]:
Target IP address: 172.16.4.1
Repeat count [5]:
Datagram size [100]: 1500
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: 172.16.8.1
Type of service [0]:
Set DF bit in IP header? [no]: y
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 172.16.4.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/14/16 ms
HongKong_PE#



Highlighted lines 1 and 3 show the destination and source addresses of the ping packets. These are again the VRF mjlnet_××× interface on Chengdu_PE and the VRF mjlnet_××× interface on HongKong_PE, respectively.

In highlighted line 2, the packet size is 1500 bytes, and in highlighted line 4, the Don't Fragment (DF) bit is set.

In highlighted line 5, a success rate of 100 percent is shown.

It is also worth noting that if you are using IOS 12.0(27)S or above in your network, you can use the trace mpls MPLS Embedded Management feature command to verify the MTU that can be supported (without fragmentation) over an LSP in the MPLS backbone. This command can display the maximum receive unit (MRU, the maximum labelled packet size) at each hop across the MPLS backbone.