Both of these solutions are aimed at solving a few problems:
- Virtualization requires Top of Rack switches to handles much larger MAC address tables
- The limitation of 4094 VLANs is not enough for a multi-tenant environment
- The use of STP is inefficient and reduces the number of usable links
- Virtualized workloads need to be portable to efficiently use all resources: compute, storage, and network
Now that a single physical server can hold tens or possibly a hundred VMs, the number of MAC addresses coming from a single physical box can be enormous. Top of Rack switches were never meant to hold that number of MAC addresses in memory, and as a result they have to let older MAC address entries expire too quickly. The result is that the switch needs to query for the MAC address location much more often than it should, leading to network congestion and inefficiency of processing.
Moving a VM within a datacenter is mostly trivial and commonplace at this point, but moving a VM to a different datacenter or the cloud usually requires a bit more forethought. The subnet can't be stretched to the other location, and so the VM will need to be reconfigured for the subnet it encounters on the other side. In a multi-tenant architecture the problem becomes multiple tenants wanting to use the same subnet ranges. VLANs can ameliorate this to a certain degree, however the limit of 4094 VLANs can come up surprisingly fast.
The issue of inefficiencies with STP is not really new and there are efforts using technologies like TRILL as an alternative. VXLAN and NVGRE provide another way of improving efficiency.
So now that we know the problems and the offerings from the Microsoft and VMware, the question becomes how do they work and how are they different? The answer is that they are almost identical - almost being the watchword in this case.
Each technology takes the packets from the VM and wraps them in some additional packet layers that include special information about what virtual network the VM is on. It's essentially a tunneling method. VMware calls the virtual network tag the VXLAN Net Identifier (VNI) and Microsoft calls it the Tenant Network ID (TNI). The packet is then routed across a traditional layer 2 or layer 3 infrastructure. At the endpoint, the outer layers are stripped off the packet and the payload is delivered to the destination VM. These endpoints are called either VXLAN Tunnel End Points (VTEP) or an NVGRE Endpoint. Endpoints are able to read the network ID and if the packet's destination is beyond the endpoint, the exterior layers are peeled off and the packet is delivered to the appropriate destinations.
The packet structure for the two protocols differ significantly:
VMware is using UDP as their transport protocol and then adding an additional VXLAN header which contains the VNI information. They also are adding a frame check sequence to the back for checksum purposes. Microsoft, on the other hand, is making use of the GRE protocol rather than traditional TCP or UDP and does not have error checking for the outer packet.
Bits | VXLAN | NVGRE |
160 | Outer Ethernet | Outer Ethernet |
160 | Outer IP | Outer IP |
64 | UDP Header | GRE Header |
64 | VXLAN Header | None |
160 | Inner Ethernet | Inner Ethernet |
Variable | Inner Payload | Inner Payload |
32 | Frame Check Sequence | None |
Since VMware decided to use UDP for their transport protocol, that means that any layer 4 aware device will have no trouble interpreting the packet. GRE on the other hand lives in between layer 3 and 4. The packet may encounter a network device that does not know how to interpret the GRE header leading to inconsistent performance or possibly dropped packets. That being said, most modern switching devices and routers support GRE so it's not much of a consideration. The larger issue to me is the expected packet size. VXLAN packets are going to be 96 bits larger than NVGRE packets, and they are subject to UDP checksum processing. Let's imagine that you don't have jumbo packets enabled and the inner payload is relatively sizable. Packets could be truncated or corrupted, which may be part of the need for a checksum. The bigger size and additional processing makes me wonder if VXLAN will be inherently less efficient.
Beyond the packet structure the two standards are almost identical in terms of practical use. Both use multicast on the physical network to distribute broadcast and multicast messages on the virtual network. This will require some IP multicasting magic on the physical side to associate particular multicast addresses to virtual/client network identifiers. The RFC from VMware seems to outline a solution for this tracking, while the RFC from Microsoft mostly leaves that up to the control plane to manage. In Microsoft Server 2012 R2 they have added the Windows Server Gateway piece, which appears to handle some of the control plane functions missing in the RFC. I would not be at all surprised if other vendors develop similar solutions based on the NVGRE spec. The RFC is co-written by people from Intel, Dell, HP, Arista, Broadcom, and Emulex. That's a lot of big names who would be happy to develop their own solutions. The VXLAN RFC is coauthored by people from Cisco, Arista, Broadcom, Citrix, and RedHat. Again, no slouches in that department. I was a little surprised that Cisco wasn't on the RFC for NVGRE since they authored the GRE standard.
In the end both of these standards are viable for SDN application, and it remains to be seen how they perform in real life conditions. Based on the RFC and documentation from VMware and Microsoft, there is no clear winner here, making it a less of a factor when picking a network virtualization vendor.
References:
NVGRE RFC: http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-02
VXLAN RFC: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-04
Windows Server Gateway: http://technet.microsoft.com/library/dn313101.aspx
NVGRE RFC: http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-02
VXLAN RFC: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-04
Windows Server Gateway: http://technet.microsoft.com/library/dn313101.aspx
via: http://anexinetisg.blogspot.jp/2013/07/vxlan-and-nvgre.html