By Mark Seminatore, Development Manager
XNA Developer Connection (XDC)
Many common networking techniques that apply to game development for Windows also apply equally well when creating multiplayer games for Xbox LIVE and Games for Windows - LIVE. However, when it comes to networking, LIVE provides both advantages and unique challenges. This white paper summarizes best practices for networking to consider when creating LIVE games.
Many Windows-based multiplayer games follow a client-server model, where a computer running Windows acts as the game client, and a dedicated game server authoritatively manages the game simulation and communicates state updates to each client. Although custom game servers are supported for LIVE games, a number of factors make a client-to-client model appealing. These factors include the following:
Engineering, certifying, and supporting custom game servers in Microsoft-approved data centers is complex and costly. A client-to-client topology avoids significant development issues and ongoing support costs.
Peer-to-peer games must deal with Network Address Translator (NAT) traversal, a technically difficult problem. The LIVE architecture and the Secure Network Library (SNL) automatically handle NAT traversal. This means that most LIVE clients can communicate with other LIVE clients, even if one or both consoles are behind NATs.
The LIVE Matchmaking system can help ensure that when players join games, they join games that are logically close to each other. This implies better connections than could be obtained with servers in a remote data center.
The built-in security of the LIVE network layer means that games don't need to rely on an external server-based authority.
The built-in security of the Xbox 360 network layer means games don't have to rely on an external server-based authority.
Note that the client-to-client model does not imply that you can't have a client-server style topology. For example, one LIVE client could act as the server—either as a dedicated server, or a server that also functions as a client.
All network packets require some level of overhead, typically in the form of packet headers, which allow routers and receivers to interpret data. Under Windows, the minimum overhead for a UDP packet is 28 bytes. However, the secure network protocol has a minimum of 44 bytes of overhead per UDP packet. This means that if a LIVE title calls UDP send with a single byte of data, 98 percent (44 out of 45 bytes sent) of the network bandwidth is devoted to packet overhead, while only 2 percent is devoted to actual game traffic. You can minimize this penalty by sending larger payloads less frequently. One way to do this is to coalesce smaller sends into a larger packet before calling send. Don't send multiple small payloads if one larger payload can work just as well.
Developing for broadband has fantastic advantages compared to narrowband, but broadband does have its limits. Speeds quoted by broadband providers are theoretical maximums. Don't assume that 256 Kbps upstream bandwidth is guaranteed. Real available bandwidth is dependent upon a number of factors; actual performance between any two LIVE clients will be slower than what broadband providers advertise.
To ensure that your game can be played across all broadband connections, avoid sending too much data. Be aware of how many packets are sent over a given time interval. Also be aware of how much game data is carried by a typical packet.
Ensure that the information that is sent is absolutely necessary. Examine game payloads, and then evaluate whether these payloads really need to be sent, or if the receiver can calculate, extrapolate, interpolate, or predict the data. For example, don't send data for objects that are out of view, occluded, inactive, recently destroyed, about to be removed, or that otherwise don't affect the client in the current or near future game context. Don't assume that all game objects are important to all clients.
Prioritize your game data based on context. For example, objects that are far away from the viewpoint may not require orientation or animation state information to be sent. Don't assume that all game object attributes must be up to date all the time.
If you determine that data does need to be sent, compress the data in any way you can. Examples of different ways to compress data include the following:
If you're sending C/C++ structures, be sure these structures are single-byte aligned. (By default, structures are 8-byte aligned for speed, which introduces unused packing bytes into the structure.) Use the directive #pragma pack to pack structures selectively. The following code provides an example:
#pragma pack(push) // save current alignment setting #pragma pack(1) // specify single-byte alignment struct NetMsg { BYTE msgType; DWORD dwData; }; // sizeof(NetMsg) is now 5 instead of 8 #pragma pack(pop) // return to original alignment setting
Prioritize data based on connection speed. On slow/small connections, send critical data only. Send additional data on fast/large connections.
Prioritize data based on gameplay relevance. Send data for distant objects less frequently than data for near objects. Send data for unimportant objects less often than data for important objects.
Define levels of detail for data. For distant objects, only update critical state variables. Position may be more important than orientation or animation state.
Consider other forms of data compression, especially if you're sending large chunks of data.
Many online players choose the least expensive broadband package that is available. Your game should be playable on these connections. Therefore, avoid sending packets whenever possible. Don't tie the packet send rate to the frame rate of your game. For example, if your game runs at 60 frames per second, it does not necessarily mean that it should send packets 60 times per second.
Fast action client-server games on Windows-based computers typically receive packets from game servers at a rate of 10 to 20 packets per second. Clients typically send packets to the server at a lower rate. By queuing data and sending it at regular intervals, rather than sporadically, you reduce bandwidth. Keep in mind that not all games require packets to be sent on a regular basis. If you can get away with sending only three packets during the first second, ten packets during the next second, and one packet during the third second, then by all means do so.
Preference: Limit sending rates to 10-20 packets per second.
Regrettably, the speed of light is not fast enough to make multiplayer game development easy. A game player in Los Angeles with a theoretically ideal connection to a player in New York always has at least a 28-millisecond roundtrip latency. The same player has a 58-millisecond latency with a player in Tokyo or London. (Light travels at 186.3 miles per millisecond in a complete vacuum. Roundtrip latency induced by the speed of light in milliseconds equals miles × 2 / 186.3.) Additional latency is contributed by the following factors:
Fiber optic cable and copper wiring. The speed of light in fiber and copper is as low as 60 percent of the speed of light in a vacuum.
Modems, routers, and network stacks. DSL and cable modems have latencies in the neighborhood of 10 milliseconds. Routers have varying latencies, ranging from 1-5 milliseconds. However, a congested router that has a massive queue can exhibit latencies up to 50 milliseconds. The more routers that a packet traverses, the more latency is induced.
Games themselves. Games that limit packet frequency (a preferred practice) introduce latency. If packets are sent and received ten times per second, then an average additional roundtrip latency of 100 milliseconds, and a maximum roundtrip latency of 200 milliseconds, is incurred. Frame rate also introduces latency. Even if a game that displays 60 frame per second receives a new object position, the game doesn't display the new position until the next frame, which can be up to 16.7 milliseconds later.
Because lag exists even for two players who are next door to each other, it's essential that you account for lag in your game, and handle it appropriately. In a turn-based game, lag can be hidden by the turn. In an action game, lag must be compensated for in other ways. These methods include using approximation, dead reckoning, cubic splines, and loose synchronization.
Dozens of good resources exist that describe methods for lag compensation. Consult the Internet, simulation textbooks, and game programming literature. (Several suggestions appear in Resources at the end of this white paper.) Be aware that some of the best solutions involve game design trade-offs, and not technical solutions.
VDP is the LIVE Voice and Data Protocol, which is a custom protocol that allows games to send both voice data and game data in the same packet. If your game uses voice, you must use VDP. Not only can the VDP protocol send both voice and game data in the same packet, it can also be used to send voice or game data separately. The game data portion of the payload is automatically encrypted. The voice data portion is not encrypted.
The VDP protocol is used with the socket type SOCK_DGRAM. To use this protocol, create a socket as follows:
SOCKET s = socket(AF_INET, SOCK_DGRAM, IPPROTO_VDP);
This socket will behave like a UDP socket. However, messages that are sent on a VDP socket must be in the following format:
[cbGameData][GameData][VoiceData]
If the data sent on a VDP socket includes only game data (so that message payload size equals cbGameData), the network libraries can optimize cbGameData for transmission by eliminating it. This makes VDP as efficient as UDP.
Because of packet overhead, avoid using one socket for voice data and another socket for game data. Whenever possible, use a single VDP socket. You can send game data if there is only game data to send. You can send voice data if there is only voice data to send. And you can send both types of data in a single packet if you need to send both. VDP is the best general purpose socket protocol on LIVE.
If you're not sending voice data, you can use any socket type—VDP, UDP or TCP. VDP and UDP exhibit the best performance characteristics in terms of network traffic and CPU usage. But because both VDP and UDP are connectionless protocols, using these protocols implies that games need to handle both dropped and out-of-order packets. TCP has guaranteed in-order delivery, but it requires additional bandwidth because of larger headers and TCP handshaking.
In addition, if TCP needs to resend data because of dropped packets, latency can skyrocket. There are some cases in which TCP may be the best option—for synchronizing state, player handshaking, connection management, and exchanging critical game state, for example. However, typical game data is best sent on VDP or UDP sockets. Avoid using TCP for normal gameplay data.
Don't send voice traffic if the receiver is not going to use it. For example, muted players should not send voice data to players who mute them. Similarly, it might not make sense to hear from a player who is invisible to the receiver.
Consider interesting game design techniques for limiting voice, such as microphone pick-ups; walkie-talkies; voice time limits; speaker podiums; player proximity; player visibility; prioritization (friends first and proximity second, for example); and voice channels (four players per channel, for example). Keep in mind that in typical conversion, a person can distinguish only three or four simultaneous voice streams.
Aggregating voice data through a single server can be expensive in terms of both CPU performance and bandwidth. Send voice data from client to client, even if you use a client-server model. This may limit the ability to piggy-back voice and data on a single VDP packet, but the trade-off is worthwhile.
To minimize packet overhead, send and receive data on a single port if possible. Rather than using port numbers, use identifiers in the payload to indicate different types of data. Sending a merged payload on a single port (rather than on two ports) saves a minimum of 44 bytes of network traffic per packet.
The LIVE Secure Network Library (SNL) has a special optimization for port 1000. To minimize packet overhead, you should send and receive all data on port 1000 for both source and destination. This can save up to four bytes per packet (two 2-byte port numbers). If you can't send all data on port 1000, we recommend that you send and receive UDP/VDP on port numbers in the 1001-1255 range, and TCP on port numbers in the 1001-1024 range (as the system can make use of ports 1025-1255 for TCP traffic). This can save two bytes per packet (two 1-byte port numbers are sent). If the source port and the destination port are in different port categories, then both ports must use the larger of the two port description sizes. Take advantage of these optimizations, and don't underestimate the cumulative cost of extra packet overhead.
1000 | 1000 | 0 | TCP0, UDP0, VDP0, VDPVO0, VDPDO0 |
1001-1255 | 1001-1255 | 2 | TCP1, UDP1, VDP1, VDPVO1, VDPDO1 |
1000 | 1001-1255 | 2 | TCP1, UDP1, VDP1, VDPVO1, VDPDO1 |
1001-1255 | 1000 | 2 | TCP1, UDP1, VDP1, VDPVO1, VDPDO1 |
1-65535 | 1-65535 | 4 | TCP2, UDP2, VDP2, VDPVO2, VDPDO2 |
The LIVE Secure Network Layer pads payloads for encryption purposes. To minimize packet overhead, send payloads whose sizes are evenly divisible by eight. This can save up to 7 bytes per packet. In particular, avoid payloads that are 1-2 bytes larger than a size that is divisible by eight. For example, a payload of 33 bytes requires 7 bytes of padding, whereas a payload of 31 bytes requires only 1 byte of padding. If you can't avoid sending data that puts you slightly over the padding boundary, use the additional space to your advantage by sending data that would otherwise be sent in the next packet.
The appropriate choice of ports, as well as an understanding of padding rules, can have a significant effect on overall packet overhead.
TCP | 40 | 56 | 60 |
UDP | 28 | 44 | 51 |
VDP, data only | – | 44 | 51 |
VDP, voice only | – | 44 | 51 |
VDP, data and voice | – | 46 | 52 |
The first time that two online LIVE clients communicate with each other, the two endpoints enter into a network handshaking period called key exchange. If either client is behind a NAT, the handshaking may require a few seconds to complete— up to ten seconds if packets are dropped. During this time, although the game can send data to the other endpoint, the data is simply queued on the sender until handshaking is complete. If too much data is queued up, the sender's connection is likely to drop packets. In this case, the receiver either doesn't get all the data it expected, or it gets the data later than expected, or both.
The best way to handle the initial communication is for the sender to wait for a response before it begins sending a full message stream.
In Windows-based games, it's not uncommon to time out an initial connection after just a few seconds. Beyond this period of time, the likelihood of a good connection is relatively small. With LIVE, the key exchange handshake may require a number of seconds to punch through NATs. If the initial connection requires a couple of seconds, the connection is not necessarily bad. It is more likely that the delay indicates that NAT traversal is being established, and the resulting connection will be just fine.
Wait at least three seconds for a response, and up to ten seconds, before giving up on the connection. While a connection is being established, calls to XNetGetConnectStatus return XNET_CONNECT_STATUS_PENDING. Consider using the QoS functions, discussed below, to measure connectivity.
If the session host connection is lost, the gameplay experience can suffer. Fortunately, LIVE provides the ability to migrate the session host, by allowing gameplay to continue for one or more subsets of the original session.
The high-level steps involved in host migration are as follows:
An ordered list of migration hosts can be pre-determined at matchmaking time and shared with all session participants. To avoid warnings about slot count mismatch, do not remove the old host prior to migration. Remove the old host after migration.
Arbitrated sessions can also be migrated. If migration occurs before XSessionStart is called, the game should call XSessionArbitrationRegister again after migration is completed. After XSessionStart is called, the session can still be migrated, but re-registration is not possible and statistics are written to the original session.
If you are developing exclusively for the Xbox 360 system, take advantage of the XRNM and QNet networking libraries. XRNM is fast and efficient, and it provides a complete networking library with a reliable VDP/UDP messaging layer. It also supports a flexible set of options, including packet sequencing, ordered delivery, and data aggregation.
QNet is targeted at LIVE Arcade titles. QNet allows Xbox 360 developers to quickly and easily add LIVE or system link functionality to games. QNet is built on top of XRNM, and it provides a higher-level abstraction of LIVE and networking functionality.
XNetQoS, a quality of service (QoS) family of functions, computes ping times between two or more LIVE clients, and also between LIVE clients and LIVE servers. The QoS functions allow games to probe multiple connections simultaneously, with very little development effort. Using other techniques for computing ping times is discouraged, because other methods are less accurate, and also because they can't take NAT traversal of LIVE connections into account. For example, when either the source or the destination LIVE client is behind a NAT, computing ping times by measuring the time that is required to first establish a connection produces extremely inaccurate results. Always use the QoS functions to determine ping times.
The bandwidth estimates that are provided by the QoS functions are not completely accurate for a number of reasons. The reported bandwidth may be overstated or understated, or it may be accurate. Avoid setting data send rates based on bandwidth, and never display bandwidth estimates. Bandwidth measurements may be appropriate for making comparisons. Use ping times instead of bandwidth estimates for hosting or player count decisions.
Rather than designing a game to handle a fixed maximum number of players, scale the game limitations based on the measured quality of service. This recommendation applies not only to player maximums, but to other data as well. For example, a game could send additional data to clients that have high-quality broadband connections (voice and other non-critical data, for example), while sending only normal game data to clients that have lower-quality broadband connections.
Although XNetQosLookup provides useful data, it may not be practical to use during gameplay. Consider developing a set of title specific quality metrics that take into account factors that are important to game play. This could be as simple as a roundtrip time counter that is added to certain packets. More sophisticated metrics could track per-connection packet loss, queue lengths, bandwidth usage, latency that includes game loop processing, and so on. XrnmQueryInfo provides access to numerous metrics that can be queried and tuned to improve network performance. Use quality of service metrics periodically during the game to scale connections over time.
VDP and UDP sockets do not guarantee packet delivery, nor do they guarantee that packets are delivered in the same order that they are sent. Under normal Internet conditions, some packets are permanently lost, while other packets arrive late—sometimes really late. Write your network layer so that it can handle both dropped and late packets. This does not necessarily mean that you need to write your own guaranteed delivery protocol. If you really need guaranteed delivery, consider using XRNM or QNet, or perhaps even TCP.
Be aware that if a packet is dropped, it's likely that the next few packets will also be dropped. Plan accordingly. The following techniques can help the receiver recover from dropped packets.
Send redundant information, so that if one or more packets are dropped, the receiver still eventually gets the data. Yes, this goes against the recommendation to avoid sending too much data, but as with all things involved in networking, you must carefully weigh the tradeoffs. Send redundant data if the packet size is otherwise very small, or contains extra bytes of padding.
Design your game so that if a few packets are dropped, it doesn't matter. For instance, rather than just a single blow, require multiple sword blows to kill the goblin. This way, it's OK if an individual sword attack message is dropped—a player just needs to slash a couple more times.
Include sequence numbers or timestamps in packets, so that the receiver can detect out-of-order and dropped packets.
Be cautious when sending delta values by using VDP/UDP. Although deltas can reduce packet size, they are meaningless if the context is lost because of dropped packets. Periodically synchronize the absolute state of a value to guarantee consistency.
Avoid sending either too much data or too many packets at once. Transmissions that are sent in bunches are likely to be queued up on modems or routers. These devices typically have limited buffer space. Consequently, a bunched transmission is much more likely to be dropped altogether.
At some point, your game just needs to give up if it hasn't heard from another LIVE client in a long time. Plan for this condition and deal with it appropriately.
=Keeping networking separate from gameplay is one of the first steps for avoiding latency. Don't tie network communication to frame rate or controller input—you never want the game to freeze simply because a packet hasn't arrived. Don't stall gameplay because the game must wait for information from other LIVE clients. And don't stall gameplay by using blocking sockets. Instead, use non-blocking sockets or a separate network thread.
The Internet is not a reliable medium. It is exactly the opposite—guaranteed to be unreliable and unpredictable. Although all LIVE networking and online functions return error codes to indicate failure conditions, ensure that your game detects and gracefully handles these failures. You can use the following methods:
Display an informational message and return the player to a safe state if connectivity is lost.
Detect and manage socket error conditions.
All LIVE packets are automatically encrypted and authenticated by using cryptographically secure algorithms. Don't attempt to add additional levels of security by encrypting, hashing, or calculating checksums for game payloads— you'll simply reduce the performance of your game.
It is good if you are still concerned about hackers. The LIVE team is concerned, too. Measures have been taken to detect and ignore pirated online games and modified Xbox 360 consoles. Microsoft has a team of engineers that is dedicated to researching and resolving security issues, and we will continue to improve the resiliency of the hardware and software.
Although LIVE network security is robust, no form of security can promise 100 percent effectiveness. One way that hackers can attack a multiplayer game is to block packets, or to selectively block them. For instance, if an attacker can guess that certain size packets that are sent on a specific port indicate a particular type of data, your game could be vulnerable. Therefore, consider the following security recommendations.
Send and receive all data on a single port (port 1000) by using VDP.
Consider methods for randomly adjusting packet size.
Add code to check for impossible or improbable situations (for example, too much health, too many lives, too many headshots, movement too fast) and log or report them to users.
Practice defense in depth. Validate all incoming network messages to avoid access to invalid memory, buffer overruns, and so forth. Never trust the network.
Familiarize yourself with all of the LIVE-related Technical Certification Requirements (TCRs). The latest TCRs are available on Xbox 360 Central and Games for Windows - LIVE Central. Many of the TCRs will shape the choices you make regarding game design. Pay particular attention to the TCRs that address game bandwidth, latency tolerance, and usage of LIVE functionality.
NetMon is a very useful tool that can be used to monitor network traffic during gameplay. The latest, version 3.x, is fully compatible with Vista, and is available for download at http://connect.microsoft.com/. Use NetMon throughout development to spot-check your network performance.
To prepare your systems for monitoring, put your development computer and development kits on a hub, not on a switch. Switches optimize network traffic, which prevents NetMon from seeing all traffic.
You also need to copy xsp.npl to your NetMon installation. You can find xsp.npl in the following locations:
For Xbox 360, in %XEDK%\redist\netmon\xsp.npl
For Games for Windows - LIVE, in %GFWLSDK_DIR%\Tools\Netmon\xsp.npl
Copy xsp.npl to the NPL subdirectory of your NetMon installation, and then add the following line to sparser.npl:
include "xsp.npl"
During testing, filter out all network traffic from computers other than the development kits that are monitored, and do the following:
Monitor the frequency of packets sent.
Examine the size of packets.
Watch for unexpected packets during certain portions of the game.
Watch for packets on ports other than 1000.
Look for packets that use protocols other than VDP.
NoteUDP must be used for discovery, since VDP cannot be used for broadcasts.
Use timestamps to synchronize captured data with game logs. If your game has deterministic replay features, compare captures over time, and save and checkpoint captures for future reference and comparison.
If you are developing for Xbox 360, you will find that NetGrove is an indispensible tool. NetGrove analyzes NetMon capture files and provides an in-depth view of your game's network traffic. In addition to providing a range of useful statistics on network usage, NetGrove also generates warnings for common networking issues.
NetGrove provides warnings about many potential issues, including the sending of too many packets, use of ports other than port 1000, consumption of too much bandwidth, excessive padding, and the use of protocols other than VDP. Make sure that NetGrove is part of your title testing and optimization plan.
Test your game on something other than a LAN. Take your development hardware home and test gameplay on the Internet. Consider using custom tools and code that simulate the scenarios that your game will encounter when two players face off across a continent. For instance, add code to automatically drop or delay packets in the game. Include a real-time network condition display alongside the frame rate and memory use displays.
Design your LIVE game so that it works well under the following real world Internet conditions:
If you design your title to handle these types of Internet conditions smoothly, your game stands a much better chance of success. If your game exceeds these guidelines (for example, if your game can handle 50 percent packet loss), that's great. You should also cope with cases where Internet conditions go beyond the tolerance of your game. For example, if your game is unplayable at latencies above 250 milliseconds, it should detect these conditions and inform the player. Better yet, if your game detects sessions that have latencies exceeding 250 milliseconds, your game should not even allow the player to see or join these sessions.
There are many networking programming resources available. The following list provides some useful references. Note that inclusion does not imply endorsement by Microsoft.
Abrash, Michael. Quake's Game Engine: The Big Picture
Aronson, Jesse. Dead Reckoning: Latency Hiding for Networked Games
Aronson, Jesse. Using Groupings for Networked Gaming
Barron, Todd. Multiplayer Game Programming. Portland, OR: Premier Press, 2001.
Bernier, Yahn. Latency Compensating Methods in Client/Server In-game Protocol Design and Optimization
Bernier, Yahn. Half-Life and Team Fortress Networking: Closing the Loop on Scalable Network Gaming Backend Services
Bernier, Yahn. Leveling the Playing Field: Implementing Lag Compensation to Improve the Online Multiplayer Experience
Bettner, Paul and Mark Terrano. GDC 2001: 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond
Caldwell, Nick. Defeating Lag with Cubic Splines
Fitch, Crosbie. Cyberspace in the 21st Century: Part Five, Scalability with a Big 'S'
Frohnmayer, Mark and Tim Gift. The TRIBES Engine Networking Model
Haag, Chris. Targeting: A Variation of Dead Reckoning
Howland, Geoff. What is Lag?
Lambright, Rick. Distributing Object State for Networked Games Using Object Views
Lincroft, Peter. The Internet Sucks: What I Learned Coding X-Wing vs. TIE Fighter
Ng, Yu-Shen. Designing Fast-Action Games for the Internet
Ng, Yu-Shen. Internet Game Design
O'Brien, Larry. Multiplayer Math
Royer, Dan. Network Game Programming
Simpson, Jake. Networking for Games 101
Sweeney, Tim. Unreal Networking Architecture
Treglia, Dante, ed. Game Programming Gems III. Boston: Charles River Media, 2002.
Kirmse, Andrew, ed. "Network and Multiplayer." In Treglia, Dante, ed. Game Programming Gems III. Boston: Charles River Media, 2002.