大型多人游戏中间件

     

Massively Multiplayer Middleware
大型多人游戏中间件

by Michi Henning | February 24, 2004

翻译: andywang 、 金庆

Topic: Game Development

Massively Multiplayer Middleware
大型多人游戏中间件
MICHI HENNING, ZeroC

Building scaleable middleware for ultra-massive online games teaches a lesson we all can use: Big project, simple design.

为超大型在线游戏构建可扩展的中间件给我们所有人的教训: 宏大的项目,简单的设计。

Wish is a multiplayer, online, fantasy role-playing game being developed by Mutable Realms.1 It differs from similar online games in that it allows tens of thousands of players to participate in a single game world (instead of the few hundred players supported by other games). Allowing such a large number of players requires distributing the processing load over a number of machines and raises the problem of choosing an appropriate distribution technology.

 

Wish游戏是Mutable Realms[1]开发的幻想类多人在线角色扮演游戏。 不同于类似的在线游戏,它允许数万的玩家同时参与到一个游戏世界中 (而其他游戏仅支持几百人)。 允许如此之多的玩家就要求在多台机器上进行分布式处理, 由此产生了一个问题:选择合适的分布式技术。

DISTRIBUTION REQUIREMENTS

分布式需求

Mutable Realms approached ZeroC for the distribution requirements of Wish. ZeroC decided to develop a completely new middleware instead of using existing technology, such as CORBA (Common Object Request Broker Architecture).2 To understand the motivation for this choice, we need to examine a few of the requirements placed on middleware by games on the scale of Wish and other large-scale distributed applications.

 

Mutable Realms带着Wish游戏的分布式需求找到ZeroC。 ZeroC决定开发一个全新的中间件,而不是使用现有的技术, 比如CORBA(Common Object Request Broker Architecture,公共对象请求代理体系)[2]。 为了理解这个选择的动机, 我们需要考察像Wish这种规模的游戏和其他大规模分布式应用程序对中间件的一些要求。

 

Multi-Platform Support. The dominant platform for the online games market is Microsoft Windows, so the middleware has to support Windows. For the server side, Mutable Realms had early on decided to use Linux machines: The low cost of the platform, together with its reliability and rich tool support, made this an obvious choice. The middleware, therefore, had to support both Windows and Linux, with possible later support for Mac OS X and other Unix variants.

 

多平台支持。 在线游戏市场的主要平台是Microsoft Windows,所以中间件必须支持Windows。 在服务器这边,Mutable Realms很早就决定使用Linux: 费用低,并且可靠性高,以及工具软件丰富,这是明摆着的选择。 因此,中间件必须要同时支持Windows和Linux, 也许以后还要支持Mac OS X和其他Unix变种。

 

Multi-Language Support. Client and server software is written in Java, as well as a combination of C++ and assembly language for performance-critical functions. At ZeroC we used Java because some of our development staff had little prior C++ experience. Java also offers advantages in terms of defect count and development time; in particular, garbage collection eliminates the memory management errors that often plague C++ development. For administration of the game via the Web, we wanted to use the PHP hypertext processor. As a result, the game middleware had to support C++, Java, and PHP.

 

多语言支持。 客户端和服务器软件是用Java编写的,同时, 对于性能关键的函数还混合使用了C++和汇编语言。 在ZeroC,我们使用Java,因为我们的部分开发人员先前没什么C++经验。 Java同时在缺陷数和开发时间上有优势; 特别是垃圾收集消除了内存管理错误,而这是C++开发的死疾。 为了通过Web管理游戏,我们想用PHP。 所以,这个游戏中间件必须支持C++、Java和PHP。

 

Transport and Protocol Support. As we developed the initial distribution architecture for the game, it became clear that we were faced with certain requirements in terms of the underlying transports and protocols:

 

传输和协议支持。 当我们为游戏开发最初的分布式构架时, 我们明显面临了对底层传输和协议的一些需求:

 

•  Players connect to ISPs via telephone lines, as well as broadband links. While broadband is becoming increasingly popular, we had decided early on that the game had to be playable over an ordinary modem. This meant that communications between clients and server had to be possible via low-bandwidth and high-latency links.

 

•  玩家通过电话线或者宽带连接ISP上网。 虽然宽带越来越普及,但是我们早已决定游戏必须能通过普通modem玩。 这意味着客户端服务器之间通过低带宽高延时链路进行通讯必须是可能的。

 

•  Much of the game is event driven. For example, as a player moves around, other players in the same area need to be informed of the changes in the game world around them. These changes can be distributed as simple events such as, “Player A moves to new coordinates .”

 

•  游戏大部分是事件驱动的。 比如,当玩家四处溜达时, 这些变化需要通知同一区域内的其他玩家, 使他们知晓其周围的游戏世界中的变化。 这些变化可以作为简单的事件分发, 比如“玩家A移动到新坐标”。

 

Ideally, events are distributed via “datagrams.” If the occasional state update is lost, little harm is done: A lost event causes a particular observer’s view of the game world to lag behind momentarily, but that view becomes up-to-date again within a very short time, when another event is successfully delivered.

 

理想的,事件是通过数据报(datagram)分发的。 即使偶尔有状态更新丢失,也没什么危害: 一个事件丢失会造成某个观察者对于游戏世界的视图暂时性滞后, 但很快,当另一个事件成功传送后,该视图又会变成最新状态。

 

•  Events in the game often have more than one destination. For example, if a player moves within the field of vision of five other players, the same positional update must be sent to all five observing players. We wanted to be able to use broadcast or multicast to support such scenarios.

 

•  游戏中的事件经常有多个目标。 比如,玩家在其他5个玩家的视野内移动, 位置更新必须发送给所有这5个看到他的玩家。 我们想用广播或多播来支持这样的场景。

 

•  Communications between clients and game servers must be secure. For an online subscription-based game, this is necessary for revenue collection, as well as to prevent cheating. (For example, it must be impossible for a player to acquire a powerful artifact by manipulating the client-side software.)

 

•  客户端和游戏服务器之间的通信必须是安全的。 对基于订购的在线游戏,这是必须的,既为了赚钱,也为了防做弊 (例如,玩家无法通过修改客户端软件来得到超级装备)。

 

•  Clients connect to the game from LANs that are behind firewalls and use NAT (network address translation). The communications protocol for the game has to be designed in a way that accommodates NAT without requiring knowledge of application-specific information in order to translate addresses.

 

•  客户端能在局域网内部通过防火墙和NAT(network address translation, 网络地址转换)连入游戏。 游戏的通信协议必须按以下方式设计:它能够穿越NAT, 但是地址转换无需应用程序特有的相关信息。

 

Versioning Support. We wanted to be able to update the game world while the game was being played—for example, to add new items or quests. These updates have to be possible without requiring every deployed client to be upgraded immediately—that is, client software at an older revision level has to continue to work with the updated game servers (albeit without providing access to newly added features). This means that the type system has to be flexible enough to allow updates, such as adding a field to a structure or changing the signature of a method, without breaking deployed clients.

 

版本控制支持。 我们需要能够在游戏运行中更新游戏,比如,增加新的物品或任务。 这些更新不要求所有已部署的客户端都马上升级, 即,旧版本的客户端软件必须能够连接已升级的服务器继续运行 (虽然不提供新增功能)。 这意味着类型系统必须要足够灵活以允许更新, 比如给结构添加新域,或者更改方法的签名, 而不会使已部署的客户端无法使用。

 

Ease of Use. Although a few of the Wish game developers are distributed computing experts, the majority have little or no experience. This means that the middleware has to be easy for nonexperts to use, with simple, threadsafe and exception-safe APIs (application programming interfaces).

 

易用性。 虽然有些Wish游戏的开发者是分布式计算的专家, 但是大部分经验不足甚至没有经验。 这意味着中间件必须让非专家也能容易使用, 具有简单的、线程安全的,和异常安全的API (application programming interface,应用程序接口)。

 

Persistence. Much of the game requires state, such as the inventory for each player, to be stored in a database. We wanted to provide developers with a way to store and retrieve persistent state for application objects without having to concern themselves with the actual database and without having to design database schemas. Particularly during development, as the game evolves, it is prohibitively time consuming to repeatedly redesign schemas to accommodate changes. In addition, as we improve the game while being deployed, we must add new features to a database and remove older features from it. We wanted an automatic way to migrate an existing, populated database to a new database schema without losing any of the information in the old database that was still valid.

 

持久化。 很多游戏都需要将状态保存数据库,如玩家的所有物品。 我们需要给开发者提供一种方法, 使之可以保存和获取应用对象的持久化状态, 但不必关心具体的数据库,也不必设计数据库模式(database schema)。 特别是开发过程中,当游戏不断演化, 为了适应变化而反复地修改表结构, 所耗的时间会让人望而却步。 而且,在游戏发布后,当我们改进游戏时, 必须添加新的属性到数据库,以及删除旧的属性。 我们需要一种自动的方式, 将现存的已填充的数据库迁移到新的数据库模式, 而不丢失旧库中任何仍然有效的信息。

 

Threading. Much of the server-side processing is I/O-bound: Database and network access forces servers to wait for I/O completion. Other tasks, such as pathfinding, are compute-bound and can best be supported using parallel algorithms. This means that the middleware has to be inherently threaded and offer developers sufficient control over threading strategies to implement parallel algorithms while preventing problems such as thread starvation and deadlock. Given the idiosyncrasies of threading on different operating systems, we also wanted a platform-neutral threading model with a portable API.

 

多线程。 大多数服务端处理都是I/O密集型: 数据库和网络访问迫使服务器等待I/O完成。 其他任务,比如寻路,是计算密集型, 最好使用并行算法。 这意味着中间件必须内在地多线程化, 并且为开发者提供足够的对线程策略的控制, 以实现并行算法,同时避免如线程饥饿和死锁这样的问题。 考虑到不同操作系统上的线程特性, 我们需要平台无关的线程模型,并具有可移植的API。

 

Scalability. Clearly, the most serious challenges for the middleware are in the area of scalability: For an online game, predicting realistic bounds is impossible on things such as the total number of subscribers or the number of concurrent players. This means that we need an architecture that can be scaled by federating servers (that is, adding more servers) as demands on the software increase.

 

可扩展性。 显然,对中间件最重要的挑战是在可扩展性方面: 对于在线游戏,不可能预测到如订阅总数或同时在线玩家数这类数据的真实范围。 这意味着我们需要这样的架构,当对软件的需求增加时, 它可以联盟(federate)服务器来进行扩展(即添加更多服务器)。

 

We also need fault-tolerance: For example, upgrading a server to a newer version of the game software has to be possible without kicking off every player currently using that server. The middleware has to be capable of automatically using a replica server while the original server is being upgraded.

 

我们也需要容错性: 比如,将服务器升级到游戏软件的新版本必须是可能的, 而不应该踢掉当前正在使用该服务器的每个玩家。 中间件必须能够在原服务器正在升级时, 自动使用一个副本服务器(replica server)。

 

Other scalability issues relate to resource management. For example, we did not want to be subject to hardwired limits, such as a maximum number of open connections or instantiated objects. This means that, wherever possible, the middleware has to provide automated resource management functions that are not subject to arbitrary limits and are easy to use. Simultaneously, these functions have to provide enough control for developers to tune resource management to their needs. Wherever possible, we wanted to be able to change resource management strategies without requiring recompilation.

 

其他可扩展性问题涉及资源管理。 例如,我们不想受到硬限制,如最大打开连接数,或最大实例化对象数。 这意味着,只要有可能, 中间件必须提供不受任何限制的自动资源管理功能, 并且使用方便。 同时,这些功能必须给开发者提供足够的控制, 以按照他们的需要调整资源管理。 只要有可能,我们希望能够改变资源管理策略,而不要求重新编译。

 

A common scalability problem for distributed multiplayer games relates to managing distributed sets of objects. The game might allow players to form guilds, subject to certain rules: For example, a player may not be a member of more than one guild, or a guild may have at most one level-5 mage (magician). In computing terms, implementing such behavior boils down to performing membership tests on sets of distributed objects. Efficient implementation of such set operations requires an object model that does not incur the cost of a remote message for each test. In other words, the object identities of objects must be visible at all times and must have a total order.

 

分布式多人游戏共同的一个可扩展性问题是, 管理分布式对象集合。 游戏可能允许玩家们组成公会(guild),但要符合一定条件: 比如,玩家不可以加入多个公会,或者公会最多有1个5级魔法师。 用计算机术语说,实现这种行为归结为对分布式对象集执行成员资格测试。 有效地实现这种集合运算需要这样的对象模型, 它不用每次测试都发送远程消息。 也就是说,对象的标识对象要全程可见,并且具有全序关系。

 

In classical RPC (remote procedure call) systems, object implementations reside in servers, and clients send remote messages to objects: All object behavior is on the server, with clients only invoking behavior, but not implementing it. Although this approach is attractive because it naturally extends the notion of a local procedure call to distributed scenarios, it causes significant problems:

 

在经典RPC(remote procedure call,远程过程调用)系统中, 对象的实现位于服务器,客户端发送远程消息到对象: 所有对象行为都在服务器, 而客户端只是调用行为,而不是实现它。 尽管这个方法很有吸引力, 因为它将本地过程调用的概念自然地扩展到了分布式环境中, 但它造成了以下值得注意的问题:

 

•  Sending a remote message is orders of magnitude slower than sending a local message. One obvious way to reduce network traffic is to create “fat” RPCs: as much data as possible is sent with each call to better amortize the cost of going on the wire. The downside of fat RPCs is that performance considerations interfere with object modeling: While the problem domain may call for fine-grained interfaces with many operations that exchange only a small amount of state, good performance requires coarse-grained interfaces. It is difficult to reconcile this design tension and find a suitable trade-off.

 

•  发送远程消息比发送本地消息慢好几个数量级。 减少网络流量的一种明显的方法是创建“胖(fat)”RPC: 每次调用都发送尽可能多的数据,以便更好地分摊传输的成本。 胖RPC的缺点是, 性能方面的考虑干扰了对象建模: 虽然问题域可能需要细粒度的接口, 以及许多仅仅交换少量状态的操作, 但为了良好的性能,却要求粗粒度的接口。 人们很难调和这种设计的紧张局势并找到适合的折衷之法。

 

•  Many objects have behavior and can be traded among players. Yet, to meet the processing requirements of the game, we have many servers (possibly in different continents) that implement object behavior. If behavior stays put in the server, yet players can trade objects, before long, players end up with a potion whose server is in the United States and a scroll whose server is in Europe, with the potion and scroll carried in a bag that resides in Australia. In other words, a pure client–server model does not permit client-side behavior and object migration, and, therefore, destroys locality of reference.

 

•  很多对象有行为, 并且可以在玩家之间交易。 但是,为了满足游戏处理的要求, 我们有很多服务器(可能位于不同的大陆), 它们实现了对象的行为。 如果行为停留在服务器不动,而玩家可以交易对象, 那么,不久以后,有个玩家会有一瓶药在美国的服务器上, 一个卷轴在欧洲的服务器上, 而放这瓶药和卷轴的袋子却在澳大利亚的服务器上。 换句话说,纯粹的客户端-服务器模型不允许客户端的行为和对象迁移, 并且因此破坏了访问局部性(locality of reference)。

 

We wanted an object model that supports both client- and server-side behavior so we could migrate objects and improve locality of reference.

 

我们需要一个同时支持服务器端和客户端行为的对象模型, 这样我们就能迁移对象并改善访问局部性。

DESIGNING A NEW MIDDLEWARE

设计新的中间件

Looking at our requirements, we quickly realized that existing middleware would be unsuitable. The cross-platform and multi-language requirements suggested CORBA; however, a few of us had previously built a commercial object request broker and knew from this experience that CORBA could not satisfy our functionality and scalability requirements. Consequently, we decided to develop our own middleware, dubbed Ice (short for Internet Communications Engine).3

 

看到需求,我们马上意识到现有的中间件没有合适的。 多平台和多语言的需求使人想到CORBA; 但是,我们几个以前曾经建立了一个商业性的对象请求代理, 并且从这次经历知道,CORBA不能满足我们的功能和可扩展性需求。 因此,我们决定开发自己的中间件, 称为Ice(互联网通信引擎,Internet Communications Engine的简称)[3]。

 

The overriding focus in the design of Ice was on simplicity: We knew from bitter experience that every feature is paid for in increased code and memory size, more complex APIs, steeper learning curve, and reduced performance. We made every effort to find the simplest possible abstractions (without passing the “complexity buck” to the developer), and we admitted features only after we were certain that we absolutely had to have them.

 

Ice中设计的首要重点是简单性: 我们从过去的痛苦经验得知, 每个功能的代价就是增加的代码和内存占用、 更复杂的API、陡峭的学习曲线,以及降低的性能。 我们尽了一切努力寻找最简单的抽象(不是把复杂性推卸给开发者), 并且我们只接受我们确信绝对必要的功能。

 

Object Model. Ice restricts its object model to a bare minimum: Built-in data types are limited to signed integers, floating-point numbers, Booleans, Unicode strings, and 8-bit uninterpreted (binary) bytes. User-defined types include constants, enumerations, structures, sequences, dictionaries, and exceptions with inheritance. Remote objects are modeled as interfaces with multiple inheritance that contain operations with input and output parameters and a return value. Interfaces are passed by reference—that is, passing an interface passes an invocation handle via which an object can be invoked remotely.

 

对象模型。 Ice将其对象模型限制到了最小化。 内置数据类型仅限于有符号整数、浮点数、布尔型、Unicode字符串 和8位无解释(二进制)字节。 用户定义的类型包括常量、枚举、结构、序列、字典,以及带继承的异常。 远程对象建模为带多重继承的接口, 而接口包含具有输入输出参数和返回值的各种操作。 接口通过引用传递,也就是说,传递接口会传递一个调用句柄, 通过该句柄,就可以远程调用一个对象。

 

To support client-side behavior and object migration, we added classes: operation invocations on a class execute in the client’s address space (instead of the server’s, as is the case for interfaces). In addition, classes can have state (whereas interfaces, at the object-modeling level, are always stateless). Classes are passed by value—that is, passing a class instance passes the state of the class instead of a handle to a remote object.

 

为了支持客户端行为和对象迁移,我们添加了类: 类上调用的操作执行于客户端的地址空间 (而不是就接口而言的服务器端)。 此外,类可以有状态(而接口,在对象建模的层次,总是无状态的)。 类是按值传递的,也就是说,传递类实例会将该类的状态传递到远程对象, 而不是传递句柄。

 

We did not attempt to pass behavior: This would require a virtual execution environment for objects but would be in conflict with our performance and multi-language requirements. Instead, we implemented identical behavior for a class at all its possible host locations (clients and servers): Rather than shipping code around, we provide the code wherever it is needed and ship only the state. To migrate an object, a process passes a class instance to another process and then destroys its copy of the instance; semantically, the effect is the same as migrating both state and behavior.

 

我们没有试图传递行为: 这将要求为对象建立一个虚拟执行环境, 这与我们的性能和多语言需求相冲突。 相反,我们实现了类在其所有可能的主机位置(客户端和服务器)上 具有同一行为: 我们不是到处分发代码,而是在有需要的地方提供代码, 并且仅仅分发状态。 为了迁移对象,进程传递类实例到另一进程, 然后消毁自己的实例拷贝; 在语义上,其效果与同时迁移状态和行为是相同的。

 

Architecturally, implementing object migration in this way is a two-edged sword because it requires all host locations to implement identical (as opposed to merely similar) behavior. This has ramifications for versioning: If we change the behavior of a class at one host location, we must change the behavior of that class at all other locations (or suffer inconsistent behavior). Multiple languages also require attention. For example, if a class instance passes from a C++ server to a Java client, we must provide C++ and Java implementations with identical behavior. (Obviously, this requires more effort than implementing the behavior just once in a single language and single server.)

 

在架构上,以这种方式实现对象迁移是一把双刃剑, 因为它要求所有的主机位置实现同一行为(而非仅仅类似)。 这对版本控制有复杂的影响: 如果我们在一个主机位置改变类的行为, 我们必须改变那个类在所有其他位置的行为 (否则就会有不一致的行为)。 多语言也要注意。 比如,如果类实例从C++服务器传递到Java客户端, 我们必须提供具有同一行为的C++和Java实现。 (显然,相比用单一语言,在单一服务器上, 仅仅实现一次行为,这需要更多的努力。)

 

For environments such as Wish, where we control both client and server deployment, this is acceptable; for applications that provide only servers and rely on other parties to provide clients, this can be problematic because ensuring identical behavior of third-party class implementations is difficult.

 

对于像Wish这样的环境,我们控制了客户端和服务器的部署,这是可以接受的; 但对于只提供服务器,而依靠其他各方提供客户端的情况,这就有问题, 因为很难确保第3方类的实现具有同一行为。

 

Protocol Design. To meet our performance goals, we broke with established wisdom for RPC protocols in two ways:

 

协议设计。 为了满足性能目标,我们摒弃了RPC协议中的2个既定观念:

 

•  Data is not tagged with its type on the wire and is encoded as compactly as possible: The encoding uses no padding (everything is byte-aligned) and applies a number of simple techniques to save bandwidth. For example, positive integers less than 255 require a single byte instead of four bytes, and strings are not NUL terminated. This encoding is more compact (sometimes by a factor of two or more, depending on the type of data) than CORBA’s CDR (common data representation) encoding.

 

•  数据传输时没有类型标记,并且尽可能的以紧凑格式编码: 编码没有填充(结构都以1字节对齐), 并且使用了一些简单的技术节省带宽。 例如,小于255的正整数需要1个字节而不是4字节, 字符串不以NUL结尾。 这样的编码比CORBA的CDR (common data representation,通用数据表示) 编码更紧凑(有时可达到2倍以上,视数据类型而定)。

 

•  Data is always marshaled in little-endian byte order. We rejected a receiver-makes-it-right approach (as used by CORBA) because experiments showed no measurable performance gain.

 

•  数据总是以little-endian字节序组编(marshal)。 我们不使用receiver-makes-it-right(接收者让它正确)方法(它用于CORBA), 因为实验表明该方法没有可观的性能提升。

 

The protocol supports compression for better performance over low-speed links. (Interestingly, for high-speed links, compression is best disabled: It takes more time to compress data than to send it uncompressed.)

 

协议支持压缩,以在低速链路上有更好的性能。 (有趣的是,对高速链路,最好禁用压缩:压缩数据太费时,还不如不压缩传输。)

 

The protocol encodes request data as a byte count followed by the payload as a blob. This allows the receiver of a message to forward it to a number of downstream receivers without the need to unmarshal and remarshal the message. Avoiding this cost was important so we could build efficient message switches for event distribution.

 

协议将请求数据编码为一个字节计数加一个blob净荷。 这允许消息接收者向下游接收者转发消息, 而无需解编(unmarshal)和重新组编(remarshal)。 避免这种消耗很重要,这样我们才能为事件分发建立高效的消息分派。

 

The protocol supports TCP/IP and UDP (user datagram protocol). For secure communications, we use SSL (secure sockets layer): It is freely available and has been extensively scrutinized for flaws by the security community.

 

协议支持TCP/IP和UDP(user datagram protocol,用户数据报协议)。 为了安全通信,我们使用SSL(secure sockets layer,安全套接字层): 它是免费的,并且安全社区已经对它进行了广泛的检验。

 

The protocol is bidirectional, so a server can make a callback over a connection that was previously established by a client. This is important for communication through firewalls, which usually permit outgoing connections, but not incoming ones. The protocol also works across NAT boundaries.

 

协议是双向的,所以服务器可以在客户端先前建立的连接上进行回调。 这对穿越防火墙的通信很重要,防火墙通常允许向外的连接,而不能被连入。 该协议同时还能穿越NAT边界。

 

Classes make the protocol more complex because they are polymorphic: If a process sends a derived instance to a receiver that understands only a base type of that instance, the Ice runtime slices the instance to the most-derived base type that is known to the receiver. Slicing requires the receiver to unmarshal data whose type is unknown. Further, classes can be self-referential and form arbitrary graphs of nodes: Given a starting node, the Ice runtime marshals all reachable nodes so graphs require the sender to perform cycle detection.

 

类使得协议更复杂,因为类是多态的: 如果进程发送了一个派生类的实例给接收者, 而接收者只知道该实例的基类的话, Ice运行库(runtime)会将该实例剪切(slice)为接收者所知道的最近基类。 剪切要求接收者解编未知类型的数据。 而且,类可以是自引用的(self-referential), 并形成任意的节点图: 给定一个起始节点,Ice运行库会组编所有可达的节点, 因此节点图要求发送者执行环检测。

 

The implementation of slicing and class graphs is surprisingly complex. To support unmarshaling, the protocol sends classes as individually encapsulated slices, each tagged with their type. On average (compared with structures), this requires 10 to 15 percent extra bandwidth. To preserve the identity relationships of nodes and to detect cycles, the marshaling code creates additional data structures. On average, this incurs a performance penalty of 5 to 10 percent. Finally, for C++, we had to write a garbage collector to avoid memory leaks in the presence of cyclic class graphs, which was nontrivial. Without slicing and class graphs, the protocol implementation would have been simpler and (for classes) slightly faster.

 

剪切和类图的实现异常复杂。 为了支持解编,协议按单独封装的剪切片发送类,每个都标记其类型。 平均而言(对比结构),这需要10%-15%的额外带宽。 为了保存节点关系标识,也为了检测环, 组编代码会生成附加的数据结构。 平均而言,这会产生5%-10%的性能损耗。 最后,对于C++,我们必须编写垃圾收集器, 以避免环形类图表示中的内存泄漏, 这也不容易。 如果没有剪切和类图,协议实现将更简单,(对类来说)也略微更快些。

 

Versioning. The object model supports multiple interfaces: Instead of having a single most-derived interface, an object can provide any number of interfaces. Given a handle to an object, clients can request a specific interface at runtime using a safe downcast. Multiple interfaces permit versioning of objects without breaking on-the-wire compatibility: To create a newer version, we add new interfaces to existing objects. Already-deployed clients continue to work with the old interfaces, whereas new clients can use the new interfaces.

 

版本控制。 对象模型支持多接口: 对象可以提供任意多的接口,而不是单一的最终接口。 给定一个对象的句柄,客户端可以在运行时利用安全的向下转型,来请求一个特定接口。 多接口允许对象的版本控制,而不会破坏已上线对象的兼容性: 要创建一个新的版本,我们会在现有对象上添加新的接口。 已经部署的客户端使用旧接口继续运行,而新客户端可以使用新接口。

 

Used naively, multiple interfaces can lead to a versioning mess that forces clients to continuously choose the correct version. To avoid these problems, we designed the game such that clients access it via a small number of bootstrap objects for which they choose an interface version. Thereafter, clients acquire handles to other objects via their chosen interfaces on bootstrap objects, so the desired version is known implicitly to the bootstrap object. The Ice protocol provides a mechanism for implicit propagation of contextual information such as versioning, so we need not pollute all our object interfaces by adding an extra version parameter.

 

如果盲目使用,多接口可以导致版本混乱, 使得客户端要不断地选择正确版本。 为避免这些问题,我们这样设计游戏: 让客户端通过几个引导对象访问接口, 客户端只需为引导对象选择一个接口版本。 因此,客户端通过其所选的引导对象接口,来获得其他对象的句柄, 引导对象完全知道需要的版本。 对于比如版本这样的上下文信息, Ice协议提供了一个隐式传播机制, 我们不必为所有对象的接口都添加额外的版本参数。

 

Multiple interfaces reduced development time of the game because, apart from versioning, they allowed us to use loose coupling at the type level between clients and servers. Instead of modifying the definition of an existing interface, we could add new features by adding new interfaces. This reduced the number of dependencies across the system and shielded developers from each others’ changes and the associated compilation avalanches that often ensue.

 

多接口缩短了游戏开发时间,因为,除了版本之外, 多接口还允许我们在客户端和服务器之间的类型级别中使用松耦合。 不修改现有接口的定义,我们就可以通过添加新接口来增加新功能。 这减少了整个系统的依赖数量, 并且屏蔽了开发者相互间的变更, 以及由此带来的相关的编译雪崩(compilation avalanche)。

On the downside, multiple interfaces incur a loss of static type safety because interfaces are selected only at runtime, which makes the system more vulnerable to latent bugs that can escape testing. When used judiciously, however, multiple interfaces are useful in combating the often excessively tight coupling of traditional RPC approaches.

 

多接口的缺点是,它招致了静态类型安全性的损失, 因为接口只在运行时被选择, 这使得系统更容易受到潜伏错误的伤害, 即那些逃过测试的错误。 但是,如果被明智地使用, 相比传统RPC方法中往往过紧的耦合, 多接口更为有用。

 

Ease of Use. Ease of use is an overriding design goal. On the one hand, this means that we keep the runtime APIs as simple and small as possible. For example, 29 lines of specification are sufficient to define the API to the Ice object adapter. Despite this, the object adapter is fully functional and supports flexible object implementations, such as separate servant per object, one-to-many mappings of servants to objects, default servants, servant locators, and evictors. By spending a lot of time on the design, we not only kept the APIs small, but also reaped performance gains as a result of smaller code and working set sizes.

 

易用性。 易用性是设计的首要目标。 一方面,这意味着我们要保持运行库API尽可能的简单,尽可能的小。 比如,29行的说明足以定义Ice对象适配器(object adapter)的API。 尽管如此,对象适配器是功能完善的,并且支持灵活的对象实现, 比如每个对象有单独的servant(服务者)、 servant与对象是一对多关系、 默认servant、servant locator(服务者定位器), 和evictor(逐出器)。 通过长时间的设计,我们不仅使API很小, 而且由于更小的代码和工作集而获得了性能提升。

 

On the other hand, we want language mappings that are simple and intuitive. Limiting ourselves to a small object model paid off here—fewer types mean less generated code and smaller APIs.

 

另一方面,我们想要简单直观的语言映射。 我们把自己限制为小型对象模型,在这里得到了补偿: 更少的类型意味着更少的生成代码和更小的API。

 

The C++ mapping is particularly important: From CORBA, we knew that a poorly designed mapping increases development time and defect count, and we wanted something safer. We settled on a mapping that is small (documented in 40 pages) and provides a high level of convenience and safety. In particular, the mapping is integrated with the C++ standard template library, is fully threadsafe, and requires no memory management. Developers never need to deallocate anything, and exceptions cannot cause memory leaks.

 

C++映射尤其重要: 从CORBA我们知道, 设计不良的映射会增加开发时间和缺陷数, 而我们想要更安全的东西。 我们最终决定的映射很小(文档只有40页), 并且提供了高水平的便利性和安全性。 尤其是,映射集成了C++标准库,是完全线程安全的, 并且不需要内存管理。 开发者不用释放任何东西,并且异常也不会引起内存泄漏。

 

One issue we repeatedly encounter for language mappings is namespace collision. Each language has its own set of keywords, library namespaces, and so on. If the (language-independent) object model uses a name that is reserved in a particular target language, we must map around the resulting collision. Such collisions can be surprisingly subtle and confirmed, yet again, that API design (especially generic API design, such as for a language mapping) is difficult and time consuming. The choice of the trade-off between ease of use and functionality also can be contentious (such as our choice to disallow underscores in object-model identifiers to create a collision-free namespace).

 

语言映射中,我们多次遇到的一个的问题是,名字空间冲突。 每个语言都有它们自己的一套关键字、库名字空间,等等, 如果(语言无关的)对象模型使用了一个特定目标语言中保留的名字, 我们必须重新映射来绕过产生的冲突。 这种冲突会出奇的微妙,并且反复地出现,以致于API设计费时又费力 (特别是通用API设计,比如语言映射)。 在易用性和功能性之间权衡的选择也可能是有争议的 (比如我们选择不允许对象模型的标识符用下划线来产生不冲突的名字空间)。

 

Persistence. To provide object persistence, we extended the object model to permit the definition of persistence attributes for objects. To the developer, making an object persistent consists of defining those attributes that should be stored in the database. A compiler processes these definitions and generates a runtime library that implements associative containers for each type of object.

 

持久化。 为了提供对象持久化,我们扩展了对象模型, 允许为对象定义持久化属性。 对开发者来说,让对象持久化就是定义那些应该存储在数据库中的属性。 编译器处理这些定义,并生成运行时库, 为每种类型的对象实现关联容器。

 

Developers access persistent objects by looking them up in a container by their keys—if an object is not yet in memory, it is transparently loaded from the database. To update objects, developers simply assign to their state attributes. Objects are automatically written to the database by the Ice runtime. (Various policies can be used to control under what circumstances a physical database update takes place.)

 

开发者访问持久对象时,要在容器中用其键值查找它们, 如果对象尚未在内存中,它会透明地从数据库加载。 当更新对象时,开发者只要对它们的状态属性赋值。 Ice运行库会自动地把对象写入数据库 (可以使用各种策略来控制在何种情况下执行物理数据库更新)。

 

This model makes database access completely transparent. For circumstances in which greater control is required, a small API allows developers to establish transaction boundaries and preserve database integrity.

 

这个模型使得数据库访问完全透明。 对于需要更多控制的情况来说, 小型的API允许开发者建立事务的边界并维护数据库的完整性。

 

To allow us to change the game without continuously having to migrate databases to new schemas, we developed a database transformation tool. For simple feature additions, we supply the tool with the old and new object definitions—the tool automatically generates a new database schema and migrates the contents of the old database to conform to the new schema. For more complex changes, such as changing the name of a structure field or changing the key type of a dictionary, the tool creates a default transformation script in XML that a developer can modify to implement the desired migration action.

 

为了允许我们改变游戏,但不必反复地迁移数据库到新的模式, 我们开发了一个数据库转换工具。 对于简单的功能添加,我们只要向工具提供新旧对象的定义, 工具会自动地生成一个新的数据库模式, 并迁移旧库的内容以符合新的模式。 对更复杂的变更,比如改变结构的字段名, 或者改变字典键值的类型, 工具会创建一个默认的XML转换脚本, 开发者可以修改它以实现所需的迁移动作。

 

This tool has been useful, although we keep thinking of new features that could be incorporated. As always, the difficulty is in knowing when to stop: The temptation to build better tools can easily detract from the overall project goals. (“Inside every big program is a little program struggling to get out.”)

 

这个工具很有用,尽管我们不断地想到还可以纳入的新功能。 就像通常一样,困难在于知道何时停止: 建立更好的工具,这个诱惑很容易降低整个项目的目标。 (“Inside every big program is a little program struggling to get out.”, “在每个大程序中,都有一个小程序挣扎着要出来。”)

 

Threading. We built a portable threading API that provides developers with platform-independent threading and locking primitives. For remote call dispatch, we decided to support only a leader/followers threading model.4 In some situations, in which a blocking or reactive model would be better suited, this decision cost us a little in performance, but it gained us a simpler runtime and APIs and reduced the potential for deadlock in nested RPCs.

 

多线程。 我们建立了一个可移植的线程API, 给开发者提供了平台无关的线程和锁原语。 对于远程调用派发,我们决定仅支持leader/follower(领导者/跟随者)线程模型[4]。 在阻塞或者反应模式更合适的情况下, 这个决定使我们损失了一点性能, 但是它让我们得到了更简单的运行库和API, 并且减少了嵌套RPC调用中死锁的可能性。

 

Scalability. Ice permits redundant implementations of objects in different servers. The runtime automatically binds to one of an object’s replicas and, if a replica becomes unavailable, fails over to another replica. The binding information for replicas is kept in configuration and is dynamically acquired at runtime, so adding a redundant server requires only a configuration update, not changes in source code. This allows us to take down a game server for a software upgrade without having to kick all players using that server out of the game. The same mechanism also provides fault tolerance in case of hardware failure.

 

扩展性。 Ice允许在不同的服务器上冗余的对象实现。 运行库会自动绑定到一个对象副本(replica), 并且,如果副本不可用时, 会自动切换(fail over)到另一个副本。 副本的绑定信息保存在配置中,并在运行时动态获得, 所以添加冗余服务器只需更新配置,不用修改源码。 这允许我们卸下游戏服务器做软件升级, 而不必把所有使用这个服务器的玩家都踢出游戏。 同样的机制还提供了硬件故障时的容错能力。

 

To support federating logical functions across a number of servers and to share load, we built an implementation repository that delivers binding information to clients at runtime. A randomizing algorithm distributes load across any number of servers that form a logical service.

 

为了支持在多台服务器上联盟(federate)逻辑函数, 以分担负载, 我们建立了一个实现库(implementation repository), 它会在运行时传送绑定信息到客户端。 一个随机算法会分配负载到任意数量的服务器,让它们组成一个逻辑服务。

 

We made a number of trade-offs for replication and load sharing. For example, not all game components can be upgraded without server shutdown, and a load feedback mechanism would provide better load sharing than simple randomization. Given our requirements, these limitations are acceptable, but, for applications with more stringent requirements, this might not be the case. The skill is in deciding when not to build something as much as when to build it—infrastructure makes no sense if the cost of developing it exceeds the savings during its use.

 

我们为复制和负载共享作出了若干权衡。 比如,不是所有的游戏组件都能在线升级, 以及负载反馈机制将比简单的随机化提供更优的负载共享。 鉴于我们的需求,这些限制是可以接受的, 但是,对于需求更严格的应用,可能并非如此。 技巧在于决定何时不要建立某个东西,其重要性等同于决定何时要建立它: 如果基础设施的开发成本超过使用它所节省的成本,这就毫无意义了。

SIMPLE IS BETTER

简单就好

Our experiences with Ice during game development have been very positive. Despite running a distributed system that involves dozens of servers and thousands of clients, the middleware has not been a performance bottleneck.

 

在游戏开发中,我们对Ice的感觉是非常肯定的。 虽然运行的是包含几十台服务器和数千客户端的分布式系统, 但中间件并不是性能瓶颈。

 

Our focus on simplicity during design paid off many times during development. When it comes to middleware, simpler is better: A well-chosen and small feature set contributes to timely development, as well as to meeting performance goals.

 

在设计中对简单性的专注,让我们在开发中赢得了数倍的回报。 对于中间件,简单就好: 一个精心挑选并且小型化的功能集,有利于按时开发,以及达到性能目标。

 

Finally, designing and implementing middleware is difficult and costly, even with many years of experience. If you are looking for middleware, chances are that you will be better off buying it than building it.

 

最后,设计和实现中间件是困难和昂贵的,即使对于有多年经验的老手。 如果你正在寻找中间件,很有可能你最好是购买一个而不是构建一个。

REFERENCES

参考

1. Mutable Realms (Wish home page): see http://www.mutablerealms.com .

2. Henning, M., and S. Vinoski. Advanced CORBA Programming with C++. Addison-Wesley, Reading: MA, 1999.

3. ZeroC. Distributed Programming with Ice: see http://www.zeroc.com/Ice-Manual.pdf .

4. Schmidt, D. C., O’Ryan, C., Pyarali, I., Kircher, M., and Buschmann, F. Leader/ Followers: A design pattern for efficient multithreaded event demultiplexing and dispatching. Proceedings of the 7th Pattern Languages of Programs Conference (PLoP 2000); http://deuce.doc.wustl.edu/doc/pspdfs/lf.pdf .

 

MICHI HENNING ([email protected] ) is chief scientist of ZeroC. From 1995 to 2002, he worked on CORBA as a member of the Object Management Group’s Architecture Board and as an ORB implementer, consultant, and trainer. With Steve Vinoski, he wrote Advanced CORBA Programming with C++ (Addison-Wesley, 1999), the definitive text in the field. Since joining ZeroC, he has worked on the design and implementation of Ice and in 2003 coauthored “Distributed Programming with Ice” for ZeroC. He holds an honors degree in computer science from the University of Queensland, Australia.

 

Originally published in Queue vol. 1, no. 10
see this item in the ACM Digital Library

Back to top

你可能感兴趣的:(网游开发)