https://developers.google.com/protocol-buffers/docs/overview
Developer Guide
开发者指南
Welcome to the developer documentation for protocol buffers – alanguage-neutral, platform-neutral, extensible way of serializing structureddata for use in communications protocols, data storage, and more.
欢迎来到protobuf开发者文档,protobuf是语言中立、平台中立,在通信协议、数据存储中以可扩展的方式来序列化数据。
Thisdocumentation is aimed at Java, C++, or Python developers who want to use protocolbuffers in their applications. This overview introduces protocol buffers andtells you what you need to do to get started – you can then go on to follow the tutorials or delvedeeper into protocol buffer encoding. APIreference documentation is also provided for all three languages, as well as language and style guides forwriting .proto files.
这篇文档的服务对象是那些想在应用中使用protobuf的Java、C++或Python开发者。这篇文档会介绍protobuf并且告诉你使用protobuf需要做什么:接下来你可以看指南,或者从protocolbuffer encoding进行更深入的学习。reference documentation会提供这三种语言的API和编写.proto文件的语言和风格建议。
What are protocol buffers?
Protobuf是什么?
Protocol buffers are a flexible, efficient, automated mechanism forserializing structured data – think XML, but smaller, faster, and simpler. Youdefine how you want your data to be structured once, then you can use specialgenerated source code to easily write and read your structured data to and froma variety of data streams and using a variety of languages. You can even updateyour data structure without breaking deployed programs that are compiledagainst the "old" format.
Protobuf是一个灵活、高效、自动序列化数据的工具,类似于XML,但是更小、更快、更简单。定义你想要的数据,然后你可以使用不同的语言,利用生成的代码,从数据流中读或写数据。你甚至可以在不干扰已部署程序的情况下,更新新的数据结构。
How do they work?
他们是怎样工作的?
You specifyhow you want the information you're serializing to be structured by definingprotocol buffer message types in .proto files. Each protocol buffer message is a small logical record ofinformation, containing a series of name-value pairs. Here's a very basicexample of a .proto file that defines a message containing information about a person:
你想要序列化的内容,可以通过.proto文件定义protobuf消息类型。每个protobuf消息是一条包含一系列键值对的信息记录。这是一个非常基础的.proto文件例子,定义的是关于person信息的消息。
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
As you cansee, the message format is simple – each message type has one or more uniquelynumbered fields, and each field has a name and a value type, where value typescan be numbers (integer or floating-point), booleans, strings, raw bytes, oreven (as in the example above) other protocol buffer message types, allowingyou to structure your data hierarchically. You can specify optional fields,required fields, and repeated fields. You can find more information aboutwriting .proto files in the Protocol Buffer Language Guide.
如你所见,消息格式很简单,每种消息类型有一个或多个有编号的字段(编号不能重复),每个字段有名字和类型,类型可以是数字型(整形或浮点型),布尔型,字符型,byte型,甚至其他protobuf消息类型(如上面的例子),可以分层次的组织你的数据。字段分为可选的、必须的和可重复的。关于写.proto文件的更多信息,参看ProtocolBuffer Language Guide。
Once you'vedefined your messages, you run the protocol buffer compiler for your application'slanguage on your .proto file to generate data access classes. These provide simple accessors foreach field (like query() and set_query()) as well asmethods to serialize/parse the whole structure to/from raw bytes – so, forinstance, if your chosen language is C++, running the compiler on the aboveexample will generate a class called Person. You canthen use this class in your application to populate, serialize, and retrieve Person protocolbuffer messages. You might then write some code like this:
一旦定义完消息,你可以运行protobuf编译器,把.proto文件生成你程序语言的数据访问类。它提供访问每个字段(如query() 和 set_query())和序列化或反序列化的方法,如果你选择C++,运行上面的例子将生成Person类。你可以在应用中使用这个类序列化或从消息中获取Person的数据。你写的代码可能类似于这个:
Person person;
person.set_name("JohnDoe");
person.set_id(1234);
person.set_email("[email protected]");
fstream output("myfile", ios::out| ios::binary);
person.SerializeToOstream(&output);
Then, later on, you could read your message back in:
然后,你可以读消息像这样:
fstream input("myfile", ios::in| ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: "<< person.name()<< endl;
cout << "E-mail: "<< person.email()<< endl;
You can add new fields to your message formats without breakingbackwards-compatibility; old binaries simply ignore the new field when parsing.So if you have a communications protocol that uses protocol buffers as its dataformat, you can extend your protocol without having to worry about breakingexisting code.
你在消息中增加新的字段,不会破坏向后兼容性;当老版本解析的时候会忽略新的字段。所以如果你的通信协议使用protobuf序列化数据的话,当扩展协议的时候,不用担心破坏之前已存在的代码。
You'll find acomplete reference for using generated protocol buffer code in the API Reference section, and you can find out more about howprotocol buffer messages are encoded in Protocol Buffer Encoding.
你可以从API Referencesection了解更多关于使用protobuf生成代码的信息,也可以从ProtocolBuffer Encoding了解更多关于使用protobuf编码的信息。
Why not just use XML?
为什么不只使用XML?
Protocol buffers have many advantages over XML for serializing structureddata. Protocol buffers:
使用protobuf序列化数据相比于XML有很多优势:
· are simpler
· 更简单
· are 3 to 10 times smaller
· 小3到10倍
· are 20 to 100 times faster
· 快20到100倍
· are less ambiguous
· 没有歧义
· generate data access classes that areeasier to use programmatically
· 生成的数据访问类,使程序操作更容易
For example,let's say you want to model a person with a name and an email. In XML, youneed to do:
例如,假如Person有名字和email两个字段,使用XML,你需要:
while thecorresponding protocol buffer message (in protocol buffer text format) is:
使用protobuf表示(protobuf文本格式)是这样:
# Textual representation of a protocolbuffer.
# This is *not* the binary format used onthe wire.
person {
name: "John Doe"
email: "[email protected]"
}
When thismessage is encoded to the protocol buffer binary format (the textformat above is just a convenient human-readable representation for debuggingand editing), it would probably be 28 bytes long and take around 100-200nanoseconds to parse. The XML version is at least 69 bytes if you removewhitespace, and would take around 5,000-10,000 nanoseconds to parse.
Also, manipulating a protocol buffer is much easier:
当将这个消息编码成protobuf二进制类型(上面的文本类型是为了方便人们读,debug和编辑),它可能只有28字节并且解析的时候只要100-200纳秒。使用XML在你移除空格的情况下至少69字节,并且解析的时候需要5000-10000纳秒。
而且,使用protobuf读数据非常容易:
cout << "Name:" << person.name()<< endl;
cout << "E-mail: "<< person.email()<< endl;
Whereas with XML you would have to do something like:
而使用XML读数据你不得不这样做:
cout << "Name:"
<< person.getElementsByTagName("name")->item(0)->innerText()
<< endl;
cout << "E-mail: "
<< person.getElementsByTagName("email")->item(0)->innerText()
<< endl;
However,protocol buffers are not always a better solution than XML – for instance,protocol buffers would not be a good way to model a text-based document withmarkup (e.g. HTML), since you cannot easily interleave structure with text. Inaddition, XML is human-readable and human-editable; protocol buffers, at leastin their native format, are not. XML is also – to some extent –self-describing. A protocol buffer is only meaningful if you have the messagedefinition (the .proto file).
然而,protobuf不总是一个好的解决方案相比于XML,例如,protobuf不能很好的表示以文本为基础的标记文档,所以你不能直接插入数据。另外,XML是人类直接可读和可编辑的,某种程度上也是自描述的。而protobuf在你有消息定义(.proto文件)的时候,才能看出意义。
Sounds like the solution for me! Howdo I get started?
听起来这个解决方案适合我!我应该怎样开始?
Download the package – thiscontains the complete source code for the Java, Python, and C++ protocol buffercompilers, as well as the classes you need for I/O and testing. To build andinstall your compiler, follow the instructions in the README.
Once you'reall set, try following the tutorial for yourchosen language – this will step you through creating a simple application thatuses protocol buffers.
下载包,里面包括Java、Python和C++的protobuf编译器和所有的源代码,I/O和测试需要的类。按照README的指导,安装你需要的编译器。
安装好后,按照你所选择语言的指南,试着使用protobuf创建一个简单的应用。
A bit of history
一点历史
Protocol buffers were initially developed at Google to deal with an indexserver request/response protocol. Prior to protocol buffers, there was a formatfor requests and responses that used hand marshalling/unmarshalling of requestsand responses, and that supported a number of versions of the protocol. Thisresulted in some very ugly code, like:
Protobuf最初开发是为了Google索引服务器处理请求/响应的协议。在protobuf之前,需要手动封包/解包,这些请求和响应的消息,并且协议需要支持版本号。导致的结果是代码非常丑陋,像这样:
if (version == 3) {
...
} else if (version>4) {
if (version==5) {
...
}
...
}
Explicitly formatted protocols also complicated the rollout of newprotocol versions, because developers had to make sure that all servers betweenthe originator of the request and the actual server handling the requestunderstood the new protocol before they could flip a switch to start using thenew protocol.
格式清晰的协议,在推出新版本时也是复杂的,因为开发者必须确保所有的服务器,在切换到新版本之前,从请求的发起者到实际处理响应的服务器能够正确理解新的协议。
Protocol buffers were designed to solve many of these problems:
Protobuf被设计成解决这些问题:
· New fields could be easilyintroduced, and intermediate servers that didn't need to inspect the data couldsimply parse it and pass through the data without needing to know about all thefields.
· 新的字段能够轻易的增加,不需要检查数据的中间服务器能简单的解析数据,并且不需要知道数据的所有字段。
· Formats were more self-describing,and could be dealt with from a variety of languages (C++, Java, etc.)
· 格式是自描述的,并且能够使用多种的语言(C++、Java等等)进行处理
However, users still needed to hand-write their own parsing code.
然而,用户仍需要手写他们自己的解析代码。
As the system evolved, it acquired a number of other features and uses:
随着系统的发展,它需要一些其他的功能:
· Automatically-generated serializationand deserialization code avoided the need for hand parsing.
· 自动生成序列化和反序列化的代码,避免手动解析。
· In addition to being used forshort-lived RPC (Remote Procedure Call) requests, people started to useprotocol buffers as a handy self-describing format for storing datapersistently (for example, in Bigtable).
· 除了用来处理RPC(远程进程调用)的请求,人们开始用protobuf作为一种自描述格式来存储持久化数据(例如Bigtable)。
· Server RPC interfaces started to bedeclared as part of protocol files, with the protocol compiler generating stubclasses that users could override with actual implementations of the server'sinterface.
· RPC服务接口开始被声明为协议文件的一部分,随着协议编译器生成的根类,能被用户实现的服务接口来重写。
Protocolbuffers are now Google's linguafranca for data – at time of writing, there are48,162 different message types defined in the Google code tree across 12,183 .proto files.They're used both in RPC systems and for persistent storage of data in avariety of storage systems.
Protobuf现在是Google数据的通用格式,截止到写这篇文章为止,Google的代码中通过12183个.proto文件定义了48162个不同的消息类型,他们被用在RPC系统和各种需要存储持久化数据的系统中。