Developer Guide
Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.
What are protocol buffers?
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
How do they work?
You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.
Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:
Person person;
person.set_name("John Doe");
person.set_id(1234);
person.set_email("[email protected]");
fstream output("myfile", ios::out | ios::binary);
person.SerializeToOstream(&output);
Then, later on, you could read your message back in:
fstream input("myfile", ios::in | ios::binary);
Person person;
person.ParseFromIstream(&input);
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;
You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.
You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.
Why not just use XML?
Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:
For example, let's say you want to model a person with a name and an email. In XML, you need to do:
while the corresponding protocol buffer message (in protocol buffer text format) is:
# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "[email protected]"
}
When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.
Also, manipulating a protocol buffer is much easier:
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;
Whereas with XML you would have to do something like:
cout << "Name: "
<< person.getElementsByTagName("name")->item(0)->innerText()
<< endl;
cout << "E-mail: "
<< person.getElementsByTagName("email")->item(0)->innerText()
<< endl;
However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).
Sounds like the solution for me! How do I get started?
Download the package – this contains the complete source code for the Java, Python, and C++ protocol buffer compilers, as well as the classes you need for I/O and testing. To build and install your compiler, follow the instructions in the README.
Once you're all set, try following the tutorial for your chosen language – this will step you through creating a simple application that uses protocol buffers.
Introducing proto3
Our most recent version 3 release introduces a new language version - Protocol Buffers language version 3 (aka proto3), as well as some new features in our existing language version (aka proto2). Proto3 simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages: our current release lets you generate protocol buffer code in Java, C++, Python, Java Lite, Ruby, JavaScript, Objective-C, and C#. In addition you can generate proto3 code for Go using the latest Go protoc plugin, available from the golang/protobuf Github repository. More languages are in the pipeline.
Note that the two language version APIs are not completely compatible. To avoid inconvenience to existing users, we will continue to support the previous language version in new protocol buffers releases.
You can see the major differences from the current default version in the release notes and learn about proto3 syntax in the Proto3 Language Guide. Full documentation for proto3 is coming soon!
(If the names proto2 and proto3 seem a little confusing, it's because when we originally open-sourced protocol buffers it was actually Google's second version of the language – also known as proto2. This is also why our open source version number started from v2.0.0).
A bit of history
Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:
if (version == 3) {
...
} else if (version > 4) {
if (version == 5) {
...
}
...
}
Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protocol.
Protocol buffers were designed to solve many of these problems:
However, users still needed to hand-write their own parsing code.
As the system evolved, it acquired a number of other features and uses:
Protocol buffers are now Google's lingua franca for data – at time of writing, there are 306,747 different message types defined in the Google code tree across 348,952 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.
//https://developers.google.com/protocol-buffers/docs/proto3
This guide describes how to use the protocol buffer language to structure your protocol buffer data, including .proto
file syntax and how to generate data access classes from your .proto
files. It covers the proto3 version of the protocol buffers language: for information on the proto2 syntax, see the Proto2 Language Guide.
This is a reference guide – for a step by step example that uses many of the features described in this document, see the tutorial for your chosen language (currently proto2 only; more proto3 documentation is coming soon).
First let's look at a very simple example. Let's say you want to define a search request message format, where each search request has a query string, the particular page of results you are interested in, and a number of results per page. Here's the .proto
file you use to define the message type.
syntax = "proto3"; message SearchRequest { string query = 1; int32 page_number = 2; int32 result_per_page = 3; }
proto3
syntax: if you don't do this the protocol buffer compiler will assume you are using proto2. This must be the first non-empty, non-comment line of the file.SearchRequest
message definition specifies three fields (name/value pairs), one for each piece of data that you want to include in this type of message. Each field has a name and a type.In the above example, all the fields are scalar types: two integers (page_number
and result_per_page
) and a string (query
). However, you can also specify composite types for your fields, including enumerations and other message types.
As you can see, each field in the message definition has a unique number. These field numbers are used to identify your fields in the message binary format, and should not be changed once your message type is in use. Note that field numbers in the range 1 through 15 take one byte to encode, including the field number and the field's type (you can find out more about this in Protocol Buffer Encoding). Field numbers in the range 16 through 2047 take two bytes. So you should reserve the numbers 1 through 15 for very frequently occurring message elements. Remember to leave some room for frequently occurring elements that might be added in the future.
The smallest field number you can specify is 1, and the largest is 229 - 1, or 536,870,911. You also cannot use the numbers 19000 through 19999 (FieldDescriptor::kFirstReservedNumber
through FieldDescriptor::kLastReservedNumber
), as they are reserved for the Protocol Buffers implementation - the protocol buffer compiler will complain if you use one of these reserved numbers in your .proto
. Similarly, you cannot use any previously reserved field numbers.
Message fields can be one of the following:
repeated
: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.In proto3, repeated
fields of scalar numeric types use packed
encoding by default.
You can find out more about packed
encoding in Protocol Buffer Encoding.
Multiple message types can be defined in a single .proto
file. This is useful if you are defining multiple related messages – so, for example, if you wanted to define the reply message format that corresponds to your SearchResponse
message type, you could add it to the same .proto
:
message SearchRequest { string query = 1; int32 page_number = 2; int32 result_per_page = 3; } message SearchResponse { ... }
To add comments to your .proto
files, use C/C++-style //
and /* ... */
syntax.
/* SearchRequest represents a search query, with pagination options to * indicate which results to include in the response. */ message SearchRequest { string query = 1; int32 page_number = 2; // Which page number do we want? int32 result_per_page = 3; // Number of results to return per page. }
If you update a message type by entirely removing a field, or commenting it out, future users can reuse the field number when making their own updates to the type. This can cause severe issues if they later load old versions of the same .proto
, including data corruption, privacy bugs, and so on. One way to make sure this doesn't happen is to specify that the field numbers (and/or names, which can also cause issues for JSON serialization) of your deleted fields are reserved
. The protocol buffer compiler will complain if any future users try to use these field identifiers.
message Foo { reserved 2, 15, 9 to 11; reserved "foo", "bar"; }
Note that you can't mix field names and field numbers in the same reserved
statement.
.proto
?When you run the protocol buffer compiler on a .proto
, the compiler generates the code in your chosen language you'll need to work with the message types you've described in the file, including getting and setting field values, serializing your messages to an output stream, and parsing your messages from an input stream.
.h
and .cc
file from each .proto
, with a class for each message type described in your file..java
file with a class for each message type, as well as a special Builder
classes for creating message class instances..proto
, which is then used with a metaclass to create the necessary Python data access class at runtime..pb.go
file with a type for each message type in your file..rb
file with a Ruby module containing your message types.pbobjc.h
and pbobjc.m
file from each .proto
, with a class for each message type described in your file..cs
file from each .proto
, with a class for each message type described in your file..pb.dart
file with a class for each message type in your file.You can find out more about using the APIs for each language by following the tutorial for your chosen language (proto3 versions coming soon). For even more API details, see the relevant API reference (proto3 versions also coming soon).
A scalar message field can have one of the following types – the table shows the type specified in the .proto
file, and the corresponding type in the automatically generated class:
.proto Type |
Notes |
C++ Type |
Java Type |
Python Type[2] |
Go Type |
Ruby Type |
C# Type |
PHP Type |
Dart Type |
double |
double |
double |
float |
float64 |
Float |
double |
float |
double |
|
float |
float |
float |
float |
float32 |
Float |
float |
float |
double |
|
int32 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. |
int32 |
int |
int |
int32 |
Fixnum or Bignum (as required) |
int |
integer |
int |
int64 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. |
int64 |
long |
int/long[3] |
int64 |
Bignum |
long |
integer/string[5] |
Int64 |
uint32 |
Uses variable-length encoding. |
uint32 |
int[1] |
int/long[3] |
uint32 |
Fixnum or Bignum (as required) |
uint |
integer |
int |
uint64 |
Uses variable-length encoding. |
uint64 |
long[1] |
int/long[3] |
uint64 |
Bignum |
ulong |
integer/string[5] |
Int64 |
sint32 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. |
int32 |
int |
int |
int32 |
Fixnum or Bignum (as required) |
int |
integer |
int |
sint64 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. |
int64 |
long |
int/long[3] |
int64 |
Bignum |
long |
integer/string[5] |
Int64 |
fixed32 |
Always four bytes. More efficient than uint32 if values are often greater than 228. |
uint32 |
int[1] |
int/long[3] |
uint32 |
Fixnum or Bignum (as required) |
uint |
integer |
int |
fixed64 |
Always eight bytes. More efficient than uint64 if values are often greater than 256. |
uint64 |
long[1] |
int/long[3] |
uint64 |
Bignum |
ulong |
integer/string[5] |
Int64 |
sfixed32 |
Always four bytes. |
int32 |
int |
int |
int32 |
Fixnum or Bignum (as required) |
int |
integer |
int |
sfixed64 |
Always eight bytes. |
int64 |
long |
int/long[3] |
int64 |
Bignum |
long |
integer/string[5] |
Int64 |
bool |
bool |
boolean |
bool |
bool |
TrueClass/FalseClass |
bool |
boolean |
bool |
|
string |
A string must always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 232. |
string |
String |
str/unicode[4] |
string |
String (UTF-8) |
string |
string |
String |
bytes |
May contain any arbitrary sequence of bytes no longer than 232. |
string |
ByteString |
str |
[]byte |
String (ASCII-8BIT) |
ByteString |
string |
List |
You can find out more about how these types are encoded when you serialize your message in Protocol Buffer Encoding.
[1] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit.
[2] In all cases, setting values to a field will perform type checking to make sure it is valid.
[3] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].
[4] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).
[5] Integer is used on 64-bit machines and string is used on 32-bit machines.
When a message is parsed, if the encoded message does not contain a particular singular element, the corresponding field in the parsed object is set to the default value for that field. These defaults are type-specific:
The default value for repeated fields is empty (generally an empty list in the appropriate language).
Note that for scalar message fields, once a message is parsed there's no way of telling whether a field was explicitly set to the default value (for example whether a boolean was set to false
) or just not set at all: you should bear this in mind when defining your message types. For example, don't have a boolean that switches on some behaviour when set to false
if you don't want that behaviour to also happen by default. Also note that if a scalar message field is set to its default, the value will not be serialized on the wire.
See the generated code guide for your chosen language for more details about how defaults work in generated code.
When you're defining a message type, you might want one of its fields to only have one of a pre-defined list of values. For example, let's say you want to add a corpus
field for each SearchRequest
, where the corpus can be UNIVERSAL
, WEB
, IMAGES
, LOCAL
, NEWS
, PRODUCTS
or VIDEO
. You can do this very simply by adding an enum
to your message definition with a constant for each possible value.
In the following example we've added an enum
called Corpus
with all the possible values, and a field of type Corpus
:
message SearchRequest { string query = 1; int32 page_number = 2; int32 result_per_page = 3; enum Corpus { UNIVERSAL = 0; WEB = 1; IMAGES = 2; LOCAL = 3; NEWS = 4; PRODUCTS = 5; VIDEO = 6; } Corpus corpus = 4; }
As you can see, the Corpus
enum's first constant maps to zero: every enum definition must contain a constant that maps to zero as its first element. This is because:
You can define aliases by assigning the same value to different enum constants. To do this you need to set the allow_alias
option to true
, otherwise the protocol compiler will generate an error message when aliases are found.
message MyMessage1 { enum EnumAllowingAlias { option allow_alias = true; UNKNOWN = 0; STARTED = 1; RUNNING = 1; } } message MyMessage2 { enum EnumNotAllowingAlias { UNKNOWN = 0; STARTED = 1; // RUNNING = 1; // Uncommenting this line will cause a compile error inside Google and a warning message outside. } }
Enumerator constants must be in the range of a 32-bit integer. Since enum
values use varint encoding on the wire, negative values are inefficient and thus not recommended. You can define enum
s within a message definition, as in the above example, or outside – these enum
s can be reused in any message definition in your .proto
file. You can also use an enum
type declared in one message as the type of a field in a different message, using the syntax _MessageType_._EnumType_
.
When you run the protocol buffer compiler on a .proto
that uses an enum
, the generated code will have a corresponding enum
for Java or C++, a special EnumDescriptor
class for Python that's used to create a set of symbolic constants with integer values in the runtime-generated class.
**Caution:** the generated code may be subject to language-specific limitations on the number of enumerators (low thousands for one language). Please review the limitations for the languages you plan to use.
During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.
For more information about how to work with message enum
s in your applications, see the generated code guide for your chosen language.
If you update an enum type by entirely removing an enum entry, or commenting it out, future users can reuse the numeric value when making their own updates to the type. This can cause severe issues if they later load old versions of the same .proto
, including data corruption, privacy bugs, and so on. One way to make sure this doesn't happen is to specify that the numeric values (and/or names, which can also cause issues for JSON serialization) of your deleted entries are reserved
. The protocol buffer compiler will complain if any future users try to use these identifiers. You can specify that your reserved numeric value range goes up to the maximum possible value using the max
keyword.
enum Foo { reserved 2, 15, 9 to 11, 40 to max; reserved "FOO", "BAR"; }
Note that you can't mix field names and numeric values in the same reserved
statement.
You can use other message types as field types. For example, let's say you wanted to include Result
messages in each SearchResponse
message – to do this, you can define a Result
message type in the same .proto
and then specify a field of type Result
in SearchResponse
:
message SearchResponse { repeated Result results = 1; } message Result { string url = 1; string title = 2; repeated string snippets = 3; }
In the above example, the Result
message type is defined in the same file as SearchResponse
– what if the message type you want to use as a field type is already defined in another .proto
file?
You can use definitions from other .proto
files by importing them. To import another .proto
's definitions, you add an import statement to the top of your file:
import "myproject/other_protos.proto";
By default you can only use definitions from directly imported .proto
files. However, sometimes you may need to move a .proto
file to a new location. Instead of moving the .proto
file directly and updating all the call sites in a single change, now you can put a dummy .proto
file in the old location to forward all the imports to the new location using the import public
notion. import public
dependencies can be transitively relied upon by anyone importing the proto containing the import public
statement. For example:
// new.proto // All definitions are moved here
// old.proto // This is the proto that all clients are importing. import public "new.proto"; import "other.proto";
// client.proto import "old.proto"; // You use definitions from old.proto and new.proto, but not other.proto
The protocol compiler searches for imported files in a set of directories specified on the protocol compiler command line using the -I
/--proto_path
flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the --proto_path
flag to the root of your project and use fully qualified names for all imports.
It's possible to import proto2 message types and use them in your proto3 messages, and vice versa. However, proto2 enums cannot be used directly in proto3 syntax (it's okay if an imported proto2 message uses them).
You can define and use message types inside other message types, as in the following example – here the Result
message is defined inside the SearchResponse
message:
message SearchResponse { message Result { string url = 1; string title = 2; repeated string snippets = 3; } repeated Result results = 1; }
If you want to reuse this message type outside its parent message type, you refer to it as _Parent_._Type_
:
message SomeOtherMessage { SearchResponse.Result result = 1; }
You can nest messages as deeply as you like:
message Outer { // Level 0 message MiddleAA { // Level 1 message Inner { // Level 2 int64 ival = 1; bool booly = 2; } } message MiddleBB { // Level 1 message Inner { // Level 2 int32 ival = 1; bool booly = 2; } } }
If an existing message type no longer meets all your needs – for example, you'd like the message format to have an extra field – but you'd still like to use code created with the old format, don't worry! It's very simple to update message types without breaking any of your existing code. Just remember the following rules:
.proto
can't accidentally reuse the number.int32
, uint32
, int64
, uint64
, and bool
are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn't fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits).sint32
and sint64
are compatible with each other but are not compatible with the other integer types.string
and bytes
are compatible as long as the bytes are valid UTF-8.bytes
if the bytes contain an encoded version of the message.fixed32
is compatible with sfixed32
, and fixed64
with sfixed64
.string
, bytes
, and message fields, optional
is compatible with repeated
. Given serialized data of a repeated field as input, clients that expect this field to be optional
will take the last input value if it's a primitive type field or merge all input elements if it's a message type field. Note that this is not generally safe for numeric types, including bools and enums. Repeated fields of numeric types can be serialized in the packed format, which will not be parsed correctly when an optional
field is expected.enum
is compatible with int32
, uint32
, int64
, and uint64
in terms of wire format (note that values will be truncated if they don't fit). However be aware that client code may treat them differently when the message is deserialized: for example, unrecognized proto3 enum
types will be preserved in the message, but how this is represented when the message is deserialized is language-dependent. Int fields always just preserve their value.oneof
is safe and binary compatible. Moving multiple fields into a new oneof
may be safe if you are sure that no code sets more than one at a time. Moving any fields into an existing oneof
is not safe.Unknown fields are well-formed protocol buffer serialized data representing fields that the parser does not recognize. For example, when an old binary parses data sent by a new binary with new fields, those new fields become unknown fields in the old binary.
Originally, proto3 messages always discarded unknown fields during parsing, but in version 3.5 we reintroduced the preservation of unknown fields to match the proto2 behavior. In versions 3.5 and later, unknown fields are retained during parsing and included in the serialized output.
The Any
message type lets you use messages as embedded types without having their .proto definition. An Any
contains an arbitrary serialized message as bytes
, along with a URL that acts as a globally unique identifier for and resolves to that message's type. To use the Any
type, you need to import google/protobuf/any.proto
.
import "google/protobuf/any.proto"; message ErrorStatus { string message = 1; repeated google.protobuf.Any details = 2; }
The default type URL for a given message type is type.googleapis.com/_packagename_._messagename_
.
Different language implementations will support runtime library helpers to pack and unpack Any values in a typesafe manner – for example, in Java, the Any type will have special pack()
and unpack()
accessors, while in C++ there are PackFrom()
and UnpackTo()
methods:
// Storing an arbitrary message type in Any. NetworkErrorDetails details = ...; ErrorStatus status; status.add_details()->PackFrom(details); // Reading an arbitrary message from Any. ErrorStatus status = ...; for (const Any& detail : status.details()) { if (detail.Is()) { NetworkErrorDetails network_error; detail.UnpackTo(&network_error); ... processing network_error ... } }
Currently the runtime libraries for working with Any types are under development.
If you are already familiar with proto2 syntax, the Any type replaces extensions.
If you have a message with many fields and where at most one field will be set at the same time, you can enforce this behavior and save memory by using the oneof feature.
Oneof fields are like regular fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members. You can check which value in a oneof is set (if any) using a special case()
or WhichOneof()
method, depending on your chosen language.
To define a oneof in your .proto
you use the oneof
keyword followed by your oneof name, in this case test_oneof
:
message SampleMessage { oneof test_oneof { string name = 4; SubMessage sub_message = 9; } }
You then add your oneof fields to the oneof definition. You can add fields of any type, except map
fields and repeated
fields.
In your generated code, oneof fields have the same getters and setters as regular fields. You also get a special method for checking which value (if any) in the oneof is set. You can find out more about the oneof API for your chosen language in the relevant API reference.
SampleMessage message; message.set_name("name"); CHECK(message.has_name()); message.mutable_sub_message(); // Will clear name field. CHECK(!message.has_name());
repeated
.sub_message
was already deleted by calling the set_name()
method.SampleMessage message; SubMessage* sub_message = message.mutable_sub_message(); message.set_name("name"); // Will delete sub_message sub_message->set_... // Crashes here
Swap()
two messages with oneofs, each message will end up with the other’s oneof case: in the example below, msg1
will have a sub_message
and msg2
will have a name
.SampleMessage msg1; msg1.set_name("name"); SampleMessage msg2; msg2.mutable_sub_message(); msg1.swap(&msg2); CHECK(msg1.has_sub_message()); CHECK(msg2.has_name());
Be careful when adding or removing oneof fields. If checking the value of a oneof returns None
/NOT_SET
, it could mean that the oneof has not been set or it has been set to a field in a different version of the oneof. There is no way to tell the difference, since there's no way to know if an unknown field on the wire is a member of the oneof.
Tag Reuse Issues
If you want to create an associative map as part of your data definition, protocol buffers provides a handy shortcut syntax:
map map_field = N;
...where the key_type
can be any integral or string type (so, any scalar type except for floating point types and bytes
). Note that enum is not a valid key_type
. The value_type
can be any type except another map.
So, for example, if you wanted to create a map of projects where each Project
message is associated with a string key, you could define it like this:
map projects = 3;
repeated
..proto
, maps are sorted by key. Numeric keys are sorted numerically.The generated map API is currently available for all proto3 supported languages. You can find out more about the map API for your chosen language in the relevant API reference.
The map syntax is equivalent to the following on the wire, so protocol buffers implementations that do not support maps can still handle your data:
message MapFieldEntry { key_type key = 1; value_type value = 2; } repeated MapFieldEntry map_field = N;
Any protocol buffers implementation that supports maps must both produce and accept data that can be accepted by the above definition.
You can add an optional package
specifier to a .proto
file to prevent name clashes between protocol message types.
package foo.bar; message Open { ... }
You can then use the package specifier when defining fields of your message type:
message Foo { ... foo.bar.Open open = 1; ... }
The way a package specifier affects the generated code depends on your chosen language:
Open
would be in the namespace foo::bar
.option java_package
in your .proto
file.option go_package
in your .proto
file.PB_
is prepended). For example, Open
would be in the namespace Foo::Bar
.option csharp_namespace
in your .proto
file. For example, Open
would be in the namespace Foo.Bar
.Type name resolution in the protocol buffer language works like C++: first the innermost scope is searched, then the next-innermost, and so on, with each package considered to be "inner" to its parent package. A leading '.' (for example, .foo.bar.Baz
) means to start from the outermost scope instead.
The protocol buffer compiler resolves all type names by parsing the imported .proto
files. The code generator for each language knows how to refer to each type in that language, even if it has different scoping rules.
If you want to use your message types with an RPC (Remote Procedure Call) system, you can define an RPC service interface in a .proto
file and the protocol buffer compiler will generate service interface code and stubs in your chosen language. So, for example, if you want to define an RPC service with a method that takes your SearchRequest
and returns a SearchResponse
, you can define it in your .proto
file as follows:
service SearchService { rpc Search (SearchRequest) returns (SearchResponse); }
The most straightforward RPC system to use with protocol buffers is gRPC: a language- and platform-neutral open source RPC system developed at Google. gRPC works particularly well with protocol buffers and lets you generate the relevant RPC code directly from your .proto
files using a special protocol buffer compiler plugin.
If you don't want to use gRPC, it's also possible to use protocol buffers with your own RPC implementation. You can find out more about this in the Proto2 Language Guide.
There are also a number of ongoing third-party projects to develop RPC implementations for Protocol Buffers. For a list of links to projects we know about, see the third-party add-ons wiki page.
Proto3 supports a canonical encoding in JSON, making it easier to share data between systems. The encoding is described on a type-by-type basis in the table below.
If a value is missing in the JSON-encoded data or if its value is null
, it will be interpreted as the appropriate default value when parsed into a protocol buffer. If a field has the default value in the protocol buffer, it will be omitted in the JSON-encoded data by default to save space. An implementation may provide options to emit fields with default values in the JSON-encoded output.
proto3 |
JSON |
JSON example |
Notes |
message |
object |
`{"fooBar": v, "g": null, …}` |
Generates JSON objects. Message field names are mapped to lowerCamelCase and become JSON object keys. If the `json_name` field option is specified, the specified value will be used as the key instead. Parsers accept both the lowerCamelCase name (or the one specified by the `json_name` option) and the original proto field name. `null` is an accepted value for all field types and treated as the default value of the corresponding field type. |
enum |
string |
`"FOO_BAR"` |
The name of the enum value as specified in proto is used. Parsers accept both enum names and integer values. |
map |
object |
`{"k": v, …}` |
All keys are converted to strings. |
repeated V |
array |
`[v, …]` |
`null` is accepted as the empty list []. |
bool |
true, false |
`true, false` |
|
string |
string |
`"Hello World!"` |
|
bytes |
base64 string |
`"YWJjMTIzIT8kKiYoKSctPUB+"` |
JSON value will be the data encoded as a string using standard base64 encoding with paddings. Either standard or URL-safe base64 encoding with/without paddings are accepted. |
int32, fixed32, uint32 |
number |
`1, -10, 0` |
JSON value will be a decimal number. Either numbers or strings are accepted. |
int64, fixed64, uint64 |
string |
`"1", "-10"` |
JSON value will be a decimal string. Either numbers or strings are accepted. |
float, double |
number |
`1.1, -10.0, 0, "NaN", "Infinity"` |
JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted. |
Any |
`object` |
`{"@type": "url", "f": v, … }` |
If the Any contains a value that has a special JSON mapping, it will be converted as follows: `{"@type": xxx, "value": yyy}`. Otherwise, the value will be converted into a JSON object, and the `"@type"` field will be inserted to indicate the actual data type. |
Timestamp |
string |
`"1972-01-01T10:00:20.021Z"` |
Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. |
Duration |
string |
`"1.000340012s", "1s"` |
Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required. |
Struct |
`object` |
`{ … }` |
Any JSON object. See `struct.proto`. |
Wrapper types |
various types |
`2, "2", "foo", true, "true", null, 0, …` |
Wrappers use the same representation in JSON as the wrapped primitive type, except that `null` is allowed and preserved during data conversion and transfer. |
FieldMask |
string |
`"f.fooBar,h"` |
See `field_mask.proto`. |
ListValue |
array |
`[foo, bar, …]` |
|
Value |
value |
Any JSON value |
|
NullValue |
null |
JSON null |
|
Empty |
object |
{} |
An empty JSON object |
A proto3 JSON implementation may provide the following options:
Individual declarations in a .proto
file can be annotated with a number of options. Options do not change the overall meaning of a declaration, but may affect the way it is handled in a particular context. The complete list of available options is defined in google/protobuf/descriptor.proto
.
Some options are file-level options, meaning they should be written at the top-level scope, not inside any message, enum, or service definition. Some options are message-level options, meaning they should be written inside message definitions. Some options are field-level options, meaning they should be written inside field definitions. Options can also be written on enum types, enum values, oneof fields, service types, and service methods; however, no useful options currently exist for any of these.
Here are a few of the most commonly used options:
java_package
(file option): The package you want to use for your generated Java classes. If no explicit java_package
option is given in the .proto
file, then by default the proto package (specified using the "package" keyword in the .proto
file) will be used. However, proto packages generally do not make good Java packages since proto packages are not expected to start with reverse domain names. If not generating Java code, this option has no effect.option java_package = "com.example.foo";
java_multiple_files
(file option): Causes top-level messages, enums, and services to be defined at the package level, rather than inside an outer class named after the .proto file.option java_multiple_files = true;
java_outer_classname
(file option): The class name for the outermost Java class (and hence the file name) you want to generate. If no explicit java_outer_classname
is specified in the .proto
file, the class name will be constructed by converting the .proto
file name to camel-case (so foo_bar.proto
becomes FooBar.java
). If not generating Java code, this option has no effect.option java_outer_classname = "Ponycopter";
optimize_for
(file option): Can be set to SPEED
, CODE_SIZE
, or LITE_RUNTIME
. This affects the C++ and Java code generators (and possibly third-party generators) in the following ways:
SPEED
(default): The protocol buffer compiler will generate code for serializing, parsing, and performing other common operations on your message types. This code is highly optimized.CODE_SIZE
: The protocol buffer compiler will generate minimal classes and will rely on shared, reflection-based code to implement serialialization, parsing, and various other operations. The generated code will thus be much smaller than with SPEED
, but operations will be slower. Classes will still implement exactly the same public API as they do in SPEED
mode. This mode is most useful in apps that contain a very large number .proto
files and do not need all of them to be blindingly fast.LITE_RUNTIME
: The protocol buffer compiler will generate classes that depend only on the "lite" runtime library (libprotobuf-lite
instead of libprotobuf
). The lite runtime is much smaller than the full library (around an order of magnitude smaller) but omits certain features like descriptors and reflection. This is particularly useful for apps running on constrained platforms like mobile phones. The compiler will still generate fast implementations of all methods as it does in SPEED
mode. Generated classes will only implement the MessageLite
interface in each language, which provides only a subset of the methods of the full Message
interface.option optimize_for = CODE_SIZE;
cc_enable_arenas
(file option): Enables arena allocation for C++ generated code.objc_class_prefix
(file option): Sets the Objective-C class prefix which is prepended to all Objective-C generated classes and enums from this .proto. There is no default. You should use prefixes that are between 3-5 uppercase characters as recommended by Apple. Note that all 2 letter prefixes are reserved by Apple.deprecated
(field option): If set to true
, indicates that the field is deprecated and should not be used by new code. In most languages this has no actual effect. In Java, this becomes a @Deprecated
annotation. In the future, other language-specific code generators may generate deprecation annotations on the field's accessors, which will in turn cause a warning to be emitted when compiling code which attempts to use the field. If the field is not used by anyone and you want to prevent new users from using it, consider replacing the field declaration with a reservedstatement.int32 old_field = 6 [deprecated = true];
Protocol Buffers also allows you to define and use your own options. This is an advanced feature which most people don't need. If you do think you need to create your own options, see the Proto2 Language Guide for details. Note that creating custom options uses extensions, which are permitted only for custom options in proto3.
To generate the Java, Python, C++, Go, Ruby, Objective-C, or C# code you need to work with the message types defined in a .proto
file, you need to run the protocol buffer compiler protoc
on the .proto
. If you haven't installed the compiler, download the package and follow the instructions in the README. For Go, you also need to install a special code generator plugin for the compiler: you can find this and installation instructions in the golang/protobuf repository on GitHub.
The Protocol Compiler is invoked as follows:
protoc --proto_path=_IMPORT_PATH_ --cpp_out=_DST_DIR_ --java_out=_DST_DIR_ --python_out=_DST_DIR_ --go_out=_DST_DIR_ --ruby_out=_DST_DIR_ --objc_out=_DST_DIR_ --csharp_out=_DST_DIR_ _path/to/file_.proto
IMPORT_PATH
specifies a directory in which to look for .proto
files when resolving import
directives. If omitted, the current directory is used. Multiple import directories can be specified by passing the --proto_path
option multiple times; they will be searched in order. -I=_IMPORT_PATH_
can be used as a short form of --proto_path
.--cpp_out
generates C++ code in DST_DIR
. See the C++ generated code reference for more.--java_out
generates Java code in DST_DIR
. See the Java generated code reference for more.--python_out
generates Python code in DST_DIR
. See the Python generated code reference for more.--go_out
generates Go code in DST_DIR
. See the Go generated code reference for more.--ruby_out
generates Ruby code in DST_DIR
. Ruby generated code reference is coming soon!--objc_out
generates Objective-C code in DST_DIR
. See the Objective-C generated code reference for more.--csharp_out
generates C# code in DST_DIR
. See the C# generated code reference for more.--php_out
generates PHP code in DST_DIR
. See the PHP generated code reference for more.As an extra convenience, if the DST_DIR
ends in .zip
or .jar
, the compiler will write the output to a single ZIP-format archive file with the given name. .jar
outputs will also be given a manifest file as required by the Java JAR specification. Note that if the output archive already exists, it will be overwritten; the compiler is not smart enough to add files to an existing archive..proto
files as input. Multiple .proto
files can be specified at once. Although the files are named relative to the current directory, each file must reside in one of the IMPORT_PATH
s so that the compiler can determine its canonical name.Style Guide
This document provides a style guide for .proto files. By following these conventions, you'll make your protocol buffer message definitions and their corresponding classes consistent and easy to read.
Note that protocol buffer style has evolved over time, so it is likely that you will see .proto files written in different conventions or styles. Please respect the existing style when you modify these files. Consistency is key. However, it is best to adopt the current best style when you are creating a new .proto file.
Standard file formatting
File structure
Files should be named lower_snake_case.proto
All files should be ordered in the following manner:
Packages
Package name should be in lowercase, and should correspond to the directory hierarchy. e.g., if a file is in my/package/, then the package name should be my.package.
Message and field names
Use CamelCase (with an initial capital) for message names – for example, SongServerRequest. Use underscore_separated_names for field names (including oneof field and extension names) – for example, song_name.
message SongServerRequest {
required string song_name = 1;
}
Using this naming convention for field names gives you accessors like the following:
C++:
const string& song_name() { ... }
void set_song_name(const string& x) { ... }
Java:
public String getSongName() { ... }
public Builder setSongName(String v) { ... }
If your field name contains a number, the number should appear after the letter instead of after the underscore. e.g., use song_name1 instead of song_name_1
Repeated fields
Use pluralized names for repeated fields.
repeated string keys = 1;
...
repeated MyMessage accounts = 17;
Enums
Use CamelCase (with an initial capital) for enum type names and CAPITALS_WITH_UNDERSCORES for value names:
enum Foo {
FOO_UNSPECIFIED = 0;
FOO_FIRST_VALUE = 1;
FOO_SECOND_VALUE = 2;
}
Each enum value should end with a semicolon, not a comma. Prefer prefixing enum values instead of surrounding them in an enclosing message. The zero value enum should have the suffix UNSPECIFIED.
Services
If your .proto defines an RPC service, you should use CamelCase (with an initial capital) for both the service name and any RPC method names:
service FooService {
rpc GetSomething(FooRequest) returns (FooResponse);
}
Things to avoid
This document describes the binary wire format for protocol buffer messages. You don't need to understand this to use protocol buffers in your applications, but it can be very useful to know how different protocol buffer formats affect the size of your encoded messages.
Let's say you have the following very simple message definition:
message Test1 { optional int32 a = 1; }
In an application, you create a Test1
message and set a
to 150. You then serialize the message to an output stream. If you were able to examine the encoded message, you'd see three bytes:
08 96 01
So far, so small and numeric – but what does it mean? Read on...
To understand your simple protocol buffer encoding, you first need to understand varints. Varints are a method of serializing integers using one or more bytes. Smaller numbers take a smaller number of bytes.
Each byte in a varint, except the last byte, has the most significant bit (msb) set – this indicates that there are further bytes to come. The lower 7 bits of each byte are used to store the two's complement representation of the number in groups of 7 bits, least significant group first.
So, for example, here is the number 1 – it's a single byte, so the msb is not set:
0000 0001
And here is 300 – this is a bit more complicated:
1010 1100 0000 0010
How do you figure out that this is 300? First you drop the msb from each byte, as this is just there to tell us whether we've reached the end of the number (as you can see, it's set in the first byte as there is more than one byte in the varint):
1010 1100 0000 0010 → 010 1100 000 0010
You reverse the two groups of 7 bits because, as you remember, varints store numbers with the least significant group first. Then you concatenate them to get your final value:
000 0010 010 1100 → 000 0010 ++ 010 1100 → 100101100 → 256 + 32 + 8 + 4 = 300
As you know, a protocol buffer message is a series of key-value pairs. The binary version of a message just uses the field's number as the key – the name and declared type for each field can only be determined on the decoding end by referencing the message type's definition (i.e. the .proto
file).
When a message is encoded, the keys and values are concatenated into a byte stream. When the message is being decoded, the parser needs to be able to skip fields that it doesn't recognize. This way, new fields can be added to a message without breaking old programs that do not know about them. To this end, the "key" for each pair in a wire-format message is actually two values – the field number from your .proto
file, plus a wire type that provides just enough information to find the length of the following value. In most language implementations this key is referred to as a tag.
The available wire types are as follows:
Type |
Meaning |
Used For |
0 |
Varint |
int32, int64, uint32, uint64, sint32, sint64, bool, enum |
1 |
64-bit |
fixed64, sfixed64, double |
2 |
Length-delimited |
string, bytes, embedded messages, packed repeated fields |
3 |
Start group |
groups (deprecated) |
4 |
End group |
groups (deprecated) |
5 |
32-bit |
fixed32, sfixed32, float |
Each key in the streamed message is a varint with the value (field_number << 3) | wire_type
– in other words, the last three bits of the number store the wire type.
Now let's look at our simple example again. You now know that the first number in the stream is always a varint key, and here it's 08, or (dropping the msb):
000 1000
You take the last three bits to get the wire type (0) and then right-shift by three to get the field number (1). So you now know that the field number is 1 and the following value is a varint. Using your varint-decoding knowledge from the previous section, you can see that the next two bytes store the value 150.
96 01 = 1001 0110 0000 0001 → 000 0001 ++ 001 0110 (drop the msb and reverse the groups of 7 bits) → 10010110 → 128 + 16 + 4 + 2 = 150
As you saw in the previous section, all the protocol buffer types associated with wire type 0 are encoded as varints. However, there is an important difference between the signed int types (sint32
and sint64
) and the "standard" int types (int32
and int64
) when it comes to encoding negative numbers. If you use int32
or int64
as the type for a negative number, the resulting varint is always ten bytes long – it is, effectively, treated like a very large unsigned integer. If you use one of the signed types, the resulting varint uses ZigZag encoding, which is much more efficient.
ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on, as you can see in the following table:
Signed Original |
Encoded As |
0 |
0 |
-1 |
1 |
1 |
2 |
-2 |
3 |
2147483647 |
4294967294 |
-2147483648 |
4294967295 |
In other words, each value n
is encoded using
(n << 1) ^ (n >> 31)
for sint32
s, or
(n << 1) ^ (n >> 63)
for the 64-bit version.
Note that the second shift – the (n >> 31)
part – is an arithmetic shift. So, in other words, the result of the shift is either a number that is all zero bits (if n
is positive) or all one bits (if n
is negative).
When the sint32
or sint64
is parsed, its value is decoded back to the original, signed version.
Non-varint numeric types are simple – double
and fixed64
have wire type 1, which tells the parser to expect a fixed 64-bit lump of data; similarly float
and fixed32
have wire type 5, which tells it to expect 32 bits. In both cases the values are stored in little-endian byte order.
A wire type of 2 (length-delimited) means that the value is a varint encoded length followed by the specified number of bytes of data.
message Test2 { optional string b = 2; }
Setting the value of b to "testing" gives you:
12 07 74 65 73 74 69 6e 67
The red bytes are the UTF8 of "testing". The key here is 0x12 →
0001 0010
→
00010 010
→ field_number = 2, wire_type = 2. The length varint in the value is 7 and lo and behold, we find seven bytes following it – our string.
Here's a message definition with an embedded message of our example type, Test1:
message Test3 { optional Test1 c = 3; }
And here's the encoded version, again with the Test1's a
field set to 150:
1a 03 08 96 01
As you can see, the last three bytes are exactly the same as our first example (08 96 01
), and they're preceded by the number 3 – embedded messages are treated in exactly the same way as strings (wire type = 2).
If a proto2 message definition has repeated
elements (without the [packed=true]
option), the encoded message has zero or more key-value pairs with the same field number. These repeated values do not have to appear consecutively; they may be interleaved with other fields. The order of the elements with respect to each other is preserved when parsing, though the ordering with respect to other fields is lost. In proto3, repeated fields use packed encoding, which you can read about below.
For any non-repeated fields in proto3, or optional
fields in proto2, the encoded message may or may not have a key-value pair with that field number.
Normally, an encoded message would never have more than one instance of a non-repeated field. However, parsers are expected to handle the case in which they do. For numeric types and strings, if the same field appears multiple times, the parser accepts the last value it sees. For embedded message fields, the parser merges multiple instances of the same field, as if with the Message::MergeFrom
method – that is, all singular scalar fields in the latter instance replace those in the former, singular embedded messages are merged, and repeated fields are concatenated. The effect of these rules is that parsing the concatenation of two encoded messages produces exactly the same result as if you had parsed the two messages separately and merged the resulting objects. That is, this:
MyMessage message; message.ParseFromString(str1 + str2);
is equivalent to this:
MyMessage message, message2; message.ParseFromString(str1); message2.ParseFromString(str2); message.MergeFrom(message2);
This property is occasionally useful, as it allows you to merge two messages even if you do not know their types.
Version 2.1.0 introduced packed repeated fields, which in proto2 are declared like repeated fields but with the special [packed=true]
option. In proto3, repeated fields of scalar numeric types are packed by default. These function like repeated fields, but are encoded differently. A packed repeated field containing zero elements does not appear in the encoded message. Otherwise, all of the elements of the field are packed into a single key-value pair with wire type 2 (length-delimited). Each element is encoded the same way it would be normally, except without a key preceding it.
For example, imagine you have the message type:
message Test4 { repeated int32 d = 4 [packed=true]; }
Now let's say you construct a Test4
, providing the values 3, 270, and 86942 for the repeated field d
. Then, the encoded form would be:
22 // key (field number 4, wire type 2) 06 // payload size (6 bytes) 03 // first element (varint 3) 8E 02 // second element (varint 270) 9E A7 05 // third element (varint 86942)
Only repeated fields of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) can be declared "packed".
Note that although there's usually no reason to encode more than one key-value pair for a packed repeated field, encoders must be prepared to accept multiple key-value pairs. In this case, the payloads should be concatenated. Each pair must contain a whole number of elements.
Protocol buffer parsers must be able to parse repeated fields that were compiled as packed
as if they were not packed, and vice versa. This permits adding [packed=true]
to existing fields in a forward- and backward-compatible way.
Field numbers may be used in any order in a .proto
file. The order chosen has no effect on how the messages are serialized.
When a message is serialized, there is no guaranteed order for how its known or unknown fields should be written. Serialization order is an implementation detail and the details of any particular implementation may change in the future. Therefore, protocol buffer parsers must be able to parse fields in any order.
foo
.
foo.SerializeAsString() == foo.SerializeAsString()
Hash(foo.SerializeAsString()) == Hash(foo.SerializeAsString())
CRC(foo.SerializeAsString()) == CRC(foo.SerializeAsString())
FingerPrint(foo.SerializeAsString()) == FingerPrint(foo.SerializeAsString())
foo
and bar
may serialize to different byte outputs.
bar
is serialized by an old server that treats some fields as unknown.bar
is serialized by a server that is implemented in a different programming language and serializes fields in different order.bar
has a field that serializes in non-deterministic manner.bar
has a field that stores a serialized byte output of a protocol buffer message which is serialized differently.bar
is serialized by a new server that serializes fields in different order due to an implementation change.foo
and bar
are concatenation of individual messages but with different order.This page describes some commonly-used design patterns for dealing with Protocol Buffers. You can also send design and usage questions to the Protocol Buffers discussion group.
If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream
class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.
Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.
Protocol Buffers do not contain descriptions of their own types. Thus, given only a raw message without the corresponding .proto
file defining its type, it is difficult to extract any useful data.
However, note that the contents of a .proto file can itself be represented using protocol buffers. The file src/google/protobuf/descriptor.proto
in the source code package defines the message types involved. protoc
can output a FileDescriptorSet
– which represents a set of .proto files – using the --descriptor_set_out
option. With this, you could define a self-describing protocol message like so:
syntax = "proto3"; import "google/protobuf/any.proto"; import "google/protobuf/descriptor.proto"; message SelfDescribingMessage { // Set of FileDescriptorProtos which describe the type and its dependencies. google.protobuf.FileDescriptorSet descriptor_set = 1; // The message and its type, encoded as an Any message. google.protobuf.Any message = 2; }
By using classes like DynamicMessage
(available in C++ and Java), you can then write tools which can manipulate SelfDescribingMessage
s.
All that said, the reason that this functionality is not included in the Protocol Buffer library is because we have never had a use for it inside Google.
This technique requires support for dynamic messages using descriptors. Please check that your platforms support this feature before using self-describing messages.
Many open source projects seek to add useful functionality on top of Protocol Buffers. For a list of links to projects we know about, see the third-party add-ons wiki page
协议缓冲区的第三方加载项
此页面列出了与协议缓冲区相关的代码,这些代码是由第三方开发和维护的。您可能会发现此代码很有用,但是请注意,这些项目不隶属于Google或由Google认可(除非明确标记);试试看,后果自负。还要注意,这里的许多项目都处于开发的早期阶段,还没有投入生产。
如果您有应在此处列出的项目,请向我们发送请求请求以更新此页面。
编程语言
我们知道这些项目是关于为其他编程语言实现协议缓冲区的:
RPC实施
GRPC(http://www.grpc.io/)是Google针对协议缓冲区的RPC实现。也有其他第三方RPC实现。其中一些实际上使用协议缓冲区服务定义(使用文件中的service关键字定义.proto),而其他一些仅使用协议缓冲区消息对象。
无效:
其他实用程序
作为协议缓冲区开发人员,您可能还会发现其他有用的东西。
This tutorial provides a basic C++ programmer's introduction to working with protocol buffers. By walking through creating a simple example application, it shows you how to
.proto
file.This isn't a comprehensive guide to using protocol buffers in C++. For more detailed reference information, see the Protocol Buffer Language Guide, the C++ API Reference, the C++ Generated Code Guide, and the Encoding Reference.
The example we're going to use is a very simple "address book" application that can read and write people's contact details to and from a file. Each person in the address book has a name, an ID, an email address, and a contact phone number.
How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:
Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto
description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.
The example code is included in the source code package, under the "examples" directory. Download it here.
To create your address book application, you'll need to start with a .proto
file. The definitions in a .proto
file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. Here is the .proto
file that defines your messages, addressbook.proto
.
syntax = "proto2";
package tutorial;
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phones = 4;
}
message AddressBook {
repeated Person people = 1;
}
As you can see, the syntax is similar to C++ or Java. Let's go through each part of the file and see what it does.
The .proto
file starts with a package declaration, which helps to prevent naming conflicts between different projects. In C++, your generated classes will be placed in a namespace matching the package name.
Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool
, int32
, float
, double
, and string
. You can also add further structure to your messages by using other message types as field types – in the above example the Person
message contains PhoneNumber
messages, while the AddressBook
message contains Person
messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber
type is defined inside Person
. You can also define enum
types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of MOBILE
, HOME
, or WORK
.
The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.
Each field must be annotated with one of the following modifiers:
required
: a value for the field must be provided, otherwise the message will be considered "uninitialized". If libprotobuf
is compiled in debug mode, serializing an uninitialized message will cause an assertion failure. In optimized builds, the check is skipped and the message will be written anyway. However, parsing an uninitialized message will always fail (by returning false
from the parse method). Other than this, a required field behaves exactly like an optional field.optional
: the field may or may not be set. If an optional field value isn't set, a default value is used. For simple types, you can specify your own default value, as we've done for the phone number type
in the example. Otherwise, a system default is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of an optional (or required) field which has not been explicitly set always returns that field's default value.repeated
: the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.Required Is Forever You should be very careful about marking fields as required
. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required
does more harm than good; they prefer to use only optional
and repeated
. However, this view is not universal.
You'll find a complete guide to writing .proto
files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.
Now that you have a .proto
, the next thing you need to do is generate the classes you'll need to read and write AddressBook
(and hence Person
and PhoneNumber
) messages. To do this, you need to run the protocol buffer compiler protoc
on your .proto
:
$SRC_DIR
), and the path to your .proto
. In this case, you...:protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto
Because you want C++ classes, you use the --cpp_out
option – similar options are provided for other supported languages.
This generates the following files in your specified destination directory:
addressbook.pb.h
, the header which declares your generated classes.addressbook.pb.cc
, which contains the implementation of your classes.Let's look at some of the generated code and see what classes and functions the compiler has created for you. If you look in addressbook.pb.h
, you can see that you have a class for each message you specified in addressbook.proto
. Looking closer at the Person
class, you can see that the compiler has generated accessors for each field. For example, for the name
, id
, email
, and phones
fields, you have these methods:
// name inline bool has_name() const; inline void clear_name(); inline const ::std::string& name() const; inline void set_name(const ::std::string& value); inline void set_name(const char* value); inline ::std::string* mutable_name(); // id inline bool has_id() const; inline void clear_id(); inline int32_t id() const; inline void set_id(int32_t value); // email inline bool has_email() const; inline void clear_email(); inline const ::std::string& email() const; inline void set_email(const ::std::string& value); inline void set_email(const char* value); inline ::std::string* mutable_email(); // phones inline int phones_size() const; inline void clear_phones(); inline const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phones() const; inline ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phones(); inline const ::tutorial::Person_PhoneNumber& phones(int index) const; inline ::tutorial::Person_PhoneNumber* mutable_phones(int index); inline ::tutorial::Person_PhoneNumber* add_phones();
As you can see, the getters have exactly the name as the field in lowercase, and the setter methods begin with set_
. There are also has_
methods for each singular (required or optional) field which return true if that field has been set. Finally, each field has a clear_
method that un-sets the field back to its empty state.
While the numeric id
field just has the basic accessor set described above, the name
and email
fields have a couple of extra methods because they're strings – a mutable_
getter that lets you get a direct pointer to the string, and an extra setter. Note that you can call mutable_email()
even if email
is not already set; it will be initialized to an empty string automatically. If you had a singular message field in this example, it would also have a mutable_
method but not a set_
method.
Repeated fields also have some special methods – if you look at the methods for the repeated phones
field, you'll see that you can
_size
(in other words, how many phone numbers are associated with this Person
).add_
that just lets you pass in the new value).For more information on exactly what members the protocol compiler generates for any particular field definition, see the C++ generated code reference.
The generated code includes a PhoneType
enum that corresponds to your .proto
enum. You can refer to this type as Person::PhoneType
and its values as Person::MOBILE
, Person::HOME
, and Person::WORK
(the implementation details are a little more complicated, but you don't need to understand them to use the enum).
The compiler has also generated a nested class for you called Person::PhoneNumber
. If you look at the code, you can see that the "real" class is actually called Person_PhoneNumber
, but a typedef defined inside Person
allows you to treat it as if it were a nested class. The only case where this makes a difference is if you want to forward-declare the class in another file – you cannot forward-declare nested types in C++, but you can forward-declare Person_PhoneNumber
.
Each message class also contains a number of other methods that let you check or manipulate the entire message, including:
bool IsInitialized() const;
: checks if all the required fields have been set.string DebugString() const;
: returns a human-readable representation of the message, particularly useful for debugging.void CopyFrom(const Person& from);
: overwrites the message with the given message's values.void Clear();
: clears all the elements back to the empty state.These and the I/O methods described in the following section implement the Message
interface shared by all C++ protocol buffer classes. For more info, see the complete API documentation for Message
.
Finally, each protocol buffer class has methods for writing and reading messages of your chosen type using the protocol buffer binary format. These include:
bool SerializeToString(string* output) const;
: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string
class as a convenient container.bool ParseFromString(const string& data);
: parses a message from the given string.bool SerializeToOstream(ostream* output) const;
: writes the message to the given C++ ostream
.bool ParseFromIstream(istream* input);
: parses a message from the given C++ istream
.These are just a couple of the options provided for parsing and serialization. Again, see the Message
API reference for a complete list.
Protocol Buffers and O-O Design Protocol buffer classes are basically dumb data holders (like structs in C); they don't make good first class citizens in an object model. If you want to add richer behaviour to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. Wrapping protocol buffers is also a good idea if you don't have control over the design of the .proto
file (if, say, you're reusing one from another project). In that case, you can use the wrapper class to craft an interface better suited to the unique environment of your application: hiding some data and methods, exposing convenience functions, etc. You should never add behaviour to the generated classes by inheriting from them. This will break internal mechanisms and is not good object-oriented practice anyway.
Now let's try using your protocol buffer classes. The first thing you want your address book application to be able to do is write personal details to your address book file. To do this, you need to create and populate instances of your protocol buffer classes and then write them to an output stream.
Here is a program which reads an AddressBook
from a file, adds one new Person
to it based on user input, and writes the new AddressBook
back out to the file again. The parts which directly call or reference code generated by the protocol compiler are highlighted.
#include#include #include #include "addressbook.pb.h" using namespace std; // This function fills in a Person message based on user input. void PromptForAddress(tutorial::Person* person) { cout << "Enter person ID number: "; int id; cin >> id; person->set_id(id); cin.ignore(256, '\n'); cout << "Enter name: "; getline(cin, *person->mutable_name()); cout << "Enter email address (blank for none): "; string email; getline(cin, email); if (!email.empty()) { person->set_email(email); } while (true) { cout << "Enter a phone number (or leave blank to finish): "; string number; getline(cin, number); if (number.empty()) { break; } tutorial::Person::PhoneNumber* phone_number = person->add_phones(); phone_number->set_number(number); cout << "Is this a mobile, home, or work phone? "; string type; getline(cin, type); if (type == "mobile") { phone_number->set_type(tutorial::Person::MOBILE); } else if (type == "home") { phone_number->set_type(tutorial::Person::HOME); } else if (type == "work") { phone_number->set_type(tutorial::Person::WORK); } else { cout << "Unknown phone type. Using default." << endl; } } } // Main function: Reads the entire address book from a file, // adds one person based on user input, then writes it back out to the same // file. int main(int argc, char* argv[]) { // Verify that the version of the library that we linked against is // compatible with the version of the headers we compiled against. GOOGLE_PROTOBUF_VERIFY_VERSION; if (argc != 2) { cerr << "Usage: " << argv[0] << " ADDRESS_BOOK_FILE" << endl; return -1; } tutorial::AddressBook address_book; { // Read the existing address book. fstream input(argv[1], ios::in | ios::binary); if (!input) { cout << argv[1] << ": File not found. Creating a new file." << endl; } else if (!address_book.ParseFromIstream(&input)) { cerr << "Failed to parse address book." << endl; return -1; } } // Add an address. PromptForAddress(address_book.add_people()); { // Write the new address book back to disk. fstream output(argv[1], ios::out | ios::trunc | ios::binary); if (!address_book.SerializeToOstream(&output)) { cerr << "Failed to write address book." << endl; return -1; } } // Optional: Delete all global objects allocated by libprotobuf. google::protobuf::ShutdownProtobufLibrary(); return 0; }
Notice the GOOGLE_PROTOBUF_VERIFY_VERSION
macro. It is good practice – though not strictly necessary – to execute this macro before using the C++ Protocol Buffer library. It verifies that you have not accidentally linked against a version of the library which is incompatible with the version of the headers you compiled with. If a version mismatch is detected, the program will abort. Note that every .pb.cc
file automatically invokes this macro on startup.
Also notice the call to ShutdownProtobufLibrary()
at the end of the program. All this does is delete any global objects that were allocated by the Protocol Buffer library. This is unnecessary for most programs, since the process is just going to exit anyway and the OS will take care of reclaiming all of its memory. However, if you use a memory leak checker that requires that every last object be freed, or if you are writing a library which may be loaded and unloaded multiple times by a single process, then you may want to force Protocol Buffers to clean up everything.
Of course, an address book wouldn't be much use if you couldn't get any information out of it! This example reads the file created by the above example and prints all the information in it.
#include#include #include #include "addressbook.pb.h" using namespace std; // Iterates though all people in the AddressBook and prints info about them. void ListPeople(const tutorial::AddressBook& address_book) { for (int i = 0; i < address_book.people_size(); i++) { const tutorial::Person& person = address_book.people(i); cout << "Person ID: " << person.id() << endl; cout << " Name: " << person.name() << endl; if (person.has_email()) { cout << " E-mail address: " << person.email() << endl; } for (int j = 0; j < person.phones_size(); j++) { const tutorial::Person::PhoneNumber& phone_number = person.phones(j); switch (phone_number.type()) { case tutorial::Person::MOBILE: cout << " Mobile phone #: "; break; case tutorial::Person::HOME: cout << " Home phone #: "; break; case tutorial::Person::WORK: cout << " Work phone #: "; break; } cout << phone_number.number() << endl; } } } // Main function: Reads the entire address book from a file and prints all // the information inside. int main(int argc, char* argv[]) { // Verify that the version of the library that we linked against is // compatible with the version of the headers we compiled against. GOOGLE_PROTOBUF_VERIFY_VERSION; if (argc != 2) { cerr << "Usage: " << argv[0] << " ADDRESS_BOOK_FILE" << endl; return -1; } tutorial::AddressBook address_book; { // Read the existing address book. fstream input(argv[1], ios::in | ios::binary); if (!address_book.ParseFromIstream(&input)) { cerr << "Failed to parse address book." << endl; return -1; } } ListPeople(address_book); // Optional: Delete all global objects allocated by libprotobuf. google::protobuf::ShutdownProtobufLibrary(); return 0; }
Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to "improve" the protocol buffer's definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:
(There are some exceptions to these rules, but they are rarely used.)
If you follow these rules, old code will happily read new messages and simply ignore any new fields. To the old code, optional fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages. However, keep in mind that new optional fields will not be present in old messages, so you will need to either check explicitly whether they're set with has_
, or provide a reasonable default value in your .proto
file with [default = value]
after the tag number. If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For booleans, the default value is false. For numeric types, the default value is zero. Note also that if you added a new repeated field, your new code will not be able to tell whether it was left empty (by new code) or never set at all (by old code) since there is no has_
flag for it.
The C++ Protocol Buffers library is extremely heavily optimized. However, proper usage can improve performance even more. Here are some tips for squeezing every last drop of speed out of the library:
Protocol buffers have uses that go beyond simple accessors and serialization. Be sure to explore the C++ API referenceto see what else you can do with them.
One key feature provided by protocol message classes is reflection. You can iterate over the fields of a message and manipulate their values without writing your code against any specific message type. One very useful way to use reflection is for converting protocol messages to and from other encodings, such as XML or JSON. A more advanced use of reflection might be to find differences between two messages of the same type, or to develop a sort of "regular expressions for protocol messages" in which you can write expressions that match certain message contents. If you use your imagination, it's possible to apply Protocol Buffers to a much wider range of problems than you might initially expect!
Reflection is provided by the Message::Reflection
interface.
Protocol Buffer Basics: C#
This tutorial provides a basic C# programmer's introduction to working with protocol buffers, using the proto3 version of the protocol buffers language. By walking through creating a simple example application, it shows you how to
This isn't a comprehensive guide to using protocol buffers in C#. For more detailed reference information, see the Protocol Buffer Language Guide, the C# API Reference, the C# Generated Code Guide, and the Encoding Reference.
Why use protocol buffers?
The example we're going to use is a very simple "address book" application that can read and write people's contact details to and from a file. Each person in the address book has a name, an ID, an email address, and a contact phone number.
How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:
Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.
Where to find the example code
Our example is a command-line application for managing an address book data file, encoded using protocol buffers. The command AddressBook (see: Program.cs) can add a new entry to the data file or parse the data file and print the data to the console.
You can find the complete example in the examples directory and csharp/src/AddressBook directory of the GitHub repository.
Defining your protocol format
To create your address book application, you'll need to start with a .proto file. The definitions in a .proto file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. In our example, the .proto file that defines the messages is addressbook.proto.
The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects.
syntax = "proto3";
package tutorial;
import "google/protobuf/timestamp.proto";
In C#, your generated classes will be placed in a namespace matching the package name if csharp_namespace is not specified. In our example, the csharp_namespace option has been specified to override the default, so the generated code uses a namespace of Google.Protobuf.Examples.AddressBook instead of Tutorial.
option csharp_namespace = "Google.Protobuf.Examples.AddressBook";
Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types.
message Person {
string name = 1;
int32 id = 2; // Unique ID number for this person.
string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
google.protobuf.Timestamp last_updated = 5;
}
// Our address book file is just one of these.
message AddressBook {
repeated Person people = 1;
}
In the above example, the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages – as you can see, thePhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of MOBILE, HOME, or WORK.
The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.
If a field value isn't set, a default value is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of a field which has not been explicitly set always returns that field's default value.
If a field is repeated, the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.
You'll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.
Compiling your protocol buffers
Now that you have a .proto, the next thing you need to do is generate the classes you'll need to read and write AddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:
protoc -I=$SRC_DIR --csharp_out=$DST_DIR $SRC_DIR/addressbook.proto
Because you want C# code, you use the --csharp_out option – similar options are provided for other supported languages.
This generates Addressbook.cs in your specified destination directory. To compile this code, you'll need a project with a reference to the Google.Protobuf assembly.
The addressbook classes
Generating Addressbook.cs gives you five useful types:
You can read more about the details of exactly what's generated in the C# Generated Code guide, but for the most part you can treat these as perfectly ordinary C# types. One point to highlight is that any properties corresponding to repeated fields are read-only. You can add items to the collection or remove items from it, but you can't replace it with an entirely separate collection. The collection type for repeated fields is always RepeatedField
Here's an example of how you might create an instance of Person:
Person john = new Person
{
Id = 1234,
Name = "John Doe",
Email = "[email protected]",
Phones = { new Person.Types.PhoneNumber { Number = "555-4321", Type = Person.Types.PhoneType.Home } }
};
Note that with C# 6, you can use using static to remove the Person.Types ugliness:
// Add this to the other using directives
using static Google.Protobuf.Examples.AddressBook.Person.Types;
...
// The earlier Phones assignment can now be simplified to:
Phones = { new PhoneNumber { Number = "555-4321", Type = PhoneType.HOME } }
Parsing and serialization
The whole purpose of using protocol buffers is to serialize your data so that it can be parsed elsewhere. Every generated class has a WriteTo(CodedOutputStream) method, where CodedOutputStream is a class in the protocol buffer runtime library. However, usually you'll use one of the extension methods to write to a regular System.IO.Stream or convert the message to a byte array or ByteString. These extension messages are in the Google.Protobuf.MessageExtensions class, so when you want to serialize you'll usually want a using directive for the Google.Protobuf namespace. For example:
using Google.Protobuf;
...
Person john = ...; // Code as before
using (var output = File.Create("john.dat"))
{
john.WriteTo(output);
}
Parsing is also simple. Each generated class has a static Parser property which returns a MessageParser
Person john;
using (var input = File.OpenRead("john.dat"))
{
john = Person.Parser.ParseFrom(input);
}
A full example program to maintain an addressbook (adding new entries and listing existing ones) using these messages is available in the Github repository.
Extending a Protocol Buffer
Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to "improve" the protocol buffer's definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:
(There are some exceptions to these rules, but they are rarely used.)
If you follow these rules, old code will happily read new messages and simply ignore any new fields. To the old code, singular fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages.
However, keep in mind that new fields will not be present in old messages, so you will need to do something reasonable with the default value. A type-specific default value is used: for strings, the default value is the empty string. For booleans, the default value is false. For numeric types, the default value is zero.
Reflection
Message descriptors (the information in the .proto file) and instances of messages can be examined programmatically using the reflection API. This can be useful when writing generic code such as a different text format or a smart diff tool. Each generated class has a static Descriptor property, and the descriptor for any instance can be retrieved using the IMessage.Descriptor property. As a quick example of how these can be used, here is a short method to print the top-level fields of any message.
public void PrintMessage(IMessage message)
{
var descriptor = message.Descriptor;
foreach (var field in descriptor.Fields.InDeclarationOrder())
{
Console.WriteLine(
"Field {0} ({1}): {2}",
field.FieldNumber,
field.Name,
field.Accessor.GetValue(message);
}
}