Encoding

1.   Varints are a method of serializing integers using one or more bytes. Smaller numbers take a smaller number of bytes. Each byte in a varint, except the last byte, has the most significant bit (msb) set – this indicates that there are further bytes to come. The lower 7 bits of each byte are used to store the two's complement representation of the number in groups of 7 bits, least significant group first. 1 --> 00000001 , 300 --> 10101100 00000010.

 

2.   The binary version of a message just uses the field's number as the key – the name and declared type for each field can only be determined on the decoding end by referencing the message type's definition. When a message is encoded, the keys and values are concatenated into a byte stream.

 

3.   When the message is being decoded, the parser needs to be able to skip fields that it doesn't recognize. The "key" for each pair in a wire-format message is actually two values – the field number from your .proto file, plus a wire type that provides just enough information to find the length of the following value:

Type

Meaning

Used For

0

Varint

int32, int64, uint32, uint64, sint32, sint64, bool, enum

1

64-bit

fixed64, sfixed64, double

2

Length-delimited

string, bytes, embedded messages, packed repeated fields

3

Start group

groups (deprecated)

4

End group

groups (deprecated)

5

32-bit

fixed32, sfixed32, float

Each key in the streamed message is a varint with the value (field_number << 3) | wire_type – in other words, the last three bits of the number store the wire type.

 

4.   There is an important difference between the signed int types (sint32 and sint64 ) and the "standard" int types (int32 and int64 ) when it comes to encoding negative numbers. If you use int32 or int64 as the type for a negative number, the resulting varint is always ten bytes long – it is, effectively, treated like a very large unsigned integer. If you use one of the signed types, the resulting varint uses ZigZag encoding, which is much more efficient.


5.   ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on.

 

6.   Non-varint numeric types are stored in little-endian byte order.

 

7.   A wire type of 2 (length-delimited) means that the value is a varint encoded length.  The tag number and wire type are followed by the specified number of bytes of data.

 

8.   If your message definition has repeated elements (without the [packed=true] option), the encoded message has zero or more key-value pairs with the same tag number. These repeated values do not have to appear consecutively; they may be interleaved with other fields. The order of the elements with respect to each other is preserved when parsing, though the ordering with respect to other fields is lost.

 

9.   Normally, an encoded message would never have more than one instance of an optional or required field. However, parsers are expected to handle the case in which they do. For numeric types and strings, if the same value appears multiple times, the parser accepts the last value it sees. For embedded message fields, the parser merges multiple instances of the same field, as if with the Message.MergeFrom method – that is, all singular scalar fields in the latter instance replace those in the former, singular embedded messages are merged, and repeated fields are concatenated. The effect of these rules is that parsing the concatenation of two encoded messages produces exactly the same result as if you had parsed the two messages separately and merged the resulting objects:

MyMessage message;

message.ParseFromString(str1 + str2);
 

is equivalent to this:

MyMessage message, message2;

message.ParseFromString(str1);

message2.ParseFromString(str2);

message.MergeFrom(message2); 
 

 

 

10.   A packed repeated field containing zero elements does not appear in the encoded message. Otherwise, all of the elements of the field are packed into a single key-value pair with wire type 2 (length-delimited). Each element is encoded the same way it would be normally, except without a tag preceding it. Only repeated fields of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) can be declared "packed".

 

11.  W hile you can use field numbers in any order in a .proto , when a message is serialized its known fields should be written sequentially by field number. This allows parsing code to use optimizations that rely on field numbers being in sequence. However, protocol buffer parsers must be able to parse fields in any order, as not all messages are created by simply serializing an object – for instance, it's sometimes useful to merge two messages by simply concatenating them.

 

12.  If a message has unknown fields , the current Java implementations write them in arbitrary order after the sequentially-ordered known fields.

你可能感兴趣的:(encoding)