Python下使用protobuf

Python下使用Protocol buffers

一、概述

官网链接https://developers.google.com/protocol-buffers/docs/overview

google的东西,请自备梯子。

二、安装

我用的是Python3.6,Windows环境

下载链接https://github.com/google/protobuf/releases/

下载两个包:protobuf-python-3.x.x.zip 以及protoc-3.x.x-win32.zip
protobuf-python-3.x.x为protobuf的安装包
protoc-3.x.x-win32包含protobuf的编译器protoc的win32版本,用以编译*.proto文件。

安装过程可以参考https://www.jianshu.com/p/0c563b2c0fdb

三、Protobuf详述

官方链接https://developers.google.com/protocol-buffers/docs/pythontutorial

it shows you how to
Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the Python protocol buffer API to write and read messages.

教程显示了:

1).proto文件内消息的格式

2)用protocol buffer 编译

3)用Python中protocol buffer的API来读写消息

3.1Proto示例

官网上Tutorials中Basics:Python的应该是还没更新,放的是proto2,GitHub上下载的examples里已经是proto3了(2018.2.8下载),不过有Language Guide(proto3)可以参考。
下面是proto2的example:

syntax = "proto2";

package tutorial;

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

下面是proto3的examples:

// See README.txt for information and build instructions.
//
// Note: START and END tags are used in comments to define sections used in
// tutorials.  They are not part of the syntax for Protocol Buffers.
//
// To get an in-depth walkthrough of this file and the related examples, see:
// https://developers.google.com/protocol-buffers/docs/tutorials

// [START declaration]
syntax = "proto3";
package tutorial;

import "google/protobuf/timestamp.proto";
// [END declaration]

// [START java_declaration]
option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";
// [END java_declaration]

// [START csharp_declaration]
option csharp_namespace = "Google.Protobuf.Examples.AddressBook";
// [END csharp_declaration]

// [START messages]
message Person {
  string name = 1;
  int32 id = 2;  // Unique ID number for this person.
  string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }

  repeated PhoneNumber phones = 4;

  google.protobuf.Timestamp last_updated = 5;
}

// Our address book file is just one of these.
message AddressBook {
  repeated Person people = 1;
}
// [END messages]

3.2 示例解析

翻译自https://developers.google.com/protocol-buffers/docs/pythontutorial

主要是机翻,然后简单校对了一下作为参考,语句不一定通顺。

As you can see, the syntax is similar to C++ or Java. Let's go through each part of the file and see what it does.

你可以看到,语法类似于C++或java。让我们检查一下文件的每一部分,看看它做了什么。

The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects. In Python, packages are normally determined by directory structure, so the package you define in your .proto file will have no effect on the generated code. However, you should still declare one to avoid name collisions in the Protocol Buffers name space as well as in non-Python languages.

 .proto 文件以一个包声明开始,这有助于防止不同项目之间的命名冲突。在Python中,包通常由目录结构决定,因此在您的.proto文件中定义的包对生成的代码没有影响。但是,您仍然应该声明一个包声明,从而避免在Protocol Buffers以及非Python语言中的命名冲突。

Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types – in the above example the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – here you want to specify that a phone number can be one of MOBILE, HOME, or WORK.

接下来,您有了消息定义。消息只是包含一组类型字段的集合。许多标准的简单数据类型是可用的字段类型,包括bool, int32, float, double, 和 string。你也可以通过其他的message 类型字段类型–上面例子中的Person message 包含PhoneNumber  message ,而AddressBook message 中包含了Person message。你甚至可以定义message 类型嵌套在其他message中 –正如你看到的,这个PhoneNumber类型定义在Person中。你也可以定义enum(枚举)类型,如果你想要你的一个字段中有一个预定义的值列表–这里你想指定一个电话号码,可以是一个MOBILE,HOME或者WORK。

The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional elements. Each element in a repeated field requires re-encoding the tag number, so repeated fields are particularly good candidates for this optimization.

每个元素上的“= 1”、“= 2”标记标识字段在二进制编码中使用的唯一“标记”。相对于更大的数字,数字1-15在进行二进制编码时可以少用一个byte,所以作为一个优化你可以决定使用这些标签,对于常用的或重复的元素,使标签16和更高的不常用的可选元素。重复字段中的每个元素都需要重新编码标记号,因此重复字段对于这种优化尤其适合。

:proto2没写,proto3写了最小编号是1,最大是2^99-1或536,870,911。不能使用数字19000-19999.(FieldDescriptor::kFirstReservedNumber through FieldDescriptor::kLastReservedNumber),一般也用不到这么大。

Each field must be annotated with one of the following modifiers:

  • required: a value for the field must be provided, otherwise the message will be considered "uninitialized". Serializing an uninitialized message will raise an exception. Parsing an uninitialized message will fail. Other than this, a required field behaves exactly like an optional field.
  • optional: the field may or may not be set. If an optional field value isn't set, a default value is used. For simple types, you can specify your own default value, as we've done for the phone number type in the example. Otherwise, a system default is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of an optional (or required) field which has not been explicitly set always returns that field's default value.
  • repeated: the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.
每个字段必须用下列修饰符之一注释:

required:该字段的值必须提供,否则将被视为“初始化”的消息。序列化一个初始化的消息将引发一个异常。解析一个未初始化的消息会失败。除此之外,所需字段的行为与可选字段完全相同。
optional :字段可以设置也可以不设置。如果没有设置可选字段值,则使用默认值。对于简单类型,您可以指定自己的默认值,就像我们为示例中的电话号码类型所做的那样。否则,系统默认的是:零的数字类型,字符串的空字符串,false 的bools。对于嵌入的消息,默认值始终是消息的“默认实例”或“原型”,其字段没有设置。调用访问器来获得一个optional 的 值(或required)领域没有显式设置总是返回字段的默认值。
repeated:字段可以重复任意次数(包括零)。重复值的顺序将保存在protocol buffer中。将重复字段看作动态大小数组。

注意,proto2里面是这样要求的,proto3里面存在无上述修饰符的字段

Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.

required是永远的。你应该非常小心地标记字段为required。如果在某个时候您希望停止写入或发送required 字段,将字段更改为optional 字段将是有问题的——旧接受者将考虑没有该字段的消息不完整,可能会无意中拒绝或删除它们。您应该考虑为缓冲区编写特定于应用程序的自定义验证例程。谷歌的一些工程师得出结论:使用required 弊大于利;他们更喜欢只使用optional和repeated字段 。然而,这种观点并不普遍。

You'll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.

在Protocol Buffer 语言指南中,您将找到一个完整的指南来编写.proto文件(包括所有可能的字段类型)。不要去找类似于类继承的设备,protocol buffers不这样做。

3.3编译protocol Buffers

Now that you have a .proto, the next thing you need to do is generate the classes you'll need to read and write AddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:

  1. If you haven't installed the compiler, download the package and follow the instructions in the README.
  2. Now run the compiler, specifying the source directory (where your application's source code lives – the current directory is used if you don't provide a value), the destination directory (where you want the generated code to go; often the same as $SRC_DIR), and the path to your .proto. In this case, you...:
     
        
    protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.protoBecause you want Python classes, you use the --python_out option – similar options are provided for other supported languages.
This generates addressbook_pb2.py in your specified destination directory.
现在你有一个.proto,接下来你要做的是生成类你需要读写的AddressBook (Person 和PhoneNumber)。要做到这一点,你需要对你的.proto运行protocol buffer编译器protoc
如果您还没有安装编译器,请下载包并按照自述文件中的说明进行操作。现在编译运行,指定源目录(你的应用程序的源代码的位置–默认是当前目录),目标目录(生成的代码想要放的目录;经常和$SRC_DIR一样),和.proto的位置。在这种情况下,你…:
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.proto
因为你想要的是Python类,你使用-- python_out选项–其他支持的语言 也提供 类似的选项
这将生成一个addressbook_pb2.py在你的目标目录下。

3.4 Protocol Buffer API

这部分可以参考已经翻译了主要内容的博客:
http://blog.csdn.net/a464057216/article/details/54932719

Unlike when you generate Java and C++ protocol buffer code, the Python protocol buffer compiler doesn't generate your data access code for you directly. Instead (as you'll see if you look at addressbook_pb2.py) it generates special descriptors for all your messages, enums, and fields, and some mysteriously empty classes, one for each message type:

与java和C++生成protocol buffer代码不同,Python编译器直接为你生成你的数据获取代码。相反,(如果你看了addressbook_pb2 .py)产生特殊的描述符,这是对所有的messagesenumsfields,和一些神秘的空类,每一个消息类型:

以下复制参考博客翻译的一些重点内容

标准message方法

每个Message类含有一些检查或操作整个message的方法,比如: 
• IsInitialized():检查是否所有required域都已赋值。 
• str():返回message的可读形式,可以通过str(message)或者print message触发,用于调试代码。 
• Clear():将所有域的赋值清空。 
• MergeFrom(other_msg):将给定的other_msg的内容合并到当前message,独立的域使用other_msg的值覆盖写入,repeated域的内容append到当前message的对应字段。独立的子message和group被递归的合并。 
• CopyFrom(other_msg):先对本message调用Clear()方法,再调用MergeFrom(other_msg)。 
• MergeFromString(serialized):将PB二进制字符串解析后合并到本message,合并规则与MergeFrom方法一致。 
• ListFields():以(google.protobuf.descriptor.FieldDescriptor,value)的列表形式返回非空的域,独立的域如果HasField返回True则是非空的,repeated域至少包含一个元素则是非空的。 
• ClearField(field_name):清空某个域,如果被清空的域名不存在,抛出ValueError异常。 
• ByteSize():返回message占用的空间大小。 

• WichOneof(oneof_group):返回oneof组中被设置的域的名字或None,如果提供的oneof的组名不存在,抛出ValueError异常。比如test.proto中内容如下:

序列化和解析

每个Message类都有序列化和解析方法: 
• SerializeToString():将message序列化并返回str类型的结果(str类型只是二进制数据的一个容器而已,而不是文本内容)。如果message没有初始化,抛出message.EncodeError异常。 
• SerializePartialToString():将message序列化并返回str类型的结果,但是不检查message是否初始化。 
• ParseFromString(data):从给定的二进制str解析得到message对象。

如果要在生成的PB类的基础上增加新的功能,应该采用包装(wrapper)的方式,永远不要将PB类作为基类派生子类添加新功能。

四、数据类型对应

proto文件的数据类型和python不完全一样,请参考,我就懒得复制了

https://developers.google.com/protocol-buffers/docs/proto3

五、实测操作

手动创建一个文本,命名为test3.proto。感觉用PyCharm挺方便的,有proto 插件,带智能提示,其它IDE没试过。notepad++也还行,但感觉差一点。

里面内容比较简单,proto3可以加C++风格的注释,就是用//或/**/。

// [START declaration]
syntax = "proto3";
package USV2Sever;
// [END declaration]

// [START messages]
message Admin {
    string Cmd = 1;// what command will do
    string U = 2;// administor's username
    string P = 3;// administor's password

    message User{
        string N=1;//new user's username
        string P=2;//new user's password
        int32 Level=3;//new user's level
    }
    repeated User newUs=4;// new users array
}
// [END messages]

然后编译。编译那段看的有点懵,卡了一段时间,我还以为安装时有自动加道环境变量里了,一直运行不了protoc。。。实际操作如下截图所示:


用之前下的protoc-3.x.x-win32解压出的proto.exe程序,命令行切换到这个程序目录下( 或使用它的绝对目录),运行示例命令。然后自动生成一个test3_pb2.py的py程序,不过我看这命名方式是不是还是proto2。。。待后续测试。
protoc是那个程序,-I=.proto文件目录(source directory),截图里“.”表示当前目录,--python_out=生成的python代码的输出目录(destination directory),最后是给protoc编译用的.proto文件。
新建一个python项目,import先前生成的xxx_pb2.py

用PyCharm对admin没有智能提示,只能手动对照proto里的定义来写

import protobufTest.testCreate_pb2 as userCreate
admin=userCreate.Admin() # 对应message Admin{}
admin.Cmd="UserCreate"
admin.U="admin"
admin.P="admin"
NewUser1=admin.newUs.add() # add()对应repeated
NewUser1.N="ostartj"
NewUser1.P="ostar123"
NewUser1.Level=1

NewUser2=admin.newUs.add()
NewUser2.N="ostartj2"
NewUser2.P="ostar123"
NewUser2.Level=1

print(admin)

六、参考文献

1、https://www.jianshu.com/p/0c563b2c0fdb

2、https://developers.google.com/protocol-buffers/docs/overview

3、http://blog.csdn.net/a464057216/article/details/54932719

你可能感兴趣的:(Python,网络和通信)