https://github.com/protocolbuffers/protobuf
protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the “old” format.
来自google protobuf 手册
简单来说就是protobuf 格式数据(解析)更快、(使用)更简单(我并不觉得)、(体积)更轻便。
正如google的其他工具一样,protobuf缺少手册、生态和必要的测试数据,所以最近项目需要从json切换到protobuf搞得我痛不欲生。
我本机环境:
ProductName: Mac OS X
ProductVersion: 10.14.6
python 3.7.3
install manual
第一步:brew install protobuf
第二步:brew upgrade protobuf
第三步:
PROTOC_ZIP=protoc-3.7.1-osx-x86_64.zip
curl -OL https://github.com/google/protobuf/releases/download/v3.7.1/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local include/*
rm -f $PROTOC_ZIP
(每一步都是一次复制一次粘贴)
manual
以下是我写的一个测试proto文件:
syntax = "proto2";
option java_outer_classname = "OpenRtb";
package com.google.openrtb;
message Person {
required string name = 1;
required int32 id = 2;
}
当然由于我使用的是proto2版本,以及协议是OpenRTB,如果不需要可以这样写:
syntax = "proto2";
message Person {
required string name = 1;
required int32 id = 2;
}
文件名为person.proto
python环境会将proto文件编译为*_pb2.py(比如person.proto编译为person_pb2.py),python内部调用就需要import *_pb2。
具体如何使用请继续向下阅读。
编译命令:
protoc --proto_path=包含proto文件的目录(最好是绝对路径) --python_out=pb2.py文件的存放目录
(最好绝对路径) .proto文件的存放路径(最好是绝对路径)
# 所以我的编译命令是
protoc --proto_path=/USER/XXXX/protobuf --python_out=./ /USER/XXXX/protobuf/person.proto
person.proto文件内容为
syntax = "proto2";
option java_outer_classname = "OpenRtb";
package com.google.openrtb;
message Person {
required string name = 1;
required int32 id = 2;
}
输出为:
person_pb2.py
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: person.proto
import sys
_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name='person.proto',
package='com.google.openrtb',
syntax='proto2',
serialized_options=_b('B\007OpenRtb'),
serialized_pb=_b('\n\x0cperson.proto\x12\x12\x63om.google.openrtb\"\"\n\x06Person\x12\x0c\n\x04name\x18\x01 \x02(\t\x12\n\n\x02id\x18\x02 \x02(\x05\x42\tB\x07OpenRtb')
)
_PERSON = _descriptor.Descriptor(
name='Person',
full_name='com.google.openrtb.Person',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='name', full_name='com.google.openrtb.Person.name', index=0,
number=1, type=9, cpp_type=9, label=2,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR),
_descriptor.FieldDescriptor(
name='id', full_name='com.google.openrtb.Person.id', index=1,
number=2, type=5, cpp_type=1, label=2,
has_default_value=False, default_value=0,
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR),
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto2',
extension_ranges=[],
oneofs=[
],
serialized_start=36,
serialized_end=70,
)
DESCRIPTOR.message_types_by_name['Person'] = _PERSON
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
Person = _reflection.GeneratedProtocolMessageType('Person', (_message.Message,), dict(
DESCRIPTOR = _PERSON,
__module__ = 'person_pb2'
# @@protoc_insertion_point(class_scope:com.google.openrtb.Person)
))
_sym_db.RegisterMessage(Person)
DESCRIPTOR._options = None
# @@protoc_insertion_point(module_scope)
测试代码
from google.protobuf import json_format
from ppydsp.protobuf import person_pb2
import json
# 将数据转为protobuf格式
person = person_pb2.Person()
person.id = 123
person.name = "abc"
p = person.SerializeToString()
print(p)
# 值得注意的是如果使用的是sanic框架,response的返回如下:
response.row(p)
# row()特意为protobuf这样的字节流数据准备的,content-type默认为:application/octet-stream
# b'\n\x03abc\x10{'
# 将protobuf数据转为json格式
persons = person_pb2.Person()
persons.ParseFromString(p)
p = json_format.MessageToJson(persons)
print(p)
#
#{
# "name": "abc",
# "id": 123
#}
当然可以看到对于person的attr(name和id),测试中是逐个赋值的(现实场景中字段往往很多,比如openrtb协议中的字段有几百个),这其实是可以优化的。
文章有些连接访问可能需要科学上网,不过重要的信息我都有复制到文章中,影响不大。
测试代码地址:https://github.com/SchopenhauerZhang/py_protobuf
https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/index.html
protobuf 与json、xml的性能对比:
1、 https://www.infoq.cn/article/json-is-5-times-faster-than-protobuf
2、 https://tech.meituan.com/2015/02/26/serialization-vs-deserialization.html(https://code.google.com/archive/p/thrift-protobuf-compare/wikis/Benchmarking.wiki)
SerializeToString使用介绍:
http://cn.voidcc.com/question/p-crpmraiz-va.html
json_format手册:
https://developers.google.com/protocol-buffers/docs/reference/python/