protocol buffer MergeFrom的坑

pb好用, 性能虽然比不上二进制, 不过也比json在大多数情况下还是好很多. 尤其是字段的向前向后兼容, 这一点应该也是大多数同学
选用pb的原因. 然后终于掉进坑了.

整理测试程序的时候发现, python的官方指南压根就没提MergeFrom这个接口, 看起来是专门用来坑c++的兄弟的啊

下面给个例子来说明

# mergefrom_trap.proto

package mergefrom_trap;

message Person
{
    required string name = 1;
    required int32  age = 2;
//    optional int32  new1 = 3;
//    optional int32  new2 = 4[default = 0];
}

message AddressBook
{
    required string host_name = 1;
    repeated Person person = 2; 
//    optional int32  new3 = 3;
//    optional int32  new4 = 4[default = 0];
}
1. 先用没有new1-4字段的proto, 编译出c++的bin文件
[carl pb_mergefrom_trap]$ protoc -I=./ --cpp_out=./ mergefrom_trap.proto
[carl pb_mergefrom_trap]$ g++ read_new_write_old.cpp mergefrom_trap.pb.cc -lprotobuf -o read_new_write_old.bin
2. 然后添家new1-4字段, 生成 mergefrom_trap_pb2.py
[carl pb_mergefrom_trap]$ protoc -I=./ --python_out=./ mergefrom_trap.proto
3. 用write_new.py程序创建一个address_book对象并写入文件, 输出结果如下:
[carl pb_mergefrom_trap]$ python write_new.py addressbook
address_book: host_name: "carl"
person {
  name: "person1"
  age: 30
  new1: 1
  new2: 2
}
new3: 3
new4: 4
4. 执行上面c++的bin文件
[carl pb_mergefrom_trap]$ ./read_new_write_old.bin addressbook
book: host_name: "carl"
person {
  name: "person1"
  age: 30
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_copyfrom_update"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_copyfrom_update"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_copyfrom_update"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_copyfrom_update"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_copyfrom_update"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
}
3: 3
4: 4

a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
book: host_name: "carl_copyfrom_update"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_update_mergefrom"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_update_mergefrom"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_update_mergefrom"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_update_mergefrom"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
}
3: 3
4: 4

book: host_name: "carl_update_mergefrom"
person {
  name: "person1"
  age: 100
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
  3: 1
  4: 2
}
3: 3
4: 4

总结

由上测试结果可以看出, 一个使用了旧proto的c++程序, 在调用mergefrom的时候, 那2个unknown的字段不是覆盖回原来已有的字段上,
而是新增了. 连续调用几次, 这个是指数级增长的. 这会导致serialize出来的bin文件大小指数级增长, 然后会影响到进程, 在
Parse和Serialize的时候, 函数调用会阻塞几十秒甚至几分钟, 网络流量也会剧增, 这对一个生产服务器可以说是致命的, 血淋淋的
教训啊.

以上文件托管于github
github/demo/tree/master/testcode/pb_mergefrom_trap

你可能感兴趣的:(protocol buffer MergeFrom的坑)