(为啥在可视化编辑里的字都好好的,出来后就忽大忽小的,尤其在代码和文字混排的时候)
本篇主要是对“序列化.写入”所做的代码跟踪,会稍微提到点其他的。采取的例子是自带的addressbook
对我目前而言,主要关心这么几个点,对字段的管理,对协议的管理。
其中在一些代码分析的思路上是这样的:因为是对这套代码的整个需求不是太了解,所以采取的方式是,会先分析各个模块是干吗的,然后根据它们的行为开始推导。最后再将所有的模块串联起来。
要注意的是,作为一个阅读者,这套代码产生的环境、需求、历史都不太了解,所以在一些细节上的东西会稍微忽略,会有自己的疑问,但是不对里面的方法进行比较和评价,做到原原本本的展现出来。
流程图就不画了,我太懒了。反正也不是什么正规的,就是一个流水的记录。
1.字段管理.普通字段
对每个成员变量,都会有这几类的接口,一些set的接口还可能有若干的重载以
message Person { required string name = 1; }
为例,分别有以下接口
inline bool has_name() const; inline void clear_name(); inline const ::std::string& name() const; inline void set_name(const ::std::string& value); inline void set_name(const char* value); inline void set_name(const char* value, size_t size);
以及三个标志位的接口
inline bool Person::has_name() const { return (_has_bits_[0] & 0x00000001u) != 0; } inline void Person::set_has_name() { _has_bits_[0] |= 0x00000001u; } inline void Person::clear_has_name() { _has_bits_[0] &= ~0x00000001u; }
标志位类型如下
::google::protobuf::uint32 _has_bits_[(4 + 31) / 32];
在set_name(),clear_name()中,分别会调用相应的标志位接口。
因为值是和xxx=tag,中的tag绑定的, 所以在向后或向前兼容上,tag不能够重复的使用
bool SerializeToFileDescriptor(int file_descriptor) const; bool SerializePartialToFileDescriptor(int file_descriptor) const; bool SerializeToOstream(ostream* output) const; bool SerializePartialToOstream(ostream* output) const;
bool Message::SerializeToOstream(ostream* output) const { { io::OstreamOutputStream zero_copy_output(output); if (!SerializeToZeroCopyStream(&zero_copy_output)) return false; } return output->good(); } bool MessageLite::SerializeToZeroCopyStream( io::ZeroCopyOutputStream* output) const { io::CodedOutputStream encoder(output); return SerializeToCodedStream(&encoder); } bool MessageLite::SerializeToCodedStream(io::CodedOutputStream* output) const { GOOGLE_DCHECK(IsInitialized()) << InitializationErrorMessage("serialize", *this); return SerializePartialToCodedStream(output); }有些函数有Partial之分,最终都会调用到SerializePartialToCodedStream,因此整个类大体的调用层次如下:
bool MessageLite::SerializePartialToCodedStream( io::CodedOutputStream* output) const { const int size = ByteSize(); // Force size to be cached. uint8* buffer = output->GetDirectBufferForNBytesAndAdvance(size); if (buffer != NULL) { uint8* end = SerializeWithCachedSizesToArray(buffer); if (end - buffer != size) { ByteSizeConsistencyError(size, ByteSize(), end - buffer); } return true; } else { int original_byte_count = output->ByteCount(); SerializeWithCachedSizes(output); if (output->HadError()) { return false; } int final_byte_count = output->ByteCount(); if (final_byte_count - original_byte_count != size) { ByteSizeConsistencyError(size, ByteSize(), final_byte_count - original_byte_count); } return true; } }1).有两种写入的方式,SerializeWithCachedSizesToArray和SerializeWithCachedSizes
class LIBPROTOBUF_EXPORT CodedOutputStream { public: // Create an CodedOutputStream that writes to the given ZeroCopyOutputStream. explicit CodedOutputStream(ZeroCopyOutputStream* output); // Skips a number of bytes, leaving the bytes unmodified in the underlying // buffer. Returns false if an underlying write error occurs. This is // mainly useful with GetDirectBufferPointer(). bool Skip(int count); // Sets *data to point directly at the unwritten part of the // CodedOutputStream's underlying buffer, and *size to the size of that // buffer, but does not advance the stream's current position. This will // always either produce a non-empty buffer or return false. If the caller // writes any data to this buffer, it should then call Skip() to skip over // the consumed bytes. This may be useful for implementing external fast // serialization routines for types of data not covered by the // CodedOutputStream interface. bool GetDirectBufferPointer(void** data, int* size); // If there are at least "size" bytes available in the current buffer, // returns a pointer directly into the buffer and advances over these bytes. // The caller may then write directly into this buffer (e.g. using the // *ToArray static methods) rather than go through CodedOutputStream. If // there are not enough bytes available, returns NULL. The return pointer is // invalidated as soon as any other non-const method of CodedOutputStream // is called. inline uint8* GetDirectBufferForNBytesAndAdvance(int size); // Write raw bytes, copying them from the given buffer. void WriteRaw(const void* buffer, int size); // Like WriteRaw() but writing directly to the target array. // This is _not_ inlined, as the compiler often optimizes memcpy into inline // copy loops. Since this gets called by every field with string or bytes // type, inlining may lead to a significant amount of code bloat, with only a // minor performance gain. static uint8* WriteRawToArray(const void* buffer, int size, uint8* target); // Equivalent to WriteRaw(str.data(), str.size()). void WriteString(const string& str); // Like WriteString() but writing directly to the target array. static uint8* WriteStringToArray(const string& str, uint8* target); // Write a 32-bit little-endian integer. void WriteLittleEndian32(uint32 value); // Returns the total number of bytes written since this object was created. inline int ByteCount() const; // Returns true if there was an underlying I/O error since this object was // created. bool HadError() const { return had_error_; } private: GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(CodedOutputStream); ZeroCopyOutputStream* output_; uint8* buffer_; int buffer_size_; int total_bytes_; // Sum of sizes of all buffers seen so far. bool had_error_; // Whether an error occurred during output. // Advance the buffer by a given number of bytes. void Advance(int amount); // Called when the buffer runs out to request more data. Implies an // Advance(buffer_size_). bool Refresh(); };
这个类干了这么几件事
1) 维护一个ZeroCopyOutputStream
2) 维护一个uint8* buffer_,各种write函数都是和它绑定,这也是他希望的意识形态
3) uint8* buffer_和ZeroCopyOutputStream通过Refresh()转换
4) Refresh()的转换调用buffer_和ZeroCopyOutputStream通过Refresh::Next函数。而且Next必然是个虚函数
在XXXOutputStream结构类如下,以OstreamOutputStream为例,源码简化如下:
class LIBPROTOBUF_EXPORT OstreamOutputStream : public ZeroCopyOutputStream { public: // Creates a stream that writes to the given C++ ostream. // If a block_size is given, it specifies the size of the buffers // that should be returned by Next(). Otherwise, a reasonable default // is used. explicit OstreamOutputStream(ostream* stream, int block_size = -1); ~OstreamOutputStream(); // implements ZeroCopyOutputStream --------------------------------- bool Next(void** data, int* size); void BackUp(int count); int64 ByteCount() const; private: class LIBPROTOBUF_EXPORT CopyingOstreamOutputStream : public CopyingOutputStream { public: CopyingOstreamOutputStream(ostream* output); ~CopyingOstreamOutputStream(); // implements CopyingOutputStream -------------------------------- bool Write(const void* buffer, int size); private: // The stream. ostream* output_; GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(CopyingOstreamOutputStream); }; CopyingOstreamOutputStream copying_output_; CopyingOutputStreamAdaptor impl_; GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(OstreamOutputStream); };
1) OstreamOutputStream本身继承ZeroCopyOutputStream
2) 有个内置类Copying...,继承CopyingOutputStream
3) 及成员变量copying_output_和一个impl_
我们先看看OstreamOutputStream和copying_output_、impl_是怎么交互的。
// implements ZeroCopyOutputStream --------------------------------- bool Next(void** data, int* size); void BackUp(int count); int64 ByteCount() const; bool OstreamOutputStream::Next(void** data, int* size) { return impl_.Next(data, size); } void OstreamOutputStream::BackUp(int count) { impl_.BackUp(count); } int64 OstreamOutputStream::ByteCount() const { return impl_.ByteCount(); }
而copying_output_只是给impl_构造用
OstreamOutputStream::OstreamOutputStream(ostream* output, int block_size) : copying_output_(output), impl_(©ing_output_, block_size) { }
可以看到,OstreamOutputStream,copying_output_都继承了ZeroCopyOutputStream,但实现都是在copying_output_中,OstreamOutputStream只是起到接口约束。
继续跟调CopyingOutputStreamAdaptor。
1).维护scoped_array
2).围绕buffer_做了很多事,主要是字段,位置,写入等等
3).buffer_和copying_stream_交互主要通过一个Write的虚函数,比如
if (copying_stream_->Write(buffer_.get(), buffer_used_)) {
4).buffer_是一个连续的空间,大小由外部传入
至此,几个大模块功能都差不多过了一遍,现在把他们串起来。
自定义协议继承google::protobuf::Message,当你要把协议体序列化到某个介质的时候,如下:
std::fstream output(filename.c_str(), ios::out | ios::trunc | ios::binary);
addressbook.SerializeToOstream(&output);
SerializeToXXX,XXX可以是用户的自定义格式
进行一个IO流的封装,可以叫FileOutputStream,也可以叫OstreamOutputStream,以后者为例,都继承自一个叫ZeroCopyOutputStream接口类,需要实现以下三个函数
bool Next(void** data, int* size);
void BackUp(int count);
int64 ByteCount() const;
为了重写这3个接口的方便和统一,只要求用户在数据的导出上做一个重写。于是抽象出
CopyingOutputStream类,这个类里面只有一个bool Write(const void* buffer, int size);函数,也就是把第三方的数据源导入到buffer里面。
Next,BackUp,ByteCount自然可以起到一个重用的机制,于是抽象出叫CopyingOutputStreamAdaptor。
其继承自ZeroCopyOutputStream,主要是为了Next,BackUp,ByteCount接口约束。在父类OstreamOutputStream里的Next,BackUp,ByteCount,只是对CopyingOutputStreamAdaptor封装调用
(一开始对OstreamOutputStream,CopyingOutputStream,CopyingOutputStreamAdaptor有点迷惑,理清关系后,发现层次挺清晰的)
CopyingOutputStreamAdaptor维护着scoped_array
OK,那现在OstreamOutputStream已经有数据了,进行CodedOutputStream
CodedOutputStream是为两者提供服务,一个是 ZeroCopyOutputStream* output_;也就是我们前文中转换后的OstreamOutputStream;一个是静态数据,供第三方直接调用.
CodedOutputStream提供了一个uint8* buffer_;指针,其实是直接从ZeroCopyOutputStream* output_读取指针值的,这也是为什么叫ZeroCopyOutputStream。
最后调用MessageLite::SerializePartialToCodedStream函数,里面会判断调用虚函数SerializeWithCachedSizesToArray,
SerializeWithCachedSizes。(前者最后还是会调用SerializeWithCachedSizes)
在虚函数SerializeWithCachedSizesToArray里,参数是一个uint8* buffer_,把协议里的值和tag号顺序的写入入。tag|长度|值
4.序列化.读出
代码架构和写入的一样,主要关注最终的MergePartialFromCodedStream函数。