在了解 group varint 算法之前,应该对 varint 有所了解。
What is varint? 根据 google http://code.google.com/apis/protocolbuffers/docs/encoding.html 的官方解释可以很清晰的了解到,它是一种用 1 个或多个字节序列化 integer 的一种方法, Smaller numbers take a smaller number of bytes. 传统的 integer 是以 32 位来表示的,存储需要 4 个字节,当如果整数大小在 256 以内,那么只需要用一个字节就可以存储这个整数,这样就可以节省 3 个字节的存储空间, Google varint 就是根据这种思想来序列化整数的:
Each byte in a varint, except the last byte, has the most significant bit (msb) set – this indicates that there are further bytes to come. The lower 7 bits of each byte are used to store the two's complement representation of the number in groups of 7 bits, least significant group first.
在 varint 中的除了最后一个字节的每个字节,都有一个最高标志位,这表明的是否还有更多的字节来表示这个 int ,怎么说呢,因为存储是一连串的字节数组,那么以官方文档的例子为例,
第一个字节最高位为 0 ,表示我只用一个字节就可以表示我自己 ( 这个 int) ,第二个字节同理,第三个字节第一位为 1 ,表明我后面还有字节来表示我自己,于是再看它后面的字节,最高位为 0 ,所以后面没有更多的字节了,这就是 indicates that there are further bytes to come 的含义,但是这种算法在每次 decode 的时候,会遍历每个字节来确定哪些字节代表的一个 int( 关于如何 encode 以及 decode varint 可以参照野王师兄的算法实现, http://www.searchtb.com/2011/05/google-group-varint-%E6%97%A0%E6%8D%9F%E5%8E%8B%E7%BC%A9%E8%A7%A3%E5%8E%8B%E7%AE%97%E6%B3%95%E7%9A%84%E9%AB%98%E6%95%88%E5%AE%9E%E7%8E%B0.html )
Group varint 是对 varint 的一种优化,现在,我用前 2 位来表示有几个字节代表一个 int , for example:
00 表示一个字节, 01 表示 1 个字节, 10 表示 2 个字节, 11 表示 4 个字节,现在我把这 8 位表示当前这 4 个 int 序列字节数的位拿出来,存放入一个字节,将这个字节当作一个 prefix ,比如现在我要 decode 序列,我只用将传过来的序列的 prefix 拿出来,放入我预编译好的 256-entry table 中,去拿出这代表的 offsets ,这怎么说呢,以野王师兄实现的代码为例,
/**
* group varint 的索引表,
* 前 4 列分别表示 4 个 int 压缩单元 和 第一个索引单元的距离 (byte)
* 第 5 列 表示的是 下一个索引单元 和 第一个索引单元的距离 (byte)
*/
static const int GROUP_VARINT_IDX_ARR[256][5] =
{
/* 00 00 00 00 */ {1, 2, 3, 4, 5},
/* 00 00 00 01 */ {1, 2, 3, 4, 6},
/* 00 00 00 10 */ {1, 2, 3, 4, 7},
/* 00 00 00 11 */ {1, 2, 3, 4, 8},
/* 00 00 01 00 */ {1, 2, 3, 5, 6},
/* 00 00 01 01 */ {1, 2, 3, 5, 7},
/* 00 00 01 10 */ {1, 2, 3, 5, 8},
/* 00 00 01 11 */ {1, 2, 3, 5, 9},
/* 00 00 10 00 */ {1, 2, 3, 6, 7},
/* 00 00 10 01 */ {1, 2, 3, 6, 8},
/* 00 00 10 10 */ {1, 2, 3, 6, 9},
/* 00 00 10 11 */ {1, 2, 3, 6, 10},
/* 00 00 11 00 */ {1, 2, 3, 7, 8},
/* 00 00 11 01 */ {1, 2, 3, 7, 9},
/* 00 00 11 10 */ {1, 2, 3, 7, 10},
/* 00 00 11 11 */ {1, 2, 3, 7, 11},
/* 00 01 00 00 */ {1, 2, 4, 5, 6},
/* 00 01 00 01 */ {1, 2, 4, 5, 7},
/* 00 01 00 10 */ {1, 2, 4, 5, 8},
/* 00 01 00 11 */ {1, 2, 4, 5, 9},
/* 00 01 01 00 */ {1, 2, 4, 6, 7},
/* 00 01 01 01 */ {1, 2, 4, 6, 8},
/* 00 01 01 10 */ {1, 2, 4, 6, 9},
/* 00 01 01 11 */ {1, 2, 4, 6, 10},
/* 00 01 10 00 */ {1, 2, 4, 7, 8},
/* 00 01 10 01 */ {1, 2, 4, 7, 9},
/* 00 01 10 10 */ {1, 2, 4, 7, 10},
/* 00 01 10 11 */ {1, 2, 4, 7, 11},
/* 00 01 11 00 */ {1, 2, 4, 8, 9},
/* 00 01 11 01 */ {1, 2, 4, 8, 10},
/* 00 01 11 10 */ {1, 2, 4, 8, 11},
/* 00 01 11 11 */ {1, 2, 4, 8, 12},
/* 00 10 00 00 */ {1, 2, 5, 6, 7},
/* 00 10 00 01 */ {1, 2, 5, 6, 8},
/* 00 10 00 10 */ {1, 2, 5, 6, 9},
/* 00 10 00 11 */ {1, 2, 5, 6, 10},
/* 00 10 01 00 */ {1, 2, 5, 7, 8},
/* 00 10 01 01 */ {1, 2, 5, 7, 9},
/* 00 10 01 10 */ {1, 2, 5, 7, 10},
/* 00 10 01 11 */ {1, 2, 5, 7, 11},
/* 00 10 10 00 */ {1, 2, 5, 8, 9},
/* 00 10 10 01 */ {1, 2, 5, 8, 10},
/* 00 10 10 10 */ {1, 2, 5, 8, 11},
/* 00 10 10 11 */ {1, 2, 5, 8, 12},
/* 00 10 11 00 */ {1, 2, 5, 9, 10},
/* 00 10 11 01 */ {1, 2, 5, 9, 11},
/* 00 10 11 10 */ {1, 2, 5, 9, 12},
/* 00 10 11 11 */ {1, 2, 5, 9, 13},
/* 00 11 00 00 */ {1, 2, 6, 7, 8},
/* 00 11 00 01 */ {1, 2, 6, 7, 9},
/* 00 11 00 10 */ {1, 2, 6, 7, 10},
/* 00 11 00 11 */ {1, 2, 6, 7, 11},
/* 00 11 01 00 */ {1, 2, 6, 8, 9},
/* 00 11 01 01 */ {1, 2, 6, 8, 10},
/* 00 11 01 10 */ {1, 2, 6, 8, 11},
/* 00 11 01 11 */ {1, 2, 6, 8, 12},
/* 00 11 10 00 */ {1, 2, 6, 9, 10},
/* 00 11 10 01 */ {1, 2, 6, 9, 11},
/* 00 11 10 10 */ {1, 2, 6, 9, 12},
/* 00 11 10 11 */ {1, 2, 6, 9, 13},
/* 00 11 11 00 */ {1, 2, 6, 10, 11},
/* 00 11 11 01 */ {1, 2, 6, 10, 12},
/* 00 11 11 10 */ {1, 2, 6, 10, 13},
/* 00 11 11 11 */ {1, 2, 6, 10, 14},
/* 01 00 00 00 */ {1, 3, 4, 5, 6},
/* 01 00 00 01 */ {1, 3, 4, 5, 7},
/* 01 00 00 10 */ {1, 3, 4, 5, 8},
/* 01 00 00 11 */ {1, 3, 4, 5, 9},
/* 01 00 01 00 */ {1, 3, 4, 6, 7},
/* 01 00 01 01 */ {1, 3, 4, 6, 8},
/* 01 00 01 10 */ {1, 3, 4, 6, 9},
/* 01 00 01 11 */ {1, 3, 4, 6, 10},
/* 01 00 10 00 */ {1, 3, 4, 7, 8},
/* 01 00 10 01 */ {1, 3, 4, 7, 9},
/* 01 00 10 10 */ {1, 3, 4, 7, 10},
/* 01 00 10 11 */ {1, 3, 4, 7, 11},
/* 01 00 11 00 */ {1, 3, 4, 8, 9},
/* 01 00 11 01 */ {1, 3, 4, 8, 10},
/* 01 00 11 10 */ {1, 3, 4, 8, 11},
/* 01 00 11 11 */ {1, 3, 4, 8, 12},
/* 01 01 00 00 */ {1, 3, 5, 6, 7},
/* 01 01 00 01 */ {1, 3, 5, 6, 8},
/* 01 01 00 10 */ {1, 3, 5, 6, 9},
/* 01 01 00 11 */ {1, 3, 5, 6, 10},
/* 01 01 01 00 */ {1, 3, 5, 7, 8},
/* 01 01 01 01 */ {1, 3, 5, 7, 9},
/* 01 01 01 10 */ {1, 3, 5, 7, 10},
/* 01 01 01 11 */ {1, 3, 5, 7, 11},
/* 01 01 10 00 */ {1, 3, 5, 8, 9},
/* 01 01 10 01 */ {1, 3, 5, 8, 10},
/* 01 01 10 10 */ {1, 3, 5, 8, 11},
/* 01 01 10 11 */ {1, 3, 5, 8, 12},
/* 01 01 11 00 */ {1, 3, 5, 9, 10},
/* 01 01 11 01 */ {1, 3, 5, 9, 11},
/* 01 01 11 10 */ {1, 3, 5, 9, 12},
/* 01 01 11 11 */ {1, 3, 5, 9, 13},
/* 01 10 00 00 */ {1, 3, 6, 7, 8},
/* 01 10 00 01 */ {1, 3, 6, 7, 9},
/* 01 10 00 10 */ {1, 3, 6, 7, 10},
/* 01 10 00 11 */ {1, 3, 6, 7, 11},
/* 01 10 01 00 */ {1, 3, 6, 8, 9},
/* 01 10 01 01 */ {1, 3, 6, 8, 10},
/* 01 10 01 10 */ {1, 3, 6, 8, 11},
/* 01 10 01 11 */ {1, 3, 6, 8, 12},
/* 01 10 10 00 */ {1, 3, 6, 9, 10},
/* 01 10 10 01 */ {1, 3, 6, 9, 11},
/* 01 10 10 10 */ {1, 3, 6, 9, 12},
/* 01 10 10 11 */ {1, 3, 6, 9, 13},
/* 01 10 11 00 */ {1, 3, 6, 10, 11},
/* 01 10 11 01 */ {1, 3, 6, 10, 12},
/* 01 10 11 10 */ {1, 3, 6, 10, 13},
/* 01 10 11 11 */ {1, 3, 6, 10, 14},
/* 01 11 00 00 */ {1, 3, 7, 8, 9},
/* 01 11 00 01 */ {1, 3, 7, 8, 10},
/* 01 11 00 10 */ {1, 3, 7, 8, 11},
/* 01 11 00 11 */ {1, 3, 7, 8, 12},
/* 01 11 01 00 */ {1, 3, 7, 9, 10},
/* 01 11 01 01 */ {1, 3, 7, 9, 11},
/* 01 11 01 10 */ {1, 3, 7, 9, 12},
/* 01 11 01 11 */ {1, 3, 7, 9, 13},
/* 01 11 10 00 */ {1, 3, 7, 10, 11},
/* 01 11 10 01 */ {1, 3, 7, 10, 12},
/* 01 11 10 10 */ {1, 3, 7, 10, 13},
/* 01 11 10 11 */ {1, 3, 7, 10, 14},
/* 01 11 11 00 */ {1, 3, 7, 11, 12},
/* 01 11 11 01 */ {1, 3, 7, 11, 13},
/* 01 11 11 10 */ {1, 3, 7, 11, 14},
/* 01 11 11 11 */ {1, 3, 7, 11, 15},
/* 10 00 00 00 */ {1, 4, 5, 6, 7},
/* 10 00 00 01 */ {1, 4, 5, 6, 8},
/* 10 00 00 10 */ {1, 4, 5, 6, 9},
/* 10 00 00 11 */ {1, 4, 5, 6, 10},
/* 10 00 01 00 */ {1, 4, 5, 7, 8},
/* 10 00 01 01 */ {1, 4, 5, 7, 9},
/* 10 00 01 10 */ {1, 4, 5, 7, 10},
/* 10 00 01 11 */ {1, 4, 5, 7, 11},
/* 10 00 10 00 */ {1, 4, 5, 8, 9},
/* 10 00 10 01 */ {1, 4, 5, 8, 10},
/* 10 00 10 10 */ {1, 4, 5, 8, 11},
/* 10 00 10 11 */ {1, 4, 5, 8, 12},
/* 10 00 11 00 */ {1, 4, 5, 9, 10},
/* 10 00 11 01 */ {1, 4, 5, 9, 11},
/* 10 00 11 10 */ {1, 4, 5, 9, 12},
/* 10 00 11 11 */ {1, 4, 5, 9, 13},
/* 10 01 00 00 */ {1, 4, 6, 7, 8},
/* 10 01 00 01 */ {1, 4, 6, 7, 9},
/* 10 01 00 10 */ {1, 4, 6, 7, 10},
/* 10 01 00 11 */ {1, 4, 6, 7, 11},
/* 10 01 01 00 */ {1, 4, 6, 8, 9},
/* 10 01 01 01 */ {1, 4, 6, 8, 10},
/* 10 01 01 10 */ {1, 4, 6, 8, 11},
/* 10 01 01 11 */ {1, 4, 6, 8, 12},
/* 10 01 10 00 */ {1, 4, 6, 9, 10},
/* 10 01 10 01 */ {1, 4, 6, 9, 11},
/* 10 01 10 10 */ {1, 4, 6, 9, 12},
/* 10 01 10 11 */ {1, 4, 6, 9, 13},
/* 10 01 11 00 */ {1, 4, 6, 10, 11},
/* 10 01 11 01 */ {1, 4, 6, 10, 12},
/* 10 01 11 10 */ {1, 4, 6, 10, 13},
/* 10 01 11 11 */ {1, 4, 6, 10, 14},
/* 10 10 00 00 */ {1, 4, 7, 8, 9},
/* 10 10 00 01 */ {1, 4, 7, 8, 10},
/* 10 10 00 10 */ {1, 4, 7, 8, 11},
/* 10 10 00 11 */ {1, 4, 7, 8, 12},
/* 10 10 01 00 */ {1, 4, 7, 9, 10},
/* 10 10 01 01 */ {1, 4, 7, 9, 11},
/* 10 10 01 10 */ {1, 4, 7, 9, 12},
/* 10 10 01 11 */ {1, 4, 7, 9, 13},
/* 10 10 10 00 */ {1, 4, 7, 10, 11},
/* 10 10 10 01 */ {1, 4, 7, 10, 12},
/* 10 10 10 10 */ {1, 4, 7, 10, 13},
/* 10 10 10 11 */ {1, 4, 7, 10, 14},
/* 10 10 11 00 */ {1, 4, 7, 11, 12},
/* 10 10 11 01 */ {1, 4, 7, 11, 13},
/* 10 10 11 10 */ {1, 4, 7, 11, 14},
/* 10 10 11 11 */ {1, 4, 7, 11, 15},
/* 10 11 00 00 */ {1, 4, 8, 9, 10},
/* 10 11 00 01 */ {1, 4, 8, 9, 11},
/* 10 11 00 10 */ {1, 4, 8, 9, 12},
/* 10 11 00 11 */ {1, 4, 8, 9, 13},
/* 10 11 01 00 */ {1, 4, 8, 10, 11},
/* 10 11 01 01 */ {1, 4, 8, 10, 12},
/* 10 11 01 10 */ {1, 4, 8, 10, 13},
/* 10 11 01 11 */ {1, 4, 8, 10, 14},
/* 10 11 10 00 */ {1, 4, 8, 11, 12},
/* 10 11 10 01 */ {1, 4, 8, 11, 13},
/* 10 11 10 10 */ {1, 4, 8, 11, 14},
/* 10 11 10 11 */ {1, 4, 8, 11, 15},
/* 10 11 11 00 */ {1, 4, 8, 12, 13},
/* 10 11 11 01 */ {1, 4, 8, 12, 14},
/* 10 11 11 10 */ {1, 4, 8, 12, 15},
/* 10 11 11 11 */ {1, 4, 8, 12, 16},
/* 11 00 00 00 */ {1, 5, 6, 7, 8},
/* 11 00 00 01 */ {1, 5, 6, 7, 9},
/* 11 00 00 10 */ {1, 5, 6, 7, 10},
/* 11 00 00 11 */ {1, 5, 6, 7, 11},
/* 11 00 01 00 */ {1, 5, 6, 8, 9},
/* 11 00 01 01 */ {1, 5, 6, 8, 10},
/* 11 00 01 10 */ {1, 5, 6, 8, 11},
/* 11 00 01 11 */ {1, 5, 6, 8, 12},
/* 11 00 10 00 */ {1, 5, 6, 9, 10},
/* 11 00 10 01 */ {1, 5, 6, 9, 11},
/* 11 00 10 10 */ {1, 5, 6, 9, 12},
/* 11 00 10 11 */ {1, 5, 6, 9, 13},
/* 11 00 11 00 */ {1, 5, 6, 10, 11},
/* 11 00 11 01 */ {1, 5, 6, 10, 12},
/* 11 00 11 10 */ {1, 5, 6, 10, 13},
/* 11 00 11 11 */ {1, 5, 6, 10, 14},
/* 11 01 00 00 */ {1, 5, 7, 8, 9},
/* 11 01 00 01 */ {1, 5, 7, 8, 10},
/* 11 01 00 10 */ {1, 5, 7, 8, 11},
/* 11 01 00 11 */ {1, 5, 7, 8, 12},
/* 11 01 01 00 */ {1, 5, 7, 9, 10},
/* 11 01 01 01 */ {1, 5, 7, 9, 11},
/* 11 01 01 10 */ {1, 5, 7, 9, 12},
/* 11 01 01 11 */ {1, 5, 7, 9, 13},
/* 11 01 10 00 */ {1, 5, 7, 10, 11},
/* 11 01 10 01 */ {1, 5, 7, 10, 12},
/* 11 01 10 10 */ {1, 5, 7, 10, 13},
/* 11 01 10 11 */ {1, 5, 7, 10, 14},
/* 11 01 11 00 */ {1, 5, 7, 11, 12},
/* 11 01 11 01 */ {1, 5, 7, 11, 13},
/* 11 01 11 10 */ {1, 5, 7, 11, 14},
/* 11 01 11 11 */ {1, 5, 7, 11, 15},
/* 11 10 00 00 */ {1, 5, 8, 9, 10},
/* 11 10 00 01 */ {1, 5, 8, 9, 11},
/* 11 10 00 10 */ {1, 5, 8, 9, 12},
/* 11 10 00 11 */ {1, 5, 8, 9, 13},
/* 11 10 01 00 */ {1, 5, 8, 10, 11},
/* 11 10 01 01 */ {1, 5, 8, 10, 12},
/* 11 10 01 10 */ {1, 5, 8, 10, 13},
/* 11 10 01 11 */ {1, 5, 8, 10, 14},
/* 11 10 10 00 */ {1, 5, 8, 11, 12},
/* 11 10 10 01 */ {1, 5, 8, 11, 13},
/* 11 10 10 10 */ {1, 5, 8, 11, 14},
/* 11 10 10 11 */ {1, 5, 8, 11, 15},
/* 11 10 11 00 */ {1, 5, 8, 12, 13},
/* 11 10 11 01 */ {1, 5, 8, 12, 14},
/* 11 10 11 10 */ {1, 5, 8, 12, 15},
/* 11 10 11 11 */ {1, 5, 8, 12, 16},
/* 11 11 00 00 */ {1, 5, 9, 10, 11},
/* 11 11 00 01 */ {1, 5, 9, 10, 12},
/* 11 11 00 10 */ {1, 5, 9, 10, 13},
/* 11 11 00 11 */ {1, 5, 9, 10, 14},
/* 11 11 01 00 */ {1, 5, 9, 11, 12},
/* 11 11 01 01 */ {1, 5, 9, 11, 13},
/* 11 11 01 10 */ {1, 5, 9, 11, 14},
/* 11 11 01 11 */ {1, 5, 9, 11, 15},
/* 11 11 10 00 */ {1, 5, 9, 12, 13},
/* 11 11 10 01 */ {1, 5, 9, 12, 14},
/* 11 11 10 10 */ {1, 5, 9, 12, 15},
/* 11 11 10 11 */ {1, 5, 9, 12, 16},
/* 11 11 11 00 */ {1, 5, 9, 13, 14},
/* 11 11 11 01 */ {1, 5, 9, 13, 15},
/* 11 11 11 10 */ {1, 5, 9, 13, 16},
/* 11 11 11 11 */ {1, 5, 9, 13, 17}
};
/* 00 00 00 00 */ {1, 2, 3, 4, 5},
00 00 00 00 表示我传过来的 4 个 int 都用一个字节表示,因为 00 代表一个字节嘛,所以他距离 prefix 的距离就是 1 , 2 , 3 , 4 ,所以下一个索引距离第一个索引的距离是 5 ,这样我们都定位出了这 4 个连续 group varint 编码的哪几个字节代表哪几个 int 压缩单元,接下来我们所要做的就是对确定的 int 压缩单元进行解压缩了,具体的算法也可以参照上面的野王师兄的算法实现。