HPACK和twitter hpack源码解析

HPACK是用于压缩HTTP/2中header信息的压缩算法。

引言

在HTTP/1.x中，header信息以字符串的方式进行传输，随着大量并发的网络请求，冗余的header字段会造成没必要的带宽浪费，从而增加网络时延。

HTTP/2的对这个问题进行了优化，它对header信息进行压缩编码，提高了的带宽利用率，其中的压缩编码规范就是HPACK。

HPACK中将 header 字段列表视为 name-value 对的有序集合，其中可以包括重复的对。名称和值被认为是八位字节的不透明序列，并且 header 字段的顺序在压缩和解压缩后保持不变。

Static Table Definition

Index	Header Name	Header Value
1	:authority
2	:method	GET
3	:method	POST
4	:path	/
5	:path	/index.html
6	:scheme	http
7	:scheme	https
8	:status	200
9	:status	204
10	:status	206
11	:status	304
12	:status	400
13	:status	404
14	:status	500
15	accept-charset
16	accept-encoding	gzip, deflate
17	accept-language
18	accept-ranges
19	accept
20	access-control-allow-origin
21	age
22	allow
23	authorization
24	cache-control
25	content-disposition
26	content-encoding
27	content-language
28	content-length
29	content-location
30	content-range
31	content-type
32	cookie
33	date
34	etag
35	expect
36	expires
37	from
38	host
39	if-match
40	if-modified-since
41	if-none-match
42	if-range
43	if-unmodified-since
44	last-modified
45	link
46	location
47	max-forwards
48	proxy-authenticate
49	proxy-authorization
50	range
51	referer
52	refresh
53	retry-after
54	server
55	set-cookie
56	strict-transport-security
57	transfer-encoding
58	user-agent
59	vary
60	via
61	www-authenticate

从Static Table可以看出压缩的核心之一就是把http协议中常用的字符串用数字来代替，以此来减少header中的字节数。

编号的范围是1-61，很容易想到用一个8bit的byte来表示就可以了，但是实际上像1这种数字我们用1bit也可以表示，由此引入了霍夫曼编码，编码的作用就是使用频次高的数字用更少的字节来表示，这样可以使总体的字节数更少。

Dynamic Table

Dynamic Table是可以看做是一个先进先出的队列，新加入的entry的index最小，最早的entry的index最大。

Dynamic Table初始是空的。当每个header块被解压缩时，将添加条目。Dynamic Table有容量限制，当容量不足以添加新的条目时会将按照最老的条目先移除的顺序移除条目，直到容量足够添加新的条目为止，如果新添加条目比Dynamic Table最大容量还要大，那么清空Dynamic Table。

Dynamic Table是针对一个连接的，因此HTTP/2提倡使用尽可能少的连接头，这样Dynamic Table会更全，头部压缩效果更好。

Index Address Space

Static table和Dynamic Table组成了一个连续的地址空间，如下所示。

<----------  Index Address Space ---------->
<-- Static  Table -->  <-- Dynamic Table -->
+---+-----------+---+  +---+-----------+---+
| 1 |    ...    | s |  |s+1|    ...    |s+k|
+---+-----------+---+  +---+-----------+---+
                       ^                   |
                       |                   V
                Insertion Point      Dropping Point


            Index Address Space

Header Filed Representation

编码的header字段可以表示为index或者literal，index为static table或者dynamic table中的引用；header字段name可以用literal形式表示，也可以作为static table或者dynamic table中的引用。

3中不同的literal表示定义：
1. 在dynamic table开头添加header字段作为新条目的literal
2. 不将header字段添加到dynamic table
3. 不将header字段添加到dynamic table，并且规定header字段时钟使用literal presentation。

header字段name或者value可以直接或者使用Huffman对8位字节序列进行编码。

Primitive Type Representations

HPACK使用两种编码类型：无符号可变长整数和8位的字节串

1. Integer representation

整数用于表示name索引，header字段索引或字符串长度。整数表示可以在8位字节内的任何位置开始。为了优化处理，整数表示总是在8位字节的末尾结束。

整数表示分为两个部分：填充8位字节的前缀和可选的8位字节列表，如果整数值不适合该前缀，则使用这些可选的8位字节。前缀的位数（N）是整数表示的参数。

如果整数小于2^N-1，则将其编码在N位前缀中

   0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | ? | ? | ? |       Value       |
   +---+---+---+-------------------+
   
   Integer Value Encoded within the Prefix (Shown for N = 5)

否则，将前缀的所有位设置为1，并使用多个连续的8位字节列表对减少了2^N-1的值进行编码。每个8位字节列表的最高位除了最后一个是0，其它全部设置为1，剩余的7位用来表示减少后的整数。

  0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | ? | ? | ? | 1   1   1   1   1 |
   +---+---+---+-------------------+
   | 1 |    Value-(2^N-1) LSB      |
   +---+---------------------------+
                  ...
   +---+---------------------------+
   | 0 |    Value-(2^N-1) MSB      |
   +---+---------------------------+


    Integer Value Encoded after the Prefix (Shown for N = 5)

表示整数I的伪代码如下：

 if I < 2^N - 1, encode I on N bits
   else
       encode (2^N - 1) on N bits
       I = I - (2^N - 1)
       while I >= 128
            encode (I % 128 + 128) on 8 bits
            I = I / 128
       encode I on 8 bits

For example

   1. i = 10,N = 5;
   i < 31(2^5-1)


   0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | X | X | X | 0 | 1 | 0 | 1 | 0 |   10 stored on 5 bits
   +---+---+---+---+---+---+---+---+
   
   2. i = 1337,N = 5;
   i > 31(2^5-1)
   i = 1337-(2^5-1) = 1306 > 31
   encode i%128 + 128 = 154
   i = 1306/128 = 10
   
      0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | X | X | X | 1 | 1 | 1 | 1 | 1 |  Prefix = 31, I = 1306
   | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |  1306>=128, encode(154), I=1306/128
   | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |  10<128, encode(10), done
   +---+---+---+---+---+---+---+---+

从下一个N bits中解码I的伪代码：

if I < 2^N - 1, return I
   else
       M = 0
       repeat
           B = next octet
           I = I + (B & 127) * 2^M
           M = M + 7
       while B & 128 == 128
       return I

2. String Literal Representation

Header字段的name和value可以使用字符串字面量。一个字符串可以使用8位字节或使用Huffman编码将字符串编码成8位字节序列。

String Literal编码按照Appendix B中所定义Huffman编码进行编码。

由于Huffman编码的数据并不总是在8位字节的边界结束，因此在其后插入EOS(end of string)符号对应的最高有效位。

解码时，编码数据末尾的不完整数据被当做填充和丢弃。大于7 bits长度的填充被当做解码错误，与EOS 符号的代码的最高有效位不对应的填充必须被视为解码错误。包含EOS符号的霍夫曼编码的字符串字面必须被视为解码错误。

  0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | H |    String Length (7+)     |
   +---+---------------------------+
   |  String Data (Length octets)  |
   +-------------------------------+
   
   String Literal Representation

如上图，H是String Data是否使用Huffman编码的标志，如果H等于0则使用原始字符串，如果H等于1，则使用Huffman编码。

Binary Format

1. 索引Header字段表示

索引header字段表示可表示static Table或者dynamic table中的条目。

   0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 1 |        Index (7+)         |
   +---+---------------------------+
   
        Indexed Header Field

如果index为0，则解码错误。

2. 字面量header字段表示

2.1 带有增加索引的字面量header字段

  0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 |      Index (6+)       |
   +---+---+-----------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+
   
   Literal Header Field with Incremental Indexing -- Indexed
                                   Name

前两个bit为01为标识，通过index从index address space中获取到name，会将解码的结果加入到解码header列表并插入到dynamic table中。

   0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 1 |           0           |
   +---+---+-----------------------+
   | H |     Name Length (7+)      |
   +---+---------------------------+
   |  Name String (Length octets)  |
   +---+---------------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+
   
   Literal Header Field with Incremental Indexing -- New Name

如果index值为0，那么接下来的字节表示的是name的String Literal Representation。

2.2 没有索引的字面header字段

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | 0 |  Index (4+)   |
   +---+---+-----------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+
   
   Literal Header Field without Indexing -- Indexed Name

   0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | 0 |       0       |
   +---+---+-----------------------+
   | H |     Name Length (7+)      |
   +---+---------------------------+
   |  Name String (Length octets)  |
   +---+---------------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+
   
   Literal Header Field without Indexing -- New Name

前4个bit0000为标识。

同带有增加索引的字面量header字段的区别就是不会加入到dynamic table中。

2.3 从不索引的字面header字段

  0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | 1 |  Index (4+)   |
   +---+---+-----------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+
   Literal Header Field Never Indexed -- Indexed Name

  0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 0 | 1 |       0       |
   +---+---+-----------------------+
   | H |     Name Length (7+)      |
   +---+---------------------------+
   |  Name String (Length octets)  |
   +---+---------------------------+
   | H |     Value Length (7+)     |
   +---+---------------------------+
   | Value String (Length octets)  |
   +-------------------------------+


    Literal Header Field Never Indexed -- New Name

前4bit0001为标识，编码格式同没有索引的字面header字段相同。

dynmic table update

设置dynamic table max size。

  0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   | 0 | 0 | 1 |   Max size (5+)   |
   +---+---------------------------+
    Maximum Dynamic Table Size Change

前3bit001为标识。

twitter hpack

twitter hpack是twitter团队用java实现的hpack

其所有的类如下：

com/twitter/hpack/
 - Decoder
 - DynamicTable
 - Encoder
 - HeaderField
 - HeaderListener
 - HpackUtil
 - Huffman
 - HuffmanDecoder
 - HuffmanEncoder
 - StaticTable

使用流程如下:

try {
  int maxHeaderSize = 4096;
  int maxHeaderTableSize = 4096;
  byte[] name = "name".getBytes();
  byte[] value = "value".getBytes();
  boolean sensitive = false;


  ByteArrayOutputStream out = new ByteArrayOutputStream();


  // encode header list into header block
  Encoder encoder = new Encoder(maxHeaderTableSize);
  encoder.encodeHeader(out, name, value, sensitive);


  ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());


  HeaderListener listener = new HeaderListener() {
    @Override
    public void addHeader(byte[] name, byte[] value, boolean sensitive) {
      // handle header field
    }
  };


  // decode header list from header block
  Decoder decoder = new Decoder(maxHeaderSize, maxHeaderTableSize);
  decoder.decode(in, listener);
  decoder.endHeaderBlock();
} catch (IOException e) {
  // handle exception
}

encodeHeader源码解析

 /**
   * Encode the header field into the header block.
   */
  public void encodeHeader(OutputStream out, byte[] name, byte[] value, boolean sensitive) throws IOException {


    // If the header value is sensitive then it must never be indexed
    if (sensitive) { //判断是否是敏感字符，敏感字符就使用 从不索引的字面header字段
      int nameIndex = getNameIndex(name);//从Static table或者dynamic table中获取name对应的索引
      encodeLiteral(out, name, value, IndexType.NEVER, nameIndex);
      return;
    }


    // If the peer will only use the static table
    if (capacity == 0) {//只使用Static table
      int staticTableIndex = StaticTable.getIndex(name, value);
      if (staticTableIndex == -1) {//由于static table有些value预定义时为空字符，增加自定义value时会找不到对应的条目，所以要将value也添加到压缩数据中
        int nameIndex = StaticTable.getIndex(name);
        encodeLiteral(out, name, value, IndexType.NONE, nameIndex);//由于只使用static table，所以这里选择 没有索引的字面header字段
      } else {//static table中可以找到对应的name-value键值对，直接写入索引就可以
        encodeInteger(out, 0x80, 7, staticTableIndex);
      }
      return;
    }


    int headerSize = HeaderField.sizeOf(name, value);


    // If the headerSize is greater than the max table size then it must be encoded literally
    if (headerSize > capacity) {
      int nameIndex = getNameIndex(name);
      encodeLiteral(out, name, value, IndexType.NONE, nameIndex);
      return;
    }


    HeaderEntry headerField = getEntry(name, value);//从dynamic table找对应的name-value
    if (headerField != null) {//找到了就将在dynamic table的index+static table的长度写入压缩pack
      int index = getIndex(headerField.index) + StaticTable.length;
      // Section 6.1. Indexed Header Field Representation
      encodeInteger(out, 0x80, 7, index);
    } else {//dynamic table中没有找到再从static table中查找
      int staticTableIndex = StaticTable.getIndex(name, value);
      if (staticTableIndex != -1) {
        // Section 6.1. Indexed Header Field Representation
        encodeInteger(out, 0x80, 7, staticTableIndex);
      } else {
        int nameIndex = getNameIndex(name);
        if (useIndexing) {//使用 带有增加索引的字面量header字段
          ensureCapacity(headerSize); //确定dynamic table是否有足够的空间存入该header filed，不够会溢出最老的header
        }
        IndexType indexType = useIndexing ? IndexType.INCREMENTAL : IndexType.NONE;
        encodeLiteral(out, name, value, indexType, nameIndex);
        if (useIndexing) {//添加到dynamic table中
          add(name, value);
        }
      }
    }
  }
  
  /**
   * Encode literal header field according to Section 6.2.
   */
  private void encodeLiteral(OutputStream out, byte[] name, byte[] value, IndexType indexType, int nameIndex)
      throws IOException {
    int mask;
    int prefixBits;
    switch(indexType) {
    case INCREMENTAL://带有增加索引的字面量header字段
      mask = 0x40; //01 xxxxxx
      prefixBits = 6;
      break;
    case NONE://没有索引的字面header字段
      mask = 0x00; // 0000 xxxx
      prefixBits = 4;
      break;
    case NEVER://从不索引的字面header字段
      mask = 0x10; // 0001 xxxx
      prefixBits = 4;
      break;
    default:
      throw new IllegalStateException("should not reach here");
    }
    encodeInteger(out, mask, prefixBits, nameIndex == -1 ? 0 : nameIndex);//编码整数,nameIndex为整数
    if (nameIndex == -1) {//如果nameIndex == -1，即索引表中没有name，那么就使用字面量的方式将那么增加到压缩数据中
      encodeStringLiteral(out, name);
    }
    encodeStringLiteral(out, value);//编码value
  }


  private int getNameIndex(byte[] name) {
    int index = StaticTable.getIndex(name);//从Static table中获取索引
    if (index == -1) {
      index = getIndex(name); //从dynamic table中获取索引
      if (index >= 0) {
        index += StaticTable.length;//dynamic中的索引加上static table的长度
      }
    }
    return index;
  }
  
   /**
   * Encode integer according to Section 5.1.
   */
  private static void encodeInteger(OutputStream out, int mask, int n, int i) throws IOException {
    if (n < 0 || n > 8) {
      throw new IllegalArgumentException("N: " + n);
    }
    int nbits = 0xFF >>> (8 - n); //等价于 nbits = 2^n-1
    if (i < nbits) {
      out.write(mask | i);
    } else {
      out.write(mask | nbits);
      int length = i - nbits;
      while (true) {
        if ((length & ~0x7F) == 0) { //等价于 length>=128
          out.write(length);
          return;
        } else {
          out.write((length & 0x7F) | 0x80); //等价于 (I % 128 + 128)
          length >>>= 7;
        }
      }
    }
  }

decode源码解析

由于decode就是encode的逆过程，看懂了encode，看decode会轻松一些，因此以下只是部分代码解析

public void decode(InputStream in, HeaderListener headerListener) throws IOException {
    while (in.available() > 0) {
      switch(state) {
      case READ_HEADER_REPRESENTATION:
        byte b = (byte) in.read();
        if (maxDynamicTableSizeChangeRequired && (b & 0xE0) != 0x20) {//dynamic stable size update的前3bit为001，将0xE0和0x20转成2进制很容看出
          // Encoder MUST signal maximum dynamic table size change
          throw MAX_DYNAMIC_TABLE_SIZE_CHANGE_REQUIRED;
        }
        if (b < 0) {//b < 0 说明最高位为1，说明是Indexed Header Filed
          // Indexed Header Field
          index = b & 0x7F;//获取低7位的值
          if (index == 0) {
            throw ILLEGAL_INDEX_VALUE;
          } else if (index == 0x7F) {//低7位全是1，说明index > 2^7 - 1，需要从后续的流中继续解码
            state = State.READ_INDEXED_HEADER;
          } else {//index < 2^7 - 1，直接从index address space(static table + dynamic table)中查询出来
            indexHeader(index, headerListener);
          }
        } else if ((b & 0x40) == 0x40) {// 01 xxxxxx,带有增加索引的字面量header字段
          // Literal Header Field with Incremental Indexing
          indexType = IndexType.INCREMENTAL;
          index = b & 0x3F;//获取低6位的值
          if (index == 0) {//有new name
            state = State.READ_LITERAL_HEADER_NAME_LENGTH_PREFIX;
          } else if (index == 0x3F) { //低6位全是1，说明index > 2^6 - 1，需要从后续的流中继续解码
            state = State.READ_INDEXED_HEADER_NAME;
          } else {//index < 2^6 - 1，直接从index address space(static table + dynamic table)中查询出来
            // Index was stored as the prefix
            readName(index);
            state = State.READ_LITERAL_HEADER_VALUE_LENGTH_PREFIX;//接着获取value字符的长度
          }
        } else if ((b & 0x20) == 0x20) {//001 00000，更新dynamic table size
          // Dynamic Table Size Update
          index = b & 0x1F;
          if (index == 0x1F) {
            state = State.READ_MAX_DYNAMIC_TABLE_SIZE;
          } else {
            setDynamicTableSize(index);
            state = State.READ_HEADER_REPRESENTATION;
          }
        } else {//由于 从不索引的字面header字段和没有索引的字面header字段的编码规则一样，所以放到一起
          // Literal Header Field without Indexing / never Indexed
          indexType = ((b & 0x10) == 0x10) ? IndexType.NEVER : IndexType.NONE;
          index = b & 0x0F;
          if (index == 0) {
            state = State.READ_LITERAL_HEADER_NAME_LENGTH_PREFIX;
          } else if (index == 0x0F) {
            state = State.READ_INDEXED_HEADER_NAME;
          } else {
            // Index was stored as the prefix
            readName(index);
            state = State.READ_LITERAL_HEADER_VALUE_LENGTH_PREFIX;
          }
        }
        break;


      default:
        throw new IllegalStateException("should not reach here");
      }
    }
  }