Okio 源码解析（一）：数据读取流程

简介

Okio 是 square 开发的一个 Java I/O 库，并且也是 OkHttp 内部使用的一个组件。Okio 封装了 java.io 和 java.nio，并且有多个优点：

提供超时机制
不需要人工区分字节流与字符流，易于使用
易于测试

本文先介绍 Okio 的基本用法，然后分析源码中数据读取的流程。

基本用法

Okio 的用法很简单，下面是读取和写入的示例：

// 读取
InputStream inputStream = ...
BufferedSource bufferedSource = Okio.buffer(Okio.source(inputStream));
String line = bufferedSource.readUtf8();

// 写入
OutputStream outputStream = ...
BufferedSink bufferedSink = Okio.buffer(Okio.sink(outputStream));
bufferedSink.writeString("test", Charset.defaultCharset());
bufferedSink.close();

Okio 用 Okio.source 封装 InputStream，用 Okio.sink 封装 OutputStream。然后统一交给 Okio.buffer 分别获得 BufferedSource 和 BufferedSink，这两个类提供了大量的读写数据的方法。BufferedSource 中包含的部分接口如下：

int readInt() throws IOException;
long readLong() throws IOException;
byte readByte() throws IOException;
ByteString readByteString() throws IOException;
String readUtf8() throws IOException;
String readString(Charset charset) throws IOException;

其中既包含了读取字节流，也包含读取字符流的方法，BufferedSink 则提供了对应的写入数据的方法。

基本框架

Okio 中有4个接口，分别是 Source、Sink、 BufferedSource 和 BufferedSink。Source 和 Sink 分别用于提供字节流和接收字节流，对应于 Inpustream 和 OutputStream。BufferedSource 和 BufferedSink 则是保存了相应的缓存数据用于高效读写。这几个接口的继承关系如下：

okio源码

从上图可以看出，Source 和 Sink 提供基本的 read 和 write 方法，而 BufferedSource 和 BufferedSink 则提供了更多的操作数据的方法，但这些都是接口，真正实现的类是 RealBufferedSource 和 RealBufferedSink。

另外还有个类是 Buffer, 它同时实现了 BufferedSource 和 BufferedSink，并且 RealBufferedSource 和 RealbufferedSink 都包含一个 Buffer 对象，真正的数据读取操作都是交给 Buffer 完成的。

由于 read 和 write 操作类似，下面以 read 的流程对代码进行分析。

Okio.source

Okio.source 有几个重载的方法，用于封装输入流，最终调用的代码如下：

private static Source source(final InputStream in, final Timeout timeout) {
    if (in == null) throw new IllegalArgumentException("in == null");
    if (timeout == null) throw new IllegalArgumentException("timeout == null");

    return new Source() {
      @Override public long read(Buffer sink, long byteCount) throws IOException {
        if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
        if (byteCount == 0) return 0;
        try {
          timeout.throwIfReached();
          Segment tail = sink.writableSegment(1);
          int maxToCopy = (int) Math.min(byteCount, Segment.SIZE - tail.limit);
          int bytesRead = in.read(tail.data, tail.limit, maxToCopy);
          if (bytesRead == -1) return -1;
          tail.limit += bytesRead;
          sink.size += bytesRead;
          return bytesRead;
        } catch (AssertionError e) {
          if (isAndroidGetsocknameError(e)) throw new IOException(e);
          throw e;
        }
      }

      @Override public void close() throws IOException {
        in.close();
      }

      @Override public Timeout timeout() {
        return timeout;
      }

      @Override public String toString() {
        return "source(" + in + ")";
      }
    };
  }

从上面代码可以看出，Okio.source 接受两个参数，一个是 InputStream，另一个是 Timeout，返回了一个匿名的 Source 的实现类。这里主要看一下 read 方法，首先是参数为空的判断，然后是从 in 中读取数据到类型为 Buffer 的 sink 中，这段代码中涉及到 Buffer 以及 Segment，下面先看看这两个东西。

Segment

在 Okio 中，每个 Segment 代表一段数据，多个 Segment 串成一个循环双向链表。下面是 Segment 的成员变量和构造方法：

final class Segment {
  // segment数据的字节数
  static final int SIZE = 8192;
  // 共享的Segment的最低的数据大小
  static final int SHARE_MINIMUM = 1024;
  // 实际保存的数据
  final byte[] data;
  // 下一个可读的位置
  int pos;
  // 下一个可写的位置
  int limit;
  // 保存的数据是否是共享的
  boolean shared;
  // 保存的数据是否是独占的
  boolean owner;
  // 链表中下一个节点
  Segment next;
  // 链表中上一个节点
  Segment prev;

  Segment() {
    this.data = new byte[SIZE];
    this.owner = true;
    this.shared = false;
  }

  Segment(Segment shareFrom) {
    this(shareFrom.data, shareFrom.pos, shareFrom.limit);
    shareFrom.shared = true;
  }

  Segment(byte[] data, int pos, int limit) {
    this.data = data;
    this.pos = pos;
    this.limit = limit;
    this.owner = false;
    this.shared = true;
  }
  ...
}

变量的含义已经写在了注释中，可以看出 Segment 中的数据保存在一个字节数组中，并提供了一些变量标识读与写的位置。Segment 既然是链表中的节点，下面看一下插入与删除的方法：

// 在当前Segment后面插入一个Segment
public Segment push(Segment segment) {
    segment.prev = this;
    segment.next = next;
    next.prev = segment;
    next = segment;
    return segment;
  }
// 从链表中删除当前Segment，并返回其后继节点
public @Nullable Segment pop() {
    Segment result = next != this ? next : null;
    prev.next = next;
    next.prev = prev;
    next = null;
    prev = null;
    return result;
  }

插入与删除的代码其实就是数据结构中链表的操作。

Buffer

下面看看 Buffer 是如何使用 Segment 的。Buffer 中有两个重要变量：

@Nullable Segment head;
long size;

一个是 head，表示这个 Buffer 保存的 Segment 链表的头结点。还有一个 size，用于记录 Buffer 当前的字节数。

在上面 Okio.source 中生成的匿名的 Source 的 read 方法中，要读取数据到 Buffer 中，首次是调用了 writableSegment，这个方法是获取一个可写的 Segment，代码如下所示：

  Segment writableSegment(int minimumCapacity) {
    if (minimumCapacity < 1 || minimumCapacity > Segment.SIZE) throw new IllegalArgumentException();

    if (head == null) {
      head = SegmentPool.take(); // Acquire a first segment.
      return head.next = head.prev = head;
    }

    Segment tail = head.prev;
    if (tail.limit + minimumCapacity > Segment.SIZE || !tail.owner) {
      tail = tail.push(SegmentPool.take()); // Append a new empty segment to fill up.
    }
    return tail;
  }

获取 Segment 的逻辑是先判断 Buffer 是否有了 Segment 节点，没有就先去 SegmentPool 中取一个，并且将首尾相连，形成循环链表。如果已经有了，找到末尾的 Segment，判断其剩余空间是否满足，不满足就再从 SegmentPool 中获取一个新的 Segment 添加到末尾。最后，返回末尾的 Segment 用于写入。

SegmentPool 用于保存废弃的 Segment，其中有两个方法，take 从中获取，recycle 用于回收。

上面 Okio.buffer(Okio.source(in)) 最终得到的是 RealBufferedSource，这个类中持有一个 Buffer 对象和一个 Source 对象，真正的读取操作由这两个对象合作完成。下面是 readString 的代码：

@Override public String readString(long byteCount, Charset charset) throws IOException {
    require(byteCount);
    if (charset == null) throw new IllegalArgumentException("charset == null");
    return buffer.readString(byteCount, charset);
  }
@Override public void require(long byteCount) throws IOException {
    if (!request(byteCount)) throw new EOFException();
  }
// 从source中读取数据到buffer中
@Override public boolean request(long byteCount) throws IOException {
    if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
    if (closed) throw new IllegalStateException("closed");
    while (buffer.size < byteCount) {
      if (source.read(buffer, Segment.SIZE) == -1) return false;
    }
    return true;
  }

首先是从 Source 中读取数据到 Buffer 中，然后调用 buffer.readstring 方法得到最终的字符串。下面是 readString 的代码：

@Override public String readString(long byteCount, Charset charset) throws EOFException {
    checkOffsetAndCount(size, 0, byteCount);
    if (charset == null) throw new IllegalArgumentException("charset == null");
    if (byteCount > Integer.MAX_VALUE) {
      throw new IllegalArgumentException("byteCount > Integer.MAX_VALUE: " + byteCount);
    }
    if (byteCount == 0) return "";

    Segment s = head;
    if (s.pos + byteCount > s.limit) {
      // 如果string跨多个Segment，委托给readByteArray去读
      return new String(readByteArray(byteCount), charset);
    }
    // 将字节序列转换成String
    String result = new String(s.data, s.pos, (int) byteCount, charset);
    s.pos += byteCount;
    size -= byteCount;

    // 如果pos==limit，回收这个Segment
    if (s.pos == s.limit) {
      head = s.pop();
      SegmentPool.recycle(s);
    }

    return result;
}

在上面的代码中，便是从 Buffer 的 Segment 链表中读取数据。如果 String 跨多个 Segment，那么调用 readByteArray 循环读取字节序列。最终将字节序列转换为 String 对象。如果 Segment 的 pos 等于 limit，说明这个 Segment 的数据已经全部读取完毕，可以回收，放入 SegmentPool。

Okio 读取数据的时候统一将输入流看成是字节序列，读入 Buffer 后在用到的时候再转换，例如上面读取 String 时将字节序列进行了转换。其它还有很多类型，如下面是 readInt 的代码：

@Override public int readInt() {
    if (size < 4) throw new IllegalStateException("size < 4: " + size);

    Segment segment = head;
    int pos = segment.pos;
    int limit = segment.limit;

    // If the int is split across multiple segments, delegate to readByte().
    if (limit - pos < 4) {
      return (readByte() & 0xff) << 24
          |  (readByte() & 0xff) << 16
          |  (readByte() & 0xff) <<  8
          |  (readByte() & 0xff);
    }

    byte[] data = segment.data;
    int i = (data[pos++] & 0xff) << 24
        |   (data[pos++] & 0xff) << 16
        |   (data[pos++] & 0xff) <<  8
        |   (data[pos++] & 0xff);
    size -= 4;

    if (pos == limit) {
      head = segment.pop();
      SegmentPool.recycle(segment);
    } else {
      segment.pos = pos;
    }

    return i;
}

Buffer 使用 Segment 链表保存数据，有个好处是在不同的 Buffer 之间移动数据只需要转移其字节序列的拥有权，如 copyTo(Buffer out, long offset, long byteCount) 代码所示：

public Buffer copyTo(Buffer out, long offset, long byteCount) {
    if (out == null) throw new IllegalArgumentException("out == null");
    checkOffsetAndCount(size, offset, byteCount);
    if (byteCount == 0) return this;

    out.size += byteCount;

    // Skip segments that we aren't copying from.
    Segment s = head;
    for (; offset >= (s.limit - s.pos); s = s.next) {
      offset -= (s.limit - s.pos);
    }

    // Copy one segment at a time.
    for (; byteCount > 0; s = s.next) {
      Segment copy = new Segment(s);
      copy.pos += offset;
      copy.limit = Math.min(copy.pos + (int) byteCount, copy.limit);
      if (out.head == null) {
        out.head = copy.next = copy.prev = copy;
      } else {
        out.head.prev.push(copy);
      }
      byteCount -= copy.limit - copy.pos;
      offset = 0;
    }

    return this;
}

其中并没有拷贝字节数据，只是链表的相关操作。

总结

Okio 读取数据的流程基本就如本文所分析的，写入操作与读取是类似的。Okio 通过 Source 与 Sink 标识输入流与输出流。在 Buffer 中使用 Segment 链表的方式保存字节数据，并且通过 Segment 拥有权的共享避免了数据的拷贝，通过 SegmentPool 避免了废弃数据的GC，使得 Okio 成为一个高效的 I/O 库。Okio 还有一个优点是超时机制，具体内容可进入下一篇：Okio 源码解析（二）：超时机制