Okio读写流源码详解(第四篇(GzipSink压缩源码详解))

首先还是先找到程序的入口点,先看看GzipSink这个类怎么用吧

private static void zipDecompression() {
		String filePath = "D:/1.txt";

			
			try {
				Source source = Okio.source(new File(filePath));
				GzipSource responseBody = new GzipSource(source);
				BufferedSource gunzippedSource1 = Okio.buffer(responseBody);
			   System.out.println(gunzippedSource1.readUtf8());
			} catch (Exception e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
			
			
		
	}
这是将上一篇写入的Gzip数据读出来,那么读流程和写流程的顺序差不多,先把数据读到 GzipSource的链表内,然后再解压交由 BufferSource存储,好先进入 BufferSource的读方法

 @Override public String readUtf8() throws IOException {
    buffer.writeAll(source);
    return buffer.readUtf8();
  }
接下来看一 下 writeAll方法,解析也是从这个方法进行跳转的,而 buffer.readUtf8();的作用就是将链表中的数据转换为字符串,第一篇有详细介绍。


 public long writeAll(Source source) throws IOException {
    if (source == null) throw new IllegalArgumentException("source == null");
    long totalBytesRead = 0;
    for (long readCount; (readCount = source.read(this, Segment.SIZE)) != -1; ) {
      totalBytesRead += readCount;
    }
    return totalBytesRead;
  }


这个方法的作用就是将文件中所有的数据读出放入链表之中,其中 source.read就进入了GzipSink的解析方法了,此处的source就是GzipSink的对象

@Override public long read(Buffer sink, long byteCount) throws IOException {
    if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
    if (byteCount == 0) return 0;

    // If we haven't consumed the header, we must consume it before anything else.
    if (section == SECTION_HEADER) {
    //对头文件进行验证
      consumeHeader();
      section = SECTION_BODY;
    }

    // Attempt to read at least a byte of the body. If we do, we're done.
    if (section == SECTION_BODY) {
      long offset = sink.size;
      //开始解压数据
      long result = inflaterSource.read(sink, byteCount);
      if (result != -1) {
        updateCrc(sink, offset, result);
        return result;
      }
      section = SECTION_TRAILER;
    }

    // The body is exhausted; time to read the trailer. We always consume the
    // trailer before returning a -1 exhausted result; that way if you read to
    // the end of a GzipSource you guarantee that the CRC has been checked.
    if (section == SECTION_TRAILER) {
      consumeTrailer();
      section = SECTION_DONE;

      // Gzip streams self-terminate: they return -1 before their underlying
      // source returns -1. Here we attempt to force the underlying stream to
      // return -1 which may trigger it to release its resources. If it doesn't
      // return -1, then our Gzip data finished prematurely!
      if (!source.exhausted()) {
        throw new IOException("gzip finished without exhausting source");
      }
    }

    return -1;
  }

第一次进来的时候 section == SECTION_HEADER所以先进行头信息的验证,那么此处的头信息是什么?上一篇中写的时候写入了固定的头信息,那么

/**
 * 先解析压缩后的头信息
 * @throws IOException
 */
  private void consumeHeader() throws IOException {
    // Read the 10-byte header. We peek at the flags byte first so we know if we
    // need to CRC the entire header. Then we read the magic ID1ID2 sequence.
    // We can skip everything else in the first 10 bytes.
    // +---+---+---+---+---+---+---+---+---+---+
    // |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
    // +---+---+---+---+---+---+---+---+---+---+
	//是否能读出十字节进链表中,如果文件没有足够字节可供读,则抛出异常
    source.require(10);
    //获得第三个字节
    /**
     * 写入头的顺序
     * /**
   * 写入固定头
 *  Buffer buffer = this.sink.buffer();
    buffer.writeShort(0x1f8b); // Two-byte Gzip ID.
    buffer.writeByte(0x08); // 8 == Deflate compression method.
    buffer.writeByte(0x00); // No flags.
    buffer.writeInt(0x00); // No modification time.
    buffer.writeByte(0x00); // No extra flags.
    buffer.writeByte(0x00); // No OS.
 */
    //第四个字节是0x00
    byte flags = source.buffer().getByte(3);
    //右移一位为00000000&00000001得0,则fhcrc为false
    boolean fhcrc = ((flags >> FHCRC) & 1) == 1;
    if (fhcrc) updateCrc(source.buffer(), 0, 10);
    //读取两个字节,id1id2=0x1f8b
    short id1id2 = source.readShort();
    //检查是否相等
    checkEqual("ID1ID2", (short) 0x1f8b, id1id2);
    /**
     * 由于默认写入的头都是无值的不满足下面的判断所以不执行
     *  buffer.writeByte(0x00); // No flags.
    buffer.writeInt(0x00); // No modification time.
    buffer.writeByte(0x00); // No extra flags.
    buffer.writeByte(0x00); // No OS.
     */
    source.skip(8);

    // Skip optional extra fields.
    // +---+---+=================================+
    // | XLEN  |...XLEN bytes of "extra field"...| (more-->)
    // +---+---+=================================+
    if (((flags >> FEXTRA) & 1) == 1) {
      source.require(2);
      if (fhcrc) updateCrc(source.buffer(), 0, 2);
      int xlen = source.buffer().readShortLe();
      source.require(xlen);
      if (fhcrc) updateCrc(source.buffer(), 0, xlen);
      source.skip(xlen);
    }

    // Skip an optional 0-terminated name.
    // +=========================================+
    // |...original file name, zero-terminated...| (more-->)
    // +=========================================+
    if (((flags >> FNAME) & 1) == 1) {
      long index = source.indexOf((byte) 0);
      if (index == -1) throw new EOFException();
      if (fhcrc) updateCrc(source.buffer(), 0, index + 1);
      source.skip(index + 1);
    }

    // Skip an optional 0-terminated comment.
    // +===================================+
    // |...file comment, zero-terminated...| (more-->)
    // +===================================+
    if (((flags >> FCOMMENT) & 1) == 1) {
      long index = source.indexOf((byte) 0);
      if (index == -1) throw new EOFException();
      if (fhcrc) updateCrc(source.buffer(), 0, index + 1);
      source.skip(index + 1);
    }

    // Confirm the optional header CRC.
    // +---+---+
    // | CRC16 |
    // +---+---+
    if (fhcrc) {
      checkEqual("FHCRC", source.readShortLe(), (short) crc.getValue());
      crc.reset();
      
    }
  }
这个方法的核心就是对数据头进行判断,比如两个字节标记它是gzip压缩后的数据,验证是否是默认值,由于其他属性都填入的都是0x00,也就是没有值,下面的一些列验证将不走,如果用户想扩展,可添加自己定义头进行判断。头信息读取完两个字节后,跳过8个字节则朝后读取的将是数据信息和尾信息。接下来开始解压

if (section == SECTION_BODY) {
      long offset = sink.size;
      //开始解压数据
      long result = inflaterSource.read(sink, byteCount);
      if (result != -1) {
        updateCrc(sink, offset, result);
        return result;
      }
      section = SECTION_TRAILER;
    }
InflaterSource真正实现解压的类

public long read(
      Buffer sink, long byteCount) throws IOException {
    if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
    if (closed) throw new IllegalStateException("closed");
    if (byteCount == 0) return 0;
    //开始循环读取
    while (true) {
    //填充要解压的数据
      boolean sourceExhausted = refill();
      // Decompress the inflater's compressed data into the sink.
      try {
        Segment tail = sink.writableSegment(1);
        //开始解压数据,将解压数据存放到GzipSource的链表上
        int bytesInflated = inflater.inflate(tail.data, tail.limit, Segment.SIZE - tail.limit);
        //bytesInflated返回解压的数据
        if (bytesInflated > 0) {
          tail.limit += bytesInflated;
          sink.size += bytesInflated;
          return bytesInflated;
        }
        //解压完成了或者任然还有要解压的数据
        if (inflater.finished() || inflater.needsDictionary()) {
        	//重新定位解压完链表的的位置
          releaseInflatedBytes();
          //回收Segment
          if (tail.pos == tail.limit) {
            // We allocated a tail segment, but didn't end up needing it. Recycle!
            sink.head = tail.pop();
            SegmentPool.recycle(tail);
          }
          return -1;
        }
        /**
         * 假如当文件末尾上面都不满足的话抛异常
         */
        if (sourceExhausted) throw new EOFException("source exhausted prematurely");
      } catch (DataFormatException e) {
        throw new IOException(e);
      }
    }
  }
解压的时候首先向解压器不断的填充数据,如果文件中还有数据的话,然后调用 inflater.inflate(tail.data, tail.limit, Segment.SIZE - tail.limit),方法将解压的数据填入当前的链表中

refill();看一下怎么填充的

/**
  * 向解析器填加压缩的数据,如果没有的话
  * @return
  * @throws IOException
  */
  public boolean refill() throws IOException {
   //判断解析器需不需要添加数据
    if (!inflater.needsInput()) return false;
    //跳过解压完的数据,将没解压完的重新放入进行解压
    releaseInflatedBytes();
    if (inflater.getRemaining() != 0) throw new IllegalStateException("?"); // TODO: possible?

    //已经读到文件末尾时,返回true
    if (source.exhausted()) return true;

    // Assign buffer bytes to the inflater.
    Segment head = source.buffer().head;
   //记录当前需要解压多少数据
    bufferBytesHeldByInflater = head.limit - head.pos;
    //填入需要解压的数据
    inflater.setInput(head.data, head.pos, bufferBytesHeldByInflater);
    return false;
  }

再看一下releaseInflatedBytes方法是干啥的
  private void releaseInflatedBytes() throws IOException {
    if (bufferBytesHeldByInflater == 0) return;
    //inflater.getRemaining()缓冲区还有多少字节没有解压
    int toRelease = bufferBytesHeldByInflater - inflater.getRemaining();
    bufferBytesHeldByInflater -= toRelease;
    //跳过已经解压的数据
    source.skip(toRelease);
  }

计算解压器里面还剩余多少数据没有解压,记录一下,让链表跳过已经解压完的数据。

通过这几个方法已经能看出来怎么实现解压缩的了。

1、首先读取默认设置的文件头信息做验证,验证通过进行第二步

2、跳过所有的头字节,将数据循环读取压缩数据到Inflater中,每次最多解压一个Segment 的数量级

3、解压数据最后放到BufferedSource的子类的链表中

4、最后的尾字节由于没有压缩而存入文件中,所以,解压结束时,尾字节会失败,而此时GzipSource持有的链表数据的position会记录到最后一个点

解压完以后,最后执行

 private void consumeTrailer() throws IOException {
    // Read the eight-byte trailer. Confirm the body's CRC and size.
    // +---+---+---+---+---+---+---+---+
    // |     CRC32     |     ISIZE     |
    // +---+---+---+---+---+---+---+---+
    checkEqual("CRC", source.readIntLe(), (int) crc.getValue());
    checkEqual("ISIZE", source.readIntLe(), (int) inflater.getBytesWritten());
  }
这个方法就是验证尾字节的合法性,至此文件的内容全部读入链表中,然后将链表中的字节转化为String就ok了。到此 GzipSink解析完毕。






你可能感兴趣的:(java/io流)