首先还是先找到程序的入口点,先看看GzipSink这个类怎么用吧
private static void zipDecompression() {
String filePath = "D:/1.txt";
try {
Source source = Okio.source(new File(filePath));
GzipSource responseBody = new GzipSource(source);
BufferedSource gunzippedSource1 = Okio.buffer(responseBody);
System.out.println(gunzippedSource1.readUtf8());
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
这是将上一篇写入的Gzip数据读出来,那么读流程和写流程的顺序差不多,先把数据读到
GzipSource的链表内,然后再解压交由
BufferSource存储,好先进入
BufferSource的读方法
@Override public String readUtf8() throws IOException {
buffer.writeAll(source);
return buffer.readUtf8();
}
接下来看一 下
writeAll方法,解析也是从这个方法进行跳转的,而
buffer.readUtf8();的作用就是将链表中的数据转换为字符串,第一篇有详细介绍。
public long writeAll(Source source) throws IOException {
if (source == null) throw new IllegalArgumentException("source == null");
long totalBytesRead = 0;
for (long readCount; (readCount = source.read(this, Segment.SIZE)) != -1; ) {
totalBytesRead += readCount;
}
return totalBytesRead;
}
@Override public long read(Buffer sink, long byteCount) throws IOException {
if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
if (byteCount == 0) return 0;
// If we haven't consumed the header, we must consume it before anything else.
if (section == SECTION_HEADER) {
//对头文件进行验证
consumeHeader();
section = SECTION_BODY;
}
// Attempt to read at least a byte of the body. If we do, we're done.
if (section == SECTION_BODY) {
long offset = sink.size;
//开始解压数据
long result = inflaterSource.read(sink, byteCount);
if (result != -1) {
updateCrc(sink, offset, result);
return result;
}
section = SECTION_TRAILER;
}
// The body is exhausted; time to read the trailer. We always consume the
// trailer before returning a -1 exhausted result; that way if you read to
// the end of a GzipSource you guarantee that the CRC has been checked.
if (section == SECTION_TRAILER) {
consumeTrailer();
section = SECTION_DONE;
// Gzip streams self-terminate: they return -1 before their underlying
// source returns -1. Here we attempt to force the underlying stream to
// return -1 which may trigger it to release its resources. If it doesn't
// return -1, then our Gzip data finished prematurely!
if (!source.exhausted()) {
throw new IOException("gzip finished without exhausting source");
}
}
return -1;
}
/**
* 先解析压缩后的头信息
* @throws IOException
*/
private void consumeHeader() throws IOException {
// Read the 10-byte header. We peek at the flags byte first so we know if we
// need to CRC the entire header. Then we read the magic ID1ID2 sequence.
// We can skip everything else in the first 10 bytes.
// +---+---+---+---+---+---+---+---+---+---+
// |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
// +---+---+---+---+---+---+---+---+---+---+
//是否能读出十字节进链表中,如果文件没有足够字节可供读,则抛出异常
source.require(10);
//获得第三个字节
/**
* 写入头的顺序
* /**
* 写入固定头
* Buffer buffer = this.sink.buffer();
buffer.writeShort(0x1f8b); // Two-byte Gzip ID.
buffer.writeByte(0x08); // 8 == Deflate compression method.
buffer.writeByte(0x00); // No flags.
buffer.writeInt(0x00); // No modification time.
buffer.writeByte(0x00); // No extra flags.
buffer.writeByte(0x00); // No OS.
*/
//第四个字节是0x00
byte flags = source.buffer().getByte(3);
//右移一位为00000000&00000001得0,则fhcrc为false
boolean fhcrc = ((flags >> FHCRC) & 1) == 1;
if (fhcrc) updateCrc(source.buffer(), 0, 10);
//读取两个字节,id1id2=0x1f8b
short id1id2 = source.readShort();
//检查是否相等
checkEqual("ID1ID2", (short) 0x1f8b, id1id2);
/**
* 由于默认写入的头都是无值的不满足下面的判断所以不执行
* buffer.writeByte(0x00); // No flags.
buffer.writeInt(0x00); // No modification time.
buffer.writeByte(0x00); // No extra flags.
buffer.writeByte(0x00); // No OS.
*/
source.skip(8);
// Skip optional extra fields.
// +---+---+=================================+
// | XLEN |...XLEN bytes of "extra field"...| (more-->)
// +---+---+=================================+
if (((flags >> FEXTRA) & 1) == 1) {
source.require(2);
if (fhcrc) updateCrc(source.buffer(), 0, 2);
int xlen = source.buffer().readShortLe();
source.require(xlen);
if (fhcrc) updateCrc(source.buffer(), 0, xlen);
source.skip(xlen);
}
// Skip an optional 0-terminated name.
// +=========================================+
// |...original file name, zero-terminated...| (more-->)
// +=========================================+
if (((flags >> FNAME) & 1) == 1) {
long index = source.indexOf((byte) 0);
if (index == -1) throw new EOFException();
if (fhcrc) updateCrc(source.buffer(), 0, index + 1);
source.skip(index + 1);
}
// Skip an optional 0-terminated comment.
// +===================================+
// |...file comment, zero-terminated...| (more-->)
// +===================================+
if (((flags >> FCOMMENT) & 1) == 1) {
long index = source.indexOf((byte) 0);
if (index == -1) throw new EOFException();
if (fhcrc) updateCrc(source.buffer(), 0, index + 1);
source.skip(index + 1);
}
// Confirm the optional header CRC.
// +---+---+
// | CRC16 |
// +---+---+
if (fhcrc) {
checkEqual("FHCRC", source.readShortLe(), (short) crc.getValue());
crc.reset();
}
}
这个方法的核心就是对数据头进行判断,比如两个字节标记它是gzip压缩后的数据,验证是否是默认值,由于其他属性都填入的都是0x00,也就是没有值,下面的一些列验证将不走,如果用户想扩展,可添加自己定义头进行判断。头信息读取完两个字节后,跳过8个字节则朝后读取的将是数据信息和尾信息。接下来开始解压
if (section == SECTION_BODY) {
long offset = sink.size;
//开始解压数据
long result = inflaterSource.read(sink, byteCount);
if (result != -1) {
updateCrc(sink, offset, result);
return result;
}
section = SECTION_TRAILER;
}
InflaterSource真正实现解压的类
public long read(
Buffer sink, long byteCount) throws IOException {
if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
if (closed) throw new IllegalStateException("closed");
if (byteCount == 0) return 0;
//开始循环读取
while (true) {
//填充要解压的数据
boolean sourceExhausted = refill();
// Decompress the inflater's compressed data into the sink.
try {
Segment tail = sink.writableSegment(1);
//开始解压数据,将解压数据存放到GzipSource的链表上
int bytesInflated = inflater.inflate(tail.data, tail.limit, Segment.SIZE - tail.limit);
//bytesInflated返回解压的数据
if (bytesInflated > 0) {
tail.limit += bytesInflated;
sink.size += bytesInflated;
return bytesInflated;
}
//解压完成了或者任然还有要解压的数据
if (inflater.finished() || inflater.needsDictionary()) {
//重新定位解压完链表的的位置
releaseInflatedBytes();
//回收Segment
if (tail.pos == tail.limit) {
// We allocated a tail segment, but didn't end up needing it. Recycle!
sink.head = tail.pop();
SegmentPool.recycle(tail);
}
return -1;
}
/**
* 假如当文件末尾上面都不满足的话抛异常
*/
if (sourceExhausted) throw new EOFException("source exhausted prematurely");
} catch (DataFormatException e) {
throw new IOException(e);
}
}
}
解压的时候首先向解压器不断的填充数据,如果文件中还有数据的话,然后调用
inflater.inflate(tail.data, tail.limit, Segment.SIZE - tail.limit),方法将解压的数据填入当前的链表中
refill();看一下怎么填充的
/**
* 向解析器填加压缩的数据,如果没有的话
* @return
* @throws IOException
*/
public boolean refill() throws IOException {
//判断解析器需不需要添加数据
if (!inflater.needsInput()) return false;
//跳过解压完的数据,将没解压完的重新放入进行解压
releaseInflatedBytes();
if (inflater.getRemaining() != 0) throw new IllegalStateException("?"); // TODO: possible?
//已经读到文件末尾时,返回true
if (source.exhausted()) return true;
// Assign buffer bytes to the inflater.
Segment head = source.buffer().head;
//记录当前需要解压多少数据
bufferBytesHeldByInflater = head.limit - head.pos;
//填入需要解压的数据
inflater.setInput(head.data, head.pos, bufferBytesHeldByInflater);
return false;
}
再看一下releaseInflatedBytes方法是干啥的
private void releaseInflatedBytes() throws IOException {
if (bufferBytesHeldByInflater == 0) return;
//inflater.getRemaining()缓冲区还有多少字节没有解压
int toRelease = bufferBytesHeldByInflater - inflater.getRemaining();
bufferBytesHeldByInflater -= toRelease;
//跳过已经解压的数据
source.skip(toRelease);
}
计算解压器里面还剩余多少数据没有解压,记录一下,让链表跳过已经解压完的数据。
通过这几个方法已经能看出来怎么实现解压缩的了。
1、首先读取默认设置的文件头信息做验证,验证通过进行第二步
2、跳过所有的头字节,将数据循环读取压缩数据到Inflater中,每次最多解压一个Segment 的数量级
3、解压数据最后放到BufferedSource的子类的链表中
4、最后的尾字节由于没有压缩而存入文件中,所以,解压结束时,尾字节会失败,而此时GzipSource持有的链表数据的position会记录到最后一个点
解压完以后,最后执行
private void consumeTrailer() throws IOException {
// Read the eight-byte trailer. Confirm the body's CRC and size.
// +---+---+---+---+---+---+---+---+
// | CRC32 | ISIZE |
// +---+---+---+---+---+---+---+---+
checkEqual("CRC", source.readIntLe(), (int) crc.getValue());
checkEqual("ISIZE", source.readIntLe(), (int) inflater.getBytesWritten());
}
这个方法就是验证尾字节的合法性,至此文件的内容全部读入链表中,然后将链表中的字节转化为String就ok了。到此
GzipSink解析完毕。