Google Archive Patch 源码解析

如果你觉得本篇文章太长,可以直接看我总结的结论:

Google Archive Patch是严格的基于Zip文件格式的差量算法,其核心生成差量的算法还是BsDiff,核心合成文件的算法还是BsPatch,只是它将旧Zip文件和新Zip文件里的内容解压出来分别转为了差量友好的一个文件,使用差量算法生成差量文件;合成时,将旧Zip文件里的内容解压出来转为差量友好的一个文件,应用合成算法,生成新文件的差量友好的一个文件,再利用patch文件中每个ZipEntry的偏移和长度,以及压缩等级,编码策略,nowrap等标记,将其恢复为Zip文件。之所以使用差量友好的文件是因为一个文件,如果未压缩,那么可以很简单的描述其变化,如字符串”abc”,变为了”abcd”,我们可以直观的描述其变化,增加了一个字符”d”;但是如果字符串经过了压缩,那么这个变化不再可以这么容易的被描述。因此需要将压缩后的文件转为未压缩的文件,生成差量友好的文件。

生成慢:Google Archive Patch之所以生成Patch的时间变长了,是因为Zip文件解压出来后,生成的差量友好的文件变大了,因此使用BsDiff时,耗费的时间变长了。比如解压出来后变大了2倍,则时间消耗变为了原来整个文件的生成差量的时间的2倍。

合成慢:合成的时间变长了,一方面的消耗也是因为生成差量友好的文件变大了,但是这不是本质原因,BsPatch合成是极快的,就算double倍时间,这点时间也是可以忽略不计的。其耗费时间的根本问题还在于重新生成zip文件时需要对流做各种判断操作,一部分数据需要压缩,一部分数据需要拷贝,基本上大部分的耗时操作都花在了数据的压缩操作上。

生成文件小:小的原因上面已经解释过了,是因为基于差量友好的文件生成差量文件的,文件间的变换变得很容易描述。

项目地址

Google Archive Patch

主要看三个模块,一个是shared,一个是generator,另一个是applier。shared为另外两个的公共模块,generator为差量生成模块,applier为差量应用模块,其中generator中实现了一份java版的bsdiff算法,applier中实现了一份java版的bspatch算法。

Zip文件格式

Google Archive Patch是严格基于Zip文件格式的差量算法,因此有必要了解一下Zip文件格式。参考了网上的几篇文章,发现其介绍文件格式的时候犯了一个小问题,他们都是正序的介绍其构成,但是其实应该倒过来,这样更加便于理解。

一个Zip文件一般有三段构成

zip-format.png

我们一一来解释这三段,首先看最后一段

End of Central Directory
Offset Bytes Description 备注
0 4 End of Central Directory SIGNATURE = 0x06054b50 区块头部标记,固定值0x06054b50
4 2 disk number for this archive 忽略
6 2 disk number for the central directory 忽略
8 2 num entries in the central directory on this disk 忽略
10 2 num entries in the central directory overall 核心目录结构总数
12 4 the length of the central directory 核心目录的大小
16 4 the file offset of the central directory 核心目录的偏移
20 2 the length of the zip file comment 注释长度
22 n from here to the EOF is the zip file comment 注释内容

该段由一个表格所示的结构构成。这段的作用就是为了找出Central Directory的位置。

Central Directory

由End of Central Directory可以索引出Central Directory,看看其构成。

Offset Bytes Description 备注
0 4 Central Directory SIGNATURE = 0x02014b50 区块头部标记,固定值0x02014b50
4 2 the version-made-by 忽略
6 2 the version-needed-to-extract 忽略
8 2 the general-purpose flags, read for language encoding 通用位标记
10 2 the compression method 压缩方法
12 2 the MSDOS last modified file time 文件最后修改时间
14 2 the MSDOS last modified file date 文件最后修改日期
16 4 the CRC32 of the uncompressed data crc32校验码
20 4 the compressed size 压缩后的大小
24 4 the uncompressed size 未压缩的大小
28 2 the length of the file name 文件名长度
30 2 the length of the extras 扩展域长度
32 2 the length of the comment 文件注释长度
34 2 the disk number 忽略
36 2 the internal file attributes 忽略
38 4 the external file attributes 忽略
42 4 the offset of the local section entry, where the data is local entry所在偏移
46 i the file name 文件名
46+i j the extras 扩展域
46+i+j k the comment 文件注释

该段由n个表格表示的结构构成。这段的作用就是为了找出Zip文件真实数据所在的位置。

Contents of ZIP entries

由Central Directory段可以索引出Local Entry段,最后看一下Local Entry段

Offset Bytes Description 备注
0 4 Local Entry SIGNATURE = 0x04034b50 区块头部标记,固定值0x04034b50
4 2 the version-needed-to-extract 忽略
6 2 the general-purpose flags 通用位标记
8 2 the compression method 压缩方法
10 2 the MSDOS last modified file time 文件最后修改时间
12 2 the MSDOS last modified file date 文件最后修改日期
14 4 the CRC32 of the uncompressed data crc32校验码
18 4 the compressed size 压缩后的大小
22 4 the uncompressed size 未压缩的大小
26 2 the length of the file name 文件名长度
28 2 the length of the extras 扩展域长度
30 i the file name 文件名
30+i j the extras 扩展区
30+i+j k file data 真实压缩数据所在位置

该段由n个表格表示的结构构成。

Google Archive Patch解析Zip文件代码

Google Archive Patch内部实现了一个解析Zip文件的mini结构,解析的工作主要由com.google.archivepatcher.generator.MinimalZipParser类负责,承载解析出来的数据主要由MinimalCentralDirectoryMetadata、MinimalZipArchive和MinimalZipEntry负责。解析完成后,最终输出的是一个按照偏移量排序的MinimalZipEntry列表。

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
 
       
private static List listEntriesInternal(RandomAccessFileInputStream in)
throws IOException {
// Step 1: Locate the end-of-central-directory record header.
long offsetOfEocd = MinimalZipParser.locateStartOfEocd(in, 32768);
if (offsetOfEocd == - 1) {
// Archive is weird, abort.
throw new ZipException( "EOCD record not found in last 32k of archive, giving up");
}
// Step 2: Parse the end-of-central-directory data to locate the central directory itself
in.setRange(offsetOfEocd, in.length() - offsetOfEocd);
MinimalCentralDirectoryMetadata centralDirectoryMetadata = MinimalZipParser.parseEocd(in);
// Step 3: Extract a list of all central directory entries (contiguous data stream)
in.setRange(
centralDirectoryMetadata.getOffsetOfCentralDirectory(),
centralDirectoryMetadata.getLengthOfCentralDirectory());
List minimalZipEntries =
new ArrayList(centralDirectoryMetadata.getNumEntriesInCentralDirectory());
for ( int x = 0; x < centralDirectoryMetadata.getNumEntriesInCentralDirectory(); x++) {
minimalZipEntries.add(MinimalZipParser.parseCentralDirectoryEntry(in));
}
// Step 4: Sort the entries in file order, not central directory order.
Collections.sort(minimalZipEntries, LOCAL_ENTRY_OFFSET_COMAPRATOR);
// Step 5: Seek out each local entry and calculate the offset of the compressed data within
for ( int x = 0; x < minimalZipEntries.size(); x++) {
MinimalZipEntry entry = minimalZipEntries.get(x);
long offsetOfNextEntry;
if (x < minimalZipEntries.size() - 1) {
// Don't allow reading past the start of the next entry, for sanity.
offsetOfNextEntry = minimalZipEntries.get(x + 1).getFileOffsetOfLocalEntry();
} else {
// Last entry. Don't allow reading into the central directory, for sanity.
offsetOfNextEntry = centralDirectoryMetadata.getOffsetOfCentralDirectory();
}
long rangeLength = offsetOfNextEntry - entry.getFileOffsetOfLocalEntry();
in.setRange(entry.getFileOffsetOfLocalEntry(), rangeLength);
long relativeDataOffset = MinimalZipParser.parseLocalEntryAndGetCompressedDataOffset(in);
entry.setFileOffsetOfCompressedData(entry.getFileOffsetOfLocalEntry() + relativeDataOffset);
}
// Done!
return minimalZipEntries;
}

以上代码主要做了下面几件事

  • 定位End of Central Directory起始偏移量
  • 找到Central Directory段
  • 解析Central Directory段
  • 排序,按照偏移量升序
  • 解析真实数据,找到其偏移量

如何定位End of Central Directory起始偏移量,其实很简单,扫描字节,找到特定的头部,即0x06054b50,其内部实现是扫描zip文件的最后32k部分的字节数组,找到了就返回,找不到就抛异常。这里有一个问题,如果最后32k找不到怎么办,找了相关资料,也没找到End of Central Directory一定在最后32k的说法,翻了Android Multidex的实现,发现它扫描的是最后64k的字节,这里就姑且认为它一定能扫描得到吧。其实现如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 
       
public static long locateStartOfEocd(RandomAccessFileInputStream in, int searchBufferLength)
throws IOException {
final int maxBufferSize = ( int) Math.min(searchBufferLength, in.length());
final byte[] buffer = new byte[maxBufferSize]; //32k
final long rangeStart = in.length() - buffer.length;
in.setRange(rangeStart, buffer.length);
readOrDie(in, buffer, 0, buffer.length); //read to buffer
int offset = locateStartOfEocd(buffer); //locate
if (offset == - 1) {
return - 1;
}
return rangeStart + offset;
}
public static int locateStartOfEocd(byte[] buffer) {
int last4Bytes = 0; // This is the 32 bits of data from the file
for ( int offset = buffer.length - 1; offset >= 0; offset--) {
last4Bytes <<= 8;
last4Bytes |= buffer[offset];
if (last4Bytes == EOCD_SIGNATURE) { //0x06054b50
return offset;
}
}
return - 1;
}

找到End of Central Directory的起始偏移位置之后,就是解析该段数据,返回MinimalCentralDirectoryMetadata数据结构了。解析代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 
       
public static MinimalCentralDirectoryMetadata parseEocd(InputStream in)
throws IOException, ZipException {
if ((( int) read32BitUnsigned(in)) != EOCD_SIGNATURE) { //0x06054b50
throw new ZipException( "Bad eocd header");
}
// *** 4 bytes encode EOCD_SIGNATURE, ignore (already found and verified).
// 2 bytes encode disk number for this archive, ignore.
// 2 bytes encode disk number for the central directory, ignore.
// 2 bytes encode num entries in the central directory on this disk, ignore.
// *** 2 bytes encode num entries in the central directory overall [READ THIS]
// *** 4 bytes encode the length of the central directory [READ THIS]
// *** 4 bytes encode the file offset of the central directory [READ THIS]
// 2 bytes encode the length of the zip file comment, ignore.
// Everything else from here to the EOF is the zip file comment, or junk. Ignore.
skipOrDie(in, 2 + 2 + 2);
int numEntriesInCentralDirectory = read16BitUnsigned(in); //number
if (numEntriesInCentralDirectory == 0xffff) {
// If 0xffff, this is a zip64 archive and this code doesn't handle that.
throw new ZipException( "No support for zip64");
}
long lengthOfCentralDirectory = read32BitUnsigned(in); //length
long offsetOfCentralDirectory = read32BitUnsigned(in); //offset
return new MinimalCentralDirectoryMetadata(
numEntriesInCentralDirectory, offsetOfCentralDirectory, lengthOfCentralDirectory);
}

从代码中可以看出,其实只是解析出了三个重要的数据,分别是:

  • Central Directory 个数 n
  • Central Directory起始偏移 offset
  • Central Directory总长度 length

之后就是锁定数据区域在[offset,offest+length],内部实现是RandomAccessFile。for循环,循环次数为n,依次解析各个Central Directory。其解析单个Central Directory的代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
 
       
public static MinimalZipEntry parseCentralDirectoryEntry(InputStream in) throws IOException {
// *** 4 bytes encode the CENTRAL_DIRECTORY_ENTRY_SIGNATURE, verify for sanity
// 2 bytes encode the version-made-by, ignore
// 2 bytes encode the version-needed-to-extract, ignore
// *** 2 bytes encode the general-purpose flags, read for language encoding. [READ THIS]
// *** 2 bytes encode the compression method, [READ THIS]
// 2 bytes encode the MSDOS last modified file time, ignore
// 2 bytes encode the MSDOS last modified file date, ignore
// *** 4 bytes encode the CRC32 of the uncompressed data [READ THIS]
// *** 4 bytes encode the compressed size [READ THIS]
// *** 4 bytes encode the uncompressed size [READ THIS]
// *** 2 bytes encode the length of the file name [READ THIS]
// *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS]
// *** 2 bytes encode the length of the comment, needed to skip the bytes later [READ THIS]
// 2 bytes encode the disk number, ignore
// 2 bytes encode the internal file attributes, ignore
// 4 bytes encode the external file attributes, ignore
// *** 4 bytes encode the offset of the local section entry, where the data is [READ THIS]
// n bytes encode the file name
// n bytes encode the extras
// n bytes encode the comment
if ((( int) read32BitUnsigned(in)) != CENTRAL_DIRECTORY_ENTRY_SIGNATURE) {
throw new ZipException( "Bad central directory header");
}
skipOrDie(in, 2 + 2); // Skip version stuff
int generalPurposeFlags = read16BitUnsigned(in);
int compressionMethod = read16BitUnsigned(in);
skipOrDie(in, 2 + 2); // Skip MSDOS junk
long crc32OfUncompressedData = read32BitUnsigned(in);
long compressedSize = read32BitUnsigned(in);
long uncompressedSize = read32BitUnsigned(in);
int fileNameLength = read16BitUnsigned(in);
int extrasLength = read16BitUnsigned(in);
int commentLength = read16BitUnsigned(in);
skipOrDie(in, 2 + 2 + 4); // Skip the disk number and file attributes
long fileOffsetOfLocalEntry = read32BitUnsigned(in);
byte[] fileNameBuffer = new byte[fileNameLength];
readOrDie(in, fileNameBuffer, 0, fileNameBuffer.length);
skipOrDie(in, extrasLength + commentLength);
// General purpose flag bit 11 is an important hint for the character set used for file names.
boolean generalPurposeFlagBit11 = (generalPurposeFlags & ( 0x1 << 10)) != 0;
return new MinimalZipEntry(
compressionMethod,
crc32OfUncompressedData,
compressedSize,
uncompressedSize,
fileNameBuffer,
generalPurposeFlagBit11,
fileOffsetOfLocalEntry);
}

主要解析出如下的数据:

  • 压缩方法
  • crc32校验码
  • 压缩前大小
  • 压缩后大小
  • 文件名
  • 通用标记位
  • local entry偏移位置offset

返回了一个list,里面有n个MinimalZipEntry结构,经过按offset升序排序后,再遍历list,解析其在local entry中的真实数据的偏移,其解析代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 
       
public static long parseLocalEntryAndGetCompressedDataOffset(InputStream in) throws IOException {
// *** 4 bytes encode the LOCAL_ENTRY_SIGNATURE, verify for sanity
// 2 bytes encode the version-needed-to-extract, ignore
// 2 bytes encode the general-purpose flags, ignore
// 2 bytes encode the compression method, ignore (redundant with central directory)
// 2 bytes encode the MSDOS last modified file time, ignore
// 2 bytes encode the MSDOS last modified file date, ignore
// 4 bytes encode the CRC32 of the uncompressed data, ignore (redundant with central directory)
// 4 bytes encode the compressed size, ignore (redundant with central directory)
// 4 bytes encode the uncompressed size, ignore (redundant with central directory)
// *** 2 bytes encode the length of the file name, needed to skip the bytes later [READ THIS]
// *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS]
// The rest is the data, which is the main attraction here.
if ((( int) read32BitUnsigned(in)) != LOCAL_ENTRY_SIGNATURE) {
throw new ZipException( "Bad local entry header");
}
int junkLength = 2 + 2 + 2 + 2 + 2 + 4 + 4 + 4;
skipOrDie(in, junkLength); // Skip everything up to the length of the file name
final int fileNameLength = read16BitUnsigned(in);
final int extrasLength = read16BitUnsigned(in);
// The file name is already known and will match the central directory, so no need to read it.
// The extra field length can be different here versus in the central directory and is used for
// things like zipaligning APKs. This single value is the critical part as it dictates where the
// actual DATA for the entry begins.
return 4 + junkLength + 2 + 2 + fileNameLength + extrasLength;
}

很简单,跳过了locat entry的真实数据前面的所有字节,获得偏移。

至此Zip文件解析完成。

差量文件的生成

实现代码主要在FileByFileV1DeltaGenerator中,代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
 
       
@Override
public void generateDelta(File oldFile, File newFile, OutputStream patchOut)
throws IOException, InterruptedException {
try (TempFileHolder deltaFriendlyOldFile = new TempFileHolder();
TempFileHolder deltaFriendlyNewFile = new TempFileHolder();
TempFileHolder deltaFile = new TempFileHolder();
FileOutputStream deltaFileOut = new FileOutputStream(deltaFile.file);
BufferedOutputStream bufferedDeltaOut = new BufferedOutputStream(deltaFileOut)) {
PreDiffExecutor.Builder builder =
new PreDiffExecutor.Builder()
.readingOriginalFiles(oldFile, newFile)
.writingDeltaFriendlyFiles(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file);
for (RecommendationModifier modifier : recommendationModifiers) {
builder.withRecommendationModifier(modifier);
}
PreDiffExecutor executor = builder.build();
PreDiffPlan preDiffPlan = executor.prepareForDiffing();
DeltaGenerator deltaGenerator = getDeltaGenerator();
deltaGenerator.generateDelta(
deltaFriendlyOldFile.file, deltaFriendlyNewFile.file, bufferedDeltaOut);
bufferedDeltaOut.close();
PatchWriter patchWriter =
new PatchWriter(
preDiffPlan,
deltaFriendlyOldFile.file.length(),
deltaFriendlyNewFile.file.length(),
deltaFile.file);
patchWriter.writeV1Patch(patchOut);
}
}
protected DeltaGenerator getDeltaGenerator() {
return new BsDiffDeltaGenerator();
}

干了如下几件事:

  • 生成了三个临时文件,分别用于存储旧文件的差量友好文件,新文件的差量友好文件,差量文件,这三个文件会在jvm退出时自动删除。
  • 调用PreDiffExecutor的prepareForDiffing生成PreDiffPlan对象,该函数做了很多很多十分复杂的事情,后面细说
  • 应用BsDiff差量算法生成差量文件
  • 生成patch文件,patch文件格式后面细说。

现在来看下PreDiffExecutor的prepareForDiffing函数:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 
       
public PreDiffPlan prepareForDiffing() throws IOException {
PreDiffPlan preDiffPlan = generatePreDiffPlan();
List> deltaFriendlyNewFileRecompressionPlan = null;
if (deltaFriendlyOldFile != null) {
// Builder.writingDeltaFriendlyFiles() ensures old and new are non-null when called, so a
// check on either is sufficient.
deltaFriendlyNewFileRecompressionPlan =
Collections.unmodifiableList(generateDeltaFriendlyFiles(preDiffPlan));
}
return new PreDiffPlan(
preDiffPlan.getQualifiedRecommendations(),
preDiffPlan.getOldFileUncompressionPlan(),
preDiffPlan.getNewFileUncompressionPlan(),
deltaFriendlyNewFileRecompressionPlan);
}

干了下面几件事:

  • 调用generatePreDiffPlan函数,生成一个PreDiffPlan对象,这个函数后面细说
  • 根据返回的PreDiffPlan对象,调用generateDeltaFriendlyFiles函数生成差量友好文件,这个函数后面细说
  • 创建一个PreDiffPlan对象,将相关参数传入,分别是建议列表,旧文件需要被解压的列表,新闻需要被解压的列表,还有生成的新文件的差量友好相关的列表

现在来看看generatePreDiffPlan函数:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
 
       
private PreDiffPlan generatePreDiffPlan() throws IOException {
Map originalOldArchiveZipEntriesByPath =
new HashMap();
Map originalNewArchiveZipEntriesByPath =
new HashMap();
Map originalNewArchiveJreDeflateParametersByPath =
new HashMap();
for (MinimalZipEntry zipEntry : MinimalZipArchive.listEntries(originalOldFile)) {
ByteArrayHolder key = new ByteArrayHolder(zipEntry.getFileNameBytes());
originalOldArchiveZipEntriesByPath.put(key, zipEntry);
}
DefaultDeflateCompressionDiviner diviner = new DefaultDeflateCompressionDiviner();
for (DivinationResult divinationResult : diviner.divineDeflateParameters(originalNewFile)) {
ByteArrayHolder key =
new ByteArrayHolder(divinationResult.minimalZipEntry.getFileNameBytes());
originalNewArchiveZipEntriesByPath.put(key, divinationResult.minimalZipEntry);
originalNewArchiveJreDeflateParametersByPath.put(key, divinationResult.divinedParameters);
}
PreDiffPlanner preDiffPlanner =
new PreDiffPlanner(
originalOldFile,
originalOldArchiveZipEntriesByPath,
originalNewFile,
originalNewArchiveZipEntriesByPath,
originalNewArchiveJreDeflateParametersByPath,
recommendationModifiers.toArray( new RecommendationModifier[] {}));
return preDiffPlanner.generatePreDiffPlan();
}
public List divineDeflateParameters(File archiveFile) throws IOException {
List results = new ArrayList<>();
for (MinimalZipEntry minimalZipEntry : MinimalZipArchive.listEntries(archiveFile)) {
JreDeflateParameters divinedParameters = null;
if (minimalZipEntry.isDeflateCompressed()) {
// TODO(pasc): Reuse streams to avoid churning file descriptors
MultiViewInputStreamFactory isFactory =
new RandomAccessFileInputStreamFactory(
archiveFile,
minimalZipEntry.getFileOffsetOfCompressedData(),
minimalZipEntry.getCompressedSize());
// Keep small entries in memory to avoid unnecessary file I/O.
if (minimalZipEntry.getCompressedSize() < ( 100 * 1024)) {
try (InputStream is = isFactory.newStream()) {
byte[] compressedBytes = new byte[( int) minimalZipEntry.getCompressedSize()];
is.read(compressedBytes);
divinedParameters =
divineDeflateParameters( new ByteArrayInputStreamFactory(compressedBytes));
} catch (Exception ignore) {
divinedParameters = null;
}
} else {
divinedParameters = divineDeflateParameters(isFactory);
}
}
results.add( new DivinationResult(minimalZipEntry, divinedParameters));
}
return results;
}
public JreDeflateParameters divineDeflateParameters(
MultiViewInputStreamFactory compressedDataInputStreamFactory) throws IOException {
byte[] copyBuffer = new byte[ 32 * 1024];
// Iterate over all relevant combinations of nowrap, strategy and level.
for ( boolean nowrap : new boolean[] { true, false}) {
Inflater inflater = new Inflater(nowrap);
Deflater deflater = new Deflater( 0, nowrap);
strategy_loop:
for ( int strategy : new int[] { 0, 1, 2}) {
deflater.setStrategy(strategy);
for ( int level : LEVELS_BY_STRATEGY.get(strategy)) {
deflater.setLevel(level);
inflater.reset();
deflater.reset();
try {
if (matches(inflater, deflater, compressedDataInputStreamFactory, copyBuffer)) {
end(inflater, deflater);
return JreDeflateParameters.of(level, strategy, nowrap);
}
} catch (ZipException e) {
// Parse error in input. The only possibilities are corruption or the wrong nowrap.
// Skip all remaining levels and strategies.
break strategy_loop;
}
}
}
end(inflater, deflater);
}
return null;
}

generatePreDiffPlan做的事情是生成三个map对象。

  • 第一个map对象是持有旧文件的相关数据。key为Zip Entry的文件名对应的字节数组的holder类ByteArrayHolder,value为MinimalZipEntry。
  • 第二个map对象的持有新文件的相关数据。key为Zip Entry的文件名对应的字节数组的holder类ByteArrayHolder,value为MinimalZipEntry。
  • 第三个map数据就是持有推测出来的新文件的Zip Entry的压缩级别,策略,是否是nowrap三个数据。key为Zip Entry的文件名对应的字节数组的holder类ByteArrayHolder,value为JreDeflateParameters

对于前两个数据调用前面解析过的Zip文件结构相关函数,返回MinimalZipEntry的List类型,key就来自MinimalZipEntry.getFileNameBytes(),而值就是其本身。

而第三个数据来的比较艰辛,需要经过推测,推测的方法很暴力,三层for循环,将压缩的数据解压缩,再利用三个参数的排列组合,即level,strategy,nowrap排列,进行重新压缩,压缩后的数据如果等于从Zip中解析出来的压缩数据,则得到对应的level,strategy,nowrap值。这三个值的承载方式就是JreDeflateParameters。

利用这三个map构建了一个PreDiffPlanner对象,调用该对象的generatePreDiffPlan方法返回PreDiffPlan,其代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
 
       
PreDiffPlan generatePreDiffPlan() throws IOException {
List recommendations = getDefaultRecommendations();
for (RecommendationModifier modifier : recommendationModifiers) {
// Allow changing the recommendations base on arbitrary criteria.
recommendations = modifier.getModifiedRecommendations(oldFile, newFile, recommendations);
}
// Process recommendations to extract ranges for decompression & recompression
Set> oldFilePlan = new HashSet<>();
Set> newFilePlan = new HashSet<>();
for (QualifiedRecommendation recommendation : recommendations) {
if (recommendation.getRecommendation().uncompressOldEntry) {
long offset = recommendation.getOldEntry().getFileOffsetOfCompressedData();
long length = recommendation.getOldEntry().getCompressedSize();
TypedRange range = new TypedRange(offset, length, null);
oldFilePlan.add(range);
}
if (recommendation.getRecommendation().uncompressNewEntry) {
long offset = recommendation.getNewEntry().getFileOffsetOfCompressedData();
long length = recommendation.getNewEntry().getCompressedSize();
JreDeflateParameters newJreDeflateParameters =
newArchiveJreDeflateParametersByPath.get(
new ByteArrayHolder(recommendation.getNewEntry().getFileNameBytes()));
TypedRange range =
new TypedRange(offset, length, newJreDeflateParameters);
newFilePlan.add(range);
}
}
List> oldFilePlanList = new ArrayList<>(oldFilePlan);
Collections.sort(oldFilePlanList);
List> newFilePlanList = new ArrayList<>(newFilePlan);
Collections.sort(newFilePlanList);
return new PreDiffPlan(
Collections.unmodifiableList(recommendations),
Collections.unmodifiableList(oldFilePlanList),
Collections.unmodifiableList(newFilePlanList));
}
private List getDefaultRecommendations() throws IOException {
List recommendations = new ArrayList<>();
// This will be used to find files that have been renamed, but not modified. This is relatively
// cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and
// the CRC32 is already available in the ZIP headers.
SimilarityFinder trivialRenameFinder =
new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values());
// Iterate over every pair of entries and get a recommendation for what to do.
for (Map.Entry newEntry :
newArchiveZipEntriesByPath.entrySet()) {
ByteArrayHolder newEntryPath = newEntry.getKey();
MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath);
if (oldZipEntry == null) {
// The path is only present in the new archive, not in the old archive. Try to find a
// similar file in the old archive that can serve as a diff base for the new file.
List identicalEntriesInOldArchive =
trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue());
if (!identicalEntriesInOldArchive.isEmpty()) {
// An identical file exists in the old archive at a different path. Use it for the
// recommendation and carry on with the normal logic.
// All entries in the returned list are identical, so just pick the first one.
// NB, in principle it would be optimal to select the file that required the least work
// to apply the patch - in practice, it is unlikely that an archive will contain multiple
// copies of the same file that are compressed differently, so don't bother with that
// degenerate case.
oldZipEntry = identicalEntriesInOldArchive.get( 0);
}
}
// If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is
// null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so
// get a recommendation for what to do.
if (oldZipEntry != null) {
recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));
}
}
return recommendations;
}

该函数主要生成两个List对象,分别是:

  • 旧文件的建议解压的Zip Entry的压缩数据偏移位置和数据长度,承载的载体是TypedRange,泛型是Void,所有相关文件组成一个List对象
  • 新文件的建议解压的Zip Entry的压缩数据的偏移位置和数据长度,承载的载体是TypedRange,泛型是JreDeflateParameters,泛型参数对应的值来自上一步解析出来的第三个map,所有相关件组成一个List对象

上面两个List对象各自按偏移升序排序。

上面提到建议解压的Zip Entry,那么这个数据是怎么来的呢?来自下面这个函数

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
 
       
private List getDefaultRecommendations() throws IOException {
List recommendations = new ArrayList<>();
// This will be used to find files that have been renamed, but not modified. This is relatively
// cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and
// the CRC32 is already available in the ZIP headers.
SimilarityFinder trivialRenameFinder =
new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values());
// Iterate over every pair of entries and get a recommendation for what to do.
for (Map.Entry newEntry :
newArchiveZipEntriesByPath.entrySet()) {
ByteArrayHolder newEntryPath = newEntry.getKey();
MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath);
if (oldZipEntry == null) {
// The path is only present in the new archive, not in the old archive. Try to find a
// similar file in the old archive that can serve as a diff base for the new file.
List identicalEntriesInOldArchive =
trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue());
if (!identicalEntriesInOldArchive.isEmpty()) {
// An identical file exists in the old archive at a different path. Use it for the
// recommendation and carry on with the normal logic.
// All entries in the returned list are identical, so just pick the first one.
// NB, in principle it would be optimal to select the file that required the least work
// to apply the patch - in practice, it is unlikely that an archive will contain multiple
// copies of the same file that are compressed differently, so don't bother with that
// degenerate case.
oldZipEntry = identicalEntriesInOldArchive.get( 0);
}
}
// If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is
// null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so
// get a recommendation for what to do.
if (oldZipEntry != null) {
recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));
}
}
return recommendations;
}

这个函数主要做如下工作:

  • 创建一个相似文件查找器,内部使用Map进行查找,key为crc32,值为旧文件的MinimalZipEntry,且是一个List,因为crc32相同的文件可能有多个。
  • 遍历新文件MinimalZipEntry的List对象,查看对应名字在旧文件中是否存在,如果不存在,则通过第一步的相似文件查找器查找crc32相同的文件,如果找到了,取List对象的第一个。如果找不到,则表示这个文件被移除了,不需要管它。
  • 通过旧Entry和新Entry调用getRecommendation函数返回QualifiedRecommendation对象,add到List对象中;该对象持有了新旧Entry,以及新旧文件是否被解压等相关信息。
  • 返回找到的QualifiedRecommendation列表

QualifiedRecommendation的生成算法是什么呢,它是调用getRecommendation返回的,该函数代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
 
       
private QualifiedRecommendation getRecommendation(MinimalZipEntry oldEntry, MinimalZipEntry newEntry)
throws IOException {
// Reject anything that is unsuitable for uncompressed diffing.
// Reason singled out in order to monitor unsupported versions of zlib.
if (unsuitableDeflate(newEntry)) {
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_NEITHER,
RecommendationReason.DEFLATE_UNSUITABLE);
}
// Reject anything that is unsuitable for uncompressed diffing.
if (unsuitable(oldEntry, newEntry)) {
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_NEITHER,
RecommendationReason.UNSUITABLE);
}
// If both entries are already uncompressed there is nothing to do.
if (bothEntriesUncompressed(oldEntry, newEntry)) {
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_NEITHER,
RecommendationReason.BOTH_ENTRIES_UNCOMPRESSED);
}
// The following are now true:
// 1. At least one of the entries is compressed.
// 1. The old entry is either uncompressed, or is compressed with deflate.
// 2. The new entry is either uncompressed, or is reproducibly compressed with deflate.
if (uncompressedChangedToCompressed(oldEntry, newEntry)) {
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_NEW,
RecommendationReason.UNCOMPRESSED_CHANGED_TO_COMPRESSED);
}
if (compressedChangedToUncompressed(oldEntry, newEntry)) {
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_OLD,
RecommendationReason.COMPRESSED_CHANGED_TO_UNCOMPRESSED);
}
// At this point, both entries must be compressed with deflate.
if (compressedBytesChanged(oldEntry, newEntry)) {
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_BOTH,
RecommendationReason.COMPRESSED_BYTES_CHANGED);
}
// If the compressed bytes have not changed, there is no need to do anything.
return new QualifiedRecommendation(
oldEntry,
newEntry,
Recommendation.UNCOMPRESS_NEITHER,
RecommendationReason.COMPRESSED_BYTES_IDENTICAL);
}

主要有7种类型:

  • 该文件被压缩过,但是无法推测出其JreDeflateParamyaseters参数,也就是无法获得其压缩级别,编码策略,nowrap三个参数,没有了这三个参数,我们就无法重新进行压缩,因此,对于这种情况,返回的是不建议解压,原因是找不到合适的deflate参数还原压缩数据
  • 旧文件,或新文件被压缩了,但是是不支持的压缩算法,则返回不建议解压缩,原因是使用了不支持的压缩算法
  • 如果新旧文件都没有被压缩,则返回不需要解压,原因是都没有被压缩
  • 如果旧文件未压缩,新文件已压缩,则返回新文件需要解压,原因是从未压缩文件变成了已压缩文件
  • 如果旧文件已压缩,新文件未压缩,则返回旧文件需要解压,原因是从已压缩文件变成了未压缩文件
  • 如果新旧文件都已经压缩,且发生了变化,则返回需要解压新旧文件,原因是文件发生改变
  • 没有新旧文件没有发生变化,则返回不需要解压新旧文件,原因是文件未发生改变

有了以上信息,再来看看差量友好的文件是怎么生成的:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
 
       
private List> generateDeltaFriendlyFiles(PreDiffPlan preDiffPlan)
throws IOException {
try (FileOutputStream out = new FileOutputStream(deltaFriendlyOldFile);
BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) {
DeltaFriendlyFile.generateDeltaFriendlyFile(
preDiffPlan.getOldFileUncompressionPlan(), originalOldFile, bufferedOut);
}
try (FileOutputStream out = new FileOutputStream(deltaFriendlyNewFile);
BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) {
return DeltaFriendlyFile.generateDeltaFriendlyFile(
preDiffPlan.getNewFileUncompressionPlan(), originalNewFile, bufferedOut);
}
}
public static List> generateDeltaFriendlyFile(
List> rangesToUncompress, File file, OutputStream deltaFriendlyOut)
throws IOException {
return generateDeltaFriendlyFile(
rangesToUncompress, file, deltaFriendlyOut, true, DEFAULT_COPY_BUFFER_SIZE);
}
public static List> generateDeltaFriendlyFile(
List> rangesToUncompress,
File file,
OutputStream deltaFriendlyOut,
boolean generateInverse,
int copyBufferSize)
throws IOException {
List> inverseRanges = null;
if (generateInverse) {
inverseRanges = new ArrayList>(rangesToUncompress.size());
}
long lastReadOffset = 0;
RandomAccessFileInputStream oldFileRafis = null;
PartiallyUncompressingPipe filteredOut =
new PartiallyUncompressingPipe(deltaFriendlyOut, copyBufferSize);
try {
oldFileRafis = new RandomAccessFileInputStream(file);
for (TypedRange rangeToUncompress : rangesToUncompress) {
long gap = rangeToUncompress.getOffset() - lastReadOffset;
if (gap > 0) {
// Copy bytes up to the range start point
oldFileRafis.setRange(lastReadOffset, gap);
filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);
}
// Now uncompress the range.
oldFileRafis.setRange(rangeToUncompress.getOffset(), rangeToUncompress.getLength());
long inverseRangeStart = filteredOut.getNumBytesWritten();
// TODO(andrewhayden): Support nowrap=false here? Never encountered in practice.
// This would involve catching the ZipException, checking if numBytesWritten is still zero,
// resetting the stream and trying again.
filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.UNCOMPRESS_NOWRAP);
lastReadOffset = rangeToUncompress.getOffset() + rangeToUncompress.getLength();
if (generateInverse) {
long inverseRangeEnd = filteredOut.getNumBytesWritten();
long inverseRangeLength = inverseRangeEnd - inverseRangeStart;
TypedRange inverseRange =
new TypedRange(
inverseRangeStart, inverseRangeLength, rangeToUncompress.getMetadata());
inverseRanges.add(inverseRange);
}
}
// Finish the final bytes of the file
long bytesLeft = oldFileRafis.length() - lastReadOffset;
if (bytesLeft > 0) {
oldFileRafis.setRange(lastReadOffset, bytesLeft);
filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);
}
} finally {
try {
oldFileRafis.close();
} catch (Exception ignored) {
// Nothing
}
try {
filteredOut.close();
} catch (Exception ignored) {
// Nothing
}
}
return inverseRanges;
}

这个函数比较巧妙,也比较复杂,其过程如下:

  • 遍历需要解压的列表,获得其偏移,将该偏移减去上次读的偏移位置lastReadOffset,得到一个gap值,这个值使用COPY直接拷贝子杰数组
  • 然后将数据定位到[offset,offset+length]之间,获得已经解压写入的所有数据大小,赋值给inverseRangeStart,然后将压缩数据使用对应的参数进行解压,将上次读的偏移位置lastReadOffset设置为当前的offset+length值。
  • 判断generateInverse是否为true,这里这个值永远为true,因为入参传了true。获得已经解压写入的所有数据大小,赋值给inverseRangeEnd,使用inverseRangeEnd减去inverseRangeStart就是解压之后的大小,构建TypedRange对象,add到list中
  • 所有数据遍历完之后,判断当前读的位置到文件结尾是否还有数据剩余,如果有,则继续写入
  • 返回TypedRange的List对象。

这个过程比较复杂抽象,用一张图来说明整个文件解压过程。

friendly_generator.png

上图是zip文件,绿色的gap是一些描述信息,红色的表示真实的压缩数据,蓝色的表示文件末尾遗留的数据,对于gap,执行拷贝操作,对于压缩数据,执行解压操作,并返回解压之后真实的偏移offset和解压之后真实数据的大小length,所有数据遍历完之后,文件末尾还有一部分遗留数据,对其执行拷贝操作。

特别注意返回的TypedRange是新文件的解压之后的offset和length,这个数据十分重要,还原zip文件就靠这个数据了。

有了新旧文件的差量友好文件之后做什么呢,很简单,使用BsDiff生成差量文件,然后将差量文件写入patch文件。patch文件的格式如下:

Offset Bytes Description 备注
0 8 Versioned Identifier 头部标记,固定值”GFbFv1_0”,UTF-8字符串
8 4 Flags (currently unused, but reserved) 标记未,预留
12 8 Delta-friendly old archive size 旧文件差量友好文件大小,64位无符号整型
20 4 Num old archive uncompression ops 旧文件待解压文件个数,32位无符号整型
24 i Old archive uncompression op 1…n 旧文件待解压文件的偏移和大小,总共n个
24+i 4 Num new archive recompression ops 新文件待压缩文件个数,32位无符号整型
24+i+4 j New archive recompression op 1…n 新文件待压缩文件的偏移和大小,总共n个
24+i+4+j 4 Num delta descriptor records 新文件差量描述个数,32位无符号整型
24+i+4+j+4 k Delta descriptor record 1…n 差量算法描述记录,总共n个
24+i+4+j+4+k l Delta 1…n 差量算法描述

Old Archive Uncompression Op的数据结构如下

Bytes Description 备注
8 Offset of first byte to uncompress 待解压的偏移位置,64位无符号整型
8 Number of bytes to uncompress 待解压的字节个数,64位无符号整型

New Archive Recompression Op的数据结构如下

Bytes Description 备注
8 Offset of first byte to compress 待压缩的偏移位置,64位无符号整型
8 Number of bytes to compress 待压缩的字节个数,64位无符号整型
4 Compression settings 压缩参数,即压缩级别,编码策略,nowrap

Compression Settings的数据结构如下

Bytes Description 备注
1 Compatibility window ID 兼容窗口,当前取值为0,即默认兼容窗口
1 Deflate level 压缩级别,取值[1,9]
1 Deflate strategy 编码策略,取值[0,2]
1 Wrap mode 取值0=wrap,1=nowrap

Compatibility Window即兼容窗口,其默认的兼容窗口ID取值为0,默认兼容窗口使用如下配置

  • 使用deflate算法进行压缩(zlib)
  • 32768个字节的buffer大小
  • 已经被验证的压缩级别,1-9
  • 已经被验证过的编码策略,0-2
  • 已经被验证过的wrap模式,wrap和nowrap

默认兼容窗口可以兼容Android4.0之后的系统。

这个兼容窗口是怎么得到的呢,其中有一个类叫DefaultDeflateCompatibilityWindow,可以调用getIncompatibleValues获得其不兼容的参数列表JreDeflateParameters(压缩级别,编码策略,nowrap的承载体),内部通过排列组合这三个参数,对一段内容进行压缩,产生压缩后的数据的16进制的编码,与内置的预期数据进行对比,如果相同则表示兼容,不相同表示不兼容。

这里有一个问题,官方表示可以兼容压缩级别1-9,编码策略0-2,wrap和nowrap,但是实际我测试下来,发现在pc上有一部分组合是不兼容的,大概约4个组合。Android上没有测试过,不知道是否有这个问题。

Delta Descriptor Record用于描述差量算法,在当前的V1版Patch中,只有BsDiff算法,因此只有一条该数据结构,其数据结构如下:

Bytes Description 备注
1 Delta format ID 差量算法对应的枚举id,bsdiff取值0
8 Old delta-friendly region start 旧文件差量算法应用的偏移位置
8 Old delta-friendly region length 旧文件差量算法应用的长度
8 New delta-friendly region start 新文件差量算法应用的偏移位置
8 New delta-friendly region length 新文件差量算法应用的长度
8 Delta length 生成的差量文件的长度

生成patch文件的函数是writeV1Patch,其代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
 
       
public void writeV1Patch(OutputStream out) throws IOException {
// Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would
// close the output stream that was passed in and that is not part of the method's documented
// behavior.
@SuppressWarnings( "resource")
DataOutputStream dataOut = new DataOutputStream(out);
dataOut.write(PatchConstants.IDENTIFIER.getBytes( "US-ASCII")); //GFbFv1_0
dataOut.writeInt( 0); // Flags (reserved)
dataOut.writeLong(deltaFriendlyOldFileSize);
// Write out all the delta-friendly old file uncompression instructions
dataOut.writeInt(plan.getOldFileUncompressionPlan().size());
for (TypedRange range : plan.getOldFileUncompressionPlan()) {
dataOut.writeLong(range.getOffset());
dataOut.writeLong(range.getLength());
}
// Write out all the delta-friendly new file recompression instructions
dataOut.writeInt(plan.getDeltaFriendlyNewFileRecompressionPlan().size());
for (TypedRange range : plan.getDeltaFriendlyNewFileRecompressionPlan()) {
dataOut.writeLong(range.getOffset());
dataOut.writeLong(range.getLength());
// Write the deflate information
dataOut.write(PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue);
dataOut.write(range.getMetadata().level);
dataOut.write(range.getMetadata().strategy);
dataOut.write(range.getMetadata().nowrap ? 1 : 0);
}
// Now the delta section
// First write the number of deltas present in the patch. In v1, there is always exactly one
// delta, and it is for the entire input; in future versions there may be multiple deltas, of
// arbitrary types.
dataOut.writeInt( 1);
// In v1 the delta format is always bsdiff, so write it unconditionally.
dataOut.write(PatchConstants.DeltaFormat.BSDIFF.patchValue);
// Write the working ranges. In v1 these are always the entire contents of the delta-friendly
// old file and the delta-friendly new file. These are for forward compatibility with future
// versions that may allow deltas of arbitrary formats to be mapped to arbitrary ranges.
dataOut.writeLong( 0); // i.e., start of the working range in the delta-friendly old file
dataOut.writeLong(deltaFriendlyOldFileSize); // i.e., length of the working range in old
dataOut.writeLong( 0); // i.e., start of the working range in the delta-friendly new file
dataOut.writeLong(deltaFriendlyNewFileSize); // i.e., length of the working range in new
// Finally, the length of the delta and the delta itself.
dataOut.writeLong(deltaFile.length());
try (FileInputStream deltaFileIn = new FileInputStream(deltaFile);
BufferedInputStream deltaIn = new BufferedInputStream(deltaFileIn)) {
byte[] buffer = new byte[ 32768];
int numRead = 0;
while ((numRead = deltaIn.read(buffer)) >= 0) {
dataOut.write(buffer, 0, numRead);
}
}
dataOut.flush();
}

主要做了如下几步:

  • 写入文件头,”GFbFv1_0”
  • 写入标记位,预留,值为0
  • 写入旧文件差量友好文件的大小
  • 写入旧文件需要解压的entry个数
  • 依次写入旧文件n个待解压的entry的偏移和长度
  • 写入新文件需要压缩的entry的个数
  • 依次写入新文件n个待压缩的entry的偏移和长度,兼容窗口(窗口id,压缩级别,压缩策略,nowrap)
  • 写入差量算法描述个数,只使用了bsdiff,因此值为1
  • 写入差量算法id,旧文件差量友好文件应用差量算法的偏移和长度,新文件差量友好文件应用差量算法的偏移和长度
  • 写入patch文件的大小
  • 写入bsdiff生成的patch文件内容

新文件的合成

合成主要通过com.google.archivepatcher.applier.FileByFileV1DeltaApplier的applyDelta,最终会调用到applyDeltaInternal方法,其代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 
       
private void applyDeltaInternal(
File oldBlob, File deltaFriendlyOldBlob, InputStream deltaIn, OutputStream newBlobOut)
throws IOException {
// First, read the patch plan from the patch stream.
PatchReader patchReader = new PatchReader();
PatchApplyPlan plan = patchReader.readPatchApplyPlan(deltaIn);
writeDeltaFriendlyOldBlob(plan, oldBlob, deltaFriendlyOldBlob);
// Apply the delta. In v1 there is always exactly one delta descriptor, it is bsdiff, and it
// takes up the rest of the patch stream - so there is no need to examine the list of
// DeltaDescriptors in the patch at all.
long deltaLength = plan.getDeltaDescriptors().get( 0).getDeltaLength();
DeltaApplier deltaApplier = getDeltaApplier();
// Don't close this stream, as it is just a limiting wrapper.
@SuppressWarnings( "resource")
LimitedInputStream limitedDeltaIn = new LimitedInputStream(deltaIn, deltaLength);
// Don't close this stream, as it would close the underlying OutputStream (that we don't own).
@SuppressWarnings( "resource")
PartiallyCompressingOutputStream recompressingNewBlobOut =
new PartiallyCompressingOutputStream(
plan.getDeltaFriendlyNewFileRecompressionPlan(),
newBlobOut,
DEFAULT_COPY_BUFFER_SIZE);
deltaApplier.applyDelta(deltaFriendlyOldBlob, limitedDeltaIn, recompressingNewBlobOut);
recompressingNewBlobOut.flush();
}

主要做了如下几件事:

  • 解析patch文件生成PatchApplyPlan对象
  • 生成旧文件的差量友好文件
  • 应用合成算法,合成新文件的差量友好文件,于此同时新文件zip包在流的写入过程中完成合成。

对于第一步,来看看如何解析的,解析代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
 
       
public PatchApplyPlan readPatchApplyPlan(InputStream in) throws IOException {
// Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would
// close the output stream that was passed in and that is not part of the method's documented
// behavior.
@SuppressWarnings( "resource")
DataInputStream dataIn = new DataInputStream(in);
// Read header and flags.
byte[] expectedIdentifier = PatchConstants.IDENTIFIER.getBytes( "US-ASCII");
byte[] actualIdentifier = new byte[expectedIdentifier.length];
dataIn.readFully(actualIdentifier);
if (!Arrays.equals(expectedIdentifier, actualIdentifier)) {
throw new PatchFormatException( "Bad identifier");
}
dataIn.skip( 4); // Flags (ignored in v1)
long deltaFriendlyOldFileSize = checkNonNegative(
dataIn.readLong(), "delta-friendly old file size");
// Read old file uncompression instructions.
int numOldFileUncompressionInstructions = ( int) checkNonNegative(
dataIn.readInt(), "old file uncompression instruction count");
List> oldFileUncompressionPlan =
new ArrayList>(numOldFileUncompressionInstructions);
long lastReadOffset = - 1;
for ( int x = 0; x < numOldFileUncompressionInstructions; x++) {
long offset = checkNonNegative(dataIn.readLong(), "old file uncompression range offset");
long length = checkNonNegative(dataIn.readLong(), "old file uncompression range length");
if (offset < lastReadOffset) {
throw new PatchFormatException( "old file uncompression ranges out of order or overlapping");
}
TypedRange range = new TypedRange(offset, length, null);
oldFileUncompressionPlan.add(range);
lastReadOffset = offset + length; // To check that the next range starts after the current one
}
// Read new file recompression instructions
int numDeltaFriendlyNewFileRecompressionInstructions = dataIn.readInt();
checkNonNegative(
numDeltaFriendlyNewFileRecompressionInstructions,
"delta-friendly new file recompression instruction count");
List> deltaFriendlyNewFileRecompressionPlan =
new ArrayList>(
numDeltaFriendlyNewFileRecompressionInstructions);
lastReadOffset = - 1;
for ( int x = 0; x < numDeltaFriendlyNewFileRecompressionInstructions; x++) {
long offset = checkNonNegative(
dataIn.readLong(), "delta-friendly new file recompression range offset");
long length = checkNonNegative(
dataIn.readLong(), "delta-friendly new file recompression range length");
if (offset < lastReadOffset) {
throw new PatchFormatException(
"delta-friendly new file recompression ranges out of order or overlapping");
}
lastReadOffset = offset + length; // To check that the next range starts after the current one
// Read the JreDeflateParameters
// Note that v1 only supports the default deflate compatibility window.
checkRange(
dataIn.readByte(),
PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue,
PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue,
"compatibility window id");
int level = ( int) checkRange(dataIn.readUnsignedByte(), 1, 9, "recompression level");
int strategy = ( int) checkRange(dataIn.readUnsignedByte(), 0, 2, "recompression strategy");
int nowrapInt = ( int) checkRange(dataIn.readUnsignedByte(), 0, 1, "recompression nowrap");
TypedRange range =
new TypedRange(
offset,
length,
JreDeflateParameters.of(level, strategy, nowrapInt == 0 ? false : true));
deltaFriendlyNewFileRecompressionPlan.add(range);
}
// Read the delta metadata, but stop before the first byte of the actual delta.
// V1 has exactly one delta and it must be bsdiff.
int numDeltaRecords = ( int) checkRange(dataIn.readInt(), 1, 1, "num delta records");
List deltaDescriptors = new ArrayList(numDeltaRecords);
for ( int x = 0; x < numDeltaRecords; x++) {
byte deltaFormatByte = ( byte)
checkRange(
dataIn.readByte(),
PatchConstants.DeltaFormat.BSDIFF.patchValue,
PatchConstants.DeltaFormat.BSDIFF.patchValue,
"delta format");
long deltaFriendlyOldFileWorkRangeOffset = checkNonNegative(
dataIn.readLong(), "delta-friendly old file work range offset");
long deltaFriendlyOldFileWorkRangeLength = checkNonNegative(
dataIn.readLong(), "delta-friendly old file work range length");
long deltaFriendlyNewFileWorkRangeOffset = checkNonNegative(
dataIn.readLong(), "delta-friendly new file work range offset");
long deltaFriendlyNewFileWorkRangeLength = checkNonNegative(
dataIn.readLong(), "delta-friendly new file work range length");
long deltaLength = checkNonNegative(dataIn.readLong(), "delta length");
DeltaDescriptor descriptor =
new DeltaDescriptor(
PatchConstants.DeltaFormat.fromPatchValue(deltaFormatByte),
new TypedRange(
deltaFriendlyOldFileWorkRangeOffset, deltaFriendlyOldFileWorkRangeLength, null),
new TypedRange(
deltaFriendlyNewFileWorkRangeOffset, deltaFriendlyNewFileWorkRangeLength, null),
deltaLength);
deltaDescriptors.add(descriptor);
}
return new PatchApplyPlan(
Collections.unmodifiableList(oldFileUncompressionPlan),
deltaFriendlyOldFileSize,
Collections.unmodifiableList(deltaFriendlyNewFileRecompressionPlan),
Collections.unmodifiableList(deltaDescriptors));
}

分为以下几个步骤:

  • 读文件头,校验文件头
  • 忽略4个字节的标记位
  • 读旧文件差量友好文件的大小,并校验,非负数
  • 读旧文件待解压的个数,并校验,非负数
  • 读n个旧文件待解压的偏移,长度,并校验,非负数
  • 读新文件待压缩的个数,并校验,非负数
  • 读n个新文件待压缩的偏移,长度,并校验,非负数,压缩级别,编码策略,nowrap值
  • 读差量算法个数
  • 读n个差量算法描述。差量算法id,旧文件应用差量算法的偏移和长度,新文件应用差量算法的偏移和长度,生成的差量文件的大小
  • 返回PatchApplyPlan对象

接下来就是根据返回的PatchApplyPlan对象,获得旧文件待解压的一个TypedRange的List对象,然后使用DeltaFriendlyFile.generateDeltaFriendlyFile生产差量友好文件,这个过程和生产patch的那个过程一样,不重复描述。其代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
 
       
private void writeDeltaFriendlyOldBlob(
PatchApplyPlan plan, File oldBlob, File deltaFriendlyOldBlob) throws IOException {
RandomAccessFileOutputStream deltaFriendlyOldFileOut = null;
try {
deltaFriendlyOldFileOut =
new RandomAccessFileOutputStream(
deltaFriendlyOldBlob, plan.getDeltaFriendlyOldFileSize());
DeltaFriendlyFile.generateDeltaFriendlyFile(
plan.getOldFileUncompressionPlan(),
oldBlob,
deltaFriendlyOldFileOut,
false,
DEFAULT_COPY_BUFFER_SIZE);
} finally {
try {
deltaFriendlyOldFileOut.close();
} catch (Exception ignored) {
// Nothing
}
}

接下里就是合成新文件了,使用BsPatch算法完成合成并写入Outputstream中,而这个OutputStream经过装饰者模式包装,最终传入的是PartiallyCompressingOutputStream输出流,构建PartiallyCompressingOutputStream对象所需参数就是新文件差量友好文件需要重新压缩的数据的TypedRange的List对象。最终,合成Zip文件的工作会辗转到PartiallyCompressingOutputStream中的writeChunk函数,其代码如下:

 
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
 
       
private int writeChunk(byte[] buffer, int offset, int length) throws IOException {
if (bytesTillCompressionStarts() == 0 && !currentlyCompressing()) {
// Compression will begin immediately.
JreDeflateParameters parameters = nextCompressedRange.getMetadata();
if (deflater == null) {
deflater = new Deflater(parameters.level, parameters.nowrap);
} else if (lastDeflateParameters.nowrap != parameters.nowrap) {
// Last deflater must be destroyed because nowrap settings do not match.
deflater.end();
deflater = new Deflater(parameters.level, parameters.nowrap);
}
// Deflater will already have been reset at the end of this method, no need to do it again.
// Just set up the right parameters.
deflater.setLevel(parameters.level);
deflater.setStrategy(parameters.strategy);
deflaterOut = new DeflaterOutputStream(normalOut, deflater, compressionBufferSize);
}
int numBytesToWrite;
OutputStream writeTarget;
if (currentlyCompressing()) {
// Don't write past the end of the compressed range.
numBytesToWrite = ( int) Math.min(length, bytesTillCompressionEnds());
writeTarget = deflaterOut;
} else {
writeTarget = normalOut;
if (nextCompressedRange == null) {
// All compression ranges have been consumed.
numBytesToWrite = length;
} else {
// Don't write past the point where the next compressed range begins.
numBytesToWrite = ( int) Math.min(length, bytesTillCompressionStarts());
}
}
writeTarget.write(buffer, offset, numBytesToWrite);
numBytesWritten += numBytesToWrite;
if (currentlyCompressing() && bytesTillCompressionEnds() == 0) {
// Compression range complete. Finish the output and set up for the next run.
deflaterOut.finish();
deflaterOut.flush();
deflaterOut = null;
deflater.reset();
lastDeflateParameters = nextCompressedRange.getMetadata();
if (rangeIterator.hasNext()) {
// More compression ranges await in the future.
nextCompressedRange = rangeIterator.next();
} else {
// All compression ranges have been consumed.
nextCompressedRange = null;
deflater.end();
deflater = null;
}
}
return numBytesToWrite;
}
private boolean currentlyCompressing() {
return deflaterOut != null;
}
private long bytesTillCompressionStarts() {
if (nextCompressedRange == null) {
// All compression ranges have been consumed
return - 1L;
}
return nextCompressedRange.getOffset() - numBytesWritten;
}
private long bytesTillCompressionEnds() {
if (nextCompressedRange == null) {
// All compression ranges have been consumed
return - 1L;
}
return (nextCompressedRange.getOffset() + nextCompressedRange.getLength()) - numBytesWritten;
}

合成算法的核心就是这个函数了,这个函数设计的十分巧妙,建议打个断点跑一跑,好好理解一下。这里简单介绍一下这个过程。

  • 在PartiallyCompressingOutputStream的构造函数中,获得了compressionRanges的第一个数据
  • 判断写入的数据距离下一个压缩数据开始如果是0,且当前并不在压缩,则获得压缩设置,即压缩级别,编码策略,nowrap,并进行设置。并包装输出流为压缩流。
  • 如果当前正在压缩,则判断当前写入的数据长度和待压缩的数据长度,取其中小的一个,设置目标输出流为压缩流,即负责压缩工作,而不是拷贝工作。
  • 如果当前不在压缩,如果没有下一个压缩数据了,则直接写入对应长度的数据,如果还有下一个压缩数据,则取当前写入数据的长度和距离下一个压缩数据的偏移位置的长度,取其中小的一个,设置目标输出流为正常流,即进行拷贝工作,而不是压缩工作
  • 判断当前是否正在压缩,并且当前节点所有压缩数据都已经写入完全,执行压缩流的finish和flush操作,重置压缩相关配置项,并移动待压缩的数据到下一条记录。
  • 重复以上操作,直到所有数据写入完全。

过程比较复杂,同样的用一张图来表示:

recompress.png

合成的新的差量友好的文件数据如上图表示。

当遇到绿色的gap区域时,则执行二进制拷贝操作,将其拷贝到输出流去,当遇到红色的已经解压的数据,会使用对应的压缩级别,编码策略,nowrap参数将数据进行压缩,将蓝色的剩余数据写入目标数据。

就这样整个合成操作就完成了。

这样为何能完成zip文件的合成呢,上面的解析已经很清楚了,其实Google Archive Patch记录了所有新文件中需要重新压缩的数据的参数,对于这些数据,使用这些参数压缩,得到对应的压缩数据,写入其在新文件的真实位置,而对于Zip文件中的其他数据,则执行的是拷贝操作,这样两种操作合起来,最终就产生了新的Zip文件。且对于Apk来说,我们也无需关心其签名。

这个过程真是巧妙,感叹一下 !

Patch文件的压缩和解压

Google Archive Patch不对patch文件进行压缩,压缩工作需要自己进行,保证patch文件的大小很小,而客户端接受到patch后需要对应的解压。这么做,保证了压缩patch算法的充分自由,可自行选择,方便扩展。

通用的差量生成和合成框架

这几天简单的实现了一个通用的差量生成和合成框架,github地址见 CorePatch ,目前已经实现bsdiff和Google Archive Patch以及全量合成(直接拷贝文件)

优化

生成差量文件优化

生成差量文件使用的是bsdiff,但是对应的基础文件经过解压之后,其文件大小大大变大,导致生成差量文件的时间大大增加,这里没有办法优化,唯一的优化点就是使用其他更优差量生成算法,而不是BsDiff算法。

合成新文件优化

合成使用BsPatch进行合成,这个过程是十分快的,因此这里可以不优化,但是需要优化的点是合成新的Zip文件过程,即上面提到的writeChunk函数,而这个函数,唯一的耗时点就是压缩操作,基本上压缩操作耗时占全部耗时的80%-90%左右,所以这里基本没什么优化点。

总结

Google Archive Patch的核心是生成差量友好文件,应用差量算法,记录新文件差量友好文件中需要重新压缩的偏移和长度,应用合成算法合成新文件时,对于需要重新压缩的数据,用patch中的压缩相关的参数进行压缩,得到压缩数据,而对于非压缩数据,如Zip文件格式中其他数据,则执行拷贝操作。最终完美的合成了新文件,这种方式的优点是patch比基于文件级别的bsdiff生成的要小,缺点是生成时间长,合成时间长。

该算法核心的一个基本要求就是使用相同的压缩级别,编码策略和nowrap参数,对相同的数据进行压缩,得到的数据数据。如果这个前提如果不满足,则该算法就没有意义了。

http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/
http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/

你可能感兴趣的:(Android,Google,Archive,Patch,补丁,源码)