受完整性保护的内容
为了保护 APK 内容,APK 包含以下 4 个部分:
第 1、3 和 4 部分的完整性通过其内容的一个或多个摘要实施保护,这些摘要存储在 signed data 分块中,而这些分块则通过一个或多个签名实施保护。
第 1、3 和 4 部分的摘要采用以下计算方式,类似于两级 Merkle 树。每个部分都会被拆分成多个大小为 1MB(220 个字节)的连续块。每个部分的最后一个块可能会短一些。每个块的摘要均通过字节 0xa5 的串联、块的长度(采用小端字节序的 uint32 值,以字节数计)和块的内容进行计算。顶级摘要通过字节 0x5a 的串联、块数(采用小端字节序的 uint32 值)以及块的摘要的串联(按照块在 APK 中显示的顺序)进行计算。摘要以分块方式计算,以便通过并行处理来加快计算速度。
由于第 4 部分(ZIP 中央目录结尾)包含“ZIP 中央目录”的偏移量,因此该部分的保护比较复杂。当“APK 签名分块”的大小发生变化(例如,添加了新签名)时,偏移量也会随之改变。因此,在通过“ZIP 中央目录结尾”计算摘要时,必须将包含“ZIP 中央目录”偏移量的字段视为包含“APK 签名分块”的偏移量。
具体代码实现是在ApkSigningBlockUtils类的verifyIntegrity(contentDigests, apk, signatureInfo)中,代码如下:
static void verifyIntegrity(
Map<Integer, byte[]> expectedDigests,
RandomAccessFile apk,
SignatureInfo signatureInfo) throws SecurityException {
if (expectedDigests.isEmpty()) {
throw new SecurityException("No digests provided");
}
boolean neverVerified = true;
Map<Integer, byte[]> expected1MbChunkDigests = new ArrayMap<>();
if (expectedDigests.containsKey(CONTENT_DIGEST_CHUNKED_SHA256)) {
expected1MbChunkDigests.put(CONTENT_DIGEST_CHUNKED_SHA256,
expectedDigests.get(CONTENT_DIGEST_CHUNKED_SHA256));
}
if (expectedDigests.containsKey(CONTENT_DIGEST_CHUNKED_SHA512)) {
expected1MbChunkDigests.put(CONTENT_DIGEST_CHUNKED_SHA512,
expectedDigests.get(CONTENT_DIGEST_CHUNKED_SHA512));
}
if (!expected1MbChunkDigests.isEmpty()) {
try {
verifyIntegrityFor1MbChunkBasedAlgorithm(expected1MbChunkDigests, apk.getFD(),
signatureInfo);
neverVerified = false;
} catch (IOException e) {
throw new SecurityException("Cannot get FD", e);
}
}
if (expectedDigests.containsKey(CONTENT_DIGEST_VERITY_CHUNKED_SHA256)) {
verifyIntegrityForVerityBasedAlgorithm(
expectedDigests.get(CONTENT_DIGEST_VERITY_CHUNKED_SHA256), apk, signatureInfo);
neverVerified = false;
}
if (neverVerified) {
throw new SecurityException("No known digest exists for integrity check");
}
}
参数expectedDigests是在v2分块中signer序列中最好算法对应的摘要值。它是通过验证签名时返回的,可以参考之前的文章Android APK文件的签名V2查找、验证。
如果expectedDigests中包含CONTENT_DIGEST_CHUNKED_SHA256或者CONTENT_DIGEST_CHUNKED_SHA512两者中的一个或者全部包括,会将它包括的算法及摘要值放入expected1MbChunkDigests变量中。接下来就要对其中的摘要算法、摘要值来验证文件的完整性。
接下来如果expectedDigests中包括CONTENT_DIGEST_VERITY_CHUNKED_SHA256摘要算法,则会构建Merkle树,并且对根执行摘要算法,得到摘要与从v2分块中得到的摘要值进行比对,如果相等则认为文件完整,否则会报异常。
它的代码是在verifyIntegrityFor1MbChunkBasedAlgorithm()中,
private static void verifyIntegrityFor1MbChunkBasedAlgorithm(
Map<Integer, byte[]> expectedDigests,
FileDescriptor apkFileDescriptor,
SignatureInfo signatureInfo) throws SecurityException {
int[] digestAlgorithms = new int[expectedDigests.size()];
int digestAlgorithmCount = 0;
for (int digestAlgorithm : expectedDigests.keySet()) {
digestAlgorithms[digestAlgorithmCount] = digestAlgorithm;
digestAlgorithmCount++;
}
byte[][] actualDigests;
try {
actualDigests = computeContentDigestsPer1MbChunk(digestAlgorithms, apkFileDescriptor,
signatureInfo);
} catch (DigestException e) {
throw new SecurityException("Failed to compute digest(s) of contents", e);
}
for (int i = 0; i < digestAlgorithms.length; i++) {
int digestAlgorithm = digestAlgorithms[i];
byte[] expectedDigest = expectedDigests.get(digestAlgorithm);
byte[] actualDigest = actualDigests[i];
if (!MessageDigest.isEqual(expectedDigest, actualDigest)) {
throw new SecurityException(
getContentDigestAlgorithmJcaDigestAlgorithm(digestAlgorithm)
+ " digest of contents did not verify");
}
}
}
可以看到,digestAlgorithms数组中存的是摘要方法,actualDigests数组得到的是不同的摘要算法的摘要值,它是通过computeContentDigestsPer1MbChunk()方法计算出来的。最后,通过计算得到的摘要和之前从v2分块中得到的进行比对,如果不等,就会报出异常。
可见这块,主要就是理解computeContentDigestsPer1MbChunk()方法,它就是根据前面官网描述的算法来进行计算的。
下面就看看computeContentDigestsPer1MbChunk()方法:
public static byte[][] computeContentDigestsPer1MbChunk(int[] digestAlgorithms,
FileDescriptor apkFileDescriptor, SignatureInfo signatureInfo) throws DigestException {
// We need to verify the integrity of the following three sections of the file:
// 1. Everything up to the start of the APK Signing Block.
// 2. ZIP Central Directory.
// 3. ZIP End of Central Directory (EoCD).
// Each of these sections is represented as a separate DataSource instance below.
// To handle large APKs, these sections are read in 1 MB chunks using memory-mapped I/O to
// avoid wasting physical memory. In most APK verification scenarios, the contents of the
// APK are already there in the OS's page cache and thus mmap does not use additional
// physical memory.
DataSource beforeApkSigningBlock =
DataSource.create(apkFileDescriptor, 0, signatureInfo.apkSigningBlockOffset);
DataSource centralDir =
DataSource.create(
apkFileDescriptor, signatureInfo.centralDirOffset,
signatureInfo.eocdOffset - signatureInfo.centralDirOffset);
// For the purposes of integrity verification, ZIP End of Central Directory's field Start of
// Central Directory must be considered to point to the offset of the APK Signing Block.
ByteBuffer eocdBuf = signatureInfo.eocd.duplicate();
eocdBuf.order(ByteOrder.LITTLE_ENDIAN);
ZipUtils.setZipEocdCentralDirectoryOffset(eocdBuf, signatureInfo.apkSigningBlockOffset);
DataSource eocd = new ByteBufferDataSource(eocdBuf);
return computeContentDigestsPer1MbChunk(digestAlgorithms,
new DataSource[]{beforeApkSigningBlock, centralDir, eocd});
}
需要参与验证的有三部分,签名块之前的zip条目内容、中央目录、中央目录尾部。
参数signatureInfo包含各个数据块的偏移量。所以分成了三个数据源beforeApkSigningBlock、centralDir、eocd。需要注意的是,中央目录结尾中的中央目录偏移量现在是添加上签名块的偏移量,所以现在将它还原成没添加签名块的偏移量。还原偏移量的操作是由ZipUtils.setZipEocdCentralDirectoryOffset(eocdBuf, signatureInfo.apkSigningBlockOffset)实现的。
这里是将三个数据块都封装成DataSource类。DataSource类有一个feedIntoDataDigester(DataDigester md, long offset, int size)方法,它的第一个参数是DataDigester类对象,它用来生成摘要的。因为数据块都要分成1M字节大小,所以还需要数据中哪些数据需要用来生成摘要,所以后面两个参数就是来定位DataSource中的数据。
其实这里的DataSource类对象,根据文件是否放在增量文件系统上,来决定是ReadFileDataSource还是MemoryMappedFileDataSource对象。MemoryMappedFileDataSource是用内存映射来实现的数据读取,ReadFileDataSource则是使用的pread系统调用来实现读取文件偏移位置的数,它比内存映射速度慢,但是更安全。
接着看下看一下computeContentDigestsPer1MbChunk的重载方法,代码有点长,分开看,看一下它的第一段:
private static byte[][] computeContentDigestsPer1MbChunk(
int[] digestAlgorithms,
DataSource[] contents) throws DigestException {
// For each digest algorithm the result is computed as follows:
// 1. Each segment of contents is split into consecutive chunks of 1 MB in size.
// The final chunk will be shorter iff the length of segment is not a multiple of 1 MB.
// No chunks are produced for empty (zero length) segments.
// 2. The digest of each chunk is computed over the concatenation of byte 0xa5, the chunk's
// length in bytes (uint32 little-endian) and the chunk's contents.
// 3. The output digest is computed over the concatenation of the byte 0x5a, the number of
// chunks (uint32 little-endian) and the concatenation of digests of chunks of all
// segments in-order.
long totalChunkCountLong = 0;
for (DataSource input : contents) {
totalChunkCountLong += getChunkCount(input.size());
}
if (totalChunkCountLong >= Integer.MAX_VALUE / 1024) {
throw new DigestException("Too many chunks: " + totalChunkCountLong);
}
int totalChunkCount = (int) totalChunkCountLong;
byte[][] digestsOfChunks = new byte[digestAlgorithms.length][];
for (int i = 0; i < digestAlgorithms.length; i++) {
int digestAlgorithm = digestAlgorithms[i];
int digestOutputSizeBytes = getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm);
byte[] concatenationOfChunkCountAndChunkDigests =
new byte[5 + totalChunkCount * digestOutputSizeBytes];
concatenationOfChunkCountAndChunkDigests[0] = 0x5a;
setUnsignedInt32LittleEndian(
totalChunkCount,
concatenationOfChunkCountAndChunkDigests,
1);
digestsOfChunks[i] = concatenationOfChunkCountAndChunkDigests;
}
getChunkCount(input.size())是每个数据源按照1M字节分块,得到数据源的块数。上面我们知道,目前是三大数据源,totalChunkCountLong 就是它们按照1M字节分块分的总块数。每个数据源最后一个块可能不够1M,算一个块数。
下面是根据算法数量,来拼第一层数据摘要块。
在循环中,通过摘要算法来得到摘要的长度。这是通过getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm)方法得到。得到摘要数据长度之后,再加上多少个摘要,就能知道摘要拼接到一起的长度。因为拼接的数据开头是0x5a开头,接着4个字节是摘要的数量,所以摘要数据块的长度要加上5。然后将前面五个字节赋值。这样按照算法就将拼接摘要数据块放入digestsOfChunks 数组中。
了解一下摘要的长度,看一下getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm):
private static int getContentDigestAlgorithmOutputSizeBytes(int digestAlgorithm) {
switch (digestAlgorithm) {
case CONTENT_DIGEST_CHUNKED_SHA256:
case CONTENT_DIGEST_VERITY_CHUNKED_SHA256:
return 256 / 8;
case CONTENT_DIGEST_CHUNKED_SHA512:
return 512 / 8;
default:
throw new IllegalArgumentException(
"Unknown content digest algorthm: " + digestAlgorithm);
}
}
可见,算法如果是SHA256时,摘要数据长度为32;如果为SHA512时,摘要长度为64。
computeContentDigestsPer1MbChunk的重载方法,看一下它的第二段代码:
byte[] chunkContentPrefix = new byte[5];
chunkContentPrefix[0] = (byte) 0xa5;
int chunkIndex = 0;
MessageDigest[] mds = new MessageDigest[digestAlgorithms.length];
for (int i = 0; i < digestAlgorithms.length; i++) {
String jcaAlgorithmName =
getContentDigestAlgorithmJcaDigestAlgorithm(digestAlgorithms[i]);
try {
mds[i] = MessageDigest.getInstance(jcaAlgorithmName);
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(jcaAlgorithmName + " digest not supported", e);
}
}
// TODO: Compute digests of chunks in parallel when beneficial. This requires some research
// into how to parallelize (if at all) based on the capabilities of the hardware on which
// this code is running and based on the size of input.
DataDigester digester = new MultipleDigestDataDigester(mds);
int dataSourceIndex = 0;
for (DataSource input : contents) {
long inputOffset = 0;
long inputRemaining = input.size();
while (inputRemaining > 0) {
int chunkSize = (int) Math.min(inputRemaining, CHUNK_SIZE_BYTES);
setUnsignedInt32LittleEndian(chunkSize, chunkContentPrefix, 1);
for (int i = 0; i < mds.length; i++) {
mds[i].update(chunkContentPrefix);
}
try {
input.feedIntoDataDigester(digester, inputOffset, chunkSize);
} catch (IOException e) {
throw new DigestException(
"Failed to digest chunk #" + chunkIndex + " of section #"
+ dataSourceIndex,
e);
}
for (int i = 0; i < digestAlgorithms.length; i++) {
int digestAlgorithm = digestAlgorithms[i];
byte[] concatenationOfChunkCountAndChunkDigests = digestsOfChunks[i];
int expectedDigestSizeBytes =
getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm);
MessageDigest md = mds[i];
int actualDigestSizeBytes =
md.digest(
concatenationOfChunkCountAndChunkDigests,
5 + chunkIndex * expectedDigestSizeBytes,
expectedDigestSizeBytes);
if (actualDigestSizeBytes != expectedDigestSizeBytes) {
throw new RuntimeException(
"Unexpected output size of " + md.getAlgorithm() + " digest: "
+ actualDigestSizeBytes);
}
}
inputOffset += chunkSize;
inputRemaining -= chunkSize;
chunkIndex++;
}
dataSourceIndex++;
}
这块代码就是要计算摘要值了。
因为分割的数据块计算摘要时,要在它前面固定加上0xa5和4个字节数据块长度,所以这里生成一个5个字节的前缀chunkContentPrefix,并将第一个字节设置为0xa5。
接着通过循环,得到对应算法的Jca算法名。再通过它得到计算摘要的MessageDigest类对象。
接着开始for循环数据源,先得到数据块大小chunkSize,inputRemaining是数据源剩余数据,有可能剩余数据不够1M,所以这里通过Math.min(inputRemaining, CHUNK_SIZE_BYTES),取它俩的最小值。接着就将数据块字节数放入chunkContentPrefix的第1-4字节。
接着就将每个数据块的前缀更新到摘要内容中,紧接着就调用数据源方法feedIntoDataDigester(digester, inputOffset, chunkSize)来将数据块内容更新到摘要内容中。
紧接着又循环算法,通过MessageDigest类对象的digest()方法将对应的摘要值放入digestsOfChunks[i]中对应块的偏移位置。
接着设置变量值,计算下一个数据块的摘要。直到三个数据源都执行完毕。
这里看一下,数据源是怎么将数据,更新到摘要内容中。这里是调用的input.feedIntoDataDigester(digester, inputOffset, chunkSize)实现的,input前面说了,它实际可能是ReadFileDataSource或者MemoryMappedFileDataSource对象,这里拿MemoryMappedFileDataSource对象来说一下,看一下它的代码:
@Override
public void feedIntoDataDigester(DataDigester md, long offset, int size)
throws IOException, DigestException {
// IMPLEMENTATION NOTE: After a lot of experimentation, the implementation of this
// method was settled on a straightforward mmap with prefaulting.
//
// This method is not using FileChannel.map API because that API does not offset a way
// to "prefault" the resulting memory pages. Without prefaulting, performance is about
// 10% slower on small to medium APKs, but is significantly worse for APKs in 500+ MB
// range. FileChannel.load (which currently uses madvise) doesn't help. Finally,
// invoking madvise (MADV_SEQUENTIAL) after mmap with prefaulting wastes quite a bit of
// time, which is not compensated for by faster reads.
// We mmap the smallest region of the file containing the requested data. mmap requires
// that the start offset in the file must be a multiple of memory page size. We thus may
// need to mmap from an offset less than the requested offset.
long filePosition = mFilePosition + offset;
long mmapFilePosition =
(filePosition / MEMORY_PAGE_SIZE_BYTES) * MEMORY_PAGE_SIZE_BYTES;
int dataStartOffsetInMmapRegion = (int) (filePosition - mmapFilePosition);
long mmapRegionSize = size + dataStartOffsetInMmapRegion;
long mmapPtr = 0;
try {
mmapPtr = Os.mmap(
0, // let the OS choose the start address of the region in memory
mmapRegionSize,
OsConstants.PROT_READ,
OsConstants.MAP_SHARED | OsConstants.MAP_POPULATE, // "prefault" all pages
mFd,
mmapFilePosition);
ByteBuffer buf = new DirectByteBuffer(
size,
mmapPtr + dataStartOffsetInMmapRegion,
mFd, // not really needed, but just in case
null, // no need to clean up -- it's taken care of by the finally block
true // read only buffer
);
md.consume(buf);
} catch (ErrnoException e) {
throw new IOException("Failed to mmap " + mmapRegionSize + " bytes", e);
} finally {
if (mmapPtr != 0) {
try {
Os.munmap(mmapPtr, mmapRegionSize);
} catch (ErrnoException ignored) { }
}
}
}
可以看到这块是通过Os.mmap()方法实现内存映射得到内存地址mmapPtr。让后封装成DirectByteBuffer对象,再调用md.consume(buf)将数据读取到生成摘要内容中。为了实现内存页对齐,还通过一番计算,得到内存映射的文件的位置,得到与实际数据的偏移量,所以取数据时,也需要将偏移量加上。
这里的md实际是MultipleDigestDataDigester对象,它是包含多个MessageDigest对象的,看下它的consume()方法:
private static class MultipleDigestDataDigester implements DataDigester {
private final MessageDigest[] mMds;
MultipleDigestDataDigester(MessageDigest[] mds) {
mMds = mds;
}
@Override
public void consume(ByteBuffer buffer) {
buffer = buffer.slice();
for (MessageDigest md : mMds) {
buffer.position(0);
md.update(buffer);
}
}
}
可见,这块是将数据都调用MessageDigest 对象md将数据更新到生成摘要内容中。
computeContentDigestsPer1MbChunk的重载方法,看一下它的最后一段代码:
byte[][] result = new byte[digestAlgorithms.length][];
for (int i = 0; i < digestAlgorithms.length; i++) {
int digestAlgorithm = digestAlgorithms[i];
byte[] input = digestsOfChunks[i];
String jcaAlgorithmName = getContentDigestAlgorithmJcaDigestAlgorithm(digestAlgorithm);
MessageDigest md;
try {
md = MessageDigest.getInstance(jcaAlgorithmName);
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(jcaAlgorithmName + " digest not supported", e);
}
byte[] output = md.digest(input);
result[i] = output;
}
return result;
}
现在各个数据源的摘要按照摘要算法都已经在digestsOfChunks数组中了。现在需要对它再做一次摘要算法,生成摘要。
所以这块就调用新生成的MessageDigest对象的digest(input)生成摘要值,按照数组中算法的次序,放入result 中,并返回。
它是通过verifyIntegrityForVerityBasedAlgorithm()来实现的
private static void verifyIntegrityForVerityBasedAlgorithm(
byte[] expectedDigest,
RandomAccessFile apk,
SignatureInfo signatureInfo) throws SecurityException {
try {
byte[] expectedRootHash = parseVerityDigestAndVerifySourceLength(expectedDigest,
apk.length(), signatureInfo);
VerityBuilder.VerityResult verity = VerityBuilder.generateApkVerityTree(apk,
signatureInfo, new ByteBufferFactory() {
@Override
public ByteBuffer create(int capacity) {
return ByteBuffer.allocate(capacity);
}
});
if (!Arrays.equals(expectedRootHash, verity.rootHash)) {
throw new SecurityException("APK verity digest of contents did not verify");
}
} catch (DigestException | IOException | NoSuchAlgorithmException e) {
throw new SecurityException("Error during verification", e);
}
}
参数expectedDigest是从v2分块中得到的摘要值,parseVerityDigestAndVerifySourceLength()得到摘要值。
VerityBuilder.generateApkVerityTree()通过构建Merkle树,然后对其树根执行摘要算法,得到摘要值。得到摘要的值就在VerityBuilder.VerityResult类的成员rootHash中。
如果两个摘要不等,则认为验证失败。
/**
* Return the verity digest only if the length of digest content looks correct.
* When verity digest is generated, the last incomplete 4k chunk is padded with 0s before
* hashing. This means two almost identical APKs with different number of 0 at the end will have
* the same verity digest. To avoid this problem, the length of the source content (excluding
* Signing Block) is appended to the verity digest, and the digest is returned only if the
* length is consistent to the current APK.
*/
static byte[] parseVerityDigestAndVerifySourceLength(
byte[] data, long fileSize, SignatureInfo signatureInfo) throws SecurityException {
// FORMAT:
// OFFSET DATA TYPE DESCRIPTION
// * @+0 bytes uint8[32] Merkle tree root hash of SHA-256
// * @+32 bytes int64 Length of source data
int kRootHashSize = 32;
int kSourceLengthSize = 8;
if (data.length != kRootHashSize + kSourceLengthSize) {
throw new SecurityException("Verity digest size is wrong: " + data.length);
}
ByteBuffer buffer = ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN);
buffer.position(kRootHashSize);
long expectedSourceLength = buffer.getLong();
long signingBlockSize = signatureInfo.centralDirOffset
- signatureInfo.apkSigningBlockOffset;
if (expectedSourceLength != fileSize - signingBlockSize) {
throw new SecurityException("APK content size did not verify");
}
return Arrays.copyOfRange(data, 0, kRootHashSize);
}
v2分块中得到的值是40字节,前32字节是摘要值,后8字节是生成摘要数据的内容的长度。
apk中生成该摘要的内容是不包括签名分块的数据的,所以比长度时,需要将签名分块的长度去掉。
上面方法就很明显了,如果长度对不上,就报出异常。最后将前32字节内容取出返回。
这块的实现是在VerityBuilder.generateApkVerityTree()中,最后它的实现在generateVerityTreeInternal()中:
@NonNull
private static VerityResult generateVerityTreeInternal(@NonNull RandomAccessFile apk,
@NonNull ByteBufferFactory bufferFactory, @Nullable SignatureInfo signatureInfo)
throws IOException, SecurityException, NoSuchAlgorithmException, DigestException {
long signingBlockSize =
signatureInfo.centralDirOffset - signatureInfo.apkSigningBlockOffset;
long dataSize = apk.length() - signingBlockSize;
int[] levelOffset = calculateVerityLevelOffset(dataSize);
int merkleTreeSize = levelOffset[levelOffset.length - 1];
ByteBuffer output = bufferFactory.create(
merkleTreeSize
+ CHUNK_SIZE_BYTES); // maximum size of apk-verity metadata
output.order(ByteOrder.LITTLE_ENDIAN);
ByteBuffer tree = slice(output, 0, merkleTreeSize);
byte[] apkRootHash = generateVerityTreeInternal(apk, signatureInfo, DEFAULT_SALT,
levelOffset, tree);
return new VerityResult(output, merkleTreeSize, apkRootHash);
}
先得到生成Merkle树的数据内容的长度dataSize,主要是将APK中签名分块的长度去掉。
然后通过dataSize得到树的从根到该层的数据大小。它是通过calculateVerityLevelOffset(dataSize)得到的。所以最后一层,对应数组的最后一个,取出得到merkleTreeSize 就是树的大小。
Merkle树在这里使用ByteBuffer 类对象output 来描述。然后调用generateVerityTreeInternal()来生成output 中的数据值,并且得到根的摘要值。
最后将树的数据、树的大小、树根摘要值封装成VerityResult对象返回。
private static int[] calculateVerityLevelOffset(long fileSize) {
ArrayList<Long> levelSize = new ArrayList<>();
while (true) {
long levelDigestSize = divideRoundup(fileSize, CHUNK_SIZE_BYTES) * DIGEST_SIZE_BYTES;
long chunksSize = CHUNK_SIZE_BYTES * divideRoundup(levelDigestSize, CHUNK_SIZE_BYTES);
levelSize.add(chunksSize);
if (levelDigestSize <= CHUNK_SIZE_BYTES) {
break;
}
fileSize = levelDigestSize;
}
// Reverse and convert to summed area table.
int[] levelOffset = new int[levelSize.size() + 1];
levelOffset[0] = 0;
for (int i = 0; i < levelSize.size(); i++) {
// We don't support verity tree if it is larger then Integer.MAX_VALUE.
levelOffset[i + 1] = levelOffset[i]
+ Math.toIntExact(levelSize.get(levelSize.size() - i - 1));
}
return levelOffset;
}
要理解这块代码,需要知道Merkle树的构造方式。
将APK文件除掉签名分块部分,每隔4096字节分成一块,最后一块不足补0。然后将每一块通过摘要算法计算得到32字节摘要值。这算Merkle树的最后一层。存在集合levelSize中的是32字节摘要的所有数据大小占据的4096字节块数的总和。
Merkle树的倒数第二层则是倒数第一层的摘要数据大小再按照4096字节分块,然后每块计算出来32字节摘要值。再将这些摘要值的大小占据的4096字节块数的总和放到集合levelSize中。
就这样直到摘要的大小小于等于4096字节了,循环结束。这样集合levelSize中的最后一个大小就是一个数据块的大小,4096字节。
接着的levelOffset 数组存储的就是从顶层到该层的所有节点的大小。这样我们就知道了levelSize数组的最后一个数据就是所有节点的大小,就是Merkle树的大小。
它是由generateVerityTreeInternal()实现的:
@NonNull
private static byte[] generateVerityTreeInternal(@NonNull RandomAccessFile apk,
@Nullable SignatureInfo signatureInfo, @Nullable byte[] salt,
@NonNull int[] levelOffset, @NonNull ByteBuffer output)
throws IOException, NoSuchAlgorithmException, DigestException {
// 1. Digest the apk to generate the leaf level hashes.
assertSigningBlockAlignedAndHasFullPages(signatureInfo);
generateApkVerityDigestAtLeafLevel(apk, signatureInfo, salt, slice(output,
levelOffset[levelOffset.length - 2], levelOffset[levelOffset.length - 1]));
// 2. Digest the lower level hashes bottom up.
for (int level = levelOffset.length - 3; level >= 0; level--) {
ByteBuffer inputBuffer = slice(output, levelOffset[level + 1], levelOffset[level + 2]);
ByteBuffer outputBuffer = slice(output, levelOffset[level], levelOffset[level + 1]);
DataSource source = new ByteBufferDataSource(inputBuffer);
BufferedDigester digester = new BufferedDigester(salt, outputBuffer);
consumeByChunk(digester, source, CHUNK_SIZE_BYTES);
digester.assertEmptyBuffer();
digester.fillUpLastOutputChunk();
}
// 3. Digest the first block (i.e. first level) to generate the root hash.
byte[] rootHash = new byte[DIGEST_SIZE_BYTES];
BufferedDigester digester = new BufferedDigester(salt, ByteBuffer.wrap(rootHash));
digester.consume(slice(output, 0, CHUNK_SIZE_BYTES));
digester.assertEmptyBuffer();
return rootHash;
}
首先调用generateApkVerityDigestAtLeafLevel(),生成叶子节点。output里面需要存储树的各个节点,叶子节点就是它里面位置从levelOffset[levelOffset.length - 2]开始,到levelOffset[levelOffset.length - 1]结束。levelOffset数组的意思,上面都说过了。
接着通过通过循环,将上层节点摘要数据,按照4096字节分组,然后再生层下一层节点的摘要数据。这样就把output中各个节点的数据赋值。Merkle树也就构建完成。
最后就是给树根做摘要算法,得到摘要。返回rootHash。
它是由generateApkVerityDigestAtLeafLevel()实现:
// 2. Skip APK Signing Block and continue digesting, until the Central Directory offset
// field in EoCD is reached.
long eocdCdOffsetFieldPosition =
signatureInfo.eocdOffset + ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_OFFSET;
consumeByChunk(digester,
DataSource.create(apk.getFD(), signatureInfo.centralDirOffset,
eocdCdOffsetFieldPosition - signatureInfo.centralDirOffset),
MMAP_REGION_SIZE_BYTES);
// 3. Consume offset of Signing Block as an alternative EoCD.
ByteBuffer alternativeCentralDirOffset = ByteBuffer.allocate(
ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_SIZE).order(ByteOrder.LITTLE_ENDIAN);
alternativeCentralDirOffset.putInt(Math.toIntExact(signatureInfo.apkSigningBlockOffset));
alternativeCentralDirOffset.flip();
digester.consume(alternativeCentralDirOffset);
// 4. Read from end of the Central Directory offset field in EoCD to the end of the file.
long offsetAfterEocdCdOffsetField =
eocdCdOffsetFieldPosition + ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_SIZE;
consumeByChunk(digester,
DataSource.create(apk.getFD(), offsetAfterEocdCdOffsetField,
apk.length() - offsetAfterEocdCdOffsetField),
MMAP_REGION_SIZE_BYTES);
// 5. Pad 0s up to the nearest 4096-byte block before hashing.
int lastIncompleteChunkSize = (int) (apk.length() % CHUNK_SIZE_BYTES);
if (lastIncompleteChunkSize != 0) {
digester.consume(ByteBuffer.allocate(CHUNK_SIZE_BYTES - lastIncompleteChunkSize));
}
digester.assertEmptyBuffer();
// 6. Fill up the rest of buffer with 0s.
digester.fillUpLastOutputChunk();
}
首先生成一个BufferedDigester类对象,它接收一个盐和ByteBuffer对象初始化,它是一个缓存生成摘要帮助类。它会将收到的数据凑齐成4096字节块之后,生成一个摘要,将摘要放入ByteBuffer对象对应的位置中,相当于生成了节点。上面知道,output的开始位置为levelOffset[levelOffset.length - 2],它就是叶子节点开始的位置。
因为APK中签名数据块不参与生成摘要,所以需要将它筛除。所以在第二个consumeByChunk()方法时,数据源的开始位置为signatureInfo.centralDirOffset,它就是中央目录开始的位置。
在中央目录结尾的数据块中,需要注意,它在偏移ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_OFFSET位置处存放着中央目录的偏移位置,不过它是插入签名数据块之后的位置,但是参与计算的应该是未插入签名之前的位置偏移量,所以这块需要处理一下。代码中 注释3. 位置就是处理这个的。
调用consumeByChunk()方法处理了zip条目数据块、中央目录、中央目录尾部数据之后,如果发现最后一个数据块位置不满4096时,需要将它补0,直到4096字节。这块处理,使用的是apk.length() % CHUNK_SIZE_BYTES,得到的最后一个数据块的数量。但是apk.length() 是APK的数据大小,参与计算的不包括签名数据块,不太明白他这个为啥没有减去签名数据块的大小。
最后digester.fillUpLastOutputChunk()就是将叶子子节点的最后一个4096字节块没占满的话,将它用0填充。
这块向output中填充节点数据,使用的都是consumeByChunk()方法,所以有必要读一下它的代码:
private static void consumeByChunk(DataDigester digester, DataSource source, int chunkSize)
throws IOException, DigestException {
long inputRemaining = source.size();
long inputOffset = 0;
while (inputRemaining > 0) {
int size = (int) Math.min(inputRemaining, chunkSize);
source.feedIntoDataDigester(digester, inputOffset, size);
inputOffset += size;
inputRemaining -= size;
}
}
代码挺简单,就是从数据源中按块大小,调用数据源的feedIntoDataDigester()方法,feedIntoDataDigester()方法主要就是调用digester的comsume()方法,这里digester是BufferedDigester,所以看下它的实现:
@Override
public void consume(ByteBuffer buffer) throws DigestException {
int offset = buffer.position();
int remaining = buffer.remaining();
while (remaining > 0) {
int allowance = (int) Math.min(remaining, BUFFER_SIZE - mBytesDigestedSinceReset);
// Optimization: set the buffer limit to avoid allocating a new ByteBuffer object.
buffer.limit(buffer.position() + allowance);
mMd.update(buffer);
offset += allowance;
remaining -= allowance;
mBytesDigestedSinceReset += allowance;
if (mBytesDigestedSinceReset == BUFFER_SIZE) {
mMd.digest(mDigestBuffer, 0, mDigestBuffer.length);
mOutput.put(mDigestBuffer);
// After digest, MessageDigest resets automatically, so no need to reset again.
if (mSalt != null) {
mMd.update(mSalt);
}
mBytesDigestedSinceReset = 0;
}
}
}
BUFFER_SIZE 是4096,mBytesDigestedSinceReset是每次向mMd更新多少数据,在更新数据不够BUFFER_SIZE 时,是不会进行计算摘要的。等到更新的数据到达BUFFER_SIZE 时,mMd计算摘要,然后将它放入mOutput中。然后重新计算mBytesDigestedSinceReset 值,等待下次到达BUFFER_SIZE 数据时,再进行计算。这里BufferedDigester的成员mOutput就是存储Merkle树节点的ByteBuffer对象。
这样generateApkVerityDigestAtLeafLevel()执行完毕之后,就将叶子节点都填充完毕了。
现在我们知道了APK完整性的校验方式。
一种是对应CONTENT_DIGEST_CHUNKED_SHA256(对应摘要算法"SHA-256")或CONTENT_DIGEST_CHUNKED_SHA512(对应摘要算法"SHA-512")算法的摘要验证,它是将参与摘要的数据分成1M字节大小,并用字节 0xa5 的串联、块的长度(采用小端字节序的 uint32 值,以字节数计)和块的内容进行计算摘要。顶级摘要通过字节 0x5a 的串联、块数(采用小端字节序的 uint32 值)以及块的摘要的串联(按照块在 APK 中显示的顺序)进行计算得到摘要,拿他和v2分块中的摘要进行比较,相等通过。
另外一种是对应CONTENT_DIGEST_VERITY_CHUNKED_SHA256(对应摘要算法"SHA-256"),它将构建Merkle树,它的分块大小是4096字节,并且得到树根的摘要值,拿它和v2分块中的摘要进行比对,如果一致通过。