Android APK文件完整性验证

APK文件完整性官网描述

受完整性保护的内容
  为了保护 APK 内容,APK 包含以下 4 个部分:

  1. ZIP 条目的内容(从偏移量 0 处开始一直到“APK 签名分块”的起始位置)
  2. APK 签名分块
  3. ZIP 中央目录
  4. ZIP 中央目录结尾

apksections

签名后的各个 APK 部分
  APK 签名方案 v2 负责保护第 1、3、4 部分的完整性,以及第 2 部分包含的“APK 签名方案 v2 分块”中的 signed data 分块的完整性。

  第 1、3 和 4 部分的完整性通过其内容的一个或多个摘要实施保护,这些摘要存储在 signed data 分块中,而这些分块则通过一个或多个签名实施保护。

  第 1、3 和 4 部分的摘要采用以下计算方式,类似于两级 Merkle 树。每个部分都会被拆分成多个大小为 1MB(220 个字节)的连续块。每个部分的最后一个块可能会短一些。每个块的摘要均通过字节 0xa5 的串联、块的长度(采用小端字节序的 uint32 值,以字节数计)和块的内容进行计算。顶级摘要通过字节 0x5a 的串联、块数(采用小端字节序的 uint32 值)以及块的摘要的串联(按照块在 APK 中显示的顺序)进行计算。摘要以分块方式计算,以便通过并行处理来加快计算速度。
Android APK文件完整性验证_第1张图片

APK 摘要

  由于第 4 部分(ZIP 中央目录结尾)包含“ZIP 中央目录”的偏移量,因此该部分的保护比较复杂。当“APK 签名分块”的大小发生变化(例如,添加了新签名)时,偏移量也会随之改变。因此,在通过“ZIP 中央目录结尾”计算摘要时,必须将包含“ZIP 中央目录”偏移量的字段视为包含“APK 签名分块”的偏移量。

具体代码实现

  具体代码实现是在ApkSigningBlockUtils类的verifyIntegrity(contentDigests, apk, signatureInfo)中,代码如下:

    static void verifyIntegrity(
            Map<Integer, byte[]> expectedDigests,
            RandomAccessFile apk,
            SignatureInfo signatureInfo) throws SecurityException {
        if (expectedDigests.isEmpty()) {
            throw new SecurityException("No digests provided");
        }

        boolean neverVerified = true;

        Map<Integer, byte[]> expected1MbChunkDigests = new ArrayMap<>();
        if (expectedDigests.containsKey(CONTENT_DIGEST_CHUNKED_SHA256)) {
            expected1MbChunkDigests.put(CONTENT_DIGEST_CHUNKED_SHA256,
                    expectedDigests.get(CONTENT_DIGEST_CHUNKED_SHA256));
        }
        if (expectedDigests.containsKey(CONTENT_DIGEST_CHUNKED_SHA512)) {
            expected1MbChunkDigests.put(CONTENT_DIGEST_CHUNKED_SHA512,
                    expectedDigests.get(CONTENT_DIGEST_CHUNKED_SHA512));
        }
        if (!expected1MbChunkDigests.isEmpty()) {
            try {
                verifyIntegrityFor1MbChunkBasedAlgorithm(expected1MbChunkDigests, apk.getFD(),
                        signatureInfo);
                neverVerified = false;
            } catch (IOException e) {
                throw new SecurityException("Cannot get FD", e);
            }
        }

        if (expectedDigests.containsKey(CONTENT_DIGEST_VERITY_CHUNKED_SHA256)) {
            verifyIntegrityForVerityBasedAlgorithm(
                    expectedDigests.get(CONTENT_DIGEST_VERITY_CHUNKED_SHA256), apk, signatureInfo);
            neverVerified = false;
        }

        if (neverVerified) {
            throw new SecurityException("No known digest exists for integrity check");
        }
    }

  参数expectedDigests是在v2分块中signer序列中最好算法对应的摘要值。它是通过验证签名时返回的,可以参考之前的文章Android APK文件的签名V2查找、验证。
  如果expectedDigests中包含CONTENT_DIGEST_CHUNKED_SHA256或者CONTENT_DIGEST_CHUNKED_SHA512两者中的一个或者全部包括,会将它包括的算法及摘要值放入expected1MbChunkDigests变量中。接下来就要对其中的摘要算法、摘要值来验证文件的完整性。
  接下来如果expectedDigests中包括CONTENT_DIGEST_VERITY_CHUNKED_SHA256摘要算法,则会构建Merkle树,并且对根执行摘要算法,得到摘要与从v2分块中得到的摘要值进行比对,如果相等则认为文件完整,否则会报异常。

针对文件块(分成1M块)验证文件完整性

  它的代码是在verifyIntegrityFor1MbChunkBasedAlgorithm()中,

    private static void verifyIntegrityFor1MbChunkBasedAlgorithm(
            Map<Integer, byte[]> expectedDigests,
            FileDescriptor apkFileDescriptor,
            SignatureInfo signatureInfo) throws SecurityException {
        int[] digestAlgorithms = new int[expectedDigests.size()];
        int digestAlgorithmCount = 0;
        for (int digestAlgorithm : expectedDigests.keySet()) {
            digestAlgorithms[digestAlgorithmCount] = digestAlgorithm;
            digestAlgorithmCount++;
        }
        byte[][] actualDigests;
        try {
            actualDigests = computeContentDigestsPer1MbChunk(digestAlgorithms, apkFileDescriptor,
                    signatureInfo);
        } catch (DigestException e) {
            throw new SecurityException("Failed to compute digest(s) of contents", e);
        }
        for (int i = 0; i < digestAlgorithms.length; i++) {
            int digestAlgorithm = digestAlgorithms[i];
            byte[] expectedDigest = expectedDigests.get(digestAlgorithm);
            byte[] actualDigest = actualDigests[i];
            if (!MessageDigest.isEqual(expectedDigest, actualDigest)) {
                throw new SecurityException(
                        getContentDigestAlgorithmJcaDigestAlgorithm(digestAlgorithm)
                                + " digest of contents did not verify");
            }
        }
    }

  可以看到,digestAlgorithms数组中存的是摘要方法,actualDigests数组得到的是不同的摘要算法的摘要值,它是通过computeContentDigestsPer1MbChunk()方法计算出来的。最后,通过计算得到的摘要和之前从v2分块中得到的进行比对,如果不等,就会报出异常。
  可见这块,主要就是理解computeContentDigestsPer1MbChunk()方法,它就是根据前面官网描述的算法来进行计算的。

computeContentDigestsPer1MbChunk()

下面就看看computeContentDigestsPer1MbChunk()方法:

    public static byte[][] computeContentDigestsPer1MbChunk(int[] digestAlgorithms,
            FileDescriptor apkFileDescriptor, SignatureInfo signatureInfo) throws DigestException {
        // We need to verify the integrity of the following three sections of the file:
        // 1. Everything up to the start of the APK Signing Block.
        // 2. ZIP Central Directory.
        // 3. ZIP End of Central Directory (EoCD).
        // Each of these sections is represented as a separate DataSource instance below.

        // To handle large APKs, these sections are read in 1 MB chunks using memory-mapped I/O to
        // avoid wasting physical memory. In most APK verification scenarios, the contents of the
        // APK are already there in the OS's page cache and thus mmap does not use additional
        // physical memory.

        DataSource beforeApkSigningBlock =
                DataSource.create(apkFileDescriptor, 0, signatureInfo.apkSigningBlockOffset);
        DataSource centralDir =
                DataSource.create(
                        apkFileDescriptor, signatureInfo.centralDirOffset,
                        signatureInfo.eocdOffset - signatureInfo.centralDirOffset);

        // For the purposes of integrity verification, ZIP End of Central Directory's field Start of
        // Central Directory must be considered to point to the offset of the APK Signing Block.
        ByteBuffer eocdBuf = signatureInfo.eocd.duplicate();
        eocdBuf.order(ByteOrder.LITTLE_ENDIAN);
        ZipUtils.setZipEocdCentralDirectoryOffset(eocdBuf, signatureInfo.apkSigningBlockOffset);
        DataSource eocd = new ByteBufferDataSource(eocdBuf);

        return computeContentDigestsPer1MbChunk(digestAlgorithms,
                new DataSource[]{beforeApkSigningBlock, centralDir, eocd});
    }

  需要参与验证的有三部分,签名块之前的zip条目内容、中央目录、中央目录尾部。
  参数signatureInfo包含各个数据块的偏移量。所以分成了三个数据源beforeApkSigningBlock、centralDir、eocd。需要注意的是,中央目录结尾中的中央目录偏移量现在是添加上签名块的偏移量,所以现在将它还原成没添加签名块的偏移量。还原偏移量的操作是由ZipUtils.setZipEocdCentralDirectoryOffset(eocdBuf, signatureInfo.apkSigningBlockOffset)实现的。
  这里是将三个数据块都封装成DataSource类。DataSource类有一个feedIntoDataDigester(DataDigester md, long offset, int size)方法,它的第一个参数是DataDigester类对象,它用来生成摘要的。因为数据块都要分成1M字节大小,所以还需要数据中哪些数据需要用来生成摘要,所以后面两个参数就是来定位DataSource中的数据。
  其实这里的DataSource类对象,根据文件是否放在增量文件系统上,来决定是ReadFileDataSource还是MemoryMappedFileDataSource对象。MemoryMappedFileDataSource是用内存映射来实现的数据读取,ReadFileDataSource则是使用的pread系统调用来实现读取文件偏移位置的数,它比内存映射速度慢,但是更安全。
  接着看下看一下computeContentDigestsPer1MbChunk的重载方法,代码有点长,分开看,看一下它的第一段:

    private static byte[][] computeContentDigestsPer1MbChunk(
            int[] digestAlgorithms,
            DataSource[] contents) throws DigestException {
        // For each digest algorithm the result is computed as follows:
        // 1. Each segment of contents is split into consecutive chunks of 1 MB in size.
        //    The final chunk will be shorter iff the length of segment is not a multiple of 1 MB.
        //    No chunks are produced for empty (zero length) segments.
        // 2. The digest of each chunk is computed over the concatenation of byte 0xa5, the chunk's
        //    length in bytes (uint32 little-endian) and the chunk's contents.
        // 3. The output digest is computed over the concatenation of the byte 0x5a, the number of
        //    chunks (uint32 little-endian) and the concatenation of digests of chunks of all
        //    segments in-order.

        long totalChunkCountLong = 0;
        for (DataSource input : contents) {
            totalChunkCountLong += getChunkCount(input.size());
        }
        if (totalChunkCountLong >= Integer.MAX_VALUE / 1024) {
            throw new DigestException("Too many chunks: " + totalChunkCountLong);
        }
        int totalChunkCount = (int) totalChunkCountLong;

        byte[][] digestsOfChunks = new byte[digestAlgorithms.length][];
        for (int i = 0; i < digestAlgorithms.length; i++) {
            int digestAlgorithm = digestAlgorithms[i];
            int digestOutputSizeBytes = getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm);
            byte[] concatenationOfChunkCountAndChunkDigests =
                    new byte[5 + totalChunkCount * digestOutputSizeBytes];
            concatenationOfChunkCountAndChunkDigests[0] = 0x5a;
            setUnsignedInt32LittleEndian(
                    totalChunkCount,
                    concatenationOfChunkCountAndChunkDigests,
                    1);
            digestsOfChunks[i] = concatenationOfChunkCountAndChunkDigests;
        }

  getChunkCount(input.size())是每个数据源按照1M字节分块,得到数据源的块数。上面我们知道,目前是三大数据源,totalChunkCountLong 就是它们按照1M字节分块分的总块数。每个数据源最后一个块可能不够1M,算一个块数。
  下面是根据算法数量,来拼第一层数据摘要块。
  在循环中,通过摘要算法来得到摘要的长度。这是通过getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm)方法得到。得到摘要数据长度之后,再加上多少个摘要,就能知道摘要拼接到一起的长度。因为拼接的数据开头是0x5a开头,接着4个字节是摘要的数量,所以摘要数据块的长度要加上5。然后将前面五个字节赋值。这样按照算法就将拼接摘要数据块放入digestsOfChunks 数组中。
  了解一下摘要的长度,看一下getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm):

    private static int getContentDigestAlgorithmOutputSizeBytes(int digestAlgorithm) {
        switch (digestAlgorithm) {
            case CONTENT_DIGEST_CHUNKED_SHA256:
            case CONTENT_DIGEST_VERITY_CHUNKED_SHA256:
                return 256 / 8;
            case CONTENT_DIGEST_CHUNKED_SHA512:
                return 512 / 8;
            default:
                throw new IllegalArgumentException(
                        "Unknown content digest algorthm: " + digestAlgorithm);
        }
    }

  可见,算法如果是SHA256时,摘要数据长度为32;如果为SHA512时,摘要长度为64。

  computeContentDigestsPer1MbChunk的重载方法,看一下它的第二段代码:

        byte[] chunkContentPrefix = new byte[5];
        chunkContentPrefix[0] = (byte) 0xa5;
        int chunkIndex = 0;
        MessageDigest[] mds = new MessageDigest[digestAlgorithms.length];
        for (int i = 0; i < digestAlgorithms.length; i++) {
            String jcaAlgorithmName =
                    getContentDigestAlgorithmJcaDigestAlgorithm(digestAlgorithms[i]);
            try {
                mds[i] = MessageDigest.getInstance(jcaAlgorithmName);
            } catch (NoSuchAlgorithmException e) {
                throw new RuntimeException(jcaAlgorithmName + " digest not supported", e);
            }
        }
        // TODO: Compute digests of chunks in parallel when beneficial. This requires some research
        // into how to parallelize (if at all) based on the capabilities of the hardware on which
        // this code is running and based on the size of input.
        DataDigester digester = new MultipleDigestDataDigester(mds);
        int dataSourceIndex = 0;
        for (DataSource input : contents) {
            long inputOffset = 0;
            long inputRemaining = input.size();
            while (inputRemaining > 0) {
                int chunkSize = (int) Math.min(inputRemaining, CHUNK_SIZE_BYTES);
                setUnsignedInt32LittleEndian(chunkSize, chunkContentPrefix, 1);
                for (int i = 0; i < mds.length; i++) {
                    mds[i].update(chunkContentPrefix);
                }
                try {
                    input.feedIntoDataDigester(digester, inputOffset, chunkSize);
                } catch (IOException e) {
                    throw new DigestException(
                            "Failed to digest chunk #" + chunkIndex + " of section #"
                                    + dataSourceIndex,
                            e);
                }
                for (int i = 0; i < digestAlgorithms.length; i++) {
                    int digestAlgorithm = digestAlgorithms[i];
                    byte[] concatenationOfChunkCountAndChunkDigests = digestsOfChunks[i];
                    int expectedDigestSizeBytes =
                            getContentDigestAlgorithmOutputSizeBytes(digestAlgorithm);
                    MessageDigest md = mds[i];
                    int actualDigestSizeBytes =
                            md.digest(
                                    concatenationOfChunkCountAndChunkDigests,
                                    5 + chunkIndex * expectedDigestSizeBytes,
                                    expectedDigestSizeBytes);
                    if (actualDigestSizeBytes != expectedDigestSizeBytes) {
                        throw new RuntimeException(
                                "Unexpected output size of " + md.getAlgorithm() + " digest: "
                                        + actualDigestSizeBytes);
                    }
                }
                inputOffset += chunkSize;
                inputRemaining -= chunkSize;
                chunkIndex++;
            }
            dataSourceIndex++;
        }

  这块代码就是要计算摘要值了。
  因为分割的数据块计算摘要时,要在它前面固定加上0xa5和4个字节数据块长度,所以这里生成一个5个字节的前缀chunkContentPrefix,并将第一个字节设置为0xa5。
  接着通过循环,得到对应算法的Jca算法名。再通过它得到计算摘要的MessageDigest类对象。
  接着开始for循环数据源,先得到数据块大小chunkSize,inputRemaining是数据源剩余数据,有可能剩余数据不够1M,所以这里通过Math.min(inputRemaining, CHUNK_SIZE_BYTES),取它俩的最小值。接着就将数据块字节数放入chunkContentPrefix的第1-4字节。
  接着就将每个数据块的前缀更新到摘要内容中,紧接着就调用数据源方法feedIntoDataDigester(digester, inputOffset, chunkSize)来将数据块内容更新到摘要内容中。
  紧接着又循环算法,通过MessageDigest类对象的digest()方法将对应的摘要值放入digestsOfChunks[i]中对应块的偏移位置。
  接着设置变量值,计算下一个数据块的摘要。直到三个数据源都执行完毕。
  这里看一下,数据源是怎么将数据,更新到摘要内容中。这里是调用的input.feedIntoDataDigester(digester, inputOffset, chunkSize)实现的,input前面说了,它实际可能是ReadFileDataSource或者MemoryMappedFileDataSource对象,这里拿MemoryMappedFileDataSource对象来说一下,看一下它的代码:

    @Override
    public void feedIntoDataDigester(DataDigester md, long offset, int size)
            throws IOException, DigestException {
        // IMPLEMENTATION NOTE: After a lot of experimentation, the implementation of this
        // method was settled on a straightforward mmap with prefaulting.
        //
        // This method is not using FileChannel.map API because that API does not offset a way
        // to "prefault" the resulting memory pages. Without prefaulting, performance is about
        // 10% slower on small to medium APKs, but is significantly worse for APKs in 500+ MB
        // range. FileChannel.load (which currently uses madvise) doesn't help. Finally,
        // invoking madvise (MADV_SEQUENTIAL) after mmap with prefaulting wastes quite a bit of
        // time, which is not compensated for by faster reads.

        // We mmap the smallest region of the file containing the requested data. mmap requires
        // that the start offset in the file must be a multiple of memory page size. We thus may
        // need to mmap from an offset less than the requested offset.
        long filePosition = mFilePosition + offset;
        long mmapFilePosition =
                (filePosition / MEMORY_PAGE_SIZE_BYTES) * MEMORY_PAGE_SIZE_BYTES;
        int dataStartOffsetInMmapRegion = (int) (filePosition - mmapFilePosition);
        long mmapRegionSize = size + dataStartOffsetInMmapRegion;
        long mmapPtr = 0;
        try {
            mmapPtr = Os.mmap(
                    0, // let the OS choose the start address of the region in memory
                    mmapRegionSize,
                    OsConstants.PROT_READ,
                    OsConstants.MAP_SHARED | OsConstants.MAP_POPULATE, // "prefault" all pages
                    mFd,
                    mmapFilePosition);
            ByteBuffer buf = new DirectByteBuffer(
                    size,
                    mmapPtr + dataStartOffsetInMmapRegion,
                    mFd,  // not really needed, but just in case
                    null, // no need to clean up -- it's taken care of by the finally block
                    true  // read only buffer
                    );
            md.consume(buf);
        } catch (ErrnoException e) {
            throw new IOException("Failed to mmap " + mmapRegionSize + " bytes", e);
        } finally {
            if (mmapPtr != 0) {
                try {
                    Os.munmap(mmapPtr, mmapRegionSize);
                } catch (ErrnoException ignored) { }
            }
        }
    }

  可以看到这块是通过Os.mmap()方法实现内存映射得到内存地址mmapPtr。让后封装成DirectByteBuffer对象,再调用md.consume(buf)将数据读取到生成摘要内容中。为了实现内存页对齐,还通过一番计算,得到内存映射的文件的位置,得到与实际数据的偏移量,所以取数据时,也需要将偏移量加上。
  这里的md实际是MultipleDigestDataDigester对象,它是包含多个MessageDigest对象的,看下它的consume()方法:

    private static class MultipleDigestDataDigester implements DataDigester {
        private final MessageDigest[] mMds;

        MultipleDigestDataDigester(MessageDigest[] mds) {
            mMds = mds;
        }

        @Override
        public void consume(ByteBuffer buffer) {
            buffer = buffer.slice();
            for (MessageDigest md : mMds) {
                buffer.position(0);
                md.update(buffer);
            }
        }
    }

  可见,这块是将数据都调用MessageDigest 对象md将数据更新到生成摘要内容中。

  computeContentDigestsPer1MbChunk的重载方法,看一下它的最后一段代码:

        byte[][] result = new byte[digestAlgorithms.length][];
        for (int i = 0; i < digestAlgorithms.length; i++) {
            int digestAlgorithm = digestAlgorithms[i];
            byte[] input = digestsOfChunks[i];
            String jcaAlgorithmName = getContentDigestAlgorithmJcaDigestAlgorithm(digestAlgorithm);
            MessageDigest md;
            try {
                md = MessageDigest.getInstance(jcaAlgorithmName);
            } catch (NoSuchAlgorithmException e) {
                throw new RuntimeException(jcaAlgorithmName + " digest not supported", e);
            }
            byte[] output = md.digest(input);
            result[i] = output;
        }
        return result;
    }

  现在各个数据源的摘要按照摘要算法都已经在digestsOfChunks数组中了。现在需要对它再做一次摘要算法,生成摘要。
  所以这块就调用新生成的MessageDigest对象的digest(input)生成摘要值,按照数组中算法的次序,放入result 中,并返回。

Merkle树根的摘要比对

  它是通过verifyIntegrityForVerityBasedAlgorithm()来实现的

    private static void verifyIntegrityForVerityBasedAlgorithm(
            byte[] expectedDigest,
            RandomAccessFile apk,
            SignatureInfo signatureInfo) throws SecurityException {
        try {
            byte[] expectedRootHash = parseVerityDigestAndVerifySourceLength(expectedDigest,
                    apk.length(), signatureInfo);
            VerityBuilder.VerityResult verity = VerityBuilder.generateApkVerityTree(apk,
                    signatureInfo, new ByteBufferFactory() {
                        @Override
                        public ByteBuffer create(int capacity) {
                            return ByteBuffer.allocate(capacity);
                        }
                    });
            if (!Arrays.equals(expectedRootHash, verity.rootHash)) {
                throw new SecurityException("APK verity digest of contents did not verify");
            }
        } catch (DigestException | IOException | NoSuchAlgorithmException e) {
            throw new SecurityException("Error during verification", e);
        }
    }

  参数expectedDigest是从v2分块中得到的摘要值,parseVerityDigestAndVerifySourceLength()得到摘要值。
  VerityBuilder.generateApkVerityTree()通过构建Merkle树,然后对其树根执行摘要算法,得到摘要值。得到摘要的值就在VerityBuilder.VerityResult类的成员rootHash中。
  如果两个摘要不等,则认为验证失败。

得到v2分块中CONTENT_DIGEST_VERITY_CHUNKED_SHA256算法摘要值

    /**
     * Return the verity digest only if the length of digest content looks correct.
     * When verity digest is generated, the last incomplete 4k chunk is padded with 0s before
     * hashing. This means two almost identical APKs with different number of 0 at the end will have
     * the same verity digest. To avoid this problem, the length of the source content (excluding
     * Signing Block) is appended to the verity digest, and the digest is returned only if the
     * length is consistent to the current APK.
     */
    static byte[] parseVerityDigestAndVerifySourceLength(
            byte[] data, long fileSize, SignatureInfo signatureInfo) throws SecurityException {
        // FORMAT:
        // OFFSET       DATA TYPE  DESCRIPTION
        // * @+0  bytes uint8[32]  Merkle tree root hash of SHA-256
        // * @+32 bytes int64      Length of source data
        int kRootHashSize = 32;
        int kSourceLengthSize = 8;

        if (data.length != kRootHashSize + kSourceLengthSize) {
            throw new SecurityException("Verity digest size is wrong: " + data.length);
        }
        ByteBuffer buffer = ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN);
        buffer.position(kRootHashSize);
        long expectedSourceLength = buffer.getLong();

        long signingBlockSize = signatureInfo.centralDirOffset
                - signatureInfo.apkSigningBlockOffset;
        if (expectedSourceLength != fileSize - signingBlockSize) {
            throw new SecurityException("APK content size did not verify");
        }

        return Arrays.copyOfRange(data, 0, kRootHashSize);
    }

  v2分块中得到的值是40字节,前32字节是摘要值,后8字节是生成摘要数据的内容的长度。
  apk中生成该摘要的内容是不包括签名分块的数据的,所以比长度时,需要将签名分块的长度去掉。
  上面方法就很明显了,如果长度对不上,就报出异常。最后将前32字节内容取出返回。

构建Merkle树,得到树根摘要值

  这块的实现是在VerityBuilder.generateApkVerityTree()中,最后它的实现在generateVerityTreeInternal()中:

    @NonNull
    private static VerityResult generateVerityTreeInternal(@NonNull RandomAccessFile apk,
            @NonNull ByteBufferFactory bufferFactory, @Nullable SignatureInfo signatureInfo)
            throws IOException, SecurityException, NoSuchAlgorithmException, DigestException {
        long signingBlockSize =
                signatureInfo.centralDirOffset - signatureInfo.apkSigningBlockOffset;
        long dataSize = apk.length() - signingBlockSize;
        int[] levelOffset = calculateVerityLevelOffset(dataSize);
        int merkleTreeSize = levelOffset[levelOffset.length - 1];

        ByteBuffer output = bufferFactory.create(
                merkleTreeSize
                + CHUNK_SIZE_BYTES);  // maximum size of apk-verity metadata
        output.order(ByteOrder.LITTLE_ENDIAN);
        ByteBuffer tree = slice(output, 0, merkleTreeSize);
        byte[] apkRootHash = generateVerityTreeInternal(apk, signatureInfo, DEFAULT_SALT,
                levelOffset, tree);
        return new VerityResult(output, merkleTreeSize, apkRootHash);
    }

  先得到生成Merkle树的数据内容的长度dataSize,主要是将APK中签名分块的长度去掉。
  然后通过dataSize得到树的从根到该层的数据大小。它是通过calculateVerityLevelOffset(dataSize)得到的。所以最后一层,对应数组的最后一个,取出得到merkleTreeSize 就是树的大小。
  Merkle树在这里使用ByteBuffer 类对象output 来描述。然后调用generateVerityTreeInternal()来生成output 中的数据值,并且得到根的摘要值。
  最后将树的数据、树的大小、树根摘要值封装成VerityResult对象返回。

得到Merkle树从根到该层的数据大小
    private static int[] calculateVerityLevelOffset(long fileSize) {
        ArrayList<Long> levelSize = new ArrayList<>();
        while (true) {
            long levelDigestSize = divideRoundup(fileSize, CHUNK_SIZE_BYTES) * DIGEST_SIZE_BYTES;
            long chunksSize = CHUNK_SIZE_BYTES * divideRoundup(levelDigestSize, CHUNK_SIZE_BYTES);
            levelSize.add(chunksSize);
            if (levelDigestSize <= CHUNK_SIZE_BYTES) {
                break;
            }
            fileSize = levelDigestSize;
        }

        // Reverse and convert to summed area table.
        int[] levelOffset = new int[levelSize.size() + 1];
        levelOffset[0] = 0;
        for (int i = 0; i < levelSize.size(); i++) {
            // We don't support verity tree if it is larger then Integer.MAX_VALUE.
            levelOffset[i + 1] = levelOffset[i]
                    + Math.toIntExact(levelSize.get(levelSize.size() - i - 1));
        }
        return levelOffset;
    }

  要理解这块代码,需要知道Merkle树的构造方式。
  将APK文件除掉签名分块部分,每隔4096字节分成一块,最后一块不足补0。然后将每一块通过摘要算法计算得到32字节摘要值。这算Merkle树的最后一层。存在集合levelSize中的是32字节摘要的所有数据大小占据的4096字节块数的总和。
  Merkle树的倒数第二层则是倒数第一层的摘要数据大小再按照4096字节分块,然后每块计算出来32字节摘要值。再将这些摘要值的大小占据的4096字节块数的总和放到集合levelSize中。
  就这样直到摘要的大小小于等于4096字节了,循环结束。这样集合levelSize中的最后一个大小就是一个数据块的大小,4096字节。
  接着的levelOffset 数组存储的就是从顶层到该层的所有节点的大小。这样我们就知道了levelSize数组的最后一个数据就是所有节点的大小,就是Merkle树的大小。

构建Merkle树,得到根的摘要值

  它是由generateVerityTreeInternal()实现的:

    @NonNull
    private static byte[] generateVerityTreeInternal(@NonNull RandomAccessFile apk,
            @Nullable SignatureInfo signatureInfo, @Nullable byte[] salt,
            @NonNull int[] levelOffset, @NonNull ByteBuffer output)
            throws IOException, NoSuchAlgorithmException, DigestException {
        // 1. Digest the apk to generate the leaf level hashes.
        assertSigningBlockAlignedAndHasFullPages(signatureInfo);
        generateApkVerityDigestAtLeafLevel(apk, signatureInfo, salt, slice(output,
                    levelOffset[levelOffset.length - 2], levelOffset[levelOffset.length - 1]));

        // 2. Digest the lower level hashes bottom up.
        for (int level = levelOffset.length - 3; level >= 0; level--) {
            ByteBuffer inputBuffer = slice(output, levelOffset[level + 1], levelOffset[level + 2]);
            ByteBuffer outputBuffer = slice(output, levelOffset[level], levelOffset[level + 1]);

            DataSource source = new ByteBufferDataSource(inputBuffer);
            BufferedDigester digester = new BufferedDigester(salt, outputBuffer);
            consumeByChunk(digester, source, CHUNK_SIZE_BYTES);
            digester.assertEmptyBuffer();
            digester.fillUpLastOutputChunk();
        }

        // 3. Digest the first block (i.e. first level) to generate the root hash.
        byte[] rootHash = new byte[DIGEST_SIZE_BYTES];
        BufferedDigester digester = new BufferedDigester(salt, ByteBuffer.wrap(rootHash));
        digester.consume(slice(output, 0, CHUNK_SIZE_BYTES));
        digester.assertEmptyBuffer();
        return rootHash;
    }

  首先调用generateApkVerityDigestAtLeafLevel(),生成叶子节点。output里面需要存储树的各个节点,叶子节点就是它里面位置从levelOffset[levelOffset.length - 2]开始,到levelOffset[levelOffset.length - 1]结束。levelOffset数组的意思,上面都说过了。
  接着通过通过循环,将上层节点摘要数据,按照4096字节分组,然后再生层下一层节点的摘要数据。这样就把output中各个节点的数据赋值。Merkle树也就构建完成。
  最后就是给树根做摘要算法,得到摘要。返回rootHash。

叶子节点构建

  它是由generateApkVerityDigestAtLeafLevel()实现:


        // 2. Skip APK Signing Block and continue digesting, until the Central Directory offset
        // field in EoCD is reached.
        long eocdCdOffsetFieldPosition =
                signatureInfo.eocdOffset + ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_OFFSET;
        consumeByChunk(digester,
                DataSource.create(apk.getFD(), signatureInfo.centralDirOffset,
                    eocdCdOffsetFieldPosition - signatureInfo.centralDirOffset),
                MMAP_REGION_SIZE_BYTES);

        // 3. Consume offset of Signing Block as an alternative EoCD.
        ByteBuffer alternativeCentralDirOffset = ByteBuffer.allocate(
                ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_SIZE).order(ByteOrder.LITTLE_ENDIAN);
        alternativeCentralDirOffset.putInt(Math.toIntExact(signatureInfo.apkSigningBlockOffset));
        alternativeCentralDirOffset.flip();
        digester.consume(alternativeCentralDirOffset);

        // 4. Read from end of the Central Directory offset field in EoCD to the end of the file.
        long offsetAfterEocdCdOffsetField =
                eocdCdOffsetFieldPosition + ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_SIZE;
        consumeByChunk(digester,
                DataSource.create(apk.getFD(), offsetAfterEocdCdOffsetField,
                    apk.length() - offsetAfterEocdCdOffsetField),
                MMAP_REGION_SIZE_BYTES);

        // 5. Pad 0s up to the nearest 4096-byte block before hashing.
        int lastIncompleteChunkSize = (int) (apk.length() % CHUNK_SIZE_BYTES);
        if (lastIncompleteChunkSize != 0) {
            digester.consume(ByteBuffer.allocate(CHUNK_SIZE_BYTES - lastIncompleteChunkSize));
        }
        digester.assertEmptyBuffer();

        // 6. Fill up the rest of buffer with 0s.
        digester.fillUpLastOutputChunk();
    }

  首先生成一个BufferedDigester类对象,它接收一个盐和ByteBuffer对象初始化,它是一个缓存生成摘要帮助类。它会将收到的数据凑齐成4096字节块之后,生成一个摘要,将摘要放入ByteBuffer对象对应的位置中,相当于生成了节点。上面知道,output的开始位置为levelOffset[levelOffset.length - 2],它就是叶子节点开始的位置。
  因为APK中签名数据块不参与生成摘要,所以需要将它筛除。所以在第二个consumeByChunk()方法时,数据源的开始位置为signatureInfo.centralDirOffset,它就是中央目录开始的位置。
  在中央目录结尾的数据块中,需要注意,它在偏移ZIP_EOCD_CENTRAL_DIR_OFFSET_FIELD_OFFSET位置处存放着中央目录的偏移位置,不过它是插入签名数据块之后的位置,但是参与计算的应该是未插入签名之前的位置偏移量,所以这块需要处理一下。代码中 注释3. 位置就是处理这个的。
  调用consumeByChunk()方法处理了zip条目数据块、中央目录、中央目录尾部数据之后,如果发现最后一个数据块位置不满4096时,需要将它补0,直到4096字节。这块处理,使用的是apk.length() % CHUNK_SIZE_BYTES,得到的最后一个数据块的数量。但是apk.length() 是APK的数据大小,参与计算的不包括签名数据块,不太明白他这个为啥没有减去签名数据块的大小。
  最后digester.fillUpLastOutputChunk()就是将叶子子节点的最后一个4096字节块没占满的话,将它用0填充。
  这块向output中填充节点数据,使用的都是consumeByChunk()方法,所以有必要读一下它的代码:

    private static void consumeByChunk(DataDigester digester, DataSource source, int chunkSize)
            throws IOException, DigestException {
        long inputRemaining = source.size();
        long inputOffset = 0;
        while (inputRemaining > 0) {
            int size = (int) Math.min(inputRemaining, chunkSize);
            source.feedIntoDataDigester(digester, inputOffset, size);
            inputOffset += size;
            inputRemaining -= size;
        }
    }

  代码挺简单,就是从数据源中按块大小,调用数据源的feedIntoDataDigester()方法,feedIntoDataDigester()方法主要就是调用digester的comsume()方法,这里digester是BufferedDigester,所以看下它的实现:

        @Override
        public void consume(ByteBuffer buffer) throws DigestException {
            int offset = buffer.position();
            int remaining = buffer.remaining();
            while (remaining > 0) {
                int allowance = (int) Math.min(remaining, BUFFER_SIZE - mBytesDigestedSinceReset);
                // Optimization: set the buffer limit to avoid allocating a new ByteBuffer object.
                buffer.limit(buffer.position() + allowance);
                mMd.update(buffer);
                offset += allowance;
                remaining -= allowance;
                mBytesDigestedSinceReset += allowance;

                if (mBytesDigestedSinceReset == BUFFER_SIZE) {
                    mMd.digest(mDigestBuffer, 0, mDigestBuffer.length);
                    mOutput.put(mDigestBuffer);
                    // After digest, MessageDigest resets automatically, so no need to reset again.
                    if (mSalt != null) {
                        mMd.update(mSalt);
                    }
                    mBytesDigestedSinceReset = 0;
                }
            }
        }

  BUFFER_SIZE 是4096,mBytesDigestedSinceReset是每次向mMd更新多少数据,在更新数据不够BUFFER_SIZE 时,是不会进行计算摘要的。等到更新的数据到达BUFFER_SIZE 时,mMd计算摘要,然后将它放入mOutput中。然后重新计算mBytesDigestedSinceReset 值,等待下次到达BUFFER_SIZE 数据时,再进行计算。这里BufferedDigester的成员mOutput就是存储Merkle树节点的ByteBuffer对象。
  这样generateApkVerityDigestAtLeafLevel()执行完毕之后,就将叶子节点都填充完毕了。

总结

  现在我们知道了APK完整性的校验方式。
  一种是对应CONTENT_DIGEST_CHUNKED_SHA256(对应摘要算法"SHA-256")或CONTENT_DIGEST_CHUNKED_SHA512(对应摘要算法"SHA-512")算法的摘要验证,它是将参与摘要的数据分成1M字节大小,并用字节 0xa5 的串联、块的长度(采用小端字节序的 uint32 值,以字节数计)和块的内容进行计算摘要。顶级摘要通过字节 0x5a 的串联、块数(采用小端字节序的 uint32 值)以及块的摘要的串联(按照块在 APK 中显示的顺序)进行计算得到摘要,拿他和v2分块中的摘要进行比较,相等通过。
  另外一种是对应CONTENT_DIGEST_VERITY_CHUNKED_SHA256(对应摘要算法"SHA-256"),它将构建Merkle树,它的分块大小是4096字节,并且得到树根的摘要值,拿它和v2分块中的摘要进行比对,如果一致通过。

你可能感兴趣的:(android)