proguard源码分析二 class字节码解析

上一节我们分析了proguard的参数解析、配置获取、以及配置保存等等过程,本节我们继续分析proguard是如何读取class文件、解析class字节码以及怎么存储class字节码格式的。

数据结构

在本篇开始时,我们先介绍下几个数据结构

  • Clazz:接口类,它的子类有ProgramClassLibraryClass
  • ProgramClass:实现了Clazz接口,在proguard里面用来描述应用程序类
  • LibraryClass:实现了Clazz接口,在proguard里面用来描述第三方依赖库类
    它们的关系如下:

另外在proguard里面有两个类池,programClassPool跟libraryClassPool,它们都是ClassPool类型对象。programClassPool里面保存了所有应用程序类的Clazz实例,而libraryClassPool里面保存了所有第三方依库类的Clazz实例,ClassPool类比较简单,里面就是一个TreeMap用来保存Clazz实例,代码如下:

/**
 * This is a set of representations of classes. They can be enumerated or
 * retrieved by name. They can also be accessed by means of class visitors.
 *
 * @author Eric Lafortune
 */
public class ClassPool
{
    // We're using a sorted tree map instead of a hash map to store the classes,
    // in order to make the processing more deterministic.
    private final Map classes = new TreeMap();

    /**
     * Adds the given Clazz to the class pool.
     */
    public void addClass(Clazz clazz)
    {
        classes.put(clazz.getName(), clazz);
    }

    /**
     * Removes the given Clazz from the class pool.
     */
    public void removeClass(Clazz clazz)
    {
        removeClass(clazz.getName());
    }

    /**
     * Returns a Clazz from the class pool based on its name. Returns
     * null if the class with the given name is not in the class
     * pool.
     */
    public Clazz getClass(String className)
    {
        return (Clazz)classes.get(className);
    }
    //此处省略一批代码....
}

java class读取

上一节分析了参数解析的过程,参数解析完了接着会new出个ProGuard对象,并且调用它的execute方法开始执行proguard的核心过程,execute方法大概如下:

/**
 * Performs all subsequent ProGuard operations.
 */
public void execute() throws IOException
{
    System.out.println(VERSION);

    GPL.check();

    if (configuration.printConfiguration != null)
    {
        printConfiguration();
    }

    new ConfigurationChecker(configuration).check();

    if (configuration.programJars != null     &&
        configuration.programJars.hasOutput() &&
        new UpToDateChecker(configuration).check())
    {
        return;
    }

    readInput();

    if (configuration.shrink    ||
        configuration.optimize  ||
        configuration.obfuscate ||
        configuration.preverify)
    {
        clearPreverification();
    }

    if (configuration.printSeeds != null ||
        configuration.shrink    ||
        configuration.optimize  ||
        configuration.obfuscate ||
        configuration.preverify)
    {
        initialize();
    }

    if (configuration.targetClassVersion != 0)
    {
        target();
    }

    if (configuration.printSeeds != null)
    {
        printSeeds();
    }

    if (configuration.shrink)
    {
        shrink();
    }

    if (configuration.preverify)
    {
        inlineSubroutines();
    }

    if (configuration.optimize)
    {
        for (int optimizationPass = 0;
                optimizationPass < configuration.optimizationPasses;
                optimizationPass++)
        {
            if (!optimize())
            {
                // Stop optimizing if the code doesn't improve any further.
                break;
            }

            // Shrink again, if we may.
            if (configuration.shrink)
            {
                // Don't print any usage this time around.
                configuration.printUsage       = null;
                configuration.whyAreYouKeeping = null;

                shrink();
            }
        }
    } else if (configuration.optimizeNoSideEffects) {
        new Optimizer(configuration).executeNoSideEffects(programClassPool, libraryClassPool);
    }

    if (configuration.optimize)
    {
        linearizeLineNumbers();
    }

    if (configuration.obfuscate)
    {
        obfuscate();
    }

    if (configuration.optimize)
    {
        trimLineNumbers();
    }

    if (configuration.preverify)
    {
        preverify();
    }

    if (configuration.shrink    ||
        configuration.optimize  ||
        configuration.obfuscate ||
        configuration.preverify)
    {
        sortClassElements();
    }

    if (configuration.programJars.hasOutput())
    {
        writeOutput();
    }

    if (configuration.dump != null)
    {
        dump();
    }
}

可以看到包括压缩、优化、混淆等功能都是在execute方法里面执行的,下面我们分析下类的读取加载过程。

Configuration里面保存了前面解析出来的programJars跟libraryJars,它们对应的就是应用程序类jar包以及第三方依赖库jar包,readInput方法会根据这些jar包路径,读取jar文件并且解压,最终会把class字节码读取到ProgramClass对象里,而第三方库class文件会读取到LibraryClass对象里。

private void readInput() throws IOException
{
    if (configuration.verbose)
    {
        System.out.println("Reading input...");
    }
    // Fill the program class pool and the library class pool.
    new InputReader(configuration).execute(programClassPool, libraryClassPool);
}

readInput方法非常的简单,就是实例化了一个InputReader对象,通过它的execute方法来读取jar文件

/**
 * Fills the given program class pool and library class pool by reading
 * class files, based on the current configuration.
 */
public void execute(ClassPool programClassPool,
                    ClassPool libraryClassPool) throws IOException
{

    //省略无相关代码...
    readInput("Reading program ",
                configuration.programJars,
                new ClassFilter(
                new ClassReader(false,
                                configuration.skipNonPublicLibraryClasses,
                                configuration.skipNonPublicLibraryClassMembers,
                                warningPrinter,
                new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
                new ClassPoolFiller(programClassPool)))));
    //省略无相关代码...
}
/**
 * Reads all input entries from the given section of the given class path.
 */
public void readInput(String          messagePrefix,
                        ClassPath       classPath,
                        int             fromIndex,
                        int             toIndex,
                        DataEntryReader reader) throws IOException
{
    for (int index = fromIndex; index < toIndex; index++)
    {
        ClassPathEntry entry = classPath.get(index);
        if (!entry.isOutput())
        {
            readInput(messagePrefix, entry, reader);
        }
    }
}

ClassPath内部维护了一个ClassPathEntry队列,每个ClassPathEntry本质上就是一个jar文件的描述,readInput会遍历这些ClassPathEntry,一个个的进行解析。

    /**
     * Reads the given input class path entry.
     */
private void readInput(String          messagePrefix,
                        ClassPathEntry  classPathEntry,
                        DataEntryReader dataEntryReader) throws IOException
{       
    //省略部分代码...
    // Create a reader that can unwrap jars, wars, ears, and zips.
    DataEntryReader reader =
        DataEntryReaderFactory.createDataEntryReader(messagePrefix,
                                                        classPathEntry,
                                                        dataEntryReader);

    // Create the data entry pump.
    DirectoryPump directoryPump =
        new DirectoryPump(classPathEntry.getFile());

    // Pump the data entries into the reader.
    directoryPump.pumpDataEntries(reader);    
}

首先会创建一个DataEntryReader,但是这个DataEntryReader其实并不是真正读取class字节码的Reader,这里创建出来的DataEntryReader只是负责根据不同的文件类型,譬如.apk .jar ".aar"等等做一些不一样的解压处理,在Android里面这里返回的DataEntryReader具体就是JarReader对象。

跟着后面创建了DirectoryPump对象,并且调用了它的pumpDataEntries方法,内部如下:

public void pumpDataEntries(DataEntryReader dataEntryReader)
throws IOException
{
    if (!directory.exists())
    {
        throw new IOException("No such file or directory");
    }

    readFiles(directory, dataEntryReader);
}


/**
 * Reads the given subdirectory recursively, applying the given DataEntryReader
 * to all files that are encountered.
 */
private void readFiles(File file, DataEntryReader dataEntryReader)
throws IOException
{
    // Pass the file data entry to the reader.
    dataEntryReader.read(new FileDataEntry(directory, file));

    if (file.isDirectory())
    {
        // Recurse into the subdirectory.
        File[] listedFiles = file.listFiles();

        for (int index = 0; index < listedFiles.length; index++)
        {
            File listedFile = listedFiles[index];
            try
            {
                readFiles(listedFile, dataEntryReader);
            }
            catch (IOException e)
            {
                throw (IOException)new IOException("Can't read ["+listedFile.getName()+"] ("+e.getMessage()+")").initCause(e);
            }
        }
    }
}

可以看到pumpDataEntries内部就是递归遍历文件,交给了前面创建的DataEntryReader来处理,由于JarReader实现了这个接口,我们直接看JarReader的read方法,

public void read(DataEntry dataEntry) throws IOException
{
    ZipInputStream zipInputStream = new ZipInputStream(dataEntry.getInputStream());
    // Get all entries from the input jar.
    while (true)
    {
        // Can we get another entry?
        ZipEntry zipEntry = zipInputStream.getNextEntry();
        if (zipEntry == null)
        {
            break;
        }

        // Delegate the actual reading to the data entry reader.
        dataEntryReader.read(new ZipDataEntry(dataEntry,
                                                zipEntry,
                                                zipInputStream));
    }
    //省略部分代码...
}

JarReader顾名思义,其实就只负责了解压jar文件的作用,解压出来的class文件最终会交给dataEntryReader来处理,proguard源码里面大量的使用了这种委托代理的模式,一层一层的传递去处理,这个地方的dataEntryReader本质上是在InputReader的execute方法里创建出来的ClassReader类型对象。

public void read(DataEntry dataEntry) throws IOException
{

    // Get the input stream.
    InputStream inputStream = dataEntry.getInputStream();

    // Wrap it into a data input stream.
    DataInputStream dataInputStream = new DataInputStream(inputStream);

    // Create a Clazz representation.
    Clazz clazz;
    if (isLibrary)
    {
        clazz = new LibraryClass();
        clazz.accept(new LibraryClassReader(dataInputStream, skipNonPublicLibraryClasses, skipNonPublicLibraryClassMembers));
    }
    else
    {
        clazz = new ProgramClass();
        clazz.accept(new ProgramClassReader(dataInputStream));
    }
    //省略部分代码...
}

到这里我们终于看到class文件的读取了,前面已经提到过了如果是应用程序类文件会被读到ProgramClass,而第三方依赖库类文件会读到LibraryClass里,这里我们只分析ProgramClass。

有趣的是ClassReader本身并不负责解析读取class字节码,真正的解析读取工作是由ProgramClassReaderLibraryClassReader来完成的,这里我们只分析ProgramClassReader

ProgramClassReader

ProgramClassReader类实现了ClassVisitor接口,值得注意的是proguard在class类字节码的读写里大量的使用了xxxVisitor这种设计思想,读写class类得用ClassVisitor接口,读写类成员得用MemberVisitor,读写常量池也有一套对应的Visitor等等。ProgramClassReader都实现了这些接口,我们先看visitProgramClass,这是读取类的入口,代码如下:

public void visitProgramClass(ProgramClass programClass)
{
    //魔法头,四个字节 对应的就是CAFE BABE
    // Read and check the magic number.
    programClass.u4magic = dataInput.readInt();

    ClassUtil.checkMagicNumber(programClass.u4magic);

    //跟着魔法头后面的是版本号,高低版本号各用两个字节表示.
    // Read and check the version numbers.
    int u2minorVersion = dataInput.readUnsignedShort();
    int u2majorVersion = dataInput.readUnsignedShort();

    programClass.u4version = ClassUtil.internalClassVersion(u2majorVersion,
                                                            u2minorVersion);

    ClassUtil.checkVersionNumbers(programClass.u4version);

    //在版本号后面的是常量池长度,用两个字节来表示
    // Read the constant pool. Note that the first entry is not used.
    programClass.u2constantPoolCount = dataInput.readUnsignedShort();

    //创建常量池,下面开始遍历,填充常量池里的每一项。
    programClass.constantPool = new Constant[programClass.u2constantPoolCount];
    for (int index = 1; index < programClass.u2constantPoolCount; index++)
    {
        Constant constant = createConstant();
        constant.accept(programClass, this);
        programClass.constantPool[index] = constant;

        // Long constants and double constants take up two entries in the
        // constant pool.
        int tag = constant.getTag();
        if (tag == ClassConstants.CONSTANT_Long ||
            tag == ClassConstants.CONSTANT_Double)
        {
            programClass.constantPool[++index] = null;
        }
    }

    //在常量池后面的是访问标志,两个字节表示
    // Read the general class information.
    programClass.u2accessFlags = dataInput.readUnsignedShort();
    //在访问标志后面的是类索引,两个字节表示
    programClass.u2thisClass   = dataInput.readUnsignedShort();
    //在类索引后面的是父类索引,两个字节表示
    programClass.u2superClass  = dataInput.readUnsignedShort();

    //在父类索引后面的是接口计数器,两个字节表示
    // Read the interfaces.
    programClass.u2interfacesCount = dataInput.readUnsignedShort();

    //创建接口索引集合,接着是遍历填充接口索引集合里的每一项.
    programClass.u2interfaces = new int[programClass.u2interfacesCount];
    for (int index = 0; index < programClass.u2interfacesCount; index++)
    {
        programClass.u2interfaces[index] = dataInput.readUnsignedShort();
    }

    //在接口索引集合后面的是字段个数,两个字节来表示
    // Read the fields.
    programClass.u2fieldsCount = dataInput.readUnsignedShort();

    //创建字段集合,并且遍历填充字段集合里的每一项.
    programClass.fields = new ProgramField[programClass.u2fieldsCount];
    for (int index = 0; index < programClass.u2fieldsCount; index++)
    {
        ProgramField programField = new ProgramField();
        this.visitProgramField(programClass, programField);
        programClass.fields[index] = programField;
    }

    //在字段集合后面的是方法计数器,用两个字节来表示.
    // Read the methods.
    programClass.u2methodsCount = dataInput.readUnsignedShort();

    //创建方法集合,并且遍历填充方法集合里的每一项.
    programClass.methods = new ProgramMethod[programClass.u2methodsCount];
    for (int index = 0; index < programClass.u2methodsCount; index++)
    {
        ProgramMethod programMethod = new ProgramMethod();
        this.visitProgramMethod(programClass, programMethod);
        programClass.methods[index] = programMethod;
    }

    //在方法集合后面的是附加属性计数器,用两个字节来表示
    // Read the class attributes.
    programClass.u2attributesCount = dataInput.readUnsignedShort();

    //创建附加属性集合,并且遍历填充附加属性集合里的每一项.
    programClass.attributes = new Attribute[programClass.u2attributesCount];
    for (int index = 0; index < programClass.u2attributesCount; index++)
    {
        Attribute attribute = createAttribute(programClass);
        attribute.accept(programClass, this);
        programClass.attributes[index] = attribute;
    }
}

visitProgramClass方法里面的每一步我都通过注释说明上了,简单的来说就是通过读取class字节码的方式,一个字节一个字节的把class内容读取到ProgramClass对象里,对class字节码格式熟悉的,这段代码逻辑不难理解。

回到ClassReader里,当class字节码被成功读取到ProgramClass对象来,接着下面便是把它添加到ClassPool里了,

public void read(DataEntry dataEntry) throws IOException
{
    //此处省略很多代码...
    clazz = new ProgramClass();
    clazz.accept(new ProgramClassReader(dataInputStream));
    //此处的classVisitor其实是ClassPoolFiller对象.
    clazz.accept(classVisitor);
}

classVisitor是ClassPoolFiller对象,在InputReader里面被new出来的,代码如下:

new ClassReader(false,
                configuration.skipNonPublicLibraryClasses,
                configuration.skipNonPublicLibraryClassMembers,
                warningPrinter,
                new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
                new ClassPoolFiller(programClassPool)))

public class ClassPoolFiller extends SimplifiedVisitor implements ClassVisitor
{
    private final ClassPool classPool;


    /**
     * Creates a new ClassPoolFiller.
     */
    public ClassPoolFiller(ClassPool classPool)
    {
        this.classPool = classPool;
    }


    // Implementations for ClassVisitor.

    public void visitAnyClass(Clazz clazz)
    {
        classPool.addClass(clazz);
    }
}

代码比较简单,ClassPoolFiller内部持有了programClassPool,当ProgramClass初始化成功后,就会被添加到programClassPool里面。整个流程大概可以总结为:


  • DirectoryPump 负责遍历目录
  • JarReader 负责解压jar包
  • ClassReader 负责io读取 读取class字节流
  • ProgramClassReader 负责把字节流格式化成 ProgramClass对象

总结

本节主要是从源码的角度出发,分析了下proguard是怎么把jar包读取到ClassPool里面的,当把class字节码读取出来并且管理起来,接着就可以开始对它们进行一些裁剪跟混淆工作了,下一节我们将会继续分析proguard是如何裁剪压缩代码的。

你可能感兴趣的:(proguard源码分析二 class字节码解析)