上一节我们分析了proguard的参数解析、配置获取、以及配置保存等等过程,本节我们继续分析proguard是如何读取class文件、解析class字节码以及怎么存储class字节码格式的。
数据结构
在本篇开始时,我们先介绍下几个数据结构
-
Clazz
:接口类,它的子类有ProgramClass
跟LibraryClass
-
ProgramClass
:实现了Clazz
接口,在proguard里面用来描述应用程序类 -
LibraryClass
:实现了Clazz
接口,在proguard里面用来描述第三方依赖库类
它们的关系如下:
另外在proguard里面有两个类池,programClassPool跟libraryClassPool,它们都是ClassPool类型对象。programClassPool里面保存了所有应用程序类的Clazz实例,而libraryClassPool里面保存了所有第三方依库类的Clazz实例,ClassPool类比较简单,里面就是一个TreeMap用来保存Clazz实例,代码如下:
/**
* This is a set of representations of classes. They can be enumerated or
* retrieved by name. They can also be accessed by means of class visitors.
*
* @author Eric Lafortune
*/
public class ClassPool
{
// We're using a sorted tree map instead of a hash map to store the classes,
// in order to make the processing more deterministic.
private final Map classes = new TreeMap();
/**
* Adds the given Clazz to the class pool.
*/
public void addClass(Clazz clazz)
{
classes.put(clazz.getName(), clazz);
}
/**
* Removes the given Clazz from the class pool.
*/
public void removeClass(Clazz clazz)
{
removeClass(clazz.getName());
}
/**
* Returns a Clazz from the class pool based on its name. Returns
* null
if the class with the given name is not in the class
* pool.
*/
public Clazz getClass(String className)
{
return (Clazz)classes.get(className);
}
//此处省略一批代码....
}
java class读取
上一节分析了参数解析的过程,参数解析完了接着会new出个ProGuard对象,并且调用它的execute方法开始执行proguard的核心过程,execute方法大概如下:
/**
* Performs all subsequent ProGuard operations.
*/
public void execute() throws IOException
{
System.out.println(VERSION);
GPL.check();
if (configuration.printConfiguration != null)
{
printConfiguration();
}
new ConfigurationChecker(configuration).check();
if (configuration.programJars != null &&
configuration.programJars.hasOutput() &&
new UpToDateChecker(configuration).check())
{
return;
}
readInput();
if (configuration.shrink ||
configuration.optimize ||
configuration.obfuscate ||
configuration.preverify)
{
clearPreverification();
}
if (configuration.printSeeds != null ||
configuration.shrink ||
configuration.optimize ||
configuration.obfuscate ||
configuration.preverify)
{
initialize();
}
if (configuration.targetClassVersion != 0)
{
target();
}
if (configuration.printSeeds != null)
{
printSeeds();
}
if (configuration.shrink)
{
shrink();
}
if (configuration.preverify)
{
inlineSubroutines();
}
if (configuration.optimize)
{
for (int optimizationPass = 0;
optimizationPass < configuration.optimizationPasses;
optimizationPass++)
{
if (!optimize())
{
// Stop optimizing if the code doesn't improve any further.
break;
}
// Shrink again, if we may.
if (configuration.shrink)
{
// Don't print any usage this time around.
configuration.printUsage = null;
configuration.whyAreYouKeeping = null;
shrink();
}
}
} else if (configuration.optimizeNoSideEffects) {
new Optimizer(configuration).executeNoSideEffects(programClassPool, libraryClassPool);
}
if (configuration.optimize)
{
linearizeLineNumbers();
}
if (configuration.obfuscate)
{
obfuscate();
}
if (configuration.optimize)
{
trimLineNumbers();
}
if (configuration.preverify)
{
preverify();
}
if (configuration.shrink ||
configuration.optimize ||
configuration.obfuscate ||
configuration.preverify)
{
sortClassElements();
}
if (configuration.programJars.hasOutput())
{
writeOutput();
}
if (configuration.dump != null)
{
dump();
}
}
可以看到包括压缩、优化、混淆等功能都是在execute方法里面执行的,下面我们分析下类的读取加载过程。
Configuration里面保存了前面解析出来的programJars跟libraryJars,它们对应的就是应用程序类jar包以及第三方依赖库jar包,readInput方法会根据这些jar包路径,读取jar文件并且解压,最终会把class字节码读取到ProgramClass对象里,而第三方库class文件会读取到LibraryClass对象里。
private void readInput() throws IOException
{
if (configuration.verbose)
{
System.out.println("Reading input...");
}
// Fill the program class pool and the library class pool.
new InputReader(configuration).execute(programClassPool, libraryClassPool);
}
readInput方法非常的简单,就是实例化了一个InputReader对象,通过它的execute方法来读取jar文件
/**
* Fills the given program class pool and library class pool by reading
* class files, based on the current configuration.
*/
public void execute(ClassPool programClassPool,
ClassPool libraryClassPool) throws IOException
{
//省略无相关代码...
readInput("Reading program ",
configuration.programJars,
new ClassFilter(
new ClassReader(false,
configuration.skipNonPublicLibraryClasses,
configuration.skipNonPublicLibraryClassMembers,
warningPrinter,
new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
new ClassPoolFiller(programClassPool)))));
//省略无相关代码...
}
/**
* Reads all input entries from the given section of the given class path.
*/
public void readInput(String messagePrefix,
ClassPath classPath,
int fromIndex,
int toIndex,
DataEntryReader reader) throws IOException
{
for (int index = fromIndex; index < toIndex; index++)
{
ClassPathEntry entry = classPath.get(index);
if (!entry.isOutput())
{
readInput(messagePrefix, entry, reader);
}
}
}
ClassPath内部维护了一个ClassPathEntry队列,每个ClassPathEntry本质上就是一个jar文件的描述,readInput会遍历这些ClassPathEntry,一个个的进行解析。
/**
* Reads the given input class path entry.
*/
private void readInput(String messagePrefix,
ClassPathEntry classPathEntry,
DataEntryReader dataEntryReader) throws IOException
{
//省略部分代码...
// Create a reader that can unwrap jars, wars, ears, and zips.
DataEntryReader reader =
DataEntryReaderFactory.createDataEntryReader(messagePrefix,
classPathEntry,
dataEntryReader);
// Create the data entry pump.
DirectoryPump directoryPump =
new DirectoryPump(classPathEntry.getFile());
// Pump the data entries into the reader.
directoryPump.pumpDataEntries(reader);
}
首先会创建一个DataEntryReader,但是这个DataEntryReader其实并不是真正读取class字节码的Reader,这里创建出来的DataEntryReader只是负责根据不同的文件类型,譬如.apk
.jar
".aar"
等等做一些不一样的解压处理,在Android里面这里返回的DataEntryReader具体就是JarReader对象。
跟着后面创建了DirectoryPump对象,并且调用了它的pumpDataEntries方法,内部如下:
public void pumpDataEntries(DataEntryReader dataEntryReader)
throws IOException
{
if (!directory.exists())
{
throw new IOException("No such file or directory");
}
readFiles(directory, dataEntryReader);
}
/**
* Reads the given subdirectory recursively, applying the given DataEntryReader
* to all files that are encountered.
*/
private void readFiles(File file, DataEntryReader dataEntryReader)
throws IOException
{
// Pass the file data entry to the reader.
dataEntryReader.read(new FileDataEntry(directory, file));
if (file.isDirectory())
{
// Recurse into the subdirectory.
File[] listedFiles = file.listFiles();
for (int index = 0; index < listedFiles.length; index++)
{
File listedFile = listedFiles[index];
try
{
readFiles(listedFile, dataEntryReader);
}
catch (IOException e)
{
throw (IOException)new IOException("Can't read ["+listedFile.getName()+"] ("+e.getMessage()+")").initCause(e);
}
}
}
}
可以看到pumpDataEntries内部就是递归遍历文件,交给了前面创建的DataEntryReader来处理,由于JarReader实现了这个接口,我们直接看JarReader的read方法,
public void read(DataEntry dataEntry) throws IOException
{
ZipInputStream zipInputStream = new ZipInputStream(dataEntry.getInputStream());
// Get all entries from the input jar.
while (true)
{
// Can we get another entry?
ZipEntry zipEntry = zipInputStream.getNextEntry();
if (zipEntry == null)
{
break;
}
// Delegate the actual reading to the data entry reader.
dataEntryReader.read(new ZipDataEntry(dataEntry,
zipEntry,
zipInputStream));
}
//省略部分代码...
}
JarReader顾名思义,其实就只负责了解压jar文件的作用,解压出来的class文件最终会交给dataEntryReader来处理,proguard源码里面大量的使用了这种委托代理的模式,一层一层的传递去处理,这个地方的dataEntryReader本质上是在InputReader的execute方法里创建出来的ClassReader类型对象。
public void read(DataEntry dataEntry) throws IOException
{
// Get the input stream.
InputStream inputStream = dataEntry.getInputStream();
// Wrap it into a data input stream.
DataInputStream dataInputStream = new DataInputStream(inputStream);
// Create a Clazz representation.
Clazz clazz;
if (isLibrary)
{
clazz = new LibraryClass();
clazz.accept(new LibraryClassReader(dataInputStream, skipNonPublicLibraryClasses, skipNonPublicLibraryClassMembers));
}
else
{
clazz = new ProgramClass();
clazz.accept(new ProgramClassReader(dataInputStream));
}
//省略部分代码...
}
到这里我们终于看到class文件的读取了,前面已经提到过了如果是应用程序类文件会被读到ProgramClass,而第三方依赖库类文件会读到LibraryClass里,这里我们只分析ProgramClass。
有趣的是ClassReader本身并不负责解析读取class字节码,真正的解析读取工作是由ProgramClassReader
跟LibraryClassReader
来完成的,这里我们只分析ProgramClassReader
ProgramClassReader
ProgramClassReader类实现了ClassVisitor
接口,值得注意的是proguard在class类字节码的读写里大量的使用了xxxVisitor这种设计思想,读写class类得用ClassVisitor
接口,读写类成员得用MemberVisitor
,读写常量池也有一套对应的Visitor等等。ProgramClassReader都实现了这些接口,我们先看visitProgramClass
,这是读取类的入口,代码如下:
public void visitProgramClass(ProgramClass programClass)
{
//魔法头,四个字节 对应的就是CAFE BABE
// Read and check the magic number.
programClass.u4magic = dataInput.readInt();
ClassUtil.checkMagicNumber(programClass.u4magic);
//跟着魔法头后面的是版本号,高低版本号各用两个字节表示.
// Read and check the version numbers.
int u2minorVersion = dataInput.readUnsignedShort();
int u2majorVersion = dataInput.readUnsignedShort();
programClass.u4version = ClassUtil.internalClassVersion(u2majorVersion,
u2minorVersion);
ClassUtil.checkVersionNumbers(programClass.u4version);
//在版本号后面的是常量池长度,用两个字节来表示
// Read the constant pool. Note that the first entry is not used.
programClass.u2constantPoolCount = dataInput.readUnsignedShort();
//创建常量池,下面开始遍历,填充常量池里的每一项。
programClass.constantPool = new Constant[programClass.u2constantPoolCount];
for (int index = 1; index < programClass.u2constantPoolCount; index++)
{
Constant constant = createConstant();
constant.accept(programClass, this);
programClass.constantPool[index] = constant;
// Long constants and double constants take up two entries in the
// constant pool.
int tag = constant.getTag();
if (tag == ClassConstants.CONSTANT_Long ||
tag == ClassConstants.CONSTANT_Double)
{
programClass.constantPool[++index] = null;
}
}
//在常量池后面的是访问标志,两个字节表示
// Read the general class information.
programClass.u2accessFlags = dataInput.readUnsignedShort();
//在访问标志后面的是类索引,两个字节表示
programClass.u2thisClass = dataInput.readUnsignedShort();
//在类索引后面的是父类索引,两个字节表示
programClass.u2superClass = dataInput.readUnsignedShort();
//在父类索引后面的是接口计数器,两个字节表示
// Read the interfaces.
programClass.u2interfacesCount = dataInput.readUnsignedShort();
//创建接口索引集合,接着是遍历填充接口索引集合里的每一项.
programClass.u2interfaces = new int[programClass.u2interfacesCount];
for (int index = 0; index < programClass.u2interfacesCount; index++)
{
programClass.u2interfaces[index] = dataInput.readUnsignedShort();
}
//在接口索引集合后面的是字段个数,两个字节来表示
// Read the fields.
programClass.u2fieldsCount = dataInput.readUnsignedShort();
//创建字段集合,并且遍历填充字段集合里的每一项.
programClass.fields = new ProgramField[programClass.u2fieldsCount];
for (int index = 0; index < programClass.u2fieldsCount; index++)
{
ProgramField programField = new ProgramField();
this.visitProgramField(programClass, programField);
programClass.fields[index] = programField;
}
//在字段集合后面的是方法计数器,用两个字节来表示.
// Read the methods.
programClass.u2methodsCount = dataInput.readUnsignedShort();
//创建方法集合,并且遍历填充方法集合里的每一项.
programClass.methods = new ProgramMethod[programClass.u2methodsCount];
for (int index = 0; index < programClass.u2methodsCount; index++)
{
ProgramMethod programMethod = new ProgramMethod();
this.visitProgramMethod(programClass, programMethod);
programClass.methods[index] = programMethod;
}
//在方法集合后面的是附加属性计数器,用两个字节来表示
// Read the class attributes.
programClass.u2attributesCount = dataInput.readUnsignedShort();
//创建附加属性集合,并且遍历填充附加属性集合里的每一项.
programClass.attributes = new Attribute[programClass.u2attributesCount];
for (int index = 0; index < programClass.u2attributesCount; index++)
{
Attribute attribute = createAttribute(programClass);
attribute.accept(programClass, this);
programClass.attributes[index] = attribute;
}
}
visitProgramClass方法里面的每一步我都通过注释说明上了,简单的来说就是通过读取class字节码的方式,一个字节一个字节的把class内容读取到ProgramClass对象里,对class字节码格式熟悉的,这段代码逻辑不难理解。
回到ClassReader里,当class字节码被成功读取到ProgramClass对象来,接着下面便是把它添加到ClassPool
里了,
public void read(DataEntry dataEntry) throws IOException
{
//此处省略很多代码...
clazz = new ProgramClass();
clazz.accept(new ProgramClassReader(dataInputStream));
//此处的classVisitor其实是ClassPoolFiller对象.
clazz.accept(classVisitor);
}
classVisitor是ClassPoolFiller对象,在InputReader里面被new出来的,代码如下:
new ClassReader(false,
configuration.skipNonPublicLibraryClasses,
configuration.skipNonPublicLibraryClassMembers,
warningPrinter,
new ClassPresenceFilter(programClassPool, duplicateClassPrinter,
new ClassPoolFiller(programClassPool)))
public class ClassPoolFiller extends SimplifiedVisitor implements ClassVisitor
{
private final ClassPool classPool;
/**
* Creates a new ClassPoolFiller.
*/
public ClassPoolFiller(ClassPool classPool)
{
this.classPool = classPool;
}
// Implementations for ClassVisitor.
public void visitAnyClass(Clazz clazz)
{
classPool.addClass(clazz);
}
}
代码比较简单,ClassPoolFiller内部持有了programClassPool,当ProgramClass初始化成功后,就会被添加到programClassPool里面。整个流程大概可以总结为:
- DirectoryPump 负责遍历目录
- JarReader 负责解压jar包
- ClassReader 负责io读取 读取class字节流
- ProgramClassReader 负责把字节流格式化成 ProgramClass对象
总结
本节主要是从源码的角度出发,分析了下proguard是怎么把jar包读取到ClassPool里面的,当把class字节码读取出来并且管理起来,接着就可以开始对它们进行一些裁剪跟混淆工作了,下一节我们将会继续分析proguard是如何裁剪压缩代码的。