Android安全攻击——对象序列化OOM问题

前言


        最近在项目中使用ObjectInputStream/ObjectOutputStream进行对象的序列化和反序列化,出现了OOM的问题,在解决的过程中简单的研究了一下对象的序列化和反序列化(使用Serializable接口)的过程,简单做一个记录。发现了一个持久化存储序列化数据的安全风险,可能会受到恶意攻击,导致必现的OOM。

使用场景


1 数据使用方案


        持久化过程:应用在使用过程中,首先使用ObjectOutputStream的writeObject接口将对象序列化成byte数据,然后利用加密算法对序列化数据进行加密,最终将加密后的数据持久化存储到应用的数据目录下的某个文件中。



        读取解析过程:首先将数据从文件中读取出来,然后用对应的解密算法解密,最后使用对应的ObjectInputStream的readObject接口将字节流解析成对应的对象。



2 遇到的问题

        上述方案在使用的过程中,遇到以下两种OOM的崩溃

(1) OOM 1
java.lang.OutOfMemoryError: Failed to allocate a 942137073 byte allocation with 4194240 free bytes and 487MB until OOM
	at java.io.ObjectInputStream.readBlockDataLong(ObjectInputStream.java:569)
	at java.io.ObjectInputStream.readContent(ObjectInputStream.java:699)
	at java.io.ObjectInputStream.discardData(ObjectInputStream.java:636)
	at java.io.ObjectInputStream.readNewClassDesc(ObjectInputStream.java:1662)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:657)
	at java.io.ObjectInputStream.readNewObject(ObjectInputStream.java:1782)
	at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:761)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1983)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1940)
(2) OOM 2
java.lang.OutOfMemoryError: Failed to allocate a 789137073 byte allocation with 2317152 free bytes and 456MB until OOM  
    at java.io.DataInputStream.decodeUTF  
    at java.io.DataInputStream.decodeUTF  
    at java.io.ObjectInputStream.readContent(ObjectInputStream.java:699)  
    at java.io.ObjectInputStream.discardData(ObjectInputStream.java:636)  
    at java.io.ObjectInputStream.readNewClassDesc(ObjectInputStream.java:1662)  
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:657)  
    at java.io.ObjectInputStream.readNewObject(ObjectInputStream.java:1782)  
    at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:761)  
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1983)  
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1940)  
        堆栈里面大致的意思是,在用ObjectInputStream的readObject接口进行对象的反序列化的时候,需要分配900M+/700M+的内存,导致上层出现OOM,众所周知,应用java层能够分配的最大内存由系统属性dalvik.vm.heapsize定义,这个值根据不同的厂商和机器都有可能是不一样的,我手上的测试机配如下:

Android安全攻击——对象序列化OOM问题_第1张图片

        该机器的heapsize设置为256M,也就是该机器的每个应用虚拟机能够分配的最大内存即为256M,当虚拟机需要的内存超过256M时,会出现OutOfMemoryError的问题,这边顺便记录一下,很多人用Exception去捕获所有的异常,但是这样并不能捕获OutOfMemoryError,看一下继承关系:

Android安全攻击——对象序列化OOM问题_第2张图片
        由继承关系可知,OutOfMemoryError是继承自Error,和Exception并不是一个继承分支,因此想要捕获包括Error在内的所有异常,必须使用Throwable去捕获。

3 分析问题


3.1 堆栈分析


        上述两个OOM实际上出现的原因是一样的,下面使用OOM1来着重分析这个问题,也就是最终调用ObjectInputStream.readBlockDataLong出现的OOM问题,先看一下这个函数:
    /**
     * Reads and returns an array of raw bytes with primitive data. The array
     * will have up to 255 bytes. The primitive data will be in the format
     * described by {@code DataOutputStream}.
     *
     * @return The primitive data read, as raw bytes
     *
     * @throws IOException
     *             If an IO exception happened when reading the primitive data.
     */
    private byte[] readBlockData() throws IOException {
        byte[] result = new byte[input.readByte() & 0xff];
        input.readFully(result);
        return result;
    }

    /**
     * Reads and returns an array of raw bytes with primitive data. The array
     * will have more than 255 bytes. The primitive data will be in the format
     * described by {@code DataOutputStream}.
     *
     * @return The primitive data read, as raw bytes
     *
     * @throws IOException
     *             If an IO exception happened when reading the primitive data.
     */
    private byte[] readBlockDataLong() throws IOException {
        byte[] result = new byte[input.readInt()];
        input.readFully(result);
        return result;
    }
        上面贴出来了两个函数,readBlockData和readBlockDataLong函数,从函数名称分析,这两个函数的功能应该是类似的,readBlockDataLong函数像是用于读取较大数据量的数据,看一下注释,readBlockData函数用于读取数据量小于等于255的数据块,readBlockDataLong函数用于读取数据量大于255的数据块。
继续向上看堆栈,里面调用到了ObjectInputStream.readContent函数,看一下这个函数:
    /**
     * Reads the content of the receiver based on the previously read token
     * {@code tc}.
     *
     * @param tc
     *            The token code for the next item in the stream
     * @return the object read from the stream
     *
     * @throws IOException
     *             If an IO exception happened when reading the class
     *             descriptor.
     * @throws ClassNotFoundException
     *             If the class corresponding to the object being read could not
     *             be found.
     */
    private Object readContent(byte tc) throws ClassNotFoundException,
            IOException {
        switch (tc) {
            case TC_BLOCKDATA:
                return readBlockData();
            case TC_BLOCKDATALONG:
                return readBlockDataLong();

            case TC_CLASSDESC:
                return readNewClassDesc(false);

            case TC_OBJECT:
                return readNewObject(false);

            case TC_LONGSTRING:
                return readNewLongString(false);

            case TC_EXCEPTION:
                Exception exc = readException();
                throw new WriteAbortedException("Read an exception", exc);
            case TC_RESET:
                resetState();
                return null;
            default:
                throw corruptStream(tc);
        }
    }
        这个函数是根据不同的tc(这里面认为是token),决定以不同的格式读取tc后面的数据,这个不禁让人想起利用ObjectInputStream/ObjectOutputStream进行序列化和反序列化时应该有一个特定的格式,或者说是标准,于是google了一下,找到了Serialize进行序列化的标准,见:

                                                                Grammar for the Stream Format
        
        该标准定义了Serialize序列化时每个部分写入时的顺序以及对应的tc,本文重点分析问题,不重点讲解Serialize序列化的格式标准,有兴趣的同学可以自己参照标准研究一下。上面的OOM问题也就大致能定位原因了:反序列化的数据中包含了TC_BLOCKDATALONG 的token,导致在进行反序列化的时候走到了readBlockDataLong函数中,再往上一层堆栈走,看一下ObjectInputStream.readNewClassDesc和ObjectInputStream.discardData函数:
    /**
     * Reads a new class descriptor from the receiver. It is assumed the class
     * descriptor has not been read yet (not a cyclic reference). Return the
     * class descriptor read.
     *
     * @param unshared
     *            read the object unshared
     * @return The {@code ObjectStreamClass} read from the stream.
     *
     * @throws IOException
     *             If an IO exception happened when reading the class
     *             descriptor.
     * @throws ClassNotFoundException
     *             If a class for one of the objects could not be found
     */
    private ObjectStreamClass readNewClassDesc(boolean unshared)
            throws ClassNotFoundException, IOException {


        ObjectStreamClass newClassDesc = readClassDescriptor();
        registerObjectRead(newClassDesc, descriptorHandle, unshared);
        descriptorHandle = oldHandle;
        primitiveData = emptyStream;

        //load class...

        // Consume unread class annotation data and TC_ENDBLOCKDATA
        discardData();
        checkedSetSuperClassDesc(newClassDesc, readClassDesc());
        return newClassDesc;
    }

    /**
     * Reads and discards block data and objects until TC_ENDBLOCKDATA is found.
     *
     * @throws IOException
     *             If an IO exception happened when reading the optional class
     *             annotation.
     * @throws ClassNotFoundException
     *             If the class corresponding to the class descriptor could not
     *             be found.
     */
    private void discardData() throws ClassNotFoundException, IOException {
        primitiveData = emptyStream;
        boolean resolve = mustResolve;
        mustResolve = false;
        do {
            byte tc = nextTC();
            if (tc == TC_ENDBLOCKDATA) {
                mustResolve = resolve;
                return; // End of annotation
            }
            readContent(tc);
        } while (true);
    }
        看一下ObjectInputStream.readNewClassDesc函数注释,结合相关的代码,大概可以知道该函数的主要功能是读取序列化数据中class的描述,并用classloader将对应的class加载上来,然后调用discardData函数,看一下这个函数调用上面的注释,读取和消费不需要的数据,可能是一些注解annotation数据,直到读到TC_ENDBLOCKDATA为止。看一下TC_ENDBLOCKDATA的定义:
    /**
     * Tag to mark a long block of data. The long following this tag
     * indicates the size of the block.
     */
    public static final byte TC_BLOCKDATALONG = (byte) 0x7A;
这个tc代表的后面的数据块将是一个较大的数据块,tc后面的int型数据(4个字节组成)代表的是这个数据块的数据长度。
         进一步的,导致问题的原因可以总结为:利用ObjectInputStream.readObject接口进行对象的反序列化时,读取完class的相关数据,利用classloader加载完该class后,ObjectInputStream.discardData函数会尝试消耗掉反序列化时不需要的TC_ENDBLOCKDATA数据,在读取后面的4字节组成的数据长度后,调用readBlockDataLong函数创建一个int型大小的byte数组时,出现了OOM。

3.2 TC_ENDBLOCKDATA异常数据分析


        要看TC_ENDBLOCKDATA数据正常情况下什么时候会被写入,要从序列化的流程ObjectOutputStream函数中查找线索,在ObjectOutputStream.java中搜索TC_ENDBLOCKDATA,看到TC_ENDBLOCKDATA仅在函数drain中被使用到,看一下该函数:
    /**
     * Writes buffered data to the target stream. This is similar to {@code
     * flush} but the flush is not propagated to the target stream.
     *
     * @throws IOException
     *             if an error occurs while writing to the target stream.
     */
    protected void drain() throws IOException {
        if (primitiveTypes == null || primitiveTypesBuffer == null) {
            return;
        }

        // If we got here we have a Stream previously created
        int offset = 0;
        byte[] written = primitiveTypesBuffer.toByteArray();
        // Normalize the primitive data
        while (offset < written.length) {
            int toWrite = written.length - offset > 1024 ? 1024
                    : written.length - offset;
            if (toWrite < 256) {
                output.writeByte(TC_BLOCKDATA);
                output.writeByte((byte) toWrite);
            } else {
                output.writeByte(TC_BLOCKDATALONG);
                output.writeInt(toWrite);
            }

            // write primitive types we had and the marker of end-of-buffer
            output.write(written, offset, toWrite);
            offset += toWrite;
        }

        // and now we're clean to a state where we can write an object
        primitiveTypes = null;
        primitiveTypesBuffer = null;
    }
        分析一下该函数可知,TC_BLOCKDATALONG标记和后面int型的长度字段是一起被写入到output流中的,再看上面的长度最大不会超过1024,当数据量较大时,整个数据块被分成多个大小为1024字节的TC_BLOCKDATALONG数据库写入到output流中,也就是说正常情况下,系统中TC_BLOCKDATALONG后面的长度字段不可能超过1024,因此,可以得出结论,上述出现OOM的过程中应该是最终用来进行反序列化的数据本身是有问题的,进一步的,极有可能是在数据存储、数据解密的过程中出现的问题。

3.3 异常复现

       
        经过上述分析可知,最终进行反序列的数据有问题,导致OOM,顺着这个思路,直接看一下ObjectInputStream.writeClassDesc函数:
    /**
     * Write a class descriptor {@code classDesc} (an
     * {@code ObjectStreamClass}) to the stream.
     *
     * @param classDesc
     *            The class descriptor (an {@code ObjectStreamClass}) to
     *            be dumped
     * @param unshared
     *            Write the object unshared
     * @return the handle assigned to the class descriptor
     *
     * @throws IOException
     *             If an IO exception happened when writing the class
     *             descriptor.
     */
    private int writeClassDesc(ObjectStreamClass classDesc, boolean unshared) throws IOException {
        if (classDesc == null) {
            writeNull();
            return -1;
        }


        output.writeByte(TC_CLASSDESC);

        writeClassDescriptor(classDesc);

            annotateClass(classToWrite);
            drain(); // flush primitive types in the annotation
            output.writeByte(TC_ENDBLOCKDATA);
            writeClassDesc(classDesc.getSuperclass(), unshared);

        
        return handle;
    }
    /**
     * Writes optional information for class {@code aClass} to the output
     * stream. This optional data can be read when deserializing the class
     * descriptor (ObjectStreamClass) for this class from an input stream. By
     * default, no extra data is saved.
     *
     * @param aClass
     *            the class to annotate.
     * @throws IOException
     *             if an error occurs while writing to the target stream.
     * @see ObjectInputStream#resolveClass(ObjectStreamClass)
     */
    protected void annotateClass(Class aClass) throws IOException {
        // By default no extra info is saved. Subclasses can override
    }
        看下这个函数,里面调用writeClassDescriptor函数将class的描述写入到output中,然后调用annotateClass函数,接着写入TC_ENDBLOCKDATA,作为class描述的结束符,上面的ObjectInputStream.readNewClassDesc函数在读出class的描述后,会调用discardData函数,这个函数会检查在class的描述后面是否存在对应的tc。
        根据这个思路可以继承ObjectInputStream函数,并在annotateClass函数中写入(TC_BLOCKDATALONG, 数据长度),当写入的数据长度较大时,会出现必现的OOM,代码如下:
import android.util.Log;

import java.io.DataOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.lang.reflect.Field;

public class AnObjectOutputStream extends ObjectOutputStream {

    private static final String TAG = "AnObjectOutputStream";

    /**
     * 复现堆栈java.io.ObjectInputStream.readBlockDataLong
     * 默认复现这个堆栈
     */
    private static byte[] DISCARD_BYTES_LONG_DATA = new byte[] {
            0x7a, 0x7a, 0x7a, 0x67, 0x67
    };

    /**
     * 复现堆栈 java.io.DataInputStream.decodeUTF
     *         java.io.DataInputStream.decodeUTF
     *         java.io.ObjectInputStream.readNewLongString
     */
    private static byte[] DISCARD_BYTES_LONG_STRING = new byte[] {
            0x7c, 0x7a, 0x7a, 0x67, 0x67
    };

    private DataOutputStream mInnerOutput;
    
    private boolean mStackBlockData = true;

    public AnObjectOutputStream(OutputStream input) throws IOException {
        super(input);
    }

    /**
     * 调用setStackBlockData(false),将复现下面的堆栈
     * 复现堆栈 java.io.DataInputStream.decodeUTF
     *         java.io.DataInputStream.decodeUTF
     *         java.io.ObjectInputStream.readNewLongString
     */
    public void setStackBlockData(boolean blockData) {
        mStackBlockData = blockData;
    }

    protected void annotateClass(Class aClass) throws IOException {
        // By default no extra info is saved. Subclasses can override

        Log.i(TAG, "annotateClass aClass:" + aClass);

        installOutputStream();

        if (mInnerOutput == null) {
            return;
        }

        if (mStackBlockData) {
            mInnerOutput.write(DISCARD_BYTES_LONG_DATA);
        } else {
            mInnerOutput.write(DISCARD_BYTES_LONG_STRING);
        }

        Log.i(TAG, "annotateClass write success");

    }

    private void installOutputStream() {
        Object obj = null;
        try {
            Field field = getClass().getSuperclass().getDeclaredField("output");
            field.setAccessible(true);
            obj = field.get(this);
        } catch (Exception e) {
            e.printStackTrace();
        }

        if (obj == null) {
            Log.i(TAG, "installOutputStream failed");
            return;
        }

        mInnerOutput = (DataOutputStream)obj;
    }
}
        由于ObjectOutputStream中的output成员属性为private,因此需要借助反射。果然,使用AnObjectOutputStream替代常规的ObjectOutputStream,运行一下必现的OOM,完整的调用如下:
import com.example.testpopupwindow.stream.AnObjectOutputStream;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;


public class SerializeThread extends Thread {

    private static final String TAG = "SerializeThread";


    private Employee mEmployee;

    public void run() {

        mEmployee = Employee.create("test");
        Object obj = null;
        try {
            byte[] serializeRes =  serialize();
            obj = unserialize(serializeRes);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private byte[] serialize() throws IOException {

        ByteArrayOutputStream arrOs = new ByteArrayOutputStream();
        ObjectOutputStream oos = new AnObjectOutputStream(arrOs);
        oos.writeObject(mEmployee);

        oos.flush();

        byte[] outArr = arrOs.toByteArray();

        oos.close();


        return outArr;
    }

    private Object unserialize(byte[] serializedata) throws IOException {

        ByteArrayInputStream byteArrayInputStream = null;
        ObjectInputStream objectInputStream = null;
        try {
            byteArrayInputStream = new ByteArrayInputStream(serializedata);
            objectInputStream = new ObjectInputStream(byteArrayInputStream);
            return objectInputStream.readObject();
        } catch (Exception e) {
        }
        return null;
    }

    /**
     * test error....
     */
    public static class Employee implements Serializable {
        String mName;

        /**
         * test error....
         */
        private Employee(String name) {
            mName = name;
        }

        public String toString() {
            return "Employee mName:" + mName;
        }

        public static Employee create(String name) {
            return  new  Employee(name);
        }
    }

}
         只要调用new SerializeThread().start(),即会出现下面的OOM堆栈:

Android安全攻击——对象序列化OOM问题_第3张图片

3.4 安全问题

        
        由上面的OOM问题,引出来一个ObjectInputStream/ObjectOutputStream实现Serialize序列化的安全问题,使用默认的ObjectOutputStream方式生成序列化数据,保存在本地后,如果被恶意在指定位置写入类似上述的字段,会导致应用在利用被修改后的序列化数据进行反序列化时,出现必现的崩溃。假设上述Employee在被序列化后生成的文件16进制数据如下:

Android安全攻击——对象序列化OOM问题_第4张图片

插入的代码如下:
    private byte[] mDiscardBytes = new byte[] {
            0x7a, 0x7a, 0x7a, 0x67, 0x67
    };

    private byte[] modifyBlockDataSize(byte[] content) {
        for (int i=0; i
经过这个处理以后,得出的序列化数据如下:

Android安全攻击——对象序列化OOM问题_第5张图片

        被圈出来的部分为插入的数据,经过上述插入后,反序列化以后会造成应用必现的OOM崩溃。
        至于上面为什么要判断0x78,这个要参考一下ObjectInputStream.writeClassDesc和ObjectInputStream.readNewClassDesc函数,readNewClassDesc在读取完class的描述信息后,会尝试调用discardData方法读以TC_ENDBLOCKDATA(0x78)结尾之类的annoation之类的信息,而在discardData方法中会触发检查和读取TC_BLOCKDATALONG或者TC_LONGSTRING,因此只要在0x78前面插入一段TC_BLOCKDATALONG或者TC_LONGSTRING的tc和长度数据即可。

3.5 总结


(1)使用ObjectInputStream/ObjectOutputStream进行对象的序列化和反序列化出现的OOM问题,一般都是因为反序列化时的数据有问题;
(2)使用ObjectInputStream/ObjectOutputStream存在一定的安全风险,注意最起码要对序列化以后的数据进行加密
(3)在ObjectInputStream进行反序列化的时候,要用Throwable捕获包括error在内的所有异常,以便捕获OOM后继续运行


你可能感兴趣的:(Android安全攻击——对象序列化OOM问题)