Java中提供了Seriablizable接口用于对象序列化功能,序列化是为了更方便的数据传输、保存,但是往往不能过度使用,尤其不能对其有深度依赖,否则兼容性(如二进制兼容性(Binary Compatibility)、语义兼容性(Semantic Compatibility)等)等问题会频繁出现,不胜其烦。
UID是Unique Identifier的缩写,每一个可序列化类都有一个UID与之关联(和流唯一标识符(Stream Unique Identifier)有关),假如你没有对一个名为serialVersionUID的私有静态final的long域中显式地指定该标识号,系统就会自动地根据这个类来调用一个复杂的运算过程,从而在运行时产生该表示符。
一般的IDE都会有自动生成UID的附加功能,在Intellij族中,可以将serialVersionUI未定义设置成错误级别,以便发现问题
为了确保“序列化-反序列化”的过程成功,UID是必不可少的,以下述简单类为例:
public class Person implements Serializable{
public Person() {
}
private String name;
private Gender gender;
private int age;
private boolean alive;
}
如果在当前版本时,使用了writeObject将其写入文件,当若干时之后,有人在Person中添加了新的Field,譬如
private Gender aaa;
此时,使用readObject读取原先保存的文件时,就会出现如下类似的问题:
java.io.InvalidClassException: models.Person; local class incompatible: stream classdesc serialVersionUID = 2990078061752767256, local class serialVersionUID = 5301378833195569126
序列化的保存、加载很容易,如此,就会有人依赖其便捷性,从而过度依赖,在一些自定义格式的文件保存中使用,从而使得文件本身对序列化的类有了严重依赖,从而导致向前兼容的种种难点发生。同时,因为序列化对象的保存形式,直接导致其不可见性,一旦发生问题,无法定位准确的来源。
《Effective Java》中专门讨论了Serializable接口相关,此处不做详细讨论。笔者认为,Serializable意在持久化对象,那就需要对一些约定有长久的固定,不能用于需求模型不断变更的场景。
Seriablizable会在序列化过程中产生大量临时变量,从而造成频繁GC,所以在性能必须衡量的情况下,要慎用。
Android中提供了Parcelable接口用于替代Serializable,可以用于网络或进程间传递对象,但稳定性低于后者,所以在需要永久保存文件的地方,还是需要使用Serializable。
再说一种需要注意的情况:
public abstract class IGroup {
public int getGroupID() {
return groupID;
}
public void setGroupID(int groupID) {
this.groupID = groupID;
}
private int groupID;
}
public class PersonGroup extends IGroup implements Serializable {
private static final long serialVersionUID = -2581096266403253738L;
public PersonGroup(String groupTitle, int aim, long startTime) {
this.groupTitle = groupTitle;
this.aim = aim;
this.startTime = startTime;
}
private String groupTitle;
private int aim;
private long startTime;
}
如上代码中,IGroup是为了继承而设计的不可序列化类,PersonGroup是一个继承了IGroup的一个序列化类。那么,在对PersonGroup进行序列化时,IGroup中的groupID这个Field是不会被处理的。
如果父类不愿意实现Serializable接口,那么就必须提供明确的设置Field的入口,供子类主动调用。
在Serializable的使用上,有许多所谓原则性的建议,但其实都可以抛弃不理,因为具体的使用场景决定了设计模式,而不是将几句前人的经验总结奉为圭臬。前人之所以那么总结,是为了不让那些个“懒人”破坏现有代码的完整性和结构性,以及让一开始不明就里的人先动手去做。但归根到底,我们要知道这么做的原因,不能人云亦云。
这里附带提一下transient关键字,主要用来过滤不想序列化的Field,其在防止hack攻击方面很有效,在使用到Serializable的地方,如果一个Field是关系到实际运行效果且不必要进行序列化的,一定要使用transient修饰,不能图简略。
《Effective Java》中还提到了关于自定义的序列化形式,主要提到了以下几点:
1、如果没有先认真考虑默认的序列化形式是否合适,则不要贸然接受;
2、如果一个对象的物理表示法等同于它的逻辑内容,可能就适合于使用默认的序列化形式;
3、即使你确定了默认的序列化形式是合适的,通常还必须提供一个readObject方法以保证约束关系和安全性;
4、当一个对象的物理表示法与它的逻辑数据内容有实质性的区别时,使用默认序列化形式会有以下4个缺点:
书中以如下代码为例:
// Awful candidate for default serialized form
public final class StringList implements Serializable {
private int size = 0;
private Entry head = null;
private static class Entry implements Serializable {
String data;
Entry next;
Entry previous;
}
...// Remainder omitted
}
上述代码中,如果使用默认的序列化方法,则会在处理Entry时,逐层遍历(图遍历),如果其深度不可知,那么不可知的递归调用最终可能会导致栈溢出。而且,默认的序列化方法,处理了许多不必要重复的细节,浪费空间消耗。
实现了Serializable接口的类,在通过readObject反序列化的时候,并不会调用默认构造函数,也就是说,如果需要在通过ObjectInputStream在readObject时对新的对象进行一些其他初始化操作,是不能够依赖现有的构造函数中的初始化的,尤其是声明和定义分离的final属性,在这种情况下就无法对其进行赋值了。
上述设定,其实更明确了序列化的目的,开发人员不应当将所有事情混在一处处理,序列化有其专属职能。
书中给出了一个自定义序列化实现的方案:
// StringList with a reasonable custom serialized form
public final class StringList implements Serializable {
private transient int size = 0;
private transient Entry head = null;
private static class Entry {
String data;
Entry next;
Entry previous;
}
// Appends the specified string to the list
public final void add(String s) {...}
private void writeObject(ObjectOutputStream s) throws IOException {
s.defaultWriteObject();
s.writeInt(size);
// Write out all elements in the proper order
for (Entry e = head; e != null; e = e.next)
s.writeObject(e.data);
}
private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException {
s.defaultReadObject();
int numElements = s.readInt();
// Read in all elements and insert them in list
for (int i = 0; i < numElements; i++)
add((String) s.readObject());
}
...// Remainder omitted
}
书中讨论了保护性编写readObject方法,以防止一些外部hack对原有的序列化对象进行破坏,此处不多做讨论,至于hack的方法,涉及到字节流的问题,可以参考《Java Object Serialization Specification》。
如下一个单例模式测试类:
public class SingletonTest implements Serializable{
private static final long serialVersionUID = -3764549935511906697L;
public static final SingletonTest INSTANCE = new SingletonTest();
private SingletonTest() {
System.out.println("[SingletonTest.SingletonTest()] " + this.toString());
}
public void print() {
System.out.println("[SingletonTest.print()] " + this.toString());
}
}
public static void main(String[] args) {
System.out.println("Hello World! Current Encoding = " + System.getProperty("file.encoding"));
SingletonTest writeSingletonTest = SingletonTest.INSTANCE;
Tester.SpaceSerial.saveSingletonTest(writeSingletonTest);
SingletonTest readSingletonTest1 = Tester.SpaceSerial.loadSingletonTest();
System.out.println("LoadSingletonTest1 => " + (readSingletonTest1 == null ? null : readSingletonTest1.toString()));
SingletonTest readSingletonTest2 = Tester.SpaceSerial.loadSingletonTest();
System.out.println("LoadSingletonTest2 => " + (readSingletonTest2 == null ? null : readSingletonTest2.toString()));
}
程序运行结果为:
Hello World! Current Encoding = UTF-8
[SingletonTest.SingletonTest()] models.SingletonTest@74a14482
LoadSingletonTest1 => models.SingletonTest@7ba4f24f
LoadSingletonTest2 => models.SingletonTest@3b9a45b3
可见,反序列化的通道并没有处理单例模式(因为其没有一个固定的可猜测模式),最终导致两个不同的实例化对象产生,从而影响最终结果。要解决此问题,需要在序列化类中实现如下方法,用于保证实例的唯一性:
private Object readResolve() {
System.out.println("[SingletonTest.readResolve()] " + this.toString());// + " ObjectInputStream=" + s.toString());
return INSTANCE;
}
序列化、反序列化有其内部实现通道,要完全理解一些问题的原因,需要了解其设计原理。
遇到问题时查询官方文档,有时候可能比从繁杂的代码中找到答案更加快捷。
《Java Object Serialization Specification》中的3.7节描述了readResolve方法的用途:
3.7 The readResolve Method
For Serializable and Externalizable classes, the readResolve method allows a class to replace/resolve the object read from the stream before it is returned to the caller. By implementing the readResolve method, a class can directly control the types and instances of its own instances being deserialized. The method is defined as follows:
ANY-ACCESS-MODIFIER Object readResolve()
throws ObjectStreamException;
The readResolve method is called when ObjectInputStream has read an object from the stream and is preparing to return it to the caller. ObjectInputStream checks whether the class of the object defines the readResolve method. If the method is defined, the readResolve method is called to allow the object in the stream to designate the object to be returned. The object returned should be of a type that is compatible with all uses. If it is not compatible, a ClassCastException will be thrown when the type mismatch is discovered.
For example, a Symbol class could be created for which only a single instance of each symbol binding existed within a virtual machine. The readResolve method would be implemented to determine if that symbol was already defined and substitute the preexisting equivalent Symbol object to maintain the identity constraint. In this way the uniqueness of Symbol objects can be maintained across serialization.
Note - The readResolve method is not invoked on the object until the object is fully constructed, so any references to this object in its object graph will not be updated to the new object nominated by readResolve. However, during the serialization of an object with the writeReplace method, all references to the original object in the replacement object's object graph are replaced with references to the replacement object. Therefore in cases where an object being serialized nominates a replacement object whose object graph has a reference to the original object, deserialization will result in an incorrect graph of objects. Furthermore, if the reference types of the object being read (nominated by writeReplace) and the original object are not compatible, the construction of the object graph will raise a ClassCastException.
而readResolve只会用在非枚举类型的类当中,下面的描述中证实了这一点。
Process potential substitutions by the class of the object and/or by a subclass of ObjectInputStream:
a. If the class of the object is not an enum type and defines the appropriate readResolve method, the method is called to allow the object to replace itself.
b. Then if previously enabled by enableResolveObject, the resolveObject method is called to allow subclasses of the stream to examine and replace the object. If the previous step did replace the original object, the resolveObject method is called with the replacement object.
If a replacement took place, the table of known objects is updated so the replacement object is associated with the handle. The replacement object is then returned from readObject.
但正因为readResolve的存在,也使得其被外部攻击成为了可能。正如同上文中描述的,一些hack手段正是利用此类漏洞。
《Effective Java》中说,将一个可序列化的实例受控的类编写成枚举,就可以绝对保证除了所有声明的常亮之外,不会有别的实例。JVM对此提供了保障。
书中额外说明了readResolve的可访问性(accessibility),总结了其一般使用规律,此处借鉴之:
1、如果把readResolve方法放在一个final类上,它就应该是私有的;
2、如果把readResolve方法放在一个非final雷尚,就必须认真考虑它的访问性:
- 如果它是私有的,就不适用于任何子类;
- 如果它是包级私有的,就只适用于同一个包中的子类;
- 如果它是受保护的或者公有的,就是用于所有没有覆盖塔的子类;
- 如果readResolve方法是受保护或者公有的,并且子类没有覆盖它,对序列化过的子类实例进行反序列化,就会产生一个超类实例,这样有可能导致ClassCastExcption异常。
书中总结:
尽可能地使用枚举类型来实施实例控制的约束条件;
如果做不到,同时又需要一个既可序列化又是实例受控(instance-controlled)的类,就必须提供一个readResolve方法,并确保该类的所有实例域都为基本类型,或者transient。