如果在代码中引入了第三方jar包中的类,但是这个类不可序列化,在不能改变这个类源码的情况下如何序列化呢?下面看下Apache Flink 1.14.0的源码中是如何解决这个问题的。
org.apache.flink.connectors.hive.JobConfWrapper这个类就是解决org.apache.hadoop.mapred.JobConf不可序列化而封装的一个类。
public class JobConfWrapper implements Serializable {
private static final long serialVersionUID = 1L;
private transient JobConf jobConf;
public JobConfWrapper(JobConf jobConf) {
this.jobConf = jobConf;
}
public JobConf conf() {
return jobConf;
}
private void writeObject(ObjectOutputStream out) throws IOException {
out.defaultWriteObject();
// we write the jobConf through a separate serializer to avoid cryptic exceptions when it
// corrupts the serialization stream
final DataOutputSerializer ser = new DataOutputSerializer(256);
jobConf.write(ser);
out.writeInt(ser.length());
out.write(ser.getSharedBuffer(), 0, ser.length());
}
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
in.defaultReadObject();
final byte[] data = new byte[in.readInt()];
in.readFully(data);
final DataInputDeserializer deser = new DataInputDeserializer(data);
this.jobConf = new JobConf();
try {
jobConf.readFields(deser);
} catch (IOException e) {
throw new IOException(
"Could not deserialize JobConf, the serialized and de-serialized don't match.",
e);
}
Credentials currentUserCreds =
HadoopInputFormatCommonBase.getCredentialsFromUGI(
UserGroupInformation.getCurrentUser());
if (currentUserCreds != null) {
jobConf.getCredentials().addAll(currentUserCreds);
}
}
}
现在看下这个JobConfWrapper都做了哪些事情
- JobConfWrapper是可序列化的,因为实现了Serializable接口
- JobConfWrapper有一个成员变量就是JobConf,因为JobConf不可序列化,所以声明为transient跳过自动序列化
- 实现了writeObject和readObject方法,这两个方法可以自定义对JobConf如何序列化和反序列化,其他的序列化和反序列化使用out.defaultWriteObject()和in.defaultReadObject()来完成
这个问题可以从另一个角度总结为,如果一个类是可序列化的(JobConfWrapper),但是这个类有一个成员变量是不可序列化的(JobConf),可以通过writeObject和readObject方法对不可序列化的成员变量进行手动的序列化与反序列化。
样例说明
Collar是不可序列化的,里面有两个属性,int类型和String类型
public class Collar {
private int size;
private String colour;
public Collar(int size, String colour) {
this.size = size;
this.colour = colour;
}
public int getSize() {
return size;
}
public String getColour() {
return colour;
}
@Override
public String toString() {
return "Collar{" +
"size=" + size +
", colour='" + colour + '\'' +
'}';
}
}
Dog这个类是可序列化,有一个String类型的name和一个Collar类型的collar。注意writeObject里面调用write的顺序要与readObject里面调用read的顺序保持一致。
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class Dog implements Serializable {
private String name;
private transient Collar collar;
public Dog(String name, Collar collar) {
this.name = name;
this.collar = collar;
}
private void writeObject(ObjectOutputStream os) throws Exception {
os.defaultWriteObject();
os.writeInt(collar.getSize());
os.writeUTF(collar.getColour());
}
private void readObject(ObjectInputStream is) throws Exception {
is.defaultReadObject();
int collarSize = is.readInt();
String collarColour = is.readUTF();
this.collar = new Collar(collarSize, collarColour);
}
@Override
public String toString() {
return "Dog{" +
"name='" + name + '\'' +
", collar=" + collar +
'}';
}
}
测试类
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
public class SerializeDog {
public static void main(String[] args) {
Collar collar = new Collar(10, "Red");
Dog dog = new Dog("Chase", collar);
System.out.println("原始对象:" + dog);
try {
FileOutputStream fs = new FileOutputStream("testSer.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(dog);
os.close();
}catch (Exception e){
e.printStackTrace();
}
Dog newDog = null;
try {
FileInputStream fis = new FileInputStream("testSer.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
newDog = (Dog)ois.readObject();
ois.close();
}catch (Exception e){
e.printStackTrace();
}
System.out.println("新对象:" + newDog);
}
}
输出结果
原始对象:Dog{name='Chase', collar=Collar{size=10, colour='Red'}}
新对象:Dog{name='Chase', collar=Collar{size=10, colour='Red'}}
总结
对于writeObject和readObject的原理,已经有很多文章和书籍介绍过就不在本文中赘述了,本文主要是希望读者能够理解这两个方法的使用场景并能正确的使用。