浅谈Spark Kryo serialization

原创文章,转载请注明: 转载自http://www.cnblogs.com/tovin/p/3833985.html

 

最近在使用spark开发过程中发现当数据量很大时,如果cache数据将消耗很多的内存。为了减少内存的消耗,测试了一下 Kryo serialization的使用

代码包含三个类,KryoTest、MyRegistrator、Qualify。

 我们知道在Spark默认使用的是Java自带的序列化机制。如果想使用Kryo serialization,只需要添加KryoTest类中的红色部分,指定spark序列化类

另外还需要增加MyRegistrator类,注册需要用Kryo序列化的类

 1 public class KryoTest {

 2     public static void main(String[] args) {

 3         SparkConf conf = new SparkConf();

 4         conf.setMaster("local");

 5         conf.setAppName("KryoTest");

 6         conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");

 7         conf.set("spark.kryo.registrator", "MyRegistrator");

 8         

 9         JavaSparkContext sc = new JavaSparkContext(conf);

10 

11         JavaRDD<String> rdd = sc.textFile("/home/hdpusr/qualifying.txt");

12         JavaRDD<Qualify> map = rdd.map(new Function<String, Qualify>() {

13             /* (non-Javadoc)

14              * @see org.apache.spark.api.java.function.Function#call(java.lang.Object)

15              */

16             public Qualify call(String v1) throws Exception {

17                 // TODO Auto-generated method stub

18                 String s[] =  v1.split(",");

19                 Qualify q = new Qualify();

20                 q.setA(Integer.parseInt(s[0]));

21                 q.setB(Long.parseLong(s[1]));

22                 q.setC(s[2]);

23                 

24                 

25                 return q;

26             }

27         });

28         map.persist(StorageLevel.MEMORY_AND_DISK_SER());

29         System.out.println(map.count());

30     }

31 }
 1 import org.apache.spark.serializer.KryoRegistrator;

 2 

 3 import com.esotericsoftware.kryo.Kryo;

 4 

 5 public class MyRegistrator implements KryoRegistrator{

 6     /* (non-Javadoc)

 7      * @see org.apache.spark.serializer.KryoRegistrator#registerClasses(com.esotericsoftware.kryo.Kryo)

 8      */

 9     public void registerClasses(Kryo arg0) {

10         // TODO Auto-generated method stub

11         arg0.register(Qualify.class);

12     }

13 }
 1 import java.io.Serializable;

 2 

 3 

 4 public class Qualify implements Serializable{

 5     int a;

 6     long b;

 7     String c;

 8     public int getA() {

 9         return a;

10     }

11     public void setA(int a) {

12         this.a = a;

13     }

14     public long getB() {

15         return b;

16     }

17     public void setB(long b) {

18         this.b = b;

19     }

20     public String getC() {

21         return c;

22     }

23     public void setC(String c) {

24         this.c = c;

25     }

26     

27 }

 

下面我们看看使用Java serializationKryo serialization的效果对比

Java serialization

  

 

Kryo serialization

从实际跑的数据可以看出还是能节省不少内存的。当内存不够用的时候建议使用Kryo serialization这种方式

 

 

原创文章,转载请注明: 转载自http://www.cnblogs.com/tovin/p/3833985.html

 

你可能感兴趣的:(serialization)