hive之SerDe概述

1.概述

    当进程在进行远程通信时,彼此可以发送各种类型的数据,无论是什么类型的数据都会以二进制序列的形式在网络上传送。发送方需要把对象转化为字节序列才可在网络上传输,称为对象序列化;接收方则需要把字节序列恢复为对象,称为对象的反序列化。Hive的反序列化是对key/value反序列化成hive table的每个列的值。Hive可以方便的将数据加载到表中而不需要对数据进行转换,这样在处理海量数据时可以节省大量的时间。

2. SerDe使用

   用户在建表时可以用自定义的SerDe或使用Hive自带的SerDe,SerDe能为表指定列,且对列指定相应的数据。

创建指定SerDe表时,使用row format row_format参数。

编写序列化类TestDeserializer。实现Deserializer接口的三个函数:

a)初始化:initialize(Configuration conf, Properties tb1)。

b)反序列化Writable类型返回Object:deserialize(Writable blob)。

c)获取deserialize(Writable blob)返回值Object的inspector:getObjectInspector()。

3. 示例

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hive.serde2.Deserializer;
import org.apache.hadoop.hive.serde2.SerDeException;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import  org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import  org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.ObjectInspectorOptions;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

public class TestDeserializer implements Deserializer {
  private static List FieldNames = new ArrayList();
  private static List FieldNamesObjectInspectors = new ArrayList();
  static {
    FieldNames.add("time");
    FieldNamesObjectInspectors.add(ObjectInspectorFactory
          .getReflectionObjectInspector(Long.class,
               ObjectInspectorOptions.JAVA));
    FieldNames.add("userid");
    FieldNamesObjectInspectors.add(ObjectInspectorFactory
          .getReflectionObjectInspector(Integer.class,
               ObjectInspectorOptions.JAVA));
    FieldNames.add("host");
    FieldNamesObjectInspectors.add(ObjectInspectorFactory
          .getReflectionObjectInspector(String.class,
               ObjectInspectorOptions.JAVA));

    FieldNames.add("path");
    FieldNamesObjectInspectors.add(ObjectInspectorFactory
          .getReflectionObjectInspector(String.class,
               ObjectInspectorOptions.JAVA));

   }

   @Override
  public Object deserialize(Writable blob) {
    try {
       if (blob instanceof Text) {
          String line = ((Text) blob).toString();
         if (line == null)
            return null;
          String[] field = line.split("/t");
         if (field.length != 3) {
            return null;
          }
          Listresult = new ArrayList();
          URL url = new URL(field[2]);
          Long time = Long.valueOf(

你可能感兴趣的:(Hadoop研究)