HiveSerDe2及JsonSerDe源码分析(超详细)

转载请注明出处

Hive的数据是存储在HDFS上的,在使用Hive执行操作的时候,实际上是将sql语句解释成MR程序对HDFS上的数据进行读写操作,而一般的,文件在网络传输和存储上都是以二进制(0和1组成的比特流)的方式分发的,这样,如果需要读写数据就要进行序列化和反序列化,这里,简称SerDe,即(Serializer and Deserializer)

通常,我们在HDFS上有一个JSON格式的文件,如果我们需要用Hive与它关联就需要使用以下语句并需要相对应的SerDe程序来对它进行读写

CREATE TABLE tb_name(
	Field_1 Value_1 commend ,
	Field_2 Value_2,
	.....
	Field_n Value_n
)
PARTITIONED BY(Field_1)
CLUSTERED BY(Field_4)
STORED BY(Field_1)
INTO 200 BUCKETS
ROW FORMAT SERDE'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS TEXTFILE
LOCATION 'HDFS_PATH'

这里,Hive内置了一些SerDe:
1.Avro
2.ORC
3.RegEx
4.Thrift
5.Parquet
6.CSV
7.JsonSerDe

但有时候我们需要根据自己的业务去定义自己的SerDe。

Hive的反序列化步骤:

HDFS File->InputFileFormat-><key,value>->Deserializer->Row Object

即通过inputfileformat函数读取HDFS上的文件组成KV键值对,然后反序列化为数据库表对象

Hive的序列化步骤:

Row Object->Serializer-><key,value>->OutputFileFormat->HDFS File

即将数据表对象序列化为KV键值对并通过OutputFileForma函数输出为HDFS文件

Hive序列化方法
这里我使用的是Hive3.1.1,serde类在这个版本已经弃用而使用serde2。

首先,要实现序列化和反序列化就要继承AbstractSerDe类

public abstract class AbstractSerDe implements Deserializer, Serializer

这里,AbstractSerDe类实现Serializer和Deserializer接口,用于序列化和反序列化文件对象。

public void initialize(Configuration configuration, Properties tableProperties,
                         Properties partitionProperties) throws SerDeException {};

它有一个initialize方法,用于初始化我们创建的表

序列化方法

  public abstract Writable serialize(Object obj, ObjectInspector objInspector)
      throws SerDeException;

这里序列化方法有两个参数obj和objInspector,通过我们的序列化步骤,在序列化传过来的是Row对象即数据库表,obj实际就是数据库表数据,objInspector则是表的结构包括表的字段名和表字段对应的数据类型。

反序列化方法

 public abstract Object deserialize(Writable blob) throws SerDeException;

这里的反序列化方法传入的是Writable类型的参数,写过MR程序的都知道这是用于读写HDFS文件的数据类型,这里blob参数即是读取的HDFS文件。

接下来,看一段官方源码。

JSONSerDe源码分析

public class JsonSerDe extends AbstractSerDe{

	private static final Logger LOG = LoggerFactory.getLogger(TestJsonSerDe.class);
	
	List<String> columnNames;
	private StructTypeInfo schema;
	
	 private JsonFactory jsonFactory = null;
	 private TimestampParser tsParser;
	 private StandardStructObjectInspector cachedObjectInspector;
	
	/**
	 * 初始化我们创建的表格并检查表格的格式和数据类型
	 * @param conf		Hadoop
	 * @param tabl		Hive table
	 * @see org.apache.hadoop.hive.serde2.AbstractSerDe#initialize(org.apache.hadoop.conf.Configuration, java.util.Properties, java.util.Properties)
	 */
	@Override
	public void initialize(Configuration conf, Properties tabl) throws SerDeException{
		
		List<TypeInfo> columnTypes;
		StructTypeInfo rowTypeInfo;
		
//		获取所有表名和表中每列的数据类型
		String columnName = tabl.getProperty(serdeConstants.LIST_COLUMNS);
		String columnNameType = tabl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
//		拆分规则,如果包含表名的话就按照表名拆分,否则按照逗号拆分 
		final String columnNameDelimiter = tabl.containsKey(serdeConstants.COLUMN_NAME_DELIMITER) ? tabl
			      .getProperty(serdeConstants.COLUMN_NAME_DELIMITER) : String.valueOf(SerDeUtils.COMMA);
			  
		LOG.debug("tabl:{}",tabl.entrySet());
			      
		if(columnName.isEmpty()) {
			columnNames = Collections.emptyList();
		}else {
			columnNames = Arrays.asList(columnName.split(columnNameDelimiter));
		}
		
		if(columnNameType.isEmpty()) {
			columnTypes = Collections.emptyList();
		}else {
			columnTypes = TypeInfoUtils.getTypeInfosFromTypeString(columnNameType);
		}
		
		LOG.debug("column:{},{}",columnName,columnNames);
		LOG.debug("columnTypes:{},{}",columnNameType,columnTypes);
		
//		表名和类型一一对应就继续,否则中断执行
		assert(columnNames.size() == columnTypes.size());
		
/*
 * getStructTypeInfo方法诠释:
 * 首先将List names即表名和List即类型作为参数传进来
 * 然后声明一个包含两个List的List signature(即 List signature names, List typeinfos>)
 * 并将names和TypeInfos添加进去
 * 通过Hash表cachedStructTypeInfo将第一个List names作为key,第二个List TypeInfos作为value
 * 通过name列表里的值获取TypeInfos列表相对应的值,
 * 即检查columnName和TypeInfo是否以一一对应有无缺失
 * 返回的是TypeInfo对象,即表的数据类型
 * 
 *  static ConcurrentHashMap>, TypeInfo> cachedStructTypeInfo =
    new ConcurrentHashMap>, TypeInfo>();
 * 	public static TypeInfo getStructTypeInfo(List names, List typeInfos) {
    	ArrayList> signature = new ArrayList>(2);
    	signature.add(names);
    	signature.add(typeInfos);
    	TypeInfo result = cachedStructTypeInfo.get(signature);
    	if (result == null) {
      		result = new StructTypeInfo(names, typeInfos);
      		TypeInfo prev = cachedStructTypeInfo.putIfAbsent(signature, result);
      	if (prev != null) {
        	result = prev;
      }
    }
    return result;
  }
*/
		 rowTypeInfo = (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(columnNames, columnTypes);
		    schema = rowTypeInfo;
		    LOG.debug("schema : {}", schema);
//		   检查数据类型是否符合标准
		    cachedObjectInspector = (StandardStructObjectInspector) TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(rowTypeInfo);
		    
		    jsonFactory = new JsonFactory();
/*		    serdeConstants.TIMESTAMP_FORMATS初始化时间轴
 * 			HiveStringUtils.splitAndUnEscape获取时间并拆分成String[]
 * 			TimestampParser将时间轴时间毫秒转换年月日
*/		   
		    tsParser = new TimestampParser(
		      HiveStringUtils.splitAndUnEscape(tabl.getProperty(serdeConstants.TIMESTAMP_FORMATS)));
	}
	
	/**
	 * 反序列化
	 * HDFS文件->InputFileFormat->->Deserializer->Row对象
	 * 
	 * 在initialize方法中我们已经初始化创建的表了,接下来需要将HDFS上的数据
	 * 序列化并与创建的表的栏目和数据类型相一致
	 * @param blob
	 * @return r	
	 * @see org.apache.hadoop.hive.serde2.AbstractSerDe#deserialize(org.apache.hadoop.io.Writable)
	 */
	@Override
	public Object deserialize(Writable blob) throws SerDeException{
		
		 Text t = (Text) blob;
		    JsonParser p;
//		    创建一个长度为columnNames.size,值为null的list r
		    List<Object> r = new ArrayList<>(Collections.nCopies(columnNames.size(), null));
		    try {
//		     读取HDFS上的JSON文件
		      p = jsonFactory.createJsonParser(new ByteArrayInputStream((t.getBytes())));
//		      判断是不是JSON文件,没有捕获到"{"开始的标志
		      if (p.nextToken() != JsonToken.START_OBJECT) {
		        throw new IOException("Start token not found where expected");
		      }
		      JsonToken token;
		      while (((token = p.nextToken()) != JsonToken.END_OBJECT) && (token != null)) {
		        
		        populateRecord(r, token, p, schema);
		      }
		    } catch (JsonParseException e) {
		      LOG.warn("Error [{}] parsing json text [{}].", e, t);
		      throw new SerDeException(e);
		    } catch (IOException e) {
		      LOG.warn("Error [{}] parsing json text [{}].", e, t);
		      throw new SerDeException(e);
		    }

		    return r;
	}
	
	/**
	 * @function:
	 * @param r
	 * @param token
	 * @param p
	 * @param s
	 * @throws IOException
	 */
	private void populateRecord(List<Object> r, JsonToken token, JsonParser p, StructTypeInfo s) throws IOException {
//	    
		if (token != JsonToken.FIELD_NAME) {
	      throw new IOException("Field name expected");
	    }
//		将所有字母转换成小写
	    String fieldName = p.getText().toLowerCase();
//		获取创建表每个字段的对应的JSON文件的偏移量
	    int fpos = s.getAllStructFieldNames().indexOf(fieldName);
	    if (fpos == -1) {
	      fpos = getPositionFromHiveInternalColumnName(fieldName);
	      LOG.debug("NPE finding position for field [{}] in schema [{}],"
	        + " attempting to check if it is an internal column name like _col0", fieldName, s);
//	      如果fpos返回的是-1,即表格中的字段和JSON字段没有相匹配的,所以跳过该字段
	      if (fpos == -1) {
	        skipValue(p);
	        return; 
	      }
	      /*
	       * 将JSON字段和表字段相比较,忽略大小写,如果不匹配就报错
	       * */
	      if (!fieldName.equalsIgnoreCase(getHiveInternalColumnName(fpos))) {
	        LOG.error("Hive internal column name {} and position "
	          + "encoding {} for the column name are at odds", fieldName, fpos);
	        throw new IOException("Hive internal column name (" + fieldName
	          + ") and position encoding (" + fpos
	          + ") for the column name are at odds");
	      }
	    }
	    Object currField = extractCurrentField(p, s.getStructFieldTypeInfo(fieldName), false);
	    r.set(fpos, currField);
	  }

	/*
	 * 通过表列的位置(即_col(10)等等)获得表的字段名
	 */
	  public String getHiveInternalColumnName(int fpos) {
	    return HiveConf.getColumnInternalName(fpos);
	  }

	  /**
	 * 根据JSON文件的字段去和表名作匹配,如果匹配成功就返回匹配值所在的表的列数
	 * 否则返回 -1
	 * @function:
	 * @param internalName
	 * @return -1 or Integer.parseInt(m.group(1))
	 */
	public int getPositionFromHiveInternalColumnName(String internalName) {
	    
	    Pattern internalPattern = Pattern.compile("_col([0-9]+)");
	    Matcher m = internalPattern.matcher(internalName);
	    if (!m.matches()) {
	      return -1;
	    } else {
	      return Integer.parseInt(m.group(1));
	    }
	  }

	  /**
	   * 该方法忽略掉没有匹配到表名的JSON字段
	   *
	   * @throws IOException
	   * @throws JsonParseException
	   */
	  private void skipValue(JsonParser p) throws JsonParseException, IOException {
	    JsonToken valueToken = p.nextToken();

	   /*
	    * 如果在JSON开始的位置,即从 [{ 开始
	    * */
	    if ((valueToken == JsonToken.START_ARRAY) || (valueToken == JsonToken.START_OBJECT)) {
	     /*
			跳过当前字段,移动到下一个字段,直到遇到结束标志 ]}
	      */
	      p.skipChildren();
	    }
	  }

	/**
	 * 
	 * 获取JSON的值并将JSON下标指向下一字段
	 * @function:
	 * @param p
	 * @param fieldTypeInfo
	 * @param isTokenCurrent
	 * @return
	 * @throws IOException
	 */
	@SuppressWarnings("incomplete-switch")
	private Object extractCurrentField(JsonParser p, TypeInfo fieldTypeInfo,
	    boolean isTokenCurrent) throws IOException {
	    Object val = null;
	    JsonToken valueToken;
	    if (isTokenCurrent) {
	      valueToken = p.getCurrentToken();
	    } else {
	      valueToken = p.nextToken();
	    }

	    switch (fieldTypeInfo.getCategory()) {
	      case PRIMITIVE:
	        PrimitiveObjectInspector.PrimitiveCategory primitiveCategory = PrimitiveObjectInspector.PrimitiveCategory.UNKNOWN;
	        if (fieldTypeInfo instanceof PrimitiveTypeInfo) {
	          primitiveCategory = ((PrimitiveTypeInfo) fieldTypeInfo).getPrimitiveCategory();
	        }
	        switch (primitiveCategory) {
	          case INT:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getIntValue();
	            break;
	          case BYTE:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getByteValue();
	            break;
	          case SHORT:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getShortValue();
	            break;
	          case LONG:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getLongValue();
	            break;
	          case BOOLEAN:
	            String bval = (valueToken == JsonToken.VALUE_NULL) ? null : p.getText();
	            if (bval != null) {
	              val = Boolean.valueOf(bval);
	            } else {
	              val = null;
	            }
	            break;
	          case FLOAT:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getFloatValue();
	            break;
	          case DOUBLE:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getDoubleValue();
	            break;
	          case STRING:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : p.getText();
	            break;
	          case BINARY:
	            String b = (valueToken == JsonToken.VALUE_NULL) ? null : p.getText();
	            if (b != null) {
	              try {
	                String t = Text.decode(b.getBytes(), 0, b.getBytes().length);
	                return t.getBytes();
	              } catch (CharacterCodingException e) {
	                LOG.warn("Error generating json binary type from object.", e);
	                return null;
	              }
	            } else {
	              val = null;
	            }
	            break;
	          case DATE:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : Date.valueOf(p.getText());
	            break;
	          case TIMESTAMP:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : tsParser.parseTimestamp(p.getText());
	            break;
	          case DECIMAL:
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : HiveDecimal.create(p.getText());
	            break;
	          case VARCHAR:
	            int vLen = ((BaseCharTypeInfo) fieldTypeInfo).getLength();
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : new HiveVarchar(p.getText(), vLen);
	            break;
	          case CHAR:
	            int cLen = ((BaseCharTypeInfo) fieldTypeInfo).getLength();
	            val = (valueToken == JsonToken.VALUE_NULL) ? null : new HiveChar(p.getText(), cLen);
	            break;
	        }
	        break;
	      case LIST:
	        if (valueToken == JsonToken.VALUE_NULL) {
	          val = null;
	          break;
	        }
	        if (valueToken != JsonToken.START_ARRAY) {
	          throw new IOException("Start of Array expected");
	        }
	        List<Object> arr = new ArrayList<Object>();
	        while ((valueToken = p.nextToken()) != JsonToken.END_ARRAY) {
	          arr.add(extractCurrentField(p, ((ListTypeInfo)fieldTypeInfo).getListElementTypeInfo(), true));
	        }
	        val = arr;
	        break;
	      case MAP:
	        if (valueToken == JsonToken.VALUE_NULL) {
	          val = null;
	          break;
	        }
	        if (valueToken != JsonToken.START_OBJECT) {
	          throw new IOException("Start of Object expected");
	        }
	        Map<Object, Object> map = new LinkedHashMap<Object, Object>();
	        while ((valueToken = p.nextToken()) != JsonToken.END_OBJECT) {
	          Object k = getObjectOfCorrespondingPrimitiveType(p.getCurrentName(),
	            (PrimitiveTypeInfo) ((MapTypeInfo)fieldTypeInfo).getMapKeyTypeInfo());
	          Object v = extractCurrentField(p, ((MapTypeInfo) fieldTypeInfo).getMapValueTypeInfo(), false);
	          map.put(k, v);
	        }
	        val = map;
	        break;
	      case STRUCT:
	        if (valueToken == JsonToken.VALUE_NULL) {
	          val = null;
	          break;
	        }
	        if (valueToken != JsonToken.START_OBJECT) {
	          throw new IOException("Start of Object expected");
	        }
	        ArrayList<TypeInfo> subSchema = ((StructTypeInfo)fieldTypeInfo).getAllStructFieldTypeInfos();
	        int sz = subSchema.size();
	        List<Object> struct = new ArrayList<Object>(Collections.nCopies(sz, null));
	        while ((valueToken = p.nextToken()) != JsonToken.END_OBJECT) {
	          populateRecord(struct, valueToken, p, ((StructTypeInfo) fieldTypeInfo));
	        }
	        val = struct;
	        break;
	      default:
	        LOG.error("Unknown type found: " + fieldTypeInfo);
	        return null;
	    }
	    return val;
	  }

	  /**
	   * 获取值和它的数据类型
	 * @function:
	 * @param s
	 * @param mapKeyType
	 * @return
	 * @throws IOException
	 */
	@SuppressWarnings("incomplete-switch")
	private Object getObjectOfCorrespondingPrimitiveType(String s, PrimitiveTypeInfo mapKeyType)
	    throws IOException {
	    switch (mapKeyType.getPrimitiveCategory()) {
	      case INT:
	        return Integer.valueOf(s);
	      case BYTE:
	        return Byte.valueOf(s);
	      case SHORT:
	        return Short.valueOf(s);
	      case LONG:
	        return Long.valueOf(s);
	      case BOOLEAN:
	        return (s.equalsIgnoreCase("true"));
	      case FLOAT:
	        return Float.valueOf(s);
	      case DOUBLE:
	        return Double.valueOf(s);
	      case STRING:
	        return s;
	      case BINARY:
	        try {
	          String t = Text.decode(s.getBytes(), 0, s.getBytes().length);
	          return t.getBytes();
	        } catch (CharacterCodingException e) {
	          LOG.warn("Error generating json binary type from object.", e);
	          return null;
	        }
	      case DATE:
	        return Date.valueOf(s);
	      case TIMESTAMP:
	        return Timestamp.valueOf(s);
	      case DECIMAL:
	        return HiveDecimal.create(s);
	      case VARCHAR:
	        return new HiveVarchar(s, ((BaseCharTypeInfo) mapKeyType).getLength());
	      case CHAR:
	        return new HiveChar(s, ((BaseCharTypeInfo) mapKeyType).getLength());
	    }
	    throw new IOException("Could not convert from string to map type " + mapKeyType.getTypeName());
	  }

	 
	  /**
	   * 序列化
	   * Row对象->序列化->->OutputFileFormat->HDFS文件
	   * 
	   * @param obj	数据表的数据
	   * @param objInspector 数据表的字段
	   * @return Text(sb.toString)
	   * @see org.apache.hadoop.hive.serde2.AbstractSerDe#serialize(java.lang.Object, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector)
	   * @throws SerDeException
	   */
	@Override
	  public Writable serialize(Object obj, ObjectInspector objInspector)
	    throws SerDeException {
	    StringBuilder sb = new StringBuilder();
	    try {

	      StructObjectInspector soi = (StructObjectInspector) objInspector;
//	      获取表的所有字段
	      List<? extends StructField> structFields = soi.getAllStructFieldRefs();
	      assert (columnNames.size() == structFields.size());
	      /*	      
 		  将字段放入一个StringBuilder里面,
	      并把row对象转换为JSON形式的字符串 
	      {	
	      	"field_1":obj.toString,
	      	"fields_2":obj.toString,
	      		......,
	      	"field_n":obj.toString
	      	}
	      	这里最主要的方法还是buildJSONString方法,它主要是获取各种数据类型的值
	      */
	      if (obj == null) {
	        sb.append("null");
	      } else {
	        sb.append(SerDeUtils.LBRACE);
	        for (int i = 0; i < structFields.size(); i++) {
	          if (i > 0) {
	            sb.append(SerDeUtils.COMMA);
	          }
	          appendWithQuotes(sb, columnNames.get(i));
	          sb.append(SerDeUtils.COLON);
	          buildJSONString(sb, soi.getStructFieldData(obj, structFields.get(i)),
	            structFields.get(i).getFieldObjectInspector());
	        }
	        sb.append(SerDeUtils.RBRACE);
	      }

	    } catch (IOException e) {
	      LOG.warn("Error generating json text from object.", e);
	      throw new SerDeException(e);
	    }
	    return new Text(sb.toString());
	  }

	  private static StringBuilder appendWithQuotes(StringBuilder sb, String value) {
	    return sb == null ? null : sb.append(SerDeUtils.QUOTE).append(value).append(SerDeUtils.QUOTE);
	  }

	  private static void buildJSONString(StringBuilder sb, Object o, ObjectInspector oi) throws IOException {

	    switch (oi.getCategory()) {
	      case PRIMITIVE: {
	        PrimitiveObjectInspector poi = (PrimitiveObjectInspector) oi;
	        if (o == null) {
	          sb.append("null");
	        } else {
	          switch (poi.getPrimitiveCategory()) {
	            case BOOLEAN: {
	              boolean b = ((BooleanObjectInspector) poi).get(o);
	              sb.append(b ? "true" : "false");
	              break;
	            }
	            case BYTE: {
	              sb.append(((ByteObjectInspector) poi).get(o));
	              break;
	            }
	            case SHORT: {
	              sb.append(((ShortObjectInspector) poi).get(o));
	              break;
	            }
	            case INT: {
	              sb.append(((IntObjectInspector) poi).get(o));
	              break;
	            }
	            case LONG: {
	              sb.append(((LongObjectInspector) poi).get(o));
	              break;
	            }
	            case FLOAT: {
	              sb.append(((FloatObjectInspector) poi).get(o));
	              break;
	            }
	            case DOUBLE: {
	              sb.append(((DoubleObjectInspector) poi).get(o));
	              break;
	            }
	            case STRING: {
	              String s =
	                SerDeUtils.escapeString(((StringObjectInspector) poi).getPrimitiveJavaObject(o));
	              appendWithQuotes(sb, s);
	              break;
	            }
	            case BINARY:
	              byte[] b = ((BinaryObjectInspector) oi).getPrimitiveJavaObject(o);
	              Text txt = new Text();
	              txt.set(b, 0, b.length);
	              appendWithQuotes(sb, SerDeUtils.escapeString(txt.toString()));
	              break;
	            case DATE:
	              Date d = ((DateObjectInspector) poi).getPrimitiveJavaObject(o);
	              appendWithQuotes(sb, d.toString());
	              break;
	            case TIMESTAMP: {
	              Timestamp t = ((TimestampObjectInspector) poi).getPrimitiveJavaObject(o);
	              appendWithQuotes(sb, t.toString());
	              break;
	            }
	            case DECIMAL:
	              sb.append(((HiveDecimalObjectInspector) poi).getPrimitiveJavaObject(o));
	              break;
	            case VARCHAR: {
	              String s = SerDeUtils.escapeString(
	                ((HiveVarcharObjectInspector) poi).getPrimitiveJavaObject(o).toString());
	              appendWithQuotes(sb, s);
	              break;
	            }
	            case CHAR: {
	              String s = SerDeUtils.escapeString(
	                ((HiveCharObjectInspector) poi).getPrimitiveJavaObject(o).toString());
	              appendWithQuotes(sb, s);
	              break;
	            }
	            default:
	              throw new RuntimeException("Unknown primitive type: " + poi.getPrimitiveCategory());
	          }
	        }
	        break;
	      }
	      case LIST: {
	        ListObjectInspector loi = (ListObjectInspector) oi;
	        ObjectInspector listElementObjectInspector = loi
	          .getListElementObjectInspector();
	        List<?> olist = loi.getList(o);
	        if (olist == null) {
	          sb.append("null");
	        } else {
	          sb.append(SerDeUtils.LBRACKET);
	          for (int i = 0; i < olist.size(); i++) {
	            if (i > 0) {
	              sb.append(SerDeUtils.COMMA);
	            }
	            buildJSONString(sb, olist.get(i), listElementObjectInspector);
	          }
	          sb.append(SerDeUtils.RBRACKET);
	        }
	        break;
	      }
	      case MAP: {
	        MapObjectInspector moi = (MapObjectInspector) oi;
	        ObjectInspector mapKeyObjectInspector = moi.getMapKeyObjectInspector();
	        ObjectInspector mapValueObjectInspector = moi
	          .getMapValueObjectInspector();
	        Map<?, ?> omap = moi.getMap(o);
	        if (omap == null) {
	          sb.append("null");
	        } else {
	          sb.append(SerDeUtils.LBRACE);
	          boolean first = true;
	          for (Object entry : omap.entrySet()) {
	            if (first) {
	              first = false;
	            } else {
	              sb.append(SerDeUtils.COMMA);
	            }
	            Map.Entry<?, ?> e = (Map.Entry<?, ?>) entry;
	            StringBuilder keyBuilder = new StringBuilder();
	            buildJSONString(keyBuilder, e.getKey(), mapKeyObjectInspector);
	            String keyString = keyBuilder.toString().trim();
	            if ((!keyString.isEmpty()) && (keyString.charAt(0) != SerDeUtils.QUOTE)) {
	              appendWithQuotes(sb, keyString);
	            } else {
	              sb.append(keyString);
	            }
	            sb.append(SerDeUtils.COLON);
	            buildJSONString(sb, e.getValue(), mapValueObjectInspector);
	          }
	          sb.append(SerDeUtils.RBRACE);
	        }
	        break;
	      }
	      case STRUCT: {
	        StructObjectInspector soi = (StructObjectInspector) oi;
	        List<? extends StructField> structFields = soi.getAllStructFieldRefs();
	        if (o == null) {
	          sb.append("null");
	        } else {
	          sb.append(SerDeUtils.LBRACE);
	          for (int i = 0; i < structFields.size(); i++) {
	            if (i > 0) {
	              sb.append(SerDeUtils.COMMA);
	            }
	            appendWithQuotes(sb, structFields.get(i).getFieldName());
	            sb.append(SerDeUtils.COLON);
	            buildJSONString(sb, soi.getStructFieldData(o, structFields.get(i)),
	              structFields.get(i).getFieldObjectInspector());
	          }
	          sb.append(SerDeUtils.RBRACE);
	        }
	        break;
	      }
	      case UNION: {
	        UnionObjectInspector uoi = (UnionObjectInspector) oi;
	        if (o == null) {
	          sb.append("null");
	        } else {
	          sb.append(SerDeUtils.LBRACE);
	          sb.append(uoi.getTag(o));
	          sb.append(SerDeUtils.COLON);
	          buildJSONString(sb, uoi.getField(o),
	            uoi.getObjectInspectors().get(uoi.getTag(o)));
	          sb.append(SerDeUtils.RBRACE);
	        }
	        break;
	      }
	      default:
	        throw new RuntimeException("Unknown type in ObjectInspector!");
	    }
	  }


//	 获取表的字段名
	  @Override
	  public ObjectInspector getObjectInspector() throws SerDeException {
	    return cachedObjectInspector;
	  }

	  @Override
	  public Class<? extends Writable> getSerializedClass() {
	    return Text.class;
	  }

	  @Override
	  public SerDeStats getSerDeStats() {
	    // no support for statistics yet
	    return null;
	  }
}

你可能感兴趣的:(Hive,Hive,SerDe2,HiveSerDe2,JsonSerDe,源码)