Hive建表时多字符分割

一般hive建表的时候都会根据数据的分隔符进行建表,表的分隔符分三种

1,默认分隔符

\n 行分隔符
^A 字段分隔符,八进制表示为\001, 
^B array或struct中为元素分隔符,map中为key-value分隔符\002
^C map中为key和value间的分隔符\003

默认分割符一般是在建表是指定的,^A为\001,^B为\002,^C为\003,不同的数据格式不同的分隔符,在vim中\001是先按Ctrl+v再按Ctrl+a,\002先按Ctrl+v再按Ctrl+b,以此类推

2,指定单个特殊符号做为分隔符

create external table an_dimension_area(
area_id string,
county_name string,
city_id string,
city_name string,
province_id string,
province_name string
)
row format delimited
fields terminated by ','
STORED AS TEXTFILE;

上述为用","做为字段间的分割,或者字段间以\t为分割,行之间用\n分割,自己根据自己需要随意指定

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LINES TERMINATED BY '\n'

3,使用多字符作为分割符

     我们的数据是以@@@分割,所以用上面的分割符都不能满足,

  ①使用MultiDelimitSerDe的方法来实现

 ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@@@") LINES TERMINATED BY '\n'STORED AS TEXTFILE;
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@@@")

②使用RegexSerDe的方法实现

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "^(.*)\\@\\@\\@(.*)$") LINES TERMINATED BY '\n'STORED AS TEXTFILE;

 

参考:https://blog.csdn.net/u013150378/article/details/90766209

你可能感兴趣的:(hive)