Hive多分隔符支持示例

问题描述

如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:
字段分隔符为“@#$”

test1@#$test1name@#$test2value
test2@#$test2name@#$test2value
test3@#$test3name@#$test4value

Hive多分隔符支持

Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

操作步骤

1.准备多分隔符文件并装载到HDFS对应目录

[root@server03 data]# more multi_delimiter_test.dat 
test1@#$test1name@#$test2value
test2@#$test2name@#$test2value
  1. 多分隔符文件建表
create  external table multi_delimiter_test(
s1 string,
s2 string,
s3 string
) ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")
stored as  textfile location '/fayson/multi_delimiter_test';

3.测试

2: jdbc:hive2://localhost:10000/default>  select * from multi_delimiter_test;
+--------------------------+--------------------------+--------------------------+--+
|  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
+--------------------------+--------------------------+--------------------------+--+
| test1                    | test1name                | test2value               |
| test2                    | test2name                | test2value               |
| test3                    | test3name                | test4value               |
+--------------------------+--------------------------+--------------------------+--+

你可能感兴趣的:(Hive多分隔符支持示例)