问题描述
如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:
字段分隔符为“@#$”
test1@#$test1name@#$test2value
test2@#$test2name@#$test2value
test3@#$test3name@#$test4value
Hive多分隔符支持
Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
操作步骤
1.准备多分隔符文件并装载到HDFS对应目录
[root@server03 data]# more multi_delimiter_test.dat
test1@#$test1name@#$test2value
test2@#$test2name@#$test2value
- 多分隔符文件建表
create external table multi_delimiter_test(
s1 string,
s2 string,
s3 string
) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="@#$")
stored as textfile location '/fayson/multi_delimiter_test';
3.测试
2: jdbc:hive2://localhost:10000/default> select * from multi_delimiter_test;
+--------------------------+--------------------------+--------------------------+--+
| multi_delimiter_test.s1 | multi_delimiter_test.s2 | multi_delimiter_test.s3 |
+--------------------------+--------------------------+--------------------------+--+
| test1 | test1name | test2value |
| test2 | test2name | test2value |
| test3 | test3name | test4value |
+--------------------------+--------------------------+--------------------------+--+