Hive 处理CSV格式文件数据

一般情况下对于CSV格式文件数据,有多种第三方SerDer来处理。本文采用CSVSerDe:

一、添加第三方SerDe

首先在Hive classpath中添加第三方SerDe JAR包,命令如下:

hive> add jar /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar;
Added /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar to class path
Added resource: /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar

可以从该链接下载:csv-serde-1.1.2.jar,以某CSV文件为例介绍处理过程

二、某CSV日志文件格式如下:

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!air, moon roof, loaded",4799.00
以逗号分隔,分别表示:年,制造商,型号,说明,价值

三、创建Hive表

hive> CREATE TABLE serde_csv(year STRING,company STRING,type STRING,description STRING,value STRING)
    > ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
    > STORED AS TEXTFILE ;
OK
Time taken: 0.072 seconds
四、导入数据

hive> LOAD DATA LOCAL INPATH "/home/hadoopUser/data/csv_serde.txt" INTO TABLE serde_csv;
Copying data from file:/home/hadoopUser/data/csv_serde.txt
Copying file: file:/home/hadoopUser/data/csv_serde.txt
Loading data to table hive.serde_csv
Table hive.serde_csv stats: [numFiles=1, numRows=0, totalSize=259, rawDataSize=0]
OK
Time taken: 0.389 seconds

五、查看Hive中导入的CSV数据

hive> select * from serde_csv;
OK
1997    Ford    E350    ac, abs, moon   3000.00
1999    Chevy   Venture "Extended Edition"              4900.00
1999    Chevy   Venture "Extended Edition, Very Large"          5000.00
1996    Jeep    Grand Cherokee  MUST SELL!air, moon roof, loaded        4799.00


参考:http://ogrodnek.github.io/csv-serde/

你可能感兴趣的:(Hive学习)