hive中导入json格式的数据(hive分区表)

hive中建立外部分区表,外部数据格式是json的如何导入呢?

json格式的数据表不必含有分区字段,只需要在hdfs目录结构中体现出分区就可以了

This is all according to this guide: http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

1 hive> ADD JAR /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar;
2 Added /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar to class path
3 Added resource: /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar

In /tmp/new I have a file abc.json 

The CREATE EXTERNAL TABLE command runs properly, but it doesn't take in any data:

 

hive> 
    > CREATE EXTERNAL TABLE tweets (
    >   id BIGINT,
    >   created_at STRING,
    >   source STRING,
    >   favorited BOOLEAN,
    >   retweeted_status STRUCT<
    >     text:STRING,
    >     user:STRUCT,
    >     retweet_count:INT>,
    >   entities STRUCT<
    >     urls:ARRAY>,
    >     user_mentions:ARRAY>,
    >     hashtags:ARRAY>>,
    >   text STRING,
    >   user STRUCT<
    >     screen_name:STRING,
    >     name:STRING,
    >     friends_count:INT,
    >     followers_count:INT,
    >     statuses_count:INT,
    >     verified:BOOLEAN,
    >     utc_offset:INT,
    >     time_zone:STRING>,
    >   in_reply_to_screen_name STRING
    > ) 
    > PARTITIONED BY (datehour INT)
    > ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
    > LOCATION '/tmp/new';
OK
Time taken: 0.142 seconds

注意外部分区表需要手工添加分区

具体步骤

1.) Run the create table statement.
2.) In the directory /tmp/new/ create a sub directory datehour=, and then put your .json file inside this.这里就是说只需要在hdfs上建立目录,目录体现分区信息,将数据放到对应目录,然后直接add partiton就好了
3.) Run alter table statement adding this partition to metadata:
    alter table tweets add partition(datehour=);
4.) Now run the select statement.

参考:https://www.cnblogs.com/amazement/p/4885903.html

你可能感兴趣的:(Hive)