Hive创建分区表

一、创建分区表

hive (default)> create table order_partition(
              > order_number string,
              > event_time string

              > PARTITIONED BY(event_month string)

              > row format delimited fields terminated by '\t';

与普通表的区别是多一个PARTITIONED BY参数,如果大家在创建分区表示报乱码的异常:
乱码解决办法:
改变mysql设置,不能改变已经存在的表。你需要转换表的编码。

alter database ruoze_hive3 character set latin1;
(启动hive创建的数据库)
use hive3;
alter table PARTITIONS convert to character set latin1;
alter table PARTITION_KEYS convert to character set latin1;

二、加载数据到创建的分区表
order_created.txt文件位置及内容

[root@hadoop001 data]# pwd
/opt/data
[root@hadoop001 data]# cat order_created.txt 
10703007267488  2014-05-01 06:01:12.334+01
10101043505096  2014-05-01 07:28:12.342+01
10103043509747  2014-05-01 07:50:12.33+01
10103043501575  2014-05-01 09:27:12.33+01
10104043514061  2014-05-01 09:03:12.324+01

将order_created.txt内容加载到分区表

load data local inpath '/opt/data/order_created.txt' overwrite into table order_partition PARTITION(event_month='201405');

查看

hive (default)> select * from order_partition;
OK
order_partition.order_number    order_partition.event_time      order_partition.event_month
10703007267488  2014-05-01 06:01:12.334+01      201405
10101043505096  2014-05-01 07:28:12.342+01      201405
10103043509747  2014-05-01 07:50:12.33+01       201405
10103043501575  2014-05-01 09:27:12.33+01       201405
10104043514061  2014-05-01 09:03:12.324+01      201405
Time taken: 0.656 seconds, Fetched: 5 row(s)

可以发现比order_created.txt的原内容多了一列event_month(分区列),分区列称之为伪列(PARTITIONED BY后面跟的分区字段)。

三、查看分区表在mysql中的元数据和HDFS上的数据
元数据

mysql> select * from TBLS \G;
*************************** 4. row ***************************
            TBL_ID: 16
       CREATE_TIME: 1507409258
             DB_ID: 1
  LAST_ACCESS_TIME: 0
             OWNER: root
         RETENTION: 0
             SD_ID: 16
          TBL_NAME: order_partition
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
4 rows in set (0.00 sec)
mysql> select * from PARTITIONS;
+---------+-------------+------------------+--------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME          | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------------+-------+--------+
|       1 |  1507409971 |                0 | event_month=201405 |    21 |     16 |
+---------+-------------+------------------+--------------------+-------+--------+
1 row in set (0.00 sec)

mysql> select * from  PARTITION_KEYS;
+--------+--------------+-------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME   | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-------------+-----------+-------------+
|     16 | NULL         | event_month | string    |           0 |
+--------+--------------+-------------+-----------+-------------+
1 row in set (0.00 sec)

HDFS上的数据

[root@hadoop001 data]# hadoop fs -ls /user/hive/warehouse/order_partition/event_month=201405
Found 1 items
-rwxr-xr-x   1 root supergroup        211 2017-10-08 04:59 /user/hive/warehouse/order_partition/event_month=201405/order_created.txt

四、从HDFS上加载数据到分区表
创建一个event_month=201406文件夹

[root@hadoop001 data]# hadoop fs -mkdir /user/hive/warehouse/order_partition/event_month=201406

将order_created.txt放到event_month=201406文件夹

[root@hadoop001 data]# hadoop fs -put order_created.txt /user/hive/warehouse/order_partition/event_month=201406/

来查看order_partition分区表是否有数据

hive (default)> select * from order_partition;
OK
order_partition.order_number    order_partition.event_time      order_partition.event_month
10703007267488  2014-05-01 06:01:12.334+01      201405
10101043505096  2014-05-01 07:28:12.342+01      201405
10103043509747  2014-05-01 07:50:12.33+01       201405
10103043501575  2014-05-01 09:27:12.33+01       201405
10104043514061  2014-05-01 09:03:12.324+01      201405
Time taken: 0.178 seconds, Fetched: 5 row(s)

很奇怪为什么数据没有进去呢??因为加载的数据没有分区,查看官网可以知道可以用以下方式使数据加载进去:

hive (default)> ALTER TABLE order_partition ADD IF NOT EXISTS PARTITION(event_month='201406');

再看查看一下,可以发现数据已经进去了

hive (default)>  select * from order_partition;
OK
order_partition.order_number    order_partition.event_time      order_partition.event_month
10703007267488  2014-05-01 06:01:12.334+01      201405
10101043505096  2014-05-01 07:28:12.342+01      201405
10103043509747  2014-05-01 07:50:12.33+01       201405
10103043501575  2014-05-01 09:27:12.33+01       201405
10104043514061  2014-05-01 09:03:12.324+01      201405
10703007267488  2014-05-01 06:01:12.334+01      201406
10101043505096  2014-05-01 07:28:12.342+01      201406
10103043509747  2014-05-01 07:50:12.33+01       201406
10103043501575  2014-05-01 09:27:12.33+01       201406
10104043514061  2014-05-01 09:03:12.324+01      201406
Time taken: 0.182 seconds, Fetched: 10 row(s)

当然也可以将order_created.txt文件放到HDFS上使用以下方式进行加载

hive (default)> load data inpath 'hdfs://hadoop001:8020/data/order_created.txt' overwrite into table order_partition PARTITION(event_month='201407');

发现数据也可以进去的

hive (default)> select * from order_partition;
OK
order_partition.order_number    order_partition.event_time      order_partition.event_month
10703007267488  2014-05-01 06:01:12.334+01      201405
10101043505096  2014-05-01 07:28:12.342+01      201405
10103043509747  2014-05-01 07:50:12.33+01       201405
10103043501575  2014-05-01 09:27:12.33+01       201405
10104043514061  2014-05-01 09:03:12.324+01      201405
10703007267488  2014-05-01 06:01:12.334+01      201406
10101043505096  2014-05-01 07:28:12.342+01      201406
10103043509747  2014-05-01 07:50:12.33+01       201406
10103043501575  2014-05-01 09:27:12.33+01       201406
10104043514061  2014-05-01 09:03:12.324+01      201406
10703007267488  2014-05-01 06:01:12.334+01      201407
10101043505096  2014-05-01 07:28:12.342+01      201407
10103043509747  2014-05-01 07:50:12.33+01       201407
10103043501575  2014-05-01 09:27:12.33+01       201407
10104043514061  2014-05-01 09:03:12.324+01      201407
Time taken: 0.153 seconds, Fetched: 15 row(s)

你可能感兴趣的:(Hive)