一、创建分区表
hive (default)> create table order_partition(
> order_number string,
> event_time string
> PARTITIONED BY(event_month string)
> row format delimited fields terminated by '\t';
与普通表的区别是多一个PARTITIONED BY参数,如果大家在创建分区表示报乱码的异常:
乱码解决办法:
改变mysql设置,不能改变已经存在的表。你需要转换表的编码。
alter database ruoze_hive3 character set latin1;
(启动hive创建的数据库)
use hive3;
alter table PARTITIONS convert to character set latin1;
alter table PARTITION_KEYS convert to character set latin1;
二、加载数据到创建的分区表
order_created.txt文件位置及内容
[root@hadoop001 data]# pwd
/opt/data
[root@hadoop001 data]# cat order_created.txt
10703007267488 2014-05-01 06:01:12.334+01
10101043505096 2014-05-01 07:28:12.342+01
10103043509747 2014-05-01 07:50:12.33+01
10103043501575 2014-05-01 09:27:12.33+01
10104043514061 2014-05-01 09:03:12.324+01
将order_created.txt内容加载到分区表
load data local inpath '/opt/data/order_created.txt' overwrite into table order_partition PARTITION(event_month='201405');
查看
hive (default)> select * from order_partition;
OK
order_partition.order_number order_partition.event_time order_partition.event_month
10703007267488 2014-05-01 06:01:12.334+01 201405
10101043505096 2014-05-01 07:28:12.342+01 201405
10103043509747 2014-05-01 07:50:12.33+01 201405
10103043501575 2014-05-01 09:27:12.33+01 201405
10104043514061 2014-05-01 09:03:12.324+01 201405
Time taken: 0.656 seconds, Fetched: 5 row(s)
可以发现比order_created.txt的原内容多了一列event_month(分区列),分区列称之为伪列(PARTITIONED BY后面跟的分区字段)。
三、查看分区表在mysql中的元数据和HDFS上的数据
元数据
mysql> select * from TBLS \G;
*************************** 4. row ***************************
TBL_ID: 16
CREATE_TIME: 1507409258
DB_ID: 1
LAST_ACCESS_TIME: 0
OWNER: root
RETENTION: 0
SD_ID: 16
TBL_NAME: order_partition
TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
4 rows in set (0.00 sec)
mysql> select * from PARTITIONS;
+---------+-------------+------------------+--------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------------+-------+--------+
| 1 | 1507409971 | 0 | event_month=201405 | 21 | 16 |
+---------+-------------+------------------+--------------------+-------+--------+
1 row in set (0.00 sec)
mysql> select * from PARTITION_KEYS;
+--------+--------------+-------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-------------+-----------+-------------+
| 16 | NULL | event_month | string | 0 |
+--------+--------------+-------------+-----------+-------------+
1 row in set (0.00 sec)
HDFS上的数据
[root@hadoop001 data]# hadoop fs -ls /user/hive/warehouse/order_partition/event_month=201405
Found 1 items
-rwxr-xr-x 1 root supergroup 211 2017-10-08 04:59 /user/hive/warehouse/order_partition/event_month=201405/order_created.txt
四、从HDFS上加载数据到分区表
创建一个event_month=201406文件夹
[root@hadoop001 data]# hadoop fs -mkdir /user/hive/warehouse/order_partition/event_month=201406
将order_created.txt放到event_month=201406文件夹
[root@hadoop001 data]# hadoop fs -put order_created.txt /user/hive/warehouse/order_partition/event_month=201406/
来查看order_partition分区表是否有数据
hive (default)> select * from order_partition;
OK
order_partition.order_number order_partition.event_time order_partition.event_month
10703007267488 2014-05-01 06:01:12.334+01 201405
10101043505096 2014-05-01 07:28:12.342+01 201405
10103043509747 2014-05-01 07:50:12.33+01 201405
10103043501575 2014-05-01 09:27:12.33+01 201405
10104043514061 2014-05-01 09:03:12.324+01 201405
Time taken: 0.178 seconds, Fetched: 5 row(s)
很奇怪为什么数据没有进去呢??因为加载的数据没有分区,查看官网可以知道可以用以下方式使数据加载进去:
hive (default)> ALTER TABLE order_partition ADD IF NOT EXISTS PARTITION(event_month='201406');
再看查看一下,可以发现数据已经进去了
hive (default)> select * from order_partition;
OK
order_partition.order_number order_partition.event_time order_partition.event_month
10703007267488 2014-05-01 06:01:12.334+01 201405
10101043505096 2014-05-01 07:28:12.342+01 201405
10103043509747 2014-05-01 07:50:12.33+01 201405
10103043501575 2014-05-01 09:27:12.33+01 201405
10104043514061 2014-05-01 09:03:12.324+01 201405
10703007267488 2014-05-01 06:01:12.334+01 201406
10101043505096 2014-05-01 07:28:12.342+01 201406
10103043509747 2014-05-01 07:50:12.33+01 201406
10103043501575 2014-05-01 09:27:12.33+01 201406
10104043514061 2014-05-01 09:03:12.324+01 201406
Time taken: 0.182 seconds, Fetched: 10 row(s)
当然也可以将order_created.txt文件放到HDFS上使用以下方式进行加载
hive (default)> load data inpath 'hdfs://hadoop001:8020/data/order_created.txt' overwrite into table order_partition PARTITION(event_month='201407');
发现数据也可以进去的
hive (default)> select * from order_partition;
OK
order_partition.order_number order_partition.event_time order_partition.event_month
10703007267488 2014-05-01 06:01:12.334+01 201405
10101043505096 2014-05-01 07:28:12.342+01 201405
10103043509747 2014-05-01 07:50:12.33+01 201405
10103043501575 2014-05-01 09:27:12.33+01 201405
10104043514061 2014-05-01 09:03:12.324+01 201405
10703007267488 2014-05-01 06:01:12.334+01 201406
10101043505096 2014-05-01 07:28:12.342+01 201406
10103043509747 2014-05-01 07:50:12.33+01 201406
10103043501575 2014-05-01 09:27:12.33+01 201406
10104043514061 2014-05-01 09:03:12.324+01 201406
10703007267488 2014-05-01 06:01:12.334+01 201407
10101043505096 2014-05-01 07:28:12.342+01 201407
10103043509747 2014-05-01 07:50:12.33+01 201407
10103043501575 2014-05-01 09:27:12.33+01 201407
10104043514061 2014-05-01 09:03:12.324+01 201407
Time taken: 0.153 seconds, Fetched: 15 row(s)