让分区表和数据产生关联的三种方式

目录

总结:

方式一:上传数据后添加分区alter add:

方式二:上传数据后修复msck:

方式三:创建文件夹后load数据到分区:

总结:


总结:

让分区表和数据产生关联的方式有三种:

①先在HDFS上创建分区的目录,并上传数据到该目录,最后在使用alter table add partition

②先在HDFS上创建分区的目录,并上传数据到该目录,最后使用msck repair table table_name

③先在HDFS上创建分区的目录,在使用 load data local 路径 into table table_name partition(字段名=字段值) 

将数据上传到分区目录上,让分区表和数据产生关联的三种方式:数据准备:创建一个没有分区的普通表stu_par
create table stu_par(id int,name string)
row format delimited
fields terminated by '\t';

方式一:上传数据后添加分区alter add:

> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=09
> !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=09
重复操作省略


> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/
Found 4 items
drwxr-xr-x   - isea supergroup      0 2018-12-01 05:50 /user/hive/warehouse/db_hive.db/stu_par/month=09
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12

0: jdbc:hive2://hadoop108:10000> select * from stu_par where month  = '09';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+

此时的无法查询到数据

0: jdbc:hive2://hadoop108:10000> alter table stu_par add partition(month = '09');

0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '09';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 09             |
| 1002        | liubei        | 09             |
| 1003        | guanyu        | 09             |
| 1004        | zhaoyun       | 09             |
| 1005        | caocao        | 09             |
| 1006        | zhouyu        | 09             |
+-------------+---------------+----------------+--+

方式二:上传数据后修复msck:

1,在HDFS上创建08分区,并上传数据
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=08
0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/
Found 5 items
drwxr-xr-x   - isea supergroup      0 2018-12-01 06:06 /user/hive/warehouse/db_hive.db/stu_par/month=08
drwxr-xr-x   - isea supergroup      0 2018-12-01 05:54 /user/hive/warehouse/db_hive.db/stu_par/month=09
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12
> !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=08

此时,在HDFS上有数据,但是该表中并没有对应该数据的,所以还是查询不到数据
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+
No rows selected (0.091 seconds)

此时,我使用msck 修复一下这个分区表,这个修复命令会自动调用上面修改表的命令完成元数据的引入
0: jdbc:hive2://hadoop108:10000> msck repair table stu_par;
OK
No rows affected (0.215 seconds)
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 08             |
| 1002        | liubei        | 08             |
| 1003        | guanyu        | 08             |
| 1004        | zhaoyun       | 08             |
| 1005        | caocao        | 08             |
| 1006        | zhouyu        | 08             |
+-------------+---------------+----------------+--+

方式三:创建文件夹后load数据到分区:

1,现在HDFS创建08分区,并上传数据
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=07

2,此时该表中,没有对应的元数据信息,也没有对应的数据,所以该表还是空的
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+

此时,我们在上传数据的同时,为该表创建07分区的元数据
> load data local inpath '/opt/module/hive/stu.txt' into table stu_par partition(month = '07');
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 07             |
| 1002        | liubei        | 07             |
| 1003        | guanyu        | 07             |
| 1004        | zhaoyun       | 07             |
| 1005        | caocao        | 07             |
| 1006        | zhouyu        | 07             |
+-------------+---------------+----------------+--+
既把数据上传到指定的分区对应的文件夹下,又能写入元数据。

总结:

让分区表和数据产生关联的方式有三种:

①先在HDFS上创建分区的目录,并上传数据到该目录,最后在使用alter table add partition

②先在HDFS上创建分区的目录,并上传数据到该目录,最后使用msck repair table table_name

③先在HDFS上创建分区的目录,在使用 load data local 路径 into table table_name partition(字段名=字段值) 

你可能感兴趣的:(bigData,Hive,Hive)