-Table内部表
-External Table 外部表
-Partition分区表
-Bucket Table 桶表
外部表操作演示:
创建代码演示:
create EXTERNAL table t_external (year string,month int,num int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/usr/extends';
执行load 命令之后:
如果删除外部表,只会将其对应的元数据删除了,目录中的数据并不会被删除
hive> drop table t_external;
hive> select * from t_external;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 't_external'
由上可见对应的数据并没有被删除。
综上所述:外部表与内部表的区别如下
创建分区表操作:
create table t_partition(ts bigint,line string)
partitioned by (dt string,country string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
查看表结构效果:
hive> desc t_partition;
OK
ts bigint
line string
dt string
country string
# Partition Information
# col_name data_type comment
dt string
country string
向表中导入数据后的效果:
load data local inpath '/home/centosm/test/file3.txt'
into table t_partition
partition(dt='2017-04-01',country='US');
。。。。导入多个文件到不同分区
数据导完后查询表中数据如下:
hive> select * from t_partition;
OK
1111111 hello file1 2017-03-01 GB
2222 hello file2 2017-03-01 GB
123122222222 helloFile3 2017-04-01 CH
123122222222 helloFile3 2017-04-01 US
Time taken: 0.158 seconds, Fetched: 4 row(s)
hive> select * from t_partition where country='US';
OK
123122222222 helloFile3 2017-04-01 US
hive> select * from t_partition where country='US' and dt='2017-04-01' ;
OK
123122222222 helloFile3 2017-04-01 US
Time taken: 0.216 seconds, Fetched: 1 row(s)
hive> select * from t_partition where country='US' or dt='2017-03-01' ;
OK
1111111 hello file1 2017-03-01 GB
2222 hello file2 2017-03-01 GB
123122222222 helloFile3 2017-04-01 US
Time taken: 0.13 seconds, Fetched: 3 row(s)
查询表分区
hive> show partitions t_partition;
OK
dt=2017-03-01/country=GB
dt=2017-04-01/country=CH
dt=2017-04-01/country=US
Time taken: 0.074 seconds, Fetched: 3 row(s)
查询hive存储数据的目录如下
hadoop fs -ls /user/hive/warehouse/t_partition/dt=2017-03-01/country=GB
Found 2 items
-rwxr-xr-x 1 centosm supergroup 20 2017-03-25 01:11 /user/hive/warehouse/t_partition/dt=2017-03-01/country=GB/file.txt
-rwxr-xr-x 1 centosm supergroup 17 2017-03-25 01:14 /user/hive/warehouse/t_partition/dt=2017-03-01/country=GB/file2.txt
由上述可知加载数据到分区后数据的目录可能如下所述:
create table bucketed_users (id INT,name string) clustered by (id) into 4 buckets;
create table bucketed_users (id int,name string) clustered by (id) sorted by (id asc) into 4 buckets;