(hive分区主要是为了提高检索效率,内部表和外部表都可以创建分区)
静态分区操作数据
1,小明1,lol-book-move,beijing:shangxuetang-shanghai:pudong
2,小明2,lol-book-move,beijing:shangxuetang-shanghai:pudong
3,小明3,lol-book-move,beijing:shangxuetang-shanghai:pudong
4,小明4,lol-book,beijing:shangxuetang-shanghai:pudong
5,小明5,lol-book-move,beijing:shangxuetang-shanghai:pudong
6,小明6,lol-book-move,beijing:shangxuetang-shanghai:pudong
7,小明7,lol-book-move,beijing:shangxuetang-shanghai:pudong
Hive 在创建表时添加分区partition
在表定义时指定对应的partition字段
a、不分区建表
create table psn(
id int,
name string,
likes array,
address map
)
ROW FORMAT delimited
fields terminated by ',' //hdfs存放的数据字段之间的分隔符(记得是存储在hdfs里的,不是select* from 查询显示的分割)
COLLECTION ITEMS TERMINATED BY '-' //hdfs存放的数据集合之间的分割符
map keys terminated by ':' ; //hdfs存放的数据map之间的分割符
location '/root/data/' //hdfs集群存放的数据
b、单分区建表语句:
--单分区建表
create table psn5(
id int,
name string,
likes array,
addres map
)
partitioned by (age int)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
--添加数据
load data local inpath '/root/data/data' into table psn5 partition (age =10);
c、 双分区建表语句:
--多分区建表
create table psn6(
id int,
name string,
likes array,
addres map
)
partitioned by (age int,sex string)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
--添加数据
load data local inpath '/root/data/data' into table psn6 partition (age=10,sex='body');
--增加分区
alter table psn6 add partition (age =30,sex='girl');
--多分区查询
select * from psn6 where age=30 and sex='girl';
(表已创建,在此基础上添加分区):
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
例:
ALTER TABLE day_table ADD PARTITION (dt='2008-08-08', hour='08')
Hive删除分区语法:
ALTER TABLE table_name DROP partition_spec, partition_spec,...
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
用户可以用 ALTER TABLE DROP PARTITION 来删除分区。
内部表中、对应分区的元数据和数据将被一并删除。
例:
ALTER TABLE day_hour_table DROP PARTITION (dt='2008-08-08', hour='09');
动态分区操作数据
1,小明1,12,man,book-move,beijing:shangxuetang-shanghai:pudong
2,小明2,13,man,book-move,beijing:shangxuetang-shanghai:pudong
3,小明3,12,boy,book-move,beijing:shangxuetang-shanghai:pudong
4,小明4,12,man,book,beijing:shangxuetang-shanghai:pudong
5,小明5,13,boy,book-move,beijing:shangxuetang-shanghai:pudong
6,小明6,12,boy,book-move,beijing:shangxuetang-shanghai:pudong
7,小明7,13,man,book-move,beijing:shangxuetang-shanghai:pudong
普通建表
create table psn7(
id int,
name string ,
age int,
sex string,
likes array,
addres map
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
建分区表
create table psn_part(
id int,
name string ,
likes array,
addres map
)
partitioned by (age int,sex string)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
插入数据
使用模式
from
insert
select
动态分区语句
from psn7
insert overwrite table psn_part partition (age,sex)
select id,name,likes,addres,age,sex
在插入数据的时候,会报错如下:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec
.dynamic.partition.mode=nonstrict
解决方案:
set hive.exec.dynamic.partition.mode=nonstrict;
参考文档链接 https://hugh-wangp.iteye.com/blog/1612268
https://www.cnblogs.com/luogankun/p/4111145.html