分区类型:
1、静态分区:加载数据时指定分区的值。
2、动态分区:数据未知,根据分区的值来确定创建分区(不能load加载数据可以根据查询)。
3、混合分区:静态和动态都有。
注意事项:
1、hive的分区使用的是表外字段,分区字段是一个伪列但是可以做查询过滤。
2、分区字段不建议使用中文
3、不太建议使用动态分区。因为动态分区将使用mapreduce来查询数据,如果分区数量过多将导致namenode和yarn的性能瓶颈。所以建议动态分区前需要尽可能预知分区数量。
4、分区属性修改都可以通过手动修改元数据和hdfs的数据内容。(保留修改)。
1、静态分区
创建一级分区:
create table if not exists t_part1(
uid int,
uname string,
age int
)
partitioned by (dt string)
row format delimited fields terminated by ',';
导入数据:
load data local inpath '/usr/local/hivedata/users.txt' into table t_part1 partition(dt='2018-07-04');
创建二级分区:
create table if not exists t_part2(
uid int,
uname string,
age int
)
partitioned by (year string,month string)
row format delimited fields terminated by ',';
加载数据:
load data local inpath '/usr/local/hivedata/users.txt' into table t_part2 partition(year='2018',month=07);
load data local inpath '/usr/local/hivedata/users.txt' into table t_part2 partition(year='2018',month='07');
创建三级目录:
create table if not exists t_part3(
uid int,
uname string,
age int
)
partitioned by (year string,month string,days string)
row format delimited fields terminated by ',';
加载数据:
load data local inpath '/usr/local/hivedata/users.txt' into table t_part3 partition(year='2018',month='07',days='04');
查询:
select * from t_part3;
select * from t_part3 where year = '2018';
select * from t_part3 where year = '2018' and month = '07';
select * from t_part3 where month = '07' and days='04';
查看分区:
show partitions t_part2;
修改分区:
1、分区名怎么修改???目前没有提供修改分区名方法(暴力修改,直接修改hdfs上目录名)
2、增加分区:
alter table t_part2 add partition(year='2018',month='06');
alter table t_part2 add partition(year='2018',month='06') partition(year='2017',month='12') partition(year='2017',month='11');
增加分区并设置数据:
alter table t_part2 add partition(year='2017',month='12') location '/user/hive/warehouse/gp1707.db/t_user_info';
alter table t_part2 add partition(year='2017',month='08') location '/user/hive/warehouse/gp1707.db/t_user_info' partition(year='2017',month='09') location '/user/hive/warehouse/gp1707.db/t_user_info';
3、修改分区的存储路径:
alter table t_part2 partition(year='2017',month='08') set location 'hdfs://hadoop01:9000/user/hive/warehouse/gp1707.db/t_userinfo';
4、删除分区
alter table t_part2 drop partition(year='2017',month='06');
alter table t_part2 drop partition(year='2017',month='09'),partition(year='2018',month=07);
注意:
1、修改分区路径时,需要写路径的全称,绝对路径。
2、批量增加和批量删除分区时,语法的不同,批量增加时使用" "空格来分割分区,批量删除时使用','逗号来分割。
2、动态分区
动态分区的属性:
set hive.exec.dynamic.partition=true;(默认为true)
set hive.exec.dynamic.partition.mode=strict/nonstrict;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=100;
创建动态分区:
create table t_dypart1(
uid int,
uname string,
uage int
)
partitioned by (dt string)
row format delimited fields terminated by ',';
加载数据:(不能使用load方式加载数据)
insert into table t_dypart1 partition(dt) select * from t_part1;
混合分区:
create table t_dypart3(
uid int,
uname string,
uage int
)
partitioned by (year string,month string,days string)
row format delimited fields terminated by ',';
insert into t_dypart3 partition(year='2018',month,days)
select uid,uname,age,month,days from t_part3;