Hive的动态分区

一、简介

当使用静态分区时,在向分区表中插入数据时,我们需要指定具体分区列的值。此外,hive还支持动态提供分区值(即在插入数据时,不指定具体的分区列值,而是仅仅指定分区字段)。动态分区在默认情况下是禁用的(在hive2.3.4版本中默认是开启的,在hive-default.xml.template文件中进行配置),所以需要将hive.exec.dynamic.partition设为true。默认情况下,用户必须至少指定一个静态分区列,这是为了避免意外覆盖分区。要禁用此限制,可以设置分区模式为非严格模式(即将hive.exec.dynamic.partition.mode设为nonstrict,默认值为strict)。可以选择在命令行终端方式设置:

>SET hive.exec.dynamic.partition=true;

>SET hive.exec.dynamic.partition.mode=nonstrict;

或者在hive-site.xml中设置:


    hive.exec.dynamic.partition
    true
    Whether or not to allow dynamic partitions in DML/DDL.


    hive.exec.dynamic.partition.mode
    nonstrict
    
      In strict mode, the user must specify at least one static partition
      in case the user accidentally overwrites all partitions.
      In nonstrict mode all partitions are allowed to be dynamic.
    

二、例子

2.1准备数据

          001,tom,23,2019-03-16
          002,jack,12,2019-03-13
          003,robin,14,2018-08-13
          004,justin,34,2018-10-12
          005,jarry,24,2017-11-11
          006,jasper,24,2017-12-12

2.2创建表

(1)创建普通表,用于load数据

create table dynamic_tbl_temp(id int,name string,age int,start_date date)
row format delimited
fields terminated by ','

(2)创建分区表

create table dynamic_tbl(id int,name string,age int,start_date date)
partitioned by (year string,month string)
row format delimited
fields terminated by ','

2.3加载数据

(1)向普通表加载数据

load data local inpath '/opt/modules/data/dynamic.txt' into table dynamic_tbl_temp;

(2)向分区表中插入数据

insert into dynamic_tbl
partition(year,month)
select 
   id,
   name,
   age,
   start_date,
   year(start_date),
   month(start_date)
from
  dynamic_tbl_temp;

2.4查询数据

(1)查询year=2018的数据

select * from dynamic_tbl where year = 2018;

--结果
4       justin  34      2018-10-12      2018    10
3       robin   14      2018-08-13      2018    8

(2)查询year=2017,month=11的数据

select * from dynamic_tbl where year = 2017 and month = 11;

--结果:

5       jarry   24      2017-11-11      2017    11

 

你可能感兴趣的:(Hive)