Hive分区表导入数据

Hive分区表导入数据(动态分区插入,静态分区插入、动静态分区插入)

在hive上建立一个简单的分区表:

CREATE TABLE `school_student_info`(
`name` string, 
`age` int,
`sex` string
  )
PARTITIONED BY  (`grade` string,
`teacher` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'=',', 
  'serialization.format'=',', 
  'serialization.null.format'='');

Hive参数说明:

  • set hive.exec.dynamic.partition=true; 开启动态分区,默认为false

  • set hive.exec.dynamic.partition.mode=nonstrict;动态分区的模式。Strict:至少包含一个静态分区,nonstrict没有要求。

  • set hive.mapred.mode=nonstrict; hive等值连接

  • set hive.exec.max.dynamic.partitions=5 允许的最大动态分区个数。默认为1000

  • set hive.exec.max.dynamic.partitions.pernode=5 一个mapreducejob所允许的最大的动态分区的个数。默认是100

1、 静态插入数据

Insert overwrite table school_student_info partition(grade=’Three’,teacher=’Li’)
Select name,age,sex,grade,teacher from table_old where grade=’Three’ and teacher=’Li’

要保证select中的字段个数和位置与插入的分区表的字段个数和位置对应。否则汇报如下错误:
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ‘symbol’: Table insclause-0 has 4 columns, but query has 5 columns.
2、 动静态混合分区插入

Insert overwrite table school_student_info partition(grade=’Three’,teacher)
Select name,age,sex,teacher from table_old where grade=’Three’ and teacher=’Li’

3、 动态分区插入

Insert overwrite table school_student_info partition(grade,teacher)
Select name,age,sex,grade,teacher from table_old where grade=’Three’ and teacher=’Li’

grade和teacher根据select返回记录的相对位置进行取值,

你可能感兴趣的:(spark,hive,mapreduce)