ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" ;
备注:静态分区表中不能含有分区字段,否则会报错:FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
导入数据
INSERT OVERWRITE TABLE emp_dept_partition PARTITION (deptno=10) IF NOT EXISTS select empno, ename, job, mgr, hiredate, sal, comm from emp where deptno=20;
hive> select * from emp_dept_partition;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 10
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 10
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 10
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 10
Time taken: 0.083 seconds, Fetched: 5 row(s)
mysql> select a.TBL_ID, b.PART_ID from partition_keys a, partitions b where a.TBL_ID=(select TBL_ID from tbls where TBL_NAME='emp_dept_partition') and a.TBL_ID = b.TBL_ID;
+--------+---------+
| TBL_ID | PART_ID |
+--------+---------+
| 15 | 6 |
+--------+---------+
1 row in set (0.00 sec)
mysql> update partitions set PART_NAME='deptno=20' where PART_ID=6;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0
mysql> update partition_key_vals set PART_KEY_VAL='20' where PART_ID=6;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
hive> select * from emp_dept_partition;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
Time taken: 0.093 seconds, Fetched: 5 row(s)
证明了分区字段数据的确是当作元数据被保留起来了。
2、动态分区
INSERT OVERWRITE TABLE emp_dept_partition PARTITION (deptno) select empno, ename, job, mgr, hiredate, sal, comm,deptno from emp;
分区字段必须为最后一个字段
第一次执行会报错:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
根据提示输入命令即可解决:
set hive.exec.dynamic.partition.mode=nonstrict
检查元数据
mysql> select a.TBL_ID, b.PART_ID ,b.PART_NAME from partition_keys a, partitions b where a.TBL_ID=(select TBL_ID from tbls where TBL_NAME='emp_dept_partition') and a.TBL_ID = b.TBL_ID;
+--------+---------+-----------+
| TBL_ID | PART_ID | PART_NAME |
+--------+---------+-----------+
| 15 | 6 | deptno=20 |
| 15 | 7 | deptno=30 |
| 15 | 8 | deptno=10 |
+--------+---------+-----------+
3 rows in set (0.00 sec)
4、总结
使用动态分区可以非常智能的加载表,而在动静结合使用时需要注意静态分区值必须在动态分区值的前面。所以建议使用动态分区,且insert进行分区的使用建议使用OVERWRITE,避免数据重复。
【来自@若泽大数据】