使用create table ...as创建表时要注意的问题

     工作中有时候做hive开发了,需要对一张表进行备份。一般都会使用 create table as  select(简称:CTAS)...简单方便,但是需要注意CTAS建表产生的问题,因为CTAS建表并不一定会保存原表样式。

1.创建一个分区表
CREATE TABLE T_DEDUCT_SIGN_D(
  id bigint COMMENT '主键ID',
  sign_no string COMMENT '签约协议号',
  bp_no string COMMENT '商户号'
  )COMMENT '代扣签约表'
PARTITIONED BY (
  statis_date string COMMENT '时间分区')
STORED AS RCFILE

2.往分区表里导入数据
3.使用create table as....创建表
 create table t_T_DEDUCT_SIGN_D_copy 
 as 
 select * from T_DEDUCT_SIGN_D where statis_date is not null
4.查看源表和目标表的结构进行对比

查看源表T_DEDUCT_SIGN_D结构:分区表+RCfiLE存储格式
hive (fdm_sor)> show create table T_DEDUCT_SIGN_D;
CREATE TABLE `T_DEDUCT_SIGN_D`(
  `id` bigint COMMENT '主键ID', 
  `sign_no` string COMMENT '签约协议号', 
  `bp_no` string COMMENT '商户号')
COMMENT '代扣签约表'
PARTITIONED BY ( 
  `statis_date` string COMMENT '时间分区')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
  'hdfs://SuningHadoop2/user/finance/hive/warehouse/fdm_sor.db/t_deduct_sign_d'
TBLPROPERTIES (
  'transient_lastDdlTime'='1546417837')

查看目标表结果:t_T_DEDUCT_SIGN_D_copy  变成了非分区表+TEXTFILE存储格式+多了字段
hive (fdm_sor)> show create table  t_T_DEDUCT_SIGN_D_copy;
OK
CREATE TABLE `t_T_DEDUCT_SIGN_D_copy`(
  `id` bigint, 
  `sign_no` string, 
  `bp_no` string, 
  `statis_date` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://SuningHadoop2/user/finance/hive/warehouse/fdm_sor.db/t_t_deduct_sign_d_copy'
TBLPROPERTIES (
  'transient_lastDdlTime'='1546418132')

如上演示可以看出,hive中用CTAS创建表时要注意如下事项:

    1.hive中用CTAS 创建表,所创建的表统一都是非分区表,不管源表是否是分区表。所以对于分区表的创建使用create table ..as一定要注意分区功能的丢失。当然创建表以后可以添加分区,成为分区表。注意如果源表是非分区表则没有这个问题。

    2.如果使用create table as  select  *  创建表时源表是分区表,则新建的表会多字段,具体多的字段个数和名称就是源表分区的个数和名称。当然如果select选择的是指定的列则不会有这种问题。

    3.如果源表的存储格式不是TEXTFILE。则使用CTAS创建的表存储格式会变成默认的格式textfile。比如这里源表是RCFILE。而新建的表则是TEXTFILE。当然可以在使用create table ....as创建表时指定存储格式和解析格式,甚至是列的名称等属性。具体参考博客:hive使用create as创建表指定存储格式等属性

   4.使用CTAS方式创建的表不能是外部表。

   5.使用CTAS创建的表不能分桶表。

1.创建一个外部表
CREATE  external  TABLE T_DEDUCT_SIGN_D_external(
  id bigint COMMENT '主键ID',
  sign_no string COMMENT '签约协议号',
  bp_no string COMMENT '商户号'
  )COMMENT '代扣签约表'
PARTITIONED BY (
  statis_date string COMMENT '时间分区')
STORED AS RCFILE

1.使用CTAS带external创建外部表失败:CTAS创建不了外部表
hive (fdm_sor)> create external table T_T_DEDUCT_SIGN_D_external_COPY 
              > AS 
              > SELECT * FROM T_DEDUCT_SIGN_D_external
              > WHERE STATIS_DATE ='20180101';
FAILED: SemanticException [Error 10070]: CREATE-TABLE-AS-SELECT cannot create external table

2.不带external使用CTAS创建表,但不是外部表,默认内部表
create table T_T_DEDUCT_SIGN_D_external_COPY 
AS
SELECT * FROM T_DEDUCT_SIGN_D_external
WHERE STATIS_DATE ='20180101';

hive (fdm_sor)> describe formatted T_T_DEDUCT_SIGN_D_external_COPY;
OK
# col_name            	data_type           	comment             
	 	 
id                  	bigint              	                    
sign_no             	string              	                    
bp_no               	string              	                    
statis_date         	string              	                    
	 	 
# Detailed Table Information	 	 
Database:           	fdm_sor             	 
Owner:              	finance             	 
CreateTime:         	Wed Jan 02 18:55:23 CST 2019	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://SuningHadoop2/user/finance/hive/warehouse/fdm_sor.db/t_t_deduct_sign_d_external_copy	 
Table Type:         	MANAGED_TABLE       	 
Table Parameters:	 	 
	transient_lastDdlTime	1546426523          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	serialization.format	1                   
Time taken: 0.079 seconds, Fetched: 29 row(s)

但对于oracle和mysql用create table as 创建表要注意如下事项:

查询官网发现原来使as创建表会有如下问题:

Oracle Database automatically defines on columns in the new table any  NOT NULL  constraints that were explicitly created on the corresponding columns of the selected table if the subquery selects the column rather than an expression containing the column. If any rows violate the constraint, then the database does not create the table and returns an error.

       1.显示的NOT NULL约束自动会带到新表。

NOT NULL  constraints that were implicitly created by Oracle Database on columns of the selected table (for example, for primary keys) are not carried over to the new table.

       2.隐式的NOT NULL约束不会带到新表,如主键。

In addition, primary keys, unique keys, foreign keys, check constraints, partitioning criteria, indexes, and column default values are not carried over to the new table.

      3.另外最关键的是,主键,唯一,外键,check约束,分区,索引以及列的默认值不会带到新表。

If the selected table is partitioned, then you can choose whether the new table will be partitioned the same way, partitioned differently, or not partitioned. Partitioning is not carried over to the new table. Specify any desired partitioning as part of the  CREATE TABLE  statement before the  AS subquery  clause.

你可能感兴趣的:(Hive编程和数据仓库)