hive的建表语句如下:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...]) [STORED AS DIRECTORIES] (Note: Only available starting with Hive 0.10.0)] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] (Note: Only available starting with Hive 0.6.0) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] (Note: Only available starting with Hive 0.6.0) [AS select_statement] (Note: Only available starting with Hive 0.5.0, and not supported when creating external tables.)
如果对存在表cp那么其语句为:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]
此处的外部表和内部表的标示为是否在建表语句中是否有:EXTERNAL
区别如下:
1.内部表的数据存储在hive的hive.metastroe.warehouse.dir下,如果在创建database的时候指定了LOCATION 那么其内部表的数据会在此目录下;外部表仅仅记录数据位置,不对其位置走任何修改;
2.删除表时,如果是外部表那么只删元数据,不损坏其实际hdfs数据;而内部表会删除元数据和实际数据
3.数据源发生变化时外部表的数据也会变化;如果外部hdfs数据被删那么此表数据也就没了;
举例:
create EXTERNAL TABLE IF NOT EXISTS hive.dual( id STRING COMMENT 'id' ) STORED AS RCFile LOCATION 'hdfs:///hive/dual/';
基于外部数据:
create EXTERNAL TABLE IF NOT EXISTS hive.book( ISBN STRING COMMENT 'ISBN', title STRING COMMENT 'title', author STRING COMMENT 'Author', year STRING COMMENT 'Year-Of-Publication', publisher STRING COMMENT 'Publisher', img_s STRING COMMENT 'Image-URL-S', img_m STRING COMMENT 'Image-URL-M', img_l STRING COMMENT 'Image-URL-L' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' STORED AS TEXTFILE LOCATION 'hdfs:///hive/book/';
其他:
file_format:
: SEQUENCEFILE
| TEXTFILE
| RCFILE (Note: Only available starting with Hive 0.6.0)
| ORC (Note: Only available starting with Hive 0.11.0)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
row_format:
: DELIMITED [FIELDS TERMINATED BY
char
[ESCAPED BY
char
]] [COLLECTION ITEMS TERMINATED BY
char
]
[MAP KEYS TERMINATED BY
char
] [LINES TERMINATED BY
char
]
[NULL DEFINED AS
char
] (Note: Only available starting with Hive
0.13
)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]