Hive之内部表和外部表

hive的建表语句如下:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...]) [STORED AS DIRECTORIES] (Note: Only available starting with Hive 0.10.0)]
  [
   [ROW FORMAT row_format] [STORED AS file_format]
   | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  (Note: Only available starting with Hive 0.6.0)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]  (Note: Only available starting with Hive 0.6.0)
  [AS select_statement]  (Note: Only available starting with Hive 0.5.0, and not supported when creating external tables.)

如果对存在表cp那么其语句为:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path]

此处的外部表和内部表的标示为是否在建表语句中是否有:EXTERNAL

区别如下:

1.内部表的数据存储在hive的hive.metastroe.warehouse.dir下,如果在创建database的时候指定了LOCATION 那么其内部表的数据会在此目录下;外部表仅仅记录数据位置,不对其位置走任何修改;

2.删除表时,如果是外部表那么只删元数据,不损坏其实际hdfs数据;而内部表会删除元数据和实际数据

3.数据源发生变化时外部表的数据也会变化;如果外部hdfs数据被删那么此表数据也就没了;

 

举例:

create EXTERNAL TABLE  IF NOT EXISTS hive.dual(
id STRING COMMENT 'id'
)
STORED AS RCFile 
LOCATION 'hdfs:///hive/dual/';

基于外部数据:

create EXTERNAL TABLE  IF NOT EXISTS hive.book(
ISBN STRING COMMENT 'ISBN',
title STRING COMMENT 'title',
author STRING COMMENT 'Author',
year STRING COMMENT 'Year-Of-Publication',
publisher STRING COMMENT 'Publisher',
img_s STRING COMMENT 'Image-URL-S',
img_m STRING COMMENT 'Image-URL-M',
img_l STRING COMMENT 'Image-URL-L'
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' 
STORED AS TEXTFILE
LOCATION 'hdfs:///hive/book/';

 

 

其他:

file_format:

  : SEQUENCEFILE

  | TEXTFILE

  | RCFILE     (Note: Only available starting with Hive 0.6.0)

  | ORC        (Note: Only available starting with Hive 0.11.0)

  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

 

row_format:
   : DELIMITED [FIELDS TERMINATED BY  char  [ESCAPED BY  char ]] [COLLECTION ITEMS TERMINATED BY  char ]
         [MAP KEYS TERMINATED BY  char ] [LINES TERMINATED BY  char ]
         [NULL DEFINED AS  char ] (Note: Only available starting with Hive  0.13 )
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

你可能感兴趣的:(hive)