Hive官方手册学习(三)Hive数据操作语言

一、文件数据插入表(Loading files into tables)

只是复制/移动,不做任何变换

1.1 语法(Syntax)

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

1.2 概述(Synopsis)

  • filepath:文件路径(文件或者文件夹)
    • 相对路径: project/data1
    • 绝对路径:/user/hive/project/data1
    • URI: hdfs://namenode:9000/user/hive/project/data1
  • target:目标表
    • 非分区表
    • 分区表的话必须指定某一个分区
  • 有LOCAL:本地路径模式(文件或文件夹)
    • 相对路径,相对于当前工作目录
    • 绝对路径
    • URI: file:///user/hive/project/data1
    • 有LOCAL是copy,没LOCAL是move
  • 无LOCAL:
    • 全URI:这个好说
    • hdfs还是了local参考Namenode下配置文件fs.default.name的URI
    • 相对路径:相对于/user/
    • 从以上两部组成的地址中move数据
  • OVERWRITE:覆盖写
    • 有OVERWRITE覆盖写,没有OVERWRITE追加写

1.3 注意(Notes)

  • filepath不允许包含子目录
  • 没有LOCAL时,数据和表应该位于同一个文件系统

二、查询结果插入表(Inserting data into Hive Tables from queries)

2.1 语法(Syntax)

标准语法:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;

Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

2.2 概述(Synopsis)

  • INSERT OVERWRITE:覆盖表或分区
  • INSERT INTO:追加表或分区
  • IF NOT EXISTS:避免错误
  • OutputFormat:按照表的定义
  • PARTITION:分区表必须指定分区
  • Multiple insert:一次扫描多表插入

2.3 动态分区(Dynamic Partition)

FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt
  • 插入时不指定分区的具体值,而是指定一个列
  • 默认不支持动态分区的插入操作,需要hive.exec.dynamic.partition

三、查询数据到文件(Writing data into the filesystem from queries)

3.1 语法(Syntax)

标准语法:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...

Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...


row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

3.2 概述(Synopsis)

  • DIRECTORY:可以是URI,不是的参考Namenode下配置文件fs.default.name的URI
  • LOCAL:使用本地文件系统
  • row_format:默认^A分隔,也可也可自己指定

四、插入值到表(Inserting values into tables from SQL)

4.1 语法(Syntax)

标准语法:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]

Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal

4.2 概述(Synopsis)

  • 不支持部分列插入
  • 不支持复杂类型插入
  • 分区表:静态用时不用插入分区列,动态用时需要插入分区列

4.3 举例

CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
  CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;

INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);


CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING)
  PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;

INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
  VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);

INSERT INTO TABLE pageviews PARTITION (datestamp)
  VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');

五、升级(Update)

5.1 语法(Syntax)

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]

5.2 概述(Synopsis)

  • 分区表不支持升级
  • 分桶表不支持升级
  • 只删除where语句匹配的行

六、删除(DELETE)

6.1 语法(Syntax)

DELETE FROM tablename [WHERE expression]

6.2 概述(Synopsis)

  • 只删除where语句匹配的行

你可能感兴趣的:(Hive)