Hive表增加数据有两种方式(目前我知道的),分别为load及传统意义上的insert。
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
filepath
的路径使用的是本地文件系统路径,而非HDFS路径。适用于从本地导入数据至表中。/user//
。若LOCAL未给出时,源文件系统必须与表所在的文件系统相同。// 将a.txt从本地移动表student中
LOAD DATA LOCAL INPATH 'a.txt' OVERWRITE INTO TABLE student;
// 将 hdfs:///zhaopy/test/hive/b.txt 加载到表student中,执行完后,/zhaopy//test/hive/ 下 b.txt会被移动到表所在目录。
LOAD DATA INPATH '/zhaopy/test/hive/b.txt' OVERWRITE INTO TABLE student;
每insert一条记录都会产生一次mapreduce任务,性能较差,推荐使用load
Standard syntax: // 标准版,基于查询的插入
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts): // hive扩展版本,基于查询的插入
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts): // hive扩展版本,基于查询的动态分区插入
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
语句说明
"immutable"="true"
为不可变表。对于非空的不可变表的插入对抛出异常。但是对于空的不可变表则可以插入,同时覆盖操作也不受此属性影响。(允许覆盖)使用例子
hive> select * from student;
OK
10 zhao
12 NULL
Time taken: 0.125 seconds, Fetched: 2 row(s)
hive> select * from teacher;
OK
Time taken: 0.108 seconds
hive> INSERT OVERWRITE TABLE teacher select id,null from student where student.name='zhao';
---------------------------------省略一堆输出--------------------------------------------
hive> select * from teacher;
OK
10 NULL
Time taken: 0.125 seconds, Fetched: 1 row(s)
// 可以看出通过insert 以及 select from where配合将指定的格式插入到目标表中。
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT ... FROM ...
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
hive> select * from student;
OK
10 zhao
12 NULL
Time taken: 0.134 seconds, Fetched: 2 row(s)
INSERT OVERWRITE local directory '/hadoop/asiainfo/zhaopy/hivetest' select * from student;
------------------------省略一堆输出------------------
[ochadoop@server7 hivetest]$ ls
000000_0.snappy 000001_0.snappy
[ochadoop@server7 hivetest]$ cat 000000_0.snappy
10zhao
[ochadoop@server7 hivetest]$ cat 000001_0.snappy
12\N
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
null
代替。hive> select * from student;
OK
10 zhao
12 NULL
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> insert into table student values (50, null);
----------------省略一堆输出
hive> select * from student;
OK
10 zhao
12 NULL
50 NULL
Time taken: 0.126 seconds, Fetched: 3 row(s)
Standard Syntax:
DELETE FROM tablename [WHERE expression]
Standard Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
语句说明
使用例子
与普通SQL相同。
所谓混合操作,就是将以上语句写入一个语句中,加入条件进行操作
Standard Syntax:
MERGE INTO table > AS T USING
语句说明
使用例子
MERGE INTO merge_data.transactions AS T
USING merge_data.merge_source AS S
ON T.ID = S.ID and T.tran_date = S.tran_date
WHEN MATCHED AND (T.TranValue != S.TranValue AND S.TranValue IS NOT NULL) THEN UPDATE SET TranValue = S.TranValue, last_update_user = 'merge_update'
WHEN MATCHED AND S.TranValue IS NULL THEN DELETE
WHEN NOT MATCHED THEN INSERT VALUES (S.ID, S.TranValue, 'merge_insert', S.tran_date);