There are multiple ways to modify data in Hive:
EXPORT and IMPORT commands are also available (as of Hive 0.8).
Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move (纯复制,移动) operations that move datafiles into locations corresponding to Hive tables.
LOAD DATA [LOCAL] INPATH
'filepath'
[OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
|
Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.
project/data1
/user/hive/project/data1
hdfs://namenode:9000/user/hive/project/data1
file:///user/hive/project/data1
fs.default.name
that specifies the Namenode URI./user/<username>
Query Results can be inserted into tables by using the insert clause.
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
|
IF NOT EXISTS
is provided for a partition (as of Hive 0.9.0).In the dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this a static partition, otherwise it is a dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.
Dynamic Partition inserts are disabled by default. These are the relevant(相关的) configuration properties for dynamic partition inserts:
Configuration property |
Default |
Note |
---|---|---|
|
|
Needs to be set to |
|
|
In |
|
100 |
Maximum number of dynamic partitions allowed to be created in each mapper/reducer node |
|
1000 |
Maximum number of dynamic partitions allowed to be created in total |
|
100000 |
Maximum number of HDFS files created by all mappers/reducers in a MapReduce job |
|
|
Whether to throw an exception if dynamic partition insert generates empty results |
FROM
page_view_stg pvs
INSERT
OVERWRITE
TABLE
page_view PARTITION(dt=
'2008-06-08'
, country)
SELECT
pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url,
null
,
null
, pvs.ip, pvs.cnt
|
Here the country
partition will be dynamically created by the last column from the SELECT
clause (i.e. pvs.cnt
). Note that the name is not used. In nonstrict
mode the dt
partition could also be dynamically created.
Query results can be inserted into filesystem directories by using a slight variation (细微的变化)of the syntax above:
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive
0.11
.
0
)
SELECT ... FROM ...
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
row_format
: DELIMITED [FIELDS TERMINATED BY
char
[ESCAPED BY
char
]] [COLLECTION ITEMS TERMINATED BY
char
]
[MAP KEYS TERMINATED BY
char
] [LINES TERMINATED BY
char
]
[NULL DEFINED AS
char
] (Note: Only available starting with Hive
0.13
)
|
fs.default.name
that specifies the Namenode URI.The INSERT...VALUES statement can be used to insert data into tables directly from SQL.
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either
null
or any valid SQL literal
|
Means user cannot insert data into
complex datatype [array, map, struct, union] columns using INSERT INTO...VALUES clause.
CREATE
TABLE
students (
name
VARCHAR
(64), age
INT
, gpa
DECIMAL
(3, 2))
CLUSTERED
BY
(age)
INTO
2 BUCKETS STORED
AS
ORC;
INSERT
INTO
TABLE
students
VALUES
(
'fred flintstone'
, 35, 1.28), (
'barney rubble'
, 32, 2.32);
CREATE
TABLE
pageviews (userid
VARCHAR
(64), link STRING,
from
STRING)
PARTITIONED
BY
(datestamp STRING) CLUSTERED
BY
(userid)
INTO
256 BUCKETS STORED
AS
ORC;
INSERT
INTO
TABLE
pageviews PARTITION (datestamp =
'2014-09-23'
)
VALUES
(
'jsmith'
,
'mail.com'
,
'sports.com'
), (
'jdoe'
,
'mail.com'
,
null
);
INSERT
INTO
TABLE
pageviews PARTITION (datestamp)
VALUES
(
'tjohnson'
,
'sports.com'
,
'finance.com'
,
'2014-09-23'
), (
'tlee'
,
'finance.com'
,
null
,
'2014-09-21'
);
|
Standard Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
|
Standard Syntax:
DELETE FROM tablename [WHERE expression]
|