官方参考文档 LanguageManual DDL
创建/删除/更改/使用数据库
在hive sql中database
关键词和 schema
关键词可以互换,意思是一样的
创建数据库
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
小牛试刀:
hive> create database if not exists test;
OK
Time taken: 0.131 seconds
hive> create schema if not exists test_schema;
OK
Time taken: 0.07 seconds
hive> show databases;
OK
default
test
test_schema
Time taken: 0.255 seconds, Fetched: 3 row(s)
hive> create database if not exists stefan comment 'just for test';
OK
Time taken: 0.047 seconds
hive> desc database stefan;
OK
stefan just for test hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db hadoop USER
Time taken: 0.068 seconds, Fetched: 1 row(s)
删除数据库
删除数据库时,默认行为是RESTRICT,这种情况下如果数据库不为空,则删除动作失败。如果需要删除库以及库里的表可以使用CASCADE关键字。
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
小牛试刀:
hive> drop database test;
OK
Time taken: 0.242 seconds
hive> drop database if exists test;
OK
Time taken: 0.011 seconds
hive> drop database if exists test_schema cascade;
OK
Time taken: 1.518 seconds
hive> show databases;
OK
default
prop_database
stefan
修改数据库
需要注意下述命令的生效版本号。
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
alter database ... set location 语句不会将数据库当前目录的内容移动到新指定的位置。它不会更改当前数据库下已经存在的任何表/分区关联的位置,仅改变在该数据库下新建表的默认父目录。
小牛试刀:
添加属性:
hive> show create database stefan;
OK
CREATE DATABASE `stefan`
COMMENT
'just for test'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
Time taken: 0.222 seconds, Fetched: 5 row(s)
hive> alter database stefan set dbproperties ('owner'='stefan', 'create_time'='20190403');
OK
Time taken: 0.07 seconds
hive> show create database stefan;
OK
CREATE DATABASE `stefan`
COMMENT
'just for test'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db'
WITH DBPROPERTIES (
'create_time'='20190403',
'owner'='stefan')
Time taken: 0.196 seconds, Fetched: 8 row(s)
修改owner:
hive> alter schema stefan set owner user stefan_test;
OK
Time taken: 0.084 seconds
hive> desc database stefan;
OK
stefan just for test hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db stefan_test USER
Time taken: 0.145 seconds, Fetched: 1 row(s)
切换数据库
USE database_name;
USE DEFAULT;
hive> use stefan;
OK
Time taken: 0.019 seconds
查看库
show databases;
hive> show databases;
OK
default
Time taken: 5.987 seconds, Fetched: 1 row(s)
查看当前库
show current_database();
hive> select current_database();
OK
default
Time taken: 0.852 seconds, Fetched: 1 row(s)
查看库的属性信息
desc database extended default;
hive> desc database extended default;
OK
default Default Hive database hdfs://jms-master-01:9000/user/hive/warehouse public ROLE
Time taken: 0.361 seconds, Fetched: 1 row(s)
查看建库语句
show create database default;
hive> show create database default;
OK
CREATE DATABASE `default`
COMMENT
'Default Hive database'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse'
Time taken: 0.525 seconds, Fetched: 5 row(s)
建库时携带属性
create database if not exists prop_database comment 'test properties' with dbproperties ('prop1'='aaa', 'prop2'='bbb');
hive> create database if not exists prop_database comment 'test properties database' with dbproperties('prop1'='aaa', 'prop2'='bbb');
OK
Time taken: 0.121 seconds
hive> show create database prop_database;
OK
CREATE DATABASE `prop_database`
COMMENT
'test properties database'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/prop_database.db'
WITH DBPROPERTIES (
'prop1'='aaa',
'prop2'='bbb')
Time taken: 0.181 seconds, Fetched: 8 row(s)
创建/删除/清空表
建表
根据指定表明创建表。当表明存在时抛出异常,可以使用 IF NOT EXISTS 跳过该异常。
*表和列的注释comment格式是字符串,使用单引号。
*表名和列名不区分大小写。
*使用external创建的表是外部表,不使用默认创建的表是托管表(我习惯称之为内部表)。查看一个表是托管表还是外部表,可以通过describe extended table_name;
查看其中的tableType属性值。
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| JSONFILE -- (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
创建托管表:
hive> create table manage_table (id int, name string);
OK
Time taken: 0.674 seconds
创建外部表:
hive> create external table external_table (id int, name string) location '/user/hadoop/tmp/hive/default.db/external_table';
OK
Time taken: 0.449 seconds
查看表是托管表还是外部表:
可以看出托管表的tableType是MANAGED_TABLE;外部表的tableType是EXTERNAL_TABLE。
hive> describe extended manage_table;
OK
id int
name string
Detailed Table Information Table(tableName:manage_table, dbName:default, owner:hadoop, createTime:1554341982, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hive/warehouse/manage_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=0, transient_lastDdlTime=1554341982}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)
Time taken: 0.329 seconds, Fetched: 4 row(s)
hive> describe extended external_table;
OK
id int
name string
Detailed Table Information Table(tableName:external_table, dbName:default, owner:hadoop, createTime:1554342083, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://jms-master-01:9000/user/hadoop/tmp/hive/default.db/external_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1554342083}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, rewriteEnabled:false)
Time taken: 0.143 seconds, Fetched: 4 row(s)
创建分区表
hive> create table partition_table (id int, name string) partitioned by (dt string);
OK
Time taken: 0.393 seconds
hive> show create table partition_table;
OK
CREATE TABLE `partition_table`(
`id` int,
`name` string)
PARTITIONED BY (
`dt` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/partition_table'
TBLPROPERTIES (
'transient_lastDdlTime'='1554344052')
Time taken: 0.275 seconds, Fetched: 15 row(s)
create table as select(CTAS)
CTAS语法限制:
- 目标表不能是外部表
- 目标表不能是分桶表
CTAS语法可以实现表的格式转换。
hive> select * from partition_table;
OK
1 郭靖 20190402 china
2 黄蓉 20190402 china
3 杨康 20190402 china
4 穆念慈 20190402 china
5 东邪 20190402 china
6 西毒 20190402 china
7 黄老邪 20190402 china
8 杨铁心 20190402 china
Time taken: 1.499 seconds, Fetched: 8 row(s)
hive> create table new_key_value_store row format serde "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" stored as RCFile as select (id * 10) new_key, concat(id, name) key_value_pair from partition_table sort by new_key, key_value_pair;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190404103533_902fb48c-2471-4e2d-bc79-5cba46b8c710
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1552651623473_0007, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0007/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job -kill job_1552651623473_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-04-04 10:35:45,795 Stage-1 map = 0%, reduce = 0%
2019-04-04 10:35:50,537 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.39 sec
2019-04-04 10:35:56,978 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.27 sec
MapReduce Total cumulative CPU time: 5 seconds 270 msec
Ended Job = job_1552651623473_0007
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.27 sec HDFS Read: 8487 HDFS Write: 248 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 270 msec
OK
Time taken: 24.897 seconds
hive> select * from new_key_value_store;
OK
10 1郭靖
20 2黄蓉
30 3杨康
40 4穆念慈
50 5东邪
60 6西毒
70 7黄老邪
80 8杨铁心
Time taken: 0.21 seconds, Fetched: 8 row(s)
hive> show create table new_key_value_store;
OK
CREATE TABLE `new_key_value_store`(
`new_key` int,
`key_value_pair` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/new_key_value_store'
TBLPROPERTIES (
'transient_lastDdlTime'='1554345358')
Time taken: 0.299 seconds, Fetched: 13 row(s)
克隆表(create table ... like)
create table ... like 会创建一个和源表结构完全一致的空表。
hive> create table like_new_key_value_store like new_key_value_store;
OK
Time taken: 0.149 seconds
hive> show create table like_new_key_value_store;
OK
CREATE TABLE `like_new_key_value_store`(
`new_key` int,
`key_value_pair` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/like_new_key_value_store'
TBLPROPERTIES (
'transient_lastDdlTime'='1554345937')
Time taken: 1.801 seconds, Fetched: 13 row(s)
hive> select * from like_new_key_value_store;
OK
Time taken: 0.252 seconds
分桶排序表
创建一个分桶排序表
hive> create table bucketed_table (id int, name string) partitioned by (dt string, country string) clustered by (id) sorted by (name) into 4 buckets row format delimited fields terminated by '\001' collection items terminated by '\002' map keys terminated by '\003' stored as sequencefile;
OK
Time taken: 0.241 seconds
hive> show create table bucketed_table;
OK
CREATE TABLE `bucketed_table`(
`id` int,
`name` string)
PARTITIONED BY (
`dt` string,
`country` string)
CLUSTERED BY (
id)
SORTED BY (
name ASC)
INTO 4 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'colelction.delim'='',
'field.delim'='',
'mapkey.delim'='',
'serialization.format'='')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
'hdfs://jms-master-01:9000/user/hive/warehouse/stefan.db/bucketed_table'
TBLPROPERTIES (
'transient_lastDdlTime'='1554346570')
Time taken: 0.268 seconds, Fetched: 26 row(s)
向多个分区多动态插入,需要设置参数
set hive.exec.dynamici.partition=true;
#开启动态分区,默认是false
set hive.exec.dynamic.partition.mode=nonstrict;
#开启允许所有分区都是动态的,否则必须要有静态分区才能使用。
hive> insert into bucketed_table partition(dt, country) select id, name, dt, country from default.partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190404110332_1e7c3bf3-e5e6-4ff1-af30-2a5bcf0278e6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 4
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1552651623473_0008, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0008/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job -kill job_1552651623473_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4
2019-04-04 11:03:38,849 Stage-1 map = 0%, reduce = 0%
2019-04-04 11:03:43,157 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec
2019-04-04 11:03:48,331 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 4.05 sec
2019-04-04 11:03:50,399 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 9.22 sec
2019-04-04 11:03:51,435 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.12 sec
MapReduce Total cumulative CPU time: 12 seconds 120 msec
Ended Job = job_1552651623473_0008
Loading data to table stefan.bucketed_table partition (dt=null, country=null)
Loaded : 1/1 partitions.
Time taken to load dynamic partitions: 0.243 seconds
Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 4 Cumulative CPU: 12.12 sec HDFS Read: 21718 HDFS Write: 937 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 120 msec
OK
Time taken: 20.806 seconds
hive> select * from bucketed_table;
OK
8 杨铁心 20190402 china
4 穆念慈 20190402 china
5 东邪 20190402 china
1 郭靖 20190402 china
6 西毒 20190402 china
2 黄蓉 20190402 china
3 杨康 20190402 china
7 黄老邪 20190402 china
Time taken: 0.225 seconds, Fetched: 8 row(s)
数据倾斜表(Skewed Tables)
暂时没接触过也没研究过。
临时表
临时表只在当前会话可见。数据存储在用户的临时目录总,会话结束时删除。
- 临时表不支持分区,不支持创建索引。
- 如果临时表表明和永久表名冲突,则在当前会话中会屏蔽永久表;只有drop或rename临时表表名才能使用永久表。
- 从hive1.1+版本起,临时表可以存储在内存或者SSD中,通过参数hive.exec.temporary.table.storage配置,可选值有memory,ssd,default。
create temporary table ...
小牛试刀
hive> create temporary table temp_table(id int, name string);
OK
Time taken: 6.14 seconds
hive> insert into temp_table select id, name from partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190408171623_2c787f42-e5e5-460d-8a21-90a5a5c9afc3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552651623473_0010, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0010/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job -kill job_1552651623473_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-08 17:16:32,625 Stage-1 map = 0%, reduce = 0%
2019-04-08 17:16:38,994 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.1 sec
MapReduce Total cumulative CPU time: 2 seconds 100 msec
Ended Job = job_1552651623473_0010
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://jms-master-01:9000/tmp/hive-hadoop/hadoop/8dbf32dc-51a3-4caa-b5be-9d682ccdb29c/_tmp_space.db/09e10ccb-5d5f-45ab-b07a-afe5590b2352/.hive-staging_hive_2019-04-08_17-16-23_750_5995661068890136721-1/-ext-10000
Loading data to table default.temp_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.1 sec HDFS Read: 4605 HDFS Write: 155 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 100 msec
OK
Time taken: 16.73 seconds
hive> select * from temp_table;
OK
1 郭靖
2 黄蓉
3 杨康
4 穆念慈
5 东邪
6 西毒
7 黄老邪
8 杨铁心
Time taken: 0.161 seconds, Fetched: 8 row(s)
临时表插入数据不支持insert overwrite语法。
hive> insert overwrite temp_table select id, name from partition_table;
NoViableAltException(24@[])
at org.apache.hadoop.hive.ql.parse.HiveParser.destination(HiveParser.java:38762)
at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:38531)
at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36478)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:35710)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2284)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1333)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:208)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
FAILED: ParseException line 1:17 cannot recognize input near 'temp_table' 'select' 'id' in destination specification
退出当前会话,临时表会被删除。
hive> quit;
[hadoop@jms-master-01 ~]$ hive
which: no hbase in (/usr/share/svensudo/bin:/home/hadoop/tools/scala-2.12.8/bin:/home/hadoop/tools/spark-2.4.0-bin-hadoop2.7/bin:/home/hadoop/tools/java/jdk1.8.0_191/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/go/bin/:/alc/tool/bin:/alc/tool/bin:/home/hadoop/tools/hadoop-2.7.7/bin:/home/hadoop/tools/apache-hive-2.3.4-bin/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/tools/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/tools/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/home/hadoop/tools/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> select * from temp_table;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'temp_table'
事务表(Transactional Tables)
从Hive1.4.0+版本开始支持事务表。Hive提供了支持ACID语义的事务表,可以完成增删改操作。
关于事务表,展开又是一个专题了。可以参考官方事务表文档
主键约束
从Hive2.1.0版本开始,支持约束。Hive支持未经验证的主键和外键约束,注意是未经验证的,所以主键的正确性需要上游数据来保证。
删除表
drop table [if exists] table_name [purge];
通常情况下,删除托管表,hive会删除元数据信息和数据文件,删除文件相当于执行了hadoop fs -rm
,也就是说数据文件会被移动到hdfs系统下是trash回收站中,在误操作的情况下,在一定时效期间还可以找回数据。如果指定了purge选项,则不会移动到回收站下,而是直接永久删除。所以drop动作一定要谨慎。
Truncate表
truncate table table_name [partition partition_spec]
这里partition_spec指(partition_column=partition_col_value,partition_column=partition_col_value, ...)
相信如果对关系型数据库sql语言比较熟悉的话,以上操作都可以理解。
Alter table/partition/column
Alter table
重命名表
alter table table_name rename to new_table_name;
hive> show tables;
OK
first_table
Time taken: 5.51 seconds, Fetched: 9 row(s)
hive> alter table first_table rename to second_table;
OK
Time taken: 0.406 seconds
hive> show tables;
OK
second_table
Time taken: 0.05 seconds, Fetched: 9 row(s)
修改表的属性
alter table table_name set tblproperties table_properties;
table_properties:
(property_name = property_value, property_name = property_value, ...)