Hive的语法和MySQL大部分都相同
CREATE DATABASE [IF NOT EXISTS] database name [COMMENT database_comment]
[LOCATION hdfs path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
语法:
[IF NOT EXISTS]:判断数据库是否存在
[COMMENT]:注释
[LOCATION]:数据库存储在hdfs的路径,默认为:${hive.metastore.warehouse.dir}/table_name.db
# 创建数据库
create database db_hive;
# 创建数据库,设置表注释
create database db_hive '测试用hive表';
# 创建数据库,指定路径
create database db_hive location '/db_hive';
# 创建数据库,指定kv键值对
create database db_hive with dbproperties('create_user' = 'tom', 'create_date' = '2023-12-05');
SHOW DATABASE [LIKE 'identifier_with_wildcards'];
语法:
[LIKE]:模糊匹配
# 查看所有数据库
show database;
# 查看所有db开头的数据库
show database like 'db*';
DESCRIBE DATABASE [EXTENDED] db_name;
语法:
[EXTENDED]:是否展示更为详细的信息
# 查看数据库
desc database db_hive;
# 查看数据库更多信息
desc database extended db_hive;
# 修改dbproperties
ALTER DATABASE db_name SET DBPROPERTIES (property name=property_value, ...);
# 修改location
ALTER DATABASE db_name SET LOCATION hdfs_path;
# 修改owner user
ALTER DATABASE db_name SET OWNER USER user_name
DROP DATABASE [IF EXISTS] db_name [RESTRICT][CASCADE];
语法:
[EXTENDED]:是否展示更为详细的信息
[RESTRICT]:严格模式,如果数据库不为空,则删除失败,默认为该模式
[CASCADE]:级联模式,如果数据库不为空,则会将数据库中的表一并删除
# 删除空数据库
drop database db_hive;
# 删除非空数据库
drop database db_hive cascade;
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
语法
TEMPORARY:临时表
EXTERNAL:外部表
整型:tinyint、smallint、int、bigint
浮点型:float、double、decimal
字符型:varchar(需要指定最大长度[1,65535])、string(不需要指定最大长度)
布尔型:boolean
时间戳:timestamp
二进制:binary
array:数组类型
map:key-value类型
struct:对象类型
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
分区表的每一个分区都对应数据库中相应分区列的一个索引,但是其组织方式和传统的关系型数据库不同。在Hive中,分区表的每一个分区都对应表下的一个目录,所有的分区的数据都存储在对应的目录中
CLUSTERED BY ... SORTED BY...INTO ... BUCKETS
对指定列进行哈希(hash)计算,然后会根据hash值进行切分数据,将具有不同hash值的数据写到每个桶对应的文件中
ROW FORAMT DELIMITED
[FIELDS TERMINATED BY char]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
[LINES TERMINATED BY char]
[NULL DEFINED AS char]
FIELDS TERMINATED BY:列分隔符
COLLECTION ITEMS TERMINATED BY: map、struct和array中每个元素之间的分隔符
MAP KEYS TERMINATED BY:map中的key与value的分隔符
LINES TERMINATED BY:行分隔符
NULL DEFINED AS:如果数据为null时的占位符,默认为:\n
ROW FORMAT SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,property_name=property_value, ...)]
[STORED AS file_format]
常用的文件格式有,textfile(默认值),sequence file,orc file、parquet file等等
[LOCATION hdfs_path]
指定表所对应的HDFS路径,若不指定路径。
其默认值为:
${hive.metastore.warehouse.dir}/db_name.db/table_name
[TBLPROPERTIES (property_name=property_value, ...)]
用于配置表的一些KV键值对参数
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] table_name
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[LIKE exist_table_name]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
show tables
desc table
desc extended table
desc formatted table
ALTER TABLE table_name RENAME TO new_table_name
只会修改元数据信息,不会修改HDFS文件数据
# 新增列
ALTER TABLE table_name ADD COLUMNS (col_name data_type [COMMENT col_comment], ...)
# 修改列
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
DROP TABLE [IF EXISTS] table;
TRUNCATE [TABLE] table
尚硅谷Hive视频教程