hive执行的三种方式:
1. 用hive CLI
2. 终端执行hive命令: hive -e hive语句
3. 终端执行hive脚本: hive -f hive脚本
如果需要通过jdbc来连接hive,需要在终端开启hiveserver2服务
nohup hive --service hiveserver2 &
netstat -ntpl | grep 10000 // 查看该进程是否启动
1. 创建内表:
create table customer(
customerID int,
firstName string,
lastName string,
birthday timestamp)
row format delimited fields terminated by ',';
向表中导入linux本地数据:
load data local inpath '/opt/custorm' into table customer;
2. 创建外表:
create external table salaries(
gender string,
age int,
salary double,
zip int)
row format delimited fields terminated by ',';
3. 创建静态分区表:
create table employees(
id int,
name string)
partitioned by (dept string)
row format delimited fields terminated by ',';
load data inpath '/user/root/employees.txt' overwrite into table employees;
4. 创建动态分区表:
参数设置(直接在CLI下输入):
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
创建动态分区表的代码与静态分区表一样,只是多了上面两行设置。
create table student_dynamic(
id int,
name string,
classes string)
partitioned by (class string)
row format delimited fields terminated by '\t';
数据导入动态分区表:
insert overwrite table student_dynamic_partition (class) select *, classes from student;
上面的代码,可以自动将classes中不同类别的数据自动分区
create table stucopy as select id,name,score from student;
1. 从本地导入数据到Hive表
2. 从HDFS导入数据到HIve表(会将HDFS上的文件mv到建表时指定的目录下,而不是copy)
3. 从其他表中查询并导入
4. 多表插入
from emp insert into table empcopy1
select id, name limit 5
insert into table empcopy2
select name,salary where salary > 10000.0;
数据导出到本地:
insert overwrite local directory '/opt/customer_exp'
row format delimited fields terminated by ','
select * from employees limit 5;
数据导出到HDFS:
insert overwrite directory '/user/root/customer_exp' row format delimited fields terminated by ',' select * from employees limit 5;Hive中不允许删除存在表的数据库,需要强制删除:
drop database if exists test cascate;