Hive系列第六章
基本上都是针对表的操作了。
DDL: Data Definition Language 数据定义语言
DML: Data Manipulation Language 数据操纵语言
DQL :Data Query Language 数据查询语言
语法结构:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
解释:
1、LOAD DATA:加载数据
2、LOCAL:从本地加载数据到 hive 表;不加local表示从 HDFS 移动(剪切)数据到 hive 表。注意:在分区表中,当使用的是hive3.x版本,那么如果不指定具体的分区,也会从hdfs上面复制一份到默认的分区__HIVE_DEFAULT_PARTITION__中间。这是3.x版本新特性。
3、INPATH: 加载数据的路径
4、'filepath' : 具体的待上传的数据的路径
5、OVERWRITE: 覆盖表中已有数据,否则表示追加
6、INTO TABLE: 加载到哪张表
7、tablename: 具体的表
8、PARTITION: 上传到指定分区
// 查看当前正在使用的库:
select current_database();
// 创建表
create table student(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
// 查看表
show tables;
// 往表中加载数据:
load data local inpath "/home/data/student.txt" into table student;
// 查询数据
select id,name,sex,age,department from student;
1、插入一条数据:
INSERT INTO TABLE tablename VALUES(a,b,c)
示例:
insert into table student (id, name, sex, age, department) values (101,"张三2","M",222,"IT");
2、利用查询语句将结果导入新表:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
示例:
student2表需要先创建
create table student2(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
hive (mydb)> insert into table student2 select id,sex,name,age,department from student where department = 'CS';
hive (mydb)> select * from student2;
OK
student2.id student2.name student2.sex student2.age student2.department
95013 男 李四 21 CS
95014 女 李五 19 CS
95012 女 李十 20 CS
95010 男 王一 19 CS
95006 男 王三 23 CS
95008 女 王五 18 CS
95001 男 王九 20 CS
Time taken: 0.097 seconds, Fetched: 7 row(s)
insert into 插入多次会多次在后面追加数据。
INSERT OVERWRITE 插入数据会覆盖原来的表里面的所有的数据,不仅仅是重复的那几行,是所有都覆盖。
3、多重插入
FROM from_statement
INSERT OVERWRITE TABLE table_name1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
INSERT OVERWRITE TABLE table_name2 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement2] ...
示例(上面2和3一起):
创建分区表:
create table student_ptn (id int, name string, sex string, age int) partitioned by (department string) row format delimited fields terminated by ",";
单重插入:
insert into table student_ptn partition (department = 'IS') select id,sex,name,age from student where department = 'IS';
insert into table student_ptn partition (department = 'CS') select id,sex,name,age from student where department = 'CS';
insert into table student_ptn partition (department = 'MA') select id,sex,name,age from student where department = 'MA';
多重插入:
from student
insert into table student_ptn partition (department = 'IS') select id,sex,name,age where department = 'IS'
insert into table student_ptn partition (department = 'CS') select id,sex,name,age where department = 'CS'
insert into table student_ptn partition (department = 'MA') select id,sex,name,age where department = 'MA'
语法结构:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement
示例:
insert overwrite local directory "/home/data/cs_student" select * from student where department = 'CS';
insert overwrite directory "/home/data/cs_student5" select * from student where department = 'CS';
[root@hadoop10 ~]# hive -e 'select * from mydb.student;' > /home/data/student2.txt;
which: no hbase in (.:.:.:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/software/jdk/bin:/software/zk/bin:/software/hadoop/bin:/software/hadoop/sbin:/software/hive/bin:/root/bin)
Hive Session ID = 36ee5fc2-cdfc-4057-86ee-f4dad8fbddbc
Logging initialized using configuration in jar:file:/software/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = fd035b16-f727-490f-beed-c4fb58122c0f
OK
Time taken: 2.273 seconds, Fetched: 23 row(s)
[root@hadoop10 ~]# cd /home/data/
[root@hadoop10 data]# cat student2.txt
hive (mydb)> dfs -get /user/hive/warehouse/mydb.db/student/student.txt /home/data/student3.txt;
hive (mydb)>
声明:
文章中代码及相关语句为自己根据相应理解编写,文章中出现的相关图片为自己实践中的截图和相关技术对应的图片,若有相关异议,请联系删除。感谢。转载请注明出处,感谢。
微博地址: http://weibo.com/luoyepiaoxue2014 点击打开链接