一.向表中装载数据（Load）

1．语法

hive> load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据

（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表

（3）inpath:表示加载数据的路径

（4）overwrite:表示覆盖表中已有数据，否则表示追加

（5）into table:表示加载到哪张表

（6）student:表示具体的表

（7）partition:表示上传到指定分区

2．实操案例

（0）创建一张表

hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';

（1）加载本地文件到hive

hive (default)> load data local inpath '/opt/module/datas/student.txt' into table default.student;

（2）加载HDFS文件到hive中

上传文件到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/atguigu/hive;

加载HDFS上数据

hive (default)> load data inpath '/user/atguigu/hive/student.txt' into table default.student;

（3）加载数据覆盖表中已有的数据

上传文件到HDFS

hive (default)> dfs -put /opt/module/datas/student.txt /user/atguigu/hive;

加载数据覆盖表中已有的数据

hive (default)> load data inpath '/user/atguigu/hive/student.txt' overwrite into table default.student;

通过查询语句向表中插入数据（Insert）

1．创建一张分区表

hive (default)> create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';

2．基本插入数据

hive (default)> insert into table student partition(month='201709') values(1,'wangwu');

3．基本模式插入（根据单张表查询结果）

hive (default)> insert overwrite table student partition(month='201708')

 select id, name from student where month='201709';

4．多插入模式（根据多张表查询结果）（有问题，只是查询单表不同分区的）

hive (default)> from student

 insert overwrite table student partition(month='201707')

 select id, name where month='201709'

 insert overwrite table student partition(month='201706')

 select id, name where month='201709';

真的多表查询插入至其他表
将s3，s4表的数据union all成新表new_table 然后插入到s5

from (select * from s3 union all select * from s4)  new_table
insert into table s5
select * ;

查询语句中创建表并加载数据（As Select）

根据查询结果创建表（查询的结果会添加到新创建的表中）

create table if not exists student3

as select id, name from student;

创建表时通过Location指定加载数据路径

1．创建表，并指定在hdfs上的位置

hive (default)> create table if not exists student5(

 id int, name string

 )

 row format delimited fields terminated by '\t'

 location '/user/hive/warehouse/student5';

2．上传数据到hdfs上

hive (default)> dfs -put /opt/module/datas/student.txt

/user/hive/warehouse/student5;

3．查询数据

hive (default)> select * from student5;

Import数据到指定Hive表中

注意：先用export导出后，再将数据导入。

hive (default)> import table student2 partition(month='201709') from

 '/user/hive/warehouse/export/student';

二.数据导出

Insert导出

1．将查询的结果导出到本地

hive (default)> insert overwrite local directory '/opt/module/datas/export/student'

 select * from student;

2．将查询的结果格式化导出到本地

hive(default)>insert overwrite local directory '/opt/module/datas/export/student1'

 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;

3．将查询的结果导出到HDFS上(没有local)

hive (default)> insert overwrite directory '/user/atguigu/student2'

 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

 select * from student;

Hadoop命令导出到本地

hive (default)> dfs -get /user/hive/warehouse/student/month=201709/000000_0

/opt/module/datas/export/student3.txt;

Hive Shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

[atguigu@hadoop102 hive]$ bin/hive -e 'select * from default.student;' >

 /opt/module/datas/export/student4.txt;

Export导出到HDFS上

(defahiveult)> export table default.student to

 '/user/hive/warehouse/export/student';

Sqoop导出

后续..............................。

清除表中数据（Truncate）

注意：Truncate只能删除管理表，不能删除外部表中数据

hive (default)> truncate table student;

三.查询语句语法：

[WITH CommonTableExpression (, CommonTableExpression)*]    (Note: Only available

 starting with Hive 0.13.0)

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

  FROM table_reference

  [WHERE where_condition]

  [GROUP BY col_list]

  [ORDER BY col_list]

  [CLUSTER BY col_list

    | [DISTRIBUTE BY col_list] [SORT BY col_list]

  ]

 [LIMIT number]

基本查询（Select…From）

全表和特定列查询

1．全表查询

hive (default)> select * from emp;

2．选择特定列查询

hive (default)> select empno, ename from emp;

注意：

（1）SQL 语言大小写不敏感。

（2）SQL 可以写在一行或者多行

（3）关键字不能被缩写也不能分行

（4）各子句一般要分行写。

（5）使用缩进提高语句的可读性。

列别名

1．重命名一个列

2．便于计算

3．紧跟列名，也可以在列名和别名之间加入关键字‘AS’

4．案例实操

查询名称和部门

hive (default)> select ename AS name, deptno dn from emp;

having与where不同点

select deptno,sum(sal) from emp where sal>1200 group by deptno having sum(sal)>8500 order by deptno;

#WHERE 语句在 GROUP BY 语句之前；SQL 会在分组之前计算 WHERE 语句。   
#HAVING 语句在 GROUP BY 语句之后；SQL 会在分组之后计算 HAVING 语句。

（1）where针对表中的列发挥作用，查询数据；having针对查询结果中的列发挥作用，筛选数据。
（2）where后面不能写分组函数，而having后面可以使用分组函数。
（3）having只用于group by分组统计语句。

四.其他常用查询函数

空字段赋值

函数说明
NVL：给值为NULL的数据赋值，它的格式是NVL( string1, replace_with)。它的功能是如果string1为NULL，则NVL函数返回replace_with的值，否则返回string1的值，如果两个参数都为NULL ，则返回NULL。
数据准备：采用员工表
查询：如果员工的comm为NULL，则用-1代替

hive (default)> select nvl(comm,-1) from emp;

OK
_c0
20.0
300.0
500.0
-1.0
1400.0
-1.0
-1.0
-1.0
-1.0
0.0
-1.0
-1.0
-1.0
-1.0

查询：如果员工的comm为NULL，则用领导id代替

hive (default)> select nvl(comm,mgr) from emp;
OK
_c0
20.0
300.0
500.0
7839.0
1400.0
7839.0
7839.0
7566.0
NULL
0.0
7788.0
7698.0
7566.0

5.CASE WHEN

数据准备
name dept_id sex
悟空 A 男
大海 A 男
宋宋 B 男
凤姐 A 女
婷姐 B 女
婷婷 B 女

需求
求出不同部门男女各多少人。结果如下：

A     2       1
B     1       2

创建本地emp_sex.txt，导入数据

[atguigu@hadoop102 datas]$ vi emp_sex.txt
悟空  A   男
大海  A   男
宋宋  B   男
凤姐  A   女
婷姐  B   女
婷婷  B   女

创建hive表并导入数据

create table emp_sex(
name string, 
dept_id string, 
sex string) 
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/datas/emp_sex.txt' into table emp_sex;

按需求查询数据

select 
  dept_id,
  sum(case sex when '男' then 1 else 0 end) male_count,
  sum(case sex when '女' then 1 else 0 end) female_count
from 
  emp_sex
group by
  dept_id;

五.排序

全局排序（Order By）

Order By：全局排序，一个Reducer

1．使用 ORDER BY 子句排序

ASC（ascend）: 升序（默认）

DESC（descend）: 降序

2．ORDER BY 子句在SELECT语句的结尾

3．案例实操

（1）查询员工信息按工资升序排列

hive (default)> select * from emp order by sal;

（2）查询员工信息按工资降序排列

hive (default)> select * from emp order by sal desc;

按照别名排序

按照员工薪水的2倍排序

hive (default)> select ename, sal*2 twosal from emp order by twosal;

多个列排序

按照部门和工资升序排序

hive (default)> select ename, deptno, sal from emp order by deptno, sal ;

每个MapReduce内部排序（Sort By）

Sort By：每个Reducer内部进行排序，对全局结果集来说不是排序。

1．设置reduce个数

hive (default)> set mapreduce.job.reduces=3;

2．查看设置reduce个数

hive (default)> set mapreduce.job.reduces;

3．根据部门编号降序查看员工信息

hive (default)> select * from emp sort by empno desc;

4．将查询结果导入到文件中（按照部门编号降序排序）

hive (default)> insert overwrite local directory '/opt/module/datas/sortby-result'

 select * from emp sort by deptno desc;

分区排序（Distribute By）

Distribute By：类似MR中partition，进行分区，结合sort by使用。

注意，Hive要求DISTRIBUTE BY语句要写在SORT BY语句之前。

对于distribute by进行测试，一定要分配多reduce进行处理，否则无法看到distribute by的效果。

案例实操：

  （1）先按照部门编号分区，再按照员工编号降序排序。

hive (default)> set mapreduce.job.reduces=3;

hive (default)> insert overwrite local directory '/opt/module/datas/distribute-result' select * from emp distribute by deptno sort by empno desc;

Cluster By

当distribute by和sorts by字段相同时，可以使用cluster by方式。

cluster by除了具有distribute by的功能外还兼具sort by的功能。但是排序只能是升序排序，不能指定排序规则为ASC或者DESC。

1）以下两种写法等价

hive (default)> select * from emp cluster by deptno;

hive (default)> select * from emp distribute by deptno sort by deptno;

注意：按照部门编号分区，不一定就是固定死的数值，可以是20号和30号部门分到一个分区里面去。

hive使用教程（2）--数据导入导出、查询与排序