目录
数据准备:
分组实操练习:
Join连接实操练习:
多表的连接实操练习:
1,创建emp表和dept表,并向其中导入如下的数据:
create database db_select;
use db_select;
create table if not exists dept(
deptno int,
dname string,
loc int
)
row format delimited
fields terminated by '\t';
create table if not exists emp(
empno int,
ename string,
job string,
mgr int,
hiredata string,
sal double,
comm double,
deptno int
)
row format delimited
fields delimited by '\t';
0: jdbc:hive2://hadoop108:10000> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| dept |
| emp |
| location |
+-----------+--+
[isea@hadoop108 datas]$ vim dept.txt
10 ACCOUNTING 1700
20 RESEARCH 1800
30 SALES 1900
40 OPERATIONS 1700
[isea@hadoop108 datas]$ vim emp.txt
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7900 JAMES CLERK 7698 1981-12-3 950.00 30
7902 FORD ANALYST 7566 1981-12-3 3000.00 20
7934 MILLER CLERK 7782 1982-1-23 1300.00 10
[isea@hadoop108 datas]$ vim location.txt
1700 Beijing
1800 London
1900 Tokyo
load data local inpath '/opt/module/datas/dept.txt' into table dept;
load data local inpath '/opt/module/datas/emp.txt' into table emp;
load data local inpath '/opt/module/datas/location.txt' into table location;
1,计算每个部门的平均工资:
select deptno,avg(sal) from emp group by deptno;
+---------+---------------------+--+
| deptno | _c1 |
+---------+---------------------+--+
| 10 | 2916.6666666666665 |
| 20 | 2175.0 |
| 30 | 1566.6666666666667 |
+---------+---------------------+--+
2,计算emp每个部门的最高薪水:
select deptno,max(sal) from emp group by deptno;
+---------+---------+--+
| deptno | _c1 |
+---------+---------+--+
| 10 | 5000.0 |
| 20 | 3000.0 |
| 30 | 2850.0 |
+---------+---------+--+
3,求每个部门的平均薪水大于2000的部门:
select deptno,avg(sal) AVG from emp group by deptno having AVG > 2000;
+---------+---------------------+--+
| deptno | avg |
+---------+---------------------+--+
| 10 | 2916.6666666666665 |
| 20 | 2175.0 |
+---------+---------------------+--+
4,根据员工表和部门表中的部门编号相等,查询员工编号、员工名称和部门名称;
select empno,ename,dname from emp left join dept on emp.deptno = dept.deptno;
+--------+---------+-------------+--+
| empno | ename | dname |
+--------+---------+-------------+--+
| 7369 | SMITH | RESEARCH |
| 7499 | ALLEN | SALES |
| 7521 | WARD | SALES |
| 7566 | JONES | RESEARCH |
| 7654 | MARTIN | SALES |
| 7698 | BLAKE | SALES |
| 7782 | CLARK | ACCOUNTING |
| 7788 | SCOTT | RESEARCH |
| 7839 | KING | ACCOUNTING |
| 7844 | TURNER | SALES |
| 7876 | ADAMS | RESEARCH |
| 7900 | JAMES | SALES |
| 7902 | FORD | RESEARCH |
| 7934 | MILLER | ACCOUNTING |
+--------+---------+-------------+--+
合并员工表和部门表:这里使用到了内连接
select e.ename,e.empno,e.sal,e.deptno,d.dname from emp e join dept d on e.deptno = d.deptno;
+----------+----------+---------+-----------+-------------+--+
| e.ename | e.empno | e.sal | e.deptno | d.dname |
+----------+----------+---------+-----------+-------------+--+
| SMITH | 7369 | 800.0 | 20 | RESEARCH |
| ALLEN | 7499 | 1600.0 | 30 | SALES |
| WARD | 7521 | 1250.0 | 30 | SALES |
| JONES | 7566 | 2975.0 | 20 | RESEARCH |
| MARTIN | 7654 | 1250.0 | 30 | SALES |
| BLAKE | 7698 | 2850.0 | 30 | SALES |
| CLARK | 7782 | 2450.0 | 10 | ACCOUNTING |
| SCOTT | 7788 | 3000.0 | 20 | RESEARCH |
| KING | 7839 | 5000.0 | 10 | ACCOUNTING |
| TURNER | 7844 | 1500.0 | 30 | SALES |
| ADAMS | 7876 | 1100.0 | 20 | RESEARCH |
| JAMES | 7900 | 950.0 | 30 | SALES |
| FORD | 7902 | 3000.0 | 20 | RESEARCH |
| MILLER | 7934 | 1300.0 | 10 | ACCOUNTING |
+----------+----------+---------+-----------+-------------+--+
这里如果没一个员工刚来,暂时没有分配部门,将不会显示在查询结果中
左外连接:左表的所有记录都会被保留下来,没有匹配到的,以null显示:
select e.empno,e.ename,e.sal,d.dname from emp e left join dept d where e.deptno = d.deptno;
+----------+----------+---------+-------------+--+
| e.empno | e.ename | e.sal | d.dname |
+----------+----------+---------+-------------+--+
| 7369 | SMITH | 800.0 | RESEARCH |
| 7499 | ALLEN | 1600.0 | SALES |
| 7521 | WARD | 1250.0 | SALES |
| 7566 | JONES | 2975.0 | RESEARCH |
| 7654 | MARTIN | 1250.0 | SALES |
| 7698 | BLAKE | 2850.0 | SALES |
| 7782 | CLARK | 2450.0 | ACCOUNTING |
| 7788 | SCOTT | 3000.0 | RESEARCH |
| 7839 | KING | 5000.0 | ACCOUNTING |
| 7844 | TURNER | 1500.0 | SALES |
| 7876 | ADAMS | 1100.0 | RESEARCH |
| 7900 | JAMES | 950.0 | SALES |
| 7902 | FORD | 3000.0 | RESEARCH |
| 7934 | MILLER | 1300.0 | ACCOUNTING |
+----------+----------+---------+-------------+--+
这里如果没一个员工刚来,暂时没有分配部门,也会显示在查询结果中
右外连接:右表的所有记录都会被保留下来,没有匹配到的,以null显示:
select e.empno,e.ename,e.sal,d.dname from emp e right join dept d where e.deptno = d.deptno;
+----------+----------+---------+-------------+--+
| e.empno | e.ename | e.sal | d.dname |
+----------+----------+---------+-------------+--+
| 7782 | CLARK | 2450.0 | ACCOUNTING |
| 7839 | KING | 5000.0 | ACCOUNTING |
| 7934 | MILLER | 1300.0 | ACCOUNTING |
| 7369 | SMITH | 800.0 | RESEARCH |
| 7566 | JONES | 2975.0 | RESEARCH |
| 7788 | SCOTT | 3000.0 | RESEARCH |
| 7876 | ADAMS | 1100.0 | RESEARCH |
| 7902 | FORD | 3000.0 | RESEARCH |
| 7499 | ALLEN | 1600.0 | SALES |
| 7521 | WARD | 1250.0 | SALES |
| 7654 | MARTIN | 1250.0 | SALES |
| 7698 | BLAKE | 2850.0 | SALES |
| 7844 | TURNER | 1500.0 | SALES |
| 7900 | JAMES | 950.0 | SALES |
+----------+----------+---------+-------------+--+
这里如果有一个刚陈成立的部门,暂时没有员工被调到该部门,那么这个部门仍然会显示在这个部门中
满外连接:左右表中的,没有匹配到的,都会以null显示
select e.empno,e.ename,e.sal,d.dname from emp e full join dept d where e.deptno = d.deptno;
+----------+----------+---------+-------------+--+
| e.empno | e.ename | e.sal | d.dname |
+----------+----------+---------+-------------+--+
| 7934 | MILLER | 1300.0 | ACCOUNTING |
| 7839 | KING | 5000.0 | ACCOUNTING |
| 7782 | CLARK | 2450.0 | ACCOUNTING |
| 7876 | ADAMS | 1100.0 | RESEARCH |
| 7788 | SCOTT | 3000.0 | RESEARCH |
| 7369 | SMITH | 800.0 | RESEARCH |
| 7566 | JONES | 2975.0 | RESEARCH |
| 7902 | FORD | 3000.0 | RESEARCH |
| 7844 | TURNER | 1500.0 | SALES |
| 7499 | ALLEN | 1600.0 | SALES |
| 7698 | BLAKE | 2850.0 | SALES |
| 7654 | MARTIN | 1250.0 | SALES |
| 7521 | WARD | 1250.0 | SALES |
| 7900 | JAMES | 950.0 | SALES |
+----------+----------+---------+-------------+--+
没有部门的员工和没有员工的部门都会显示在这张表中
1,多表的连接查询:
select e.ename,d.dname,l.loc_name
from emp e left join dept d on e.deptno = d.deptno
left join location l on
d.loc = l.loc;
+----------+-------------+-------------+--+
| e.ename | d.dname | l.loc_name |
+----------+-------------+-------------+--+
| SMITH | RESEARCH | London |
| ALLEN | SALES | Tokyo |
| WARD | SALES | Tokyo |
| JONES | RESEARCH | London |
| MARTIN | SALES | Tokyo |
| BLAKE | SALES | Tokyo |
| CLARK | ACCOUNTING | Beijing |
| SCOTT | RESEARCH | London |
| KING | ACCOUNTING | Beijing |
| TURNER | SALES | Tokyo |
| ADAMS | RESEARCH | London |
| JAMES | SALES | Tokyo |
| FORD | RESEARCH | London |
| MILLER | ACCOUNTING | Beijing |
+----------+-------------+-------------+--+
大多数情况下,Hive会对每对JOIN连接对象启动一个MapReduce任务。本例中会首先启动一个MapReduce job对
表e和表d进行连接操作,然后会再启动一个MapReduce job将第一个MapReduce job的输出和表l;进行连接操作