1)配置查询头信息 在hive-site.xml
hive.cli.print.header
true
hive.cli.print.current.db
true
2)基本查询
-》全表查询
select * from empt;
-》查询指定列
select empt.empno,empt.empname from empt;
-》列别名
select ename name,empno from empt;
-》算数运算符
算数运算符 描述
+ 相加
- 相减
* 相乘
/ 相除
% 取余
& 按位取与
| 按位取或
^ 异或
~ 按位取反 (只能是int类型,bigint,int,smallint,tinyint)
》函数
(1)求行数count
select count(*) from empt;
(2)求最大值max
select max(empt.sal) sal_max from empt;
(3)求最小值
select min(empt.sal) sal_min from empt;
(4)求总和
select sum(empt.sal) sal_sum from empt;
(5)求平均值
select avg(empt.sal) sal_avg from empt;
(6)前两条
select * from empt limit 2;
-》where语句
(1)工资大于1700的员工信息
select * from empt where empt.sal > 1700;
(2)工资小于1800的员工信息
select * from empt where empt.sal < 1800;
(3)查询工资在1500到1800区间的员工信息
select * from empt where empt.sal between 1500 and 1800;
(4)查询有奖金的员工信息
select *from empt where empt.comm is not null;
(5)查询无奖金的员工信息
select * from empt where empt.comm is null;
(6)查询工资是1700和1900的员工信息
select * from empt where empt.sal in(1700,1900);
-》Like
使用like运算选择类似的值 选择条件可以包含字母和数字
(1)查找员工薪水第二位为6的员工信息
select * from empt where empt.sal like ‘_6%’;
_代表一个字符
%代表0个或多个字符
(2)查找员工薪水中包含7的员工信息
select * from empt where empt.sal like '%7%';
-》rlike(加正则表达式)
select * from empt where empt.sal rlike ‘[7]’;
-》分组
(1)Group By语句,计算empt表每个部门的平均工资
select avg(empt.sal) avg_sal,deptno from empt group by deptno;
(2)计算empt每个部门中最高的薪水
select max(empt.sal) max_sal,deptno from empt group by deptno;
(3)求部门平均薪水大于1700的部门
select deptno,avg(sal) avg_sal from empt group by deptno having avg_sal>1700;
注意:having只用于group by分组统计语句,where不能用,会产生解析错误
-》Join操作
(1)等值join
根据员工表和部门表中部门编号相等,查询员工编号、员工名、部门名称
select e.empno,e.ename,d.dept from empt e join dept d on e.deptno=d.deptno;
(2)左外连接 left join
左表比又表多,多的部分用null代替
select e.empno,e.ename,d.dept from empt e left join dept d on e.deptno=d.deptno;
(3)右外连接 right join
select e.empno,e.ename,d.dept from dept d right join empt e on e.deptno= d.deptno;
(4)多表连接查询
查询员工名字、部门名称、员工地址
select e.ename,d.dept,l.loc_name from empt e join dept d on e.deptno=d.deptno join location l on d.loc = l.loc_no;
(5)笛卡尔积
为了避免笛卡尔积采用设置为严格模式
set hive.mapred.mode;
set hive.mapred.mode=strict;
-》排序
(1)全局排序 order by 查询员工信息按照工资升序排列
select * from empt order by sal asc;默认升序,可以补血
select * from empt order by sal desc;降序
(2)查询员工号与员工薪水按照员工二倍工资排序
select empt.empno,empt.sal*2 two2sal from empt order by two2sal;
(3)分区排序
select * from empt distribute by deptno sort by empno desc;
-》分桶
分区表分的是数据的存储路径
分桶针对数据文件
(1)创建分桶表
create table emp_buck(id int,name string)
clustered by(id) into 4 buckets
row format
delimited fields
terminated by '\t';
(2)设置属性
set mapreduce.job.reduces; 默认是-1,不让reduce数决定产生文件的数量
set hive.enforce.bucketing=true;
truncate table emp;清空表中数据
(3)导入数据
insert into table emp_buck select * from emp_b; 注意:分区分的是文件夹 分桶是分的文件
抽样测试
-》自定义函数
之前使用hive自带函数sum/avg/max/min…
三种自定义函数: UDF:一进一出(User-Defined-Function)
UDAF:多进一出 (count、max、min)
UDTF:一进多出
(1)导入hive依赖包 hive/lib下
(2)上传
alt+p
(3)添加到hive中
add jar /root/lower.jar;
(4)关联
create temporary function my_lower as “com.terry.com.Lower”;
(5)使用
select ename,my_lower(ename) lowername from empt;
public class Lower extends UDF {
//大写转换为小写
public String evalute(final String s) {
if(s==null){
return null;
}
return s.toString().toLowerCase();
}
}