#博学谷IT学习技术支持#
这一次主要介绍Hive的基本查询以及相关函数,Hive的查询以及函数用法与MySQL不尽相同,但是底层运行逻辑完全不同。
select * from score
select sum(sscore) from score; // 319
select avg(sscore) from score; // 79.75
select * from score limit 3;
select * from score limit 2,3; --从索引为2(从0开始)显示,显示3行
select * from score where sscore not in(80,90); -- 成绩不是80或者90
select * from score where not sscore in(80,90);-- 成绩不是80或者90
select * from score where name like '李%'; -- 姓李的
select * from score where name like '%安'; -- 名字最后一个字是 安
select * from score where name like '%安%'; -- 名字中包含 安
select * from score where name like '_安%'; -- 名字第二个字是 安\
-- 分组之后每一组只剩下一条数据,所以select后边只能跟分组字段和聚合函数
select id, sum(sscore) from score group by id;
-- 分组之后的条件筛选是having,不是where
select id, sum(sscore) as total_score from score group by id having total_score > 450
select * from teacher inner join course c on teacher.tid = c.tid;
select * from teacher join course c on teacher.tid = c.tid;
select * from teacher , course where teacher.tid = course.tid;
select * from teacher left join course c on teacher.tid = c.tid;
select * from teacher right join course c on teacher.tid = c.tid;
select * from teacher full join course c on teacher.tid = c.tid;
select * from score order by sscore ; -- 升序排序
select * from score order by sscore desc; -- 降序排序
1)设置reduce个数
set mapreduce.job.reduces=3;
2)查询成绩按照成绩降序排列
select * from score sort by sscore;
3) 将查询结果导入到文件中(按照成绩降序排列)
insert overwrite local directory '/export/data/exporthive/sort' select * from score sort by sscore;
1)设置reduce的个数,将我们对应的sid划分到对应的reduce当中去
set mapreduce.job.reduces=7;
2)通过distribute by进行数据的分区
insert overwrite local directory '/export/data/exporthive/distribute'
select * from score distribute by sid sort by sscore;
当distributed by 和 sort by字段相同时:cluster by 等价于 distributed by + sort by
cluster by id => distributed by id sort by id
当reduce个数 < id的个数时,排序有意义
id有100个 reduce 100个
set mapreduce.job.reduces=2;
insert overwrite local directory '/export/data/exporthive/cluster_by'
select * from score cluster by sid;
Hive的基本查询与Mysql几乎相同,因为Hive底层是MapReduce,所以Hive与Mysql不同的有distributed by和sort by,前者是按照某个表字段进行分区,后者是对每个分区进行排序,其余的知识点不尽相同。