Hive的几种排序方式

记录Hive的几种常见的排序方式

order by
普通排序,通过order对字段进行降序或者升序
select * from emp order by sal;
sort by
对每一个Reduce的结果进行排序,为了看出效果,我们多设置几个Reduce,查看每个Reduce的结果是否是排序的。
set mapreduce.job.reduces=3;
insert overwrite local directory ‘/opt/moduels/emp_02’ row format delimited fields terminated by ‘\t’ select * from emp sort by sal;
distribute by
distribute by设置分区,根据什么来分区,分区就是让什么字段的什么值放在哪一个分区上,设置分区之后再跟进sal进行排序。
insert overwrite local directory ‘/opt/moduels/emp_03’ row format delimited fields terminated by ‘\t’ select * from emp distribute by deptno sort by sal;
cluster by
cluster by就是distribute by+sort by的组合,但是只能默认升序。
insert overwrite local directory’/opt/moduels/emp_04’ row format delimited fields terminated by ‘\t’ select * from emp cluster by sal;

你可能感兴趣的:(大数据学习)