Hive中常见的高级查询包括:group by、Order by、join、distribute by、sort by、cluster by、Union all。今天我们来看看order by操作,Order by表示按照某些字段排序,语法如下:
select col,col2... from tableName where condition order by col1,col2 [asc|desc]注意:
(1):order by后面可以有多列进行排序,默认按字典排序。
(2):order by为全局排序。
(3):order by需要reduce操作,且只有一个reduce,无法配置(因为多个reduce无法完成全局排序)。
order by操作会受到如下属性的制约:
set hive.mapred.mode=nonstrict; (default value / 默认值) set hive.mapred.mode=strict;注:如果在strict模式下使用order by语句,那么必须要在语句中加上limit关键字,因为执行order by的时候只能启动单个reduce,如果排序的结果集过大,那么执行时间会非常漫长。
下面我们通过一个示例来深入体会order by的用法:
数据库有一个employees表,数据如下:
hive> select * from employees; OK lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
现在我要按第二列(salary)降序排列:
hive> select * from employees order by salary desc; //执行MapReduce的过程 Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.62 sec HDFS Read: 415 HDFS Write: 245 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 620 msec OK zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love Time taken: 20.484 seconds hive>
hive> set hive.mapred.mode; hive.mapred.mode=nonstrict hive>
hive> set hive.mapred.mode=strict; hive> select * from employees order by salary desc; FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'salary' hive>注:在strict模式下查询必须加上limit关键字。
hive> select * from employees order by salary desc limit 3; FAILED: Error in semantic analysis: No partition predicate found for Alias "employees" Table "employees"注:另外还有一个要注意的是strict模式也会限制分区表的查询,解决方案是必须指定分区
hive> show partitions employees; OK date_time=2015-01-24/type=love Time taken: 0.096 seconds
hive> select * from employees where partition(date_time='2015-01-24',type='love') order by salary desc limit 3; FAILED: Parse Error: line 1:30 cannot recognize input near 'partition' '(' 'date_time' in expression specification hive > select * from employees where date_time='2015-01-24' and type='love' order by salary desc limit 3; //执行MapReduce程序 Total MapReduce CPU Time Spent: 3 seconds 510 msec OK zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love Time taken: 19.861 seconds hive>