hive sql使用总结

Hive设置多个reduce方法:set mapred.reduce.tasks = 2;

(1) order by/distribute by/sort by/cluster by区别

order by #全局排序 
sort by #局部排序,单独reduce中进行排序
distribute by #分桶排序,相同KEY的记录被划分到一个Reduce
cluster by =distribute by+ sort by #分桶排序
cluster by id,name 默认是升序,且不可指定ascdesc
group by #单纯分组,一般和AVG()/COUNT()/MAX()组合

(2)窗口函数
序号函数:row_number() / rank() / dense_rank()
分布函数:percent_rank() / cume_dist()
前后函数:lag() / lead()
头尾函数:first_val() / last_val()
其他函数:nth_value() / nfile()

SELECT id,name,
RANK() OVER(PARTITION BY id ORDER BY dt desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY id ORDER BY dt desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt DESC) AS rn3 
FROM table_test;

你可能感兴趣的:(hive)