hive 排序取中间60%数据

排序过程省略。

实现方式1:利用分位函数实现

SELECT

id

from (

select

id

,percentile(id,0.2) over () as id2

,percentile(id,0.8) over () as id8

from (

select 1 as id union all

select 2 as id union all

select 3 as id union all

select 4 as id union all

select 5 as id union all

select 6 as id union all

select 7 as id union all

select 8 as id union all

select 9 as id union all

select 10 as id union all

select 11 as id union all

select 12 as id

) as a

) as a

where id between id2 and id8

实现方式2:利用ntile桶函数 实现

NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值。将一个有序的数据集划分为多个桶(bucket),并为每行分配一个适当的桶数(切片值,第几个切片,第几个分区等概念)。它可用于将数据划分为相等的小切片,为每一行分配该小切片的数字序号。

NTILE不支持ROWS BETWEEN,比如NTILE(2) OVER(PARTITION BY dept_no ORDER BY salary ROWS BETWEEN 3 PRECEDING - AND CURRENT ROW)。

如果切片不均匀,默认增加第一个切片的分布。

select

id

from (

select

id

,ntile(5) over(order by id asc) as bkt

from (

select 1 as id union all

select 2 as id union all

select 3 as id union all

select 4 as id union all

select 5 as id union all

select 6 as id union all

select 7 as id union all

select 8 as id union all

select 9 as id union all

select 10 as id union all

select 11 as id union all

select 12 as id

) as a

) as a

where bkt in (2,3,4)

你可能感兴趣的:(数据仓库,hive)