hivesql distinct

1.优化distinct

优化前,数据全部放在一个reduce里

select count(distinct ip)
from 
(select id from tablea 
union all 
select id from tableb) ta

优化后,数据先分布到不同的reduce中,再统一

select 
count(*)
from 
(select id 
from 
(select id from from tablea
union all
select id from tableb
) ta group by id) tb

impala 不支持多个distinct,prosto,和hive支持
select count(distinct id),count(distinct name)
from
jt_dw_a.black_test

你可能感兴趣的:(hive)