hive使用日常积累

hive 在两个表join时,其实on可以看做一个where条件 

比如下面的这个sql:

select devicename,companyid,groupid,did,loginname,userid,detailtime,hour,intervel,case when hour in(8,9,10,11,12,13,14,15,16,17) and intervel=0 then 2
when hour in(0,1,2,3,4,5,6,7,18,19,20,21,22,23) and intervel>0 then 2 else 1 end as devicestatus from
(select b.devicename as devicename,b.companyid as companyid,b.groupid as groupid,b.did as did,b.loginname,b.userid,
 from_unixtime(int(b.starttime/1000),'yyyy-MM-dd  hh:mm:ss') as detailtime,hour(from_unixtime(int(b.starttime/1000),'yyyy-MM-dd  hh:mm:ss')) as hour,
 (b.startnum-a.startnum) as intervel from 
 day_app_unusual_use_internal a join day_app_unusual_use_internal b on (a.companyid=b.companyid and a.groupid=b.groupid and a.userid=b.userid) 
and a.row_num+1 = b.row_num and a.dt='2017-04-04' and b.dt='2017-04-04') as c

在a join b on的最后 我把两个表的时间限制放在on后面 之前写的时候都是用where先限制再join 这样写sql简单 不过会增大join的数据量 不符合sql优化的原则(先限制 再join)

你可能感兴趣的:(小经验)