库(文件夹)
表(Excel名称)
字段(Excel第一行,包含字段名,字段数据类型、注释)
分区字段(sheet表,一般是日期,相当于在查询的时候提升速度)(必须限制分区,否则hive会报错)
数据地图(查寻需要的表)
KwaiBI(查询平台)
select[all | distinct] select_expr,…
from
[where]
[group by]
[having]
[order by]
[limit [offset,]rows]
select a+b as 'cnt’
from
where
group by后,必须包含group by的字段,剩余内容为分组的计算结果
select pic, count(1) as cnt
from
where p_date =
having count(1)>1000
count(*) :包括null
count(expr):不包括null
count(DISTINCT expr):去重后行数,不包括null
sum(col)
sum(DISTINCT col):去重求和
avg(col),avg(DISTINCT col):去重求平均
collect_set(col):拼成去重数组
在hive中求出一个数据表中在某天内首次登陆的人;
select a.id
from (select id,collect_set(time) as t from t_action_login where time<='20150906' group by id) as a where size(a.t)=1 and a.t[0]='20150906';
[email protected] | [“20150620”,“20150619”] |
| [email protected] | [“20150816”] |
| [email protected] | [“20150606”,“20150608”,“20150607”,“20150609”,“20150613”,“20150610”,“20150616”,“20150615”] |
collect_list(col):拼成数组
统计20200701北京和成都活跃用户数
法1:group by + count(1)
select
city,p_date,count(1) as cnt
from biao
where
p_date = '20200701' and city in ('北京','成都')
group by city,p_date
法2:更方便常用 case where
select
p_date,
sum(if(city = '北京',1,0)) as 'beijing_user_cnt',
sum(if(city = '成都',1,0)) as 'chengdu_user_cnt'
from biao
<