数据分析总结

数据分析师技能1-SQL篇

新手学员易犯错:忘用分区、数据多对多、逻辑错误,下面的语句筛选了工作中常用的SQL,基本篇和进阶篇。
1.基础篇

语句 含义 例句
select 查询 select * from table1 limit 10
distinct 去重 select distinct city from table1
where 限制条件 select distinct userid from table1 where city = ‘北京’
if 条件判断 select if(userid is null,1,0) is_real_user from table1 where city = ‘北京’
case when 条件判断 select case when city = ‘北京’ then ‘北漂’ when city = ‘上海’ then ‘沪漂’ else ‘其他漂’ end as piao_type from table1
group by 聚合(与聚合函数,sum、max、min等联合使用) select city,count(*) from table1 group by city
datediff 时间差 select datediff(‘2022-10-24’,‘2022-10-01’)
from_unixtime 时间戳转时间 select from_unixtime(int(time/1000),‘yyyy-MM-dd HH:mm:ss’) time from table1
concat 组合 select concat(‘加油’,‘-’,‘数分’)
and select distinct userid from table1 where city = ‘北京’ and piao_type = ‘土著’
or select distinct userid from table1 where city=‘北京’ or mark=‘外籍华人’
left join 左关联 select a.city,count(a.userid) cnt_fenmu, count(b.userid) cnt_fenzi from table1 a left join table2 b on a.userid = b.userid group by a.city
right join 右关联 select b.city,count(a.userid) cnt_fenzi, count(b.userid) cnt_fenmu from table2 a right join table1 b on a.userid = b.userid group by b.city
inner join 内关联 select a.city,count(a.userid) cnt from table1 a inner join table2 b on a.userid = b.userid group by a.city
sum 求和 select sum(money) sum_money from table1
count 计数 select count(distinct userid) usernum from table1
max 最大 select max(rank) max_rank from table1
min 最小 select min(rank) min_rank from table1
avg 平均 select avg(rank) avg_rank from table1
nvl 空值判断,非空返回第一个值,否则第二个 select nvl(city,‘其他’) city_level_new from table1
coalesce 返回参数中第一个非空的值 select coalesce(city) city_level_new from table1
percentile 百分位 select percentile(rank,0.5) zhongwei from table1
array_contains 含某个元素(可用于排序) select case when array_contains((city),‘一线城市’) ‘一线城市’ when array_contains(collect_set(city),‘二线城市’) ‘二线城市’ else ‘其他’ end as city_level from table1

2.进阶篇

语句 含义 例句
排序 row_number() over (partition by ) select city_level,city,row_number() over(partition by city_level order by usernum desc) usernum_rk from table1
上n个 lag(*,n) over (partition by ) select city,lag(date,1) over(partition by city_level order by date ) lag_1 from table1
下n个 lead(*,n) over (partition by ) select city,lead(date,1) over(partition by city_level order by date ) lead_1 from table1
内容替换 lregexp_replace(字段,查找的内容,替换的内容) select regexp_replace(channel,'新东方-东方甄选-董宇辉
组合(实现行转列) collect_set(去重) select concat_ws(‘,’,collect_set(city)) city_set from table1
组合(实现行转列) collect_list(不去重) select concat_ws(‘,’,collect_list(city)) city_list from table1
实现列转行 explode select city_expode from table1 lateral view explode(city_list) city_tmp as city_expode

你可能感兴趣的:(数据分析,hive,数据分析,数据库,sql)