数据分析笔面试——SQL

数据分析路程上遇到的一些SQL

1.小红书的一道笔试SQL题:
题目:
100家店铺,purchase表中存有销售记录,统计5月和6月,总gmv中,两个月分别的贡献前50%gmv的店铺名
purchase表的字段:id、dt、seller_id、seller_name、item_id、gmv
理解思路一:筛选5月和6月gmv排名top 50%
select concat(‘2019M’,dt) 月份,seller_name
from
(select dt,seller_name,
ntile(2) over(partition by dt order by gmv desc) r
from
(select month(dt) dt,seller_name,sum(gmv) gmv
from purchase
where month(dt) in(5,6)
group by month(dt),seller_name)t)u
where u.r=1;

理解思路二:筛选在总gmv中累计贡献前50%的店铺,内部排序默认为月gmv降序,否则没有太大的意义,我们关心哪些店铺在总gmv中贡献大,那么他本身的gmv排名也应该靠前
select concat(‘2019M’,dt) 月份,seller_name
from
(select dt,seller_name,
sum(gmv) over(partition by dt) sum1,
sum(gmv) over(partition by dt order by gmv desc) sum2
from
(select month(dt) dt,seller_name,sum(gmv) gmv
from purchase
group by month(dt),seller_name)t)u
where u.sum2/sum1 <=0.5

#2.随机取百分之五十的数据
select top 50 percent * from table2
order by newid();

每组取前百分之五十
select *
from
(select CONVERT(varchar(10),year,120) ‘月份’,amt,
count(1)over(partition by CONVERT(varchar(10),year,120)) ct,
row_number()over(partition by CONVERT(varchar(10),year,120) order by amt desc)rn
from table3 )a
where rn <= round(ct*0.5,0,0)

3.连续三天都大于1.1
select a.*
from table3 a
join table3 b
on a.year = b.year-1
join table3 c
on b.year = c.year-1
where a.amt>1.1 and b.amt>1.1 and c.amt>1.1;

腾讯一面SQL题:
CREATE TABLE employee_tbl (
id int(11) NOT NULL AUTO_INCREMENT COMMENT ‘自增ID’,
name char(10) NOT NULL DEFAULT ‘’,
date datetime NOT NULL,
singin tinyint(4) NOT NULL DEFAULT ‘0’ COMMENT ‘登录次数’,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO employee_tbl VALUES (‘1’, ‘小明’, ‘2016-04-22’, ‘1’), (‘2’, ‘小王’, ‘2016-04-20’, ‘3’), (‘3’, ‘小丽’, ‘2016-04-19’, ‘2’), (‘4’, ‘小王’, ‘2016-04-07’, ‘4’), (‘5’, ‘小明’, ‘2016-04-11’, ‘4’), (‘6’, ‘小明’, ‘2016-04-04’, ‘2’);

mysql> SELECT * FROM employee_tbl;
±—±-------±-----------------±-------+
| id | name | date | singin |
±—±-------±-----------------±-------+
| 1 | 小明 | 2016-04-22 | 1 |
| 2 | 小王 | 2016-04-20 | 3 |
| 3 | 小丽 | 2016-04-19 | 2 |
| 4 | 小王 | 2016-04-07 | 4 |
| 5 | 小明 | 2016-04-11 | 4 |
| 6 | 小明 | 2016-04-04 | 2 |
±—±-------±-----------------±-------+

问题1:统计每个人有多少条记录
问题2:统计每个人的总登录次数,以及所有人总的登录次数
问题3:统计总登录次数大于5的人
问题4:统计出总登录次数最多的人

问题1.
select name ,count(1) as num
from employee_tbl
group by name
问题2.
select t.*,u.total_num
from
(select name,sum(singin) num
from employee_tbl
group by name)t,(select sum(singin) total_num
from employee_tbl)u
问题3.
select name,sum(singin) num
from employee_tbl
group by name
having sum(singin) >5
问题4.
select name
from
(select name,sum(singin) total_num
from employee_tbl
group by name )t
where total_num = (select max(total_num) from (select name,sum(singin) total_num
from employee_tbl
group by name)u)

酷狗SQL
1.三种去重方法
a.distinct
如果有多个字段,则去掉多个字段均重复的值(存在两条完全相同的记录时很适用)
b.group by
如果有多个字段,则去掉多个字段均重复的值
c.有主键的情况下,用distinct无法过滤,可采用
select * from table where id in (select max(id) from table group by[要去重的字段])
d.没有主键的情况下,可以自己创建一个临时表,加入主键,再按c的方法处理
select identity(int,1,1) as id,* into newtable from table
select * from newtable where id in (select max(id) from newtable group by [要去重的字段])
drop table newtable
e.row_number
创建行号,只取一行
注意:row_number 排序时相同的值不会取相同的行号,会递增
rank()排序时如果值相同,则行号相同,序号不连续:如 1 1 3
dense_rank(),排序时如果值相同,则行号相同,序号连续:1 1 2

pdd一面SQL
表:tracking_log
字段:log_id、log_time(ss)、user_id、opr_type(a/b/c/d/e)
1)分天、分oprt_type的用户数
select CONVERT(varchar(10),log_time,120) dt,opr_type,count(1) usr_num
from tracking_log
group by CONVERT(varchar(10),log_time,120) ,opr_type;
2)a→b模式的用户数
方法一:select count(1) a_b_num
from
(select log_id,log_time,user_id,opr_type,row_number()over(partition by user_id order by log_time) rn
from tracking_log )t
left join
(select log_id,log_time,user_id,opr_type,row_number()over(partition by user_id order by log_time) rn
from tracking_log )b
on t.user_id= b.user_id
where t.rn = b.rn-1 and t.opr_type = ‘a’ and b.opr_type=‘b’
方法二:将每个用户的模式串联成字符串,用like筛选出’ab’
select count(1) a_b_num
from
(select user_id,
(select ‘’ + opr_type from tracking_log where user_id = a.user_id for xml path(’’)) opr_type
from tracking_log a group by user_id)u
where replace(opr_type,’ ‘,’’) like ‘%ab%’

Pdd二面sql数据分析笔面试——SQL_第1张图片
1.select a.date,b.cate,count(1),sum(gmv),avg(gmv)
from mall_gmv_1d a
left join mall_cate_1d b
on a.date = b.date and a.mall_id = b.mall_id
where gmv > 0
group by a.date,cate

2.select cate,mall_id,total_gmv
from
(select cate,mall_id,total_gmv,ntile(10)over(partition by cate order by total_gmv desc) rn
from
(select cate,a.mall_id,sum(gmv) total_gmv
from mall_gmv_1d a
left join mall_cate_1d b
on a.mall_id = b.mall_id
where a.date = ‘2019-06-18’
group by cate,a.mall_id )t)u
where rn = 1

IV WOE的一些讲解
https://blog.csdn.net/kevin7658/article/details/50780391

GBDT算法原理与实例
https://blog.csdn.net/zpalyq110/article/details/79527653

分类模型的评价标准
https://www.cnblogs.com/kamekin/p/9788730.html

你可能感兴趣的:(数据分析笔面试——SQL)