在互联网公司实习中使用Hive SQL的一些体会和注意点
SQL——计算次日留存率、三天留存率、7天留存率
问题:
计算用户是否是次留用户
计算每日次日留存率
答案点击上面链接跳转
原数据表:user_login_table表
表字段:用户、登陆日期
有一个订单表,一个商家表,想统计过去30天内每个用户下过单的商家的平均得分。接着问我如果想给所有用户按照下过单的商家评分来分100个桶(就是划成100类)该怎么分
Select pin, ntile(100) over (order by score desc) as zu
From (Select pin,avg(score) as avg_score
From(select pin, 商家 id
From 用户表
where dt >= sysdate(-30))a join 商家表 b on a.商家id = b.商家id
Group by pin)
获取当前薪水第二多的员工的emp_no以及当前薪水
-- 方法一
select s.emp_no, s.salary, e.last_name, e.first_name
from salaries s join employees e
on s.emp_no = e.emp_no
where s.salary = -- 第三步: 将第二高工资作为查询条件
(
select max(salary) -- 第二步: 查出除了原表最高工资以外的最高工资(第二高工资)
from salaries
where salary <
(
select max(salary) -- 第一步: 查出原表最高工资
from salaries
where to_date = '9999-01-01'
)
and to_date = '9999-01-01'
)
and s.to_date = '9999-01-01'
-- 方法二
select s.emp_no, s.salary, e.last_name, e.first_name
from salaries s join employees e
on s.emp_no = e.emp_no
where s.salary =
(
select s1.salary
from salaries s1 join salaries s2 -- 自连接查询
on s1.salary <= s2.salary
group by s1.salary -- 当s1<=s2链接并以s1.salary分组时一个s1会对应多个s2
having count(distinct s2.salary) = 2 -- (去重之后的数量就是对应的名次)
and s1.to_date = '9999-01-01'
and s2.to_date = '9999-01-01'
)
and s.to_date = '9999-01-01'
方法一使用于排序第二的情况,如果第三,第四排序就会显得太臃肿,无限套娃的东西
<4>最差是第几名(二)
SELECT grade
FROM (SELECT grade,
(SELECT SUM(number) FROM class_grade) AS total,
SUM(number) OVER(ORDER BY grade) a,
SUM(number) OVER(ORDER BY grade DESC) b
FROM class_grade) t
WHERE a >= total / 2 AND b>= total/2
ORDER BY grade
这里是求中位数的一种方法,窗口函数排序,然后约束条件就是大于总数的一半,倒排序大于总数的一半,详情见上面超链接
SQL75 考试分数(四)
SELECT job,
floor(( count(*) + 1)/2 ) AS "start",
ceil(( count(*) + 1)/2) AS 'end'
FROM grade
GROUP BY job
ORDER BY job
<6>考试分数(五) 中位数
胖里的日常--------SQL笔面试题:如何求取中位数?
SELECT g.id,g.job,g.score,g.rk1 AS t_rank
FROM
(SELECT id,job,score,
COUNT(*) OVER(PARTITION BY job) AS total,
row_number() over(partition by job order by score) AS rk,
row_number() over(partition by job order by score desc ) AS rk1
FROM grade) AS g
WHERE g.rk >= floor((g.total+1)/2) AND g.rk <= ceil((g.total+1)/2)
ORDER BY g.id
这道题做了很久,原因有以下三点
1 中英文符号,导致出现的bug找了贼久
2 印象中记漏了上面这个floor里面要+1
3 出现这种数相等,位置排名不一样的情况
前面select选取哪里换了一下rk,rk1就可以了
小小总结:上面微信胖里的日常中位数总结得很好
<7>SQL70 牛客每个人最近的登录日期(五)
SELECT ll.date,IFNULL(ROUND(tem.zi_count/ll.mu_count,3),0) AS p
FROM
((SELECT l.date,SUM(IF(l.user_id NOT IN(SELECT DISTINCT user_id FROM login
WHERE date<l.date),1,0)) AS mu_count -- 这里目的是求新用户的,这里卡了很久,我太菜了
FROM login l
GROUP BY date) ll -- 这里求出分母
LEFT JOIN
(SELECT a.date,COUNT(*) AS zi_count
FROM login a
JOIN login b ON a.user_id = b.user_id AND DAY(b.date) - DAY(a.date)=1
GROUP BY a.date) tem
ON tem.date = ll.date) -- 这里是求出分子
ORDER BY ll.date
这是求次日求留存率问题
<8>查找在职员工自入职以来的薪水涨幅情况
# 查询当前工资
SELECT s1.emp_no,(s1.salary - e1.salary) AS growth
FROM
(SELECT emp_no,salary
FROM salaries
WHERE to_date = '9999-01-01') AS s1
# 内连接2次查询结果
JOIN
#查询入职时候的工资
(SELECT s.emp_no,s.salary
FROM salaries AS s
JOIN employees AS e
ON s.emp_no = e.emp_no AND s.from_date = e.hire_date) AS e1
ON s1.emp_no =e1.emp_no
ORDER BY growth
SQL思路分享:如何判断用户连续7天登录
SELECT b.user_id,
FROM
(SELECT user_id,COUNT(date_rank) OVER(PARTITION BY user_id)AS contiune_day
FROM
(SELECT user_id,login_date,login_date-ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY login_date) AS date_rank
FROM login_table) a
) b
WHERE continue_day = 7
还有login_date-ROW_NUMBER()这种用法,涨见识,我一开始还想在外面减,然后再套一层select from
实践
来源于牛客数据分析面试面经总结,如有侵权,请告知,谢谢哈