来几道面试遇到的sql笔试题测测你的水平吧 |大数据分析师面试必备

面试遇到的sql笔试题目,共享给大家,如果是数据分析或者数据开发岗位,对sql一般都需要掌握,以下问题,最好能再hive,mysql或者oracle场景下都 实现下,熟悉巩固各个数据库的语法。

1. A 和B 两张表,A中有B表的外键,B表中有A表中的外键,求A表中,B表中没有的数据行,B表中,A表没有的数据行。

select * from a where not exists (select 1 from b where a.主键=b.外键)
select * from b where not exists (select 1 from a where b.主键=a.外键)

当然,还可以用left outer join 来实现。如果是hive的话,没有not exist语法,可以考虑这种方式。

2. 实现前n的逻辑

员工表Employees,里面有字段 EmployeeID,EmployeeName,
销售表Sales 里面有字段EmployeeID,productID,salesamount。
例如:EmployeeID,productID,salesamount
1 1 100
1 2 80
2 1 39
2 2 50
EmployeeID,EmployeeName
1 Jack
2 Mary
问:查出把所有负责的产品销售50以上的销售员工姓名。

select a.EmployeeName from Employees as a, Sales as b
where a.EmployeeID=b.EmployeeID and not exixts
(select 1 from b where EmployeeID=b.EmployeeID and salesmount<50

3. 求同比,环比,累计

首先,理解三者概念:

  • 环比就是今年第n月与第n-1月或第n+1月比;
  • 同比就是今年第n月与去年第n月比。
  • 累计就是累计同累加,指前几个与后几个的结合。

然后,创建测试数据

CREATE TABLE salaryByMonth
(
 employeeNo varchar2(20),
 yearMonth varchar2(6),
 salary number
) ;
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200805', 500);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200802', 150);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200803', 200);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200804', 300);
insert into SALARYBYMONTH (EMPLOYEENO, YEARMONTH, SALARY)
values (1, '200708', 100);
commit;

求同比,环比等

SELECT EMPLOYEENO
      ,YEARMONTH
      ,SALARY
      ,MIN(SALARY) KEEP(DENSE_RANK FIRST ORDER BY YEARMONTH) OVER(PARTITION BY EMPLOYEENO) FIRST_SALARY -- 基比分析 salary/first_salary 
      ,LAG(SALARY, 1, 0) OVER(PARTITION BY EMPLOYEENO ORDER BY YEARMONTH) AS PREV_SAL          -- 环比分析,与上个月份进行比较 
      ,LAG(SALARY, 12, 0) OVER(PARTITION BY EMPLOYEENO ORDER BY YEARMONTH) AS PREV_12_SAL -- 同比分析,与上个年度相同月份进行比较    
      ,SUM(SALARY) OVER(PARTITION BY EMPLOYEENO, SUBSTR(YEARMONTH, 1, 4) ORDER BY YEARMONTH RANGE UNBOUNDED PRECEDING) LJ --累计值
  FROM SALARYBYMONTH
 ORDER BY EMPLOYEENO
         ,YEARMONTH

在网上找到的一个案例,一个不错的实现方式,主要是用到数字虚拟辅助表的方式,
建测试表,假设是个销售数量表:

create table t (dt date,cnt number);

随便插入些测试数据,最后表的内容如下:
DT CNT


2012-02-09 15
2012-02-21 2
2012-03-23 1
2012-03-23 5
2012-04-23 2
2012-05-12 20
2012-07-01 20
2012-07-21 20

显示环比,如果某个月没有销售,就有点儿麻烦。
  先想个办法,显示本年度的1至12月,有个很有技巧的技术来实现:

SQL> set pagesize 20
SQL> with m as (select '2012-'||lpad(rownum,2,'0') v from dual connect by level<=12) select * from m;

得到如下结果:


---------
2012-01
2012-02
2012-03
2012-04
2012-05
2012-06
2012-07
2012-08
2012-09
2012-10
2012-11
2012-12

接下来就容易了,两个表外连,使用lag统计函数就可以了。



下面展示一些 内联代码片

with m as (
select '2012-'||lpad(rownum,2,'0') v 
from dual 
connect by level<=12
)

select 	m.v "月份"
		,nvl(sum(t.cnt),0) "销售数量"
		,lag(nvl(sum(t.cnt),0),1) over (order by m.v) "上月销量数量"
		,nvl(sum(t.cnt),0)-lag(nvl(sum(t.cnt),0),1) over (order by m.v) "环比增加量"
		,round((case when nvl(sum(t.cnt),0)=0 then null else (nvl(sum(t.cnt),0)-lag(nvl(sum(t.cnt),0),1) over (order by m.v))/nvl(sum(t.cnt),0) end)*100,1) "环比增加比例(%)" 
from m 
left outer join t
on m.v=to_char(t.dt,'yyyy-mm') 
group by m.v 
order by 1

月份 销售数量 上月销量数量 环比增加量 环比增加比例(%)


2012-01 0
2012-02 17 0 17 100
2012-03 6 17 -11 -183.3
2012-04 2 6 -4 -200
2012-05 20 2 18 90
2012-06 0 20 -20
2012-07 40 0 40 100
2012-08 0 40 -40
2012-09 0 0 0
2012-10 0 0 0
2012-11 0 0 0
2012-12 0 0 0

4. 统计连续日期的库存量create view test_v as

一样的,创建测试数据

select date '2015-07-29' as 日期,'AAA' as 物料,'S' as 仓库,100 as 库存金额,50 as 库存数量 from dual union all
select date '2015-07-31' as 日期,'BBB' as 物料,'B' as 仓库,70 as 库存金额,20 as 库存数量 from dual union all
select date '2015-08-01' as 日期,'BBB' as 物料,'B' as 仓库,120 as 库存金额,60 as 库存数量 from dual union all
select date '2015-08-02' as 日期,'BBB' as 物料,'B' as 仓库,90 as 库存金额,40 as 库存数量 from dual union all
select date '2015-08-06' as 日期,'AAA' as 物料,'S' as 仓库,40 as 库存金额,10 as 库存数量 from dual union all
select date '2015-08-07' as 日期,'CCC' as 物料,'W' as 仓库,70 as 库存金额,20 as 库存数量 from dual union all
select date '2015-08-10' as 日期,'BBB' as 物料,'S' as 仓库,88 as 库存金额,22 as 库存数量 from dual
;
WITH x0 AS (
	SELECT DATE '2015-07-29' AS d FROM dual),
x1 AS (
	SELECT d+LEVEL-1 AS d1 FROM x0 CONNECT BY LEVEL<31
)
SELECT x.d1,
      CASE WHEN v.物料 IS NULL THEN LAG(v.物料 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.物料 END AS 物料,
      case WHEN v.仓库 IS NULL THEN LAG(v.仓库 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.仓库 END AS 仓库,
      case WHEN v.库存金额 IS NULL THEN LAG(v.库存金额 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.库存金额 END AS 库存金额,
      CASE WHEN v.库存数量 IS NULL THEN LAG(v.库存数量 IGNORE NULLS) OVER(ORDER BY x.d1) ELSE v.库存数量 END AS 库存数量
 FROM x1 x,test_v v
where x.d1 = v.日期(+)
ORDER BY x.d1 ASC;
drop view test_v;
select sysdate from dual;
select trunc(sysdate)+rownum-1 from dual connect by rownum <=to_date('2018-12-31','yyyy-mm-dd')-trunc(sysdate);

5. 统计一个cookie,pv数最多的前1/3的天

SELECT   
cookieid,  
createtime,  
pv,  
NTILE(3) OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn   
FROM lxw1234;  
   
--rn = 1 的记录,就是我们想要的结果  

取分组内排序后最后一个值

SELECT cookieid,  
createtime,  
url,  
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,  
LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1,  
FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime DESC) AS last2   
FROM lxw1234   
ORDER BY cookieid,createtime;  

结果显示:

cookieid  createtime            url     rn     last1    last2  
-------------------------------------------------------------  
cookie1 2015-04-10 10:00:00     url1    1       url1    url7  
cookie1 2015-04-10 10:00:02     url2    2       url2    url7  
cookie1 2015-04-10 10:03:04     1url3   3       1url3   url7  
cookie1 2015-04-10 10:10:00     url4    4       url4    url7  
cookie1 2015-04-10 10:50:01     url5    5       url5    url7  
cookie1 2015-04-10 10:50:05     url6    6       url6    url7  
cookie1 2015-04-10 11:00:00     url7    7       url7    url7  
cookie2 2015-04-10 10:00:00     url11   1       url11   url77  
cookie2 2015-04-10 10:00:02     url22   2       url22   url77  
cookie2 2015-04-10 10:03:04     1url33  3       1url33  url77  
cookie2 2015-04-10 10:10:00     url44   4       url44   url77  
cookie2 2015-04-10 10:50:01     url55   5       url55   url77  
cookie2 2015-04-10 10:50:05     url66   6       url66   url77  
cookie2 2015-04-10 11:00:00     url77   7       url77   url77  

select nvl(t.first_name,'total'),sum(t.salary) 
from t_test1 t
group by rollup(t.first_name)

来几道面试遇到的sql笔试题测测你的水平吧 |大数据分析师面试必备_第1张图片

毕业4年,从应届生到BI数据分析师老油条,不定期将过去自己求职积累经验和数据分析学习相关的一些笔记分享给大家,对互联网数据分析、机器学习有兴趣的朋友也可以关注我的工重号:python数据分析和机器学习,专注BI、数据分析和机器学习的学习和实践 .

在这里插入图片描述

你可能感兴趣的:(SQL,数据仓库,数据分析与机器学习)