SQL | 阿里天池-SQL综合练习题-10道经典题目

题目1

数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

请使用A股上市公司季度营收预测数据集《Income Statement.xls》和《Company Operating.xlsx》和《Market Data.xlsx》,以Market Data为主表,将三张表中的TICKER_SYMBOL为600383和600048的信息合并在一起。只需要显示以下字段。

表名 字段名
Income Statement TICKER_SYMBOL
Income Statement END_DATE
Income Statement T_REVENUE
Income Statement T_COGS
Income Statement N_INCOME
Market Data TICKER_SYMBOL
Market Data END_DATE
Market Data CLOSE_PRICE
ICompany Operating TICKER_SYMBOL
ICompany Operating INDIC_NAME_EN
ICompany Operating END_DATE
ICompany Operating VALUE
select i.TICKER_SYMBOL i_TICKER_SYMBOL
		  ,i.END_DATE i_END_DATE
			,i.T_REVENUE
			,i.T_COGS
			,i.N_INCOME
			,m.TICKER_SYMBOL m_TICKER_SYMBOL
			,m.END_DATE m_END_DATE
			,m.CLOSE_PRICE
			,c.TICKER_SYMBOL c_TICKER_SYMBOL
			,c.INDIC_NAME_EN
			,c.END_DATE c_END_DATE
			,c.VALUE
from `market data` m left join `income statement` i on m.TICKER_SYMBOL=i.TICKER_SYMBOL
left join `company operating`c on m.TICKER_SYMBOL=c.TICKER_SYMBOL
where m.TICKER_SYMBOL in (600383,600048);

题目2

数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=44

请使用 Wine Quality Data 数据集《winequality-red.csv》,找出 pH=3.03的所有红葡萄酒,然后,对其 citric acid 进行中式排名(连续排名,换句话说,有并列名次,但名次之间不应该有“间隔”)

select *,dense_rank() over(order by `citric acid`)
from `winequality-red`
where pH=3.03;

题目3

数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information

使用Coupon Usage Data for O2O中的数据集《ccf_offline_stage1_test_revised.csv》,试分别找出在2016年7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。

这里只考虑满减的金额,不考虑打几折的优惠券。

select Merchant_id
from 
	(select *,rank() over(order by sum desc) sum_rk,rank() over(order by cnt desc) cnt_rk
	 from
			(select Merchant_id
					,sum(CONVERT(SUBSTR(Discount_rate,LOCATE(':',Discount_rate)+1,20),signed)) sum
					,count(*) cnt
			 from ccf_offline_stage1_test_revised
			 where Discount_rate >1
			 and DATE_FORMAT(Date_received,'%Y%m')=201607
			 group by Merchant_id
			 )t1 
	)t2 
where sum_rk =1 or cnt_rk=1

官方给的答案:

-- 发放优惠券总⾦额最多的商家
SELECT Merchant_id,
-- SUM(SUBSTRING_INDEX(`Discount_rate`,':', 1)) AS sale_amount,
SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount
FROM ccf_offline_stage1_test_revised
WHERE Date_received BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY Merchant_id

-- 发放优惠券张数最多的商家
SELECT Merchant_id,COUNT(1) AS cnt
FROM ccf_offline_stage1_test_revised
WHERE Date_received BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY Merchant_id
ORDER BY cnt DESC
LIMIT 1;

题目4

数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

请使用A股上市公司季度营收预测中的数据集《Macro&Industry.xlsx》中的sheet-INDIC_DATA,请计算全社会用电量:第一产业:当月值在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?

首先分析数据,数据结构如下:

列名 含义
indic_id id
name_cn 指标名称
FREQUENCY_CD 频率
PERIOD_DATE 时间
DATA_VALUE 数据值

Total Electricity Consumption: Primary Industry 总用电量:第一产业

限制条件:name_nc=Total Electricity Consumption: Primary Industry,时间2015

-- 
select CONCAT(round((t2.value-t.DATA_VALUE)/t.DATA_VALUE*100,2),'%')
from `macro industry` t inner join 
-- 年、月、最大的用电量,id用于连接
(select indic_id,year(PERIOD_DATE) as year,month(PERIOD_DATE) as month,DATA_VALUE as value 
from	(select *,ROW_NUMBER() OVER(order by DATA_VALUE desc) rn 
		from `macro industry`
		where year(PERIOD_DATE)=2015 and name_cn='Total Electricity Consumption: Primary Industry') t1
where rn=1) t2
--id相等,月份相等,年份差1,可以限制为year(t.PERIOD_DATE)=2014
on t.indic_id=t2.indic_id and year(t.PERIOD_DATE)=t2.year-1 and month(t.PERIOD_DATE)=t2.month

如果已知id为’2020101522’

select CONCAT(round((t1.max_value-t.DATA_VALUE)/t.DATA_VALUE*100,2),'%')
from `macro industry` t
		inner join 
		(select PERIOD_DATE,max(DATA_VALUE) as max_value
		from `macro industry`
		where year(PERIOD_DATE)=2015 and indic_id='2020101522'
		group by PERIOD_DATE
		order by 2 desc
		limit 1) t1
on  year(t.PERIOD_DATE)=year(t1.PERIOD_DATE)-1 and month(t.PERIOD_DATE)=month(t1.PERIOD_DATE)
where t.indic_id='2020101522'

官方答案

-- 2015年⽤电最⾼峰是发⽣在哪⽉
SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522'
AND YEAR(PERIOD_DATE) = 2015
GROUP BY PERIOD_DATE
ORDER BY FianlValue DESC
LIMIT 1;

-- 并且相⽐去年同期增⻓/减少了多少个百分⽐?
SELECT BaseData.*,
(BaseData.FianlValue - YoY.FianlValue) / YoY.FianlValue YoY
FROM (SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522'
AND YEAR(PERIOD_DATE) = 2015
GROUP BY PERIOD_DATE
ORDER BY FianlValue DESC
LIMIT 1) BaseData
LEFT JOIN -- YOY
(SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522'
AND YEAR(PERIOD_DATE) = 2014
GROUP BY PERIOD_DATE ) YoY
ON YEAR(BaseData.PERIOD_DATE) = YEAR(YoY.PERIOD_DATE) + 1
AND MONTH(BaseData.PERIOD_DATE) = MONTH(YoY.PERIOD_DATE);

题目5

数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information
使⽤Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》,试统计在2016年6⽉期
间,线上总体优惠券弃⽤率为多少?并找出优惠券弃⽤率最⾼的商家。
弃⽤率 = 被领券但未使⽤的优惠券张数 / 总的被领取优惠券张数
理解数据:

列名 含义
User_id 用户ID
Merchant_id 商户ID
Action 0 点击, 1购买,2领取优惠券
Coupon_id 优惠券ID:null表示无优惠券消费,此时Discount_rate和Date_received字段无意义。“fixed”表示该交易是限时低价活动。
Discount_rate 优惠率:x \in [0,1]代表折扣率;x:y表示满x减y。单位是元
Date_received 领取优惠券日期
Date 消费日期:如果Date=null & Coupon_id != null,该记录表示领取优惠券但没有使用,即负样本;如果Date!=null & Coupon_id = null,则表示普通消费日期;如果Date!=null & Coupon_id != null,则表示用优惠券消费日期,即正样本;
-- 弃⽤率 = 被领券但未使⽤的优惠券张数 / 总的被领取优惠券张数
select sum(case when Date is null and Coupon_id is not null then 1
				else 0 end)
				/
				sum(case when  Coupon_id is not null then 1
				else 0 end) as  discard_rate
from ccf_online_stage1_train
where date_format(Date_received,'%Y%m')=201606

-- 弃用率最高的商家
select Merchant_id,
			sum(case when Date is null and Coupon_id is not null then 1
				else 0 end)
				/
				sum(case when  Coupon_id is not null then 1
				else 0 end) as  discard_rate
from ccf_online_stage1_train
where date_format(Date_received,'%Y%m')=201606
group by Merchant_id
order by discard_rate desc
limit 1-- 如果更严谨
select Merchant_id,discard_rate
from
	(select *,	rank() over( order by discard_rate desc ) as ro
	from 
		(select Merchant_id,
			sum(case when Date is null and Coupon_id is not null then 1
				else 0 end)
				/
				sum(case when  Coupon_id is not null then 1
				else 0 end) as  discard_rate
		from ccf_online_stage1_train
		where date_format(Date_received,'%Y%m')=201606
		group by Merchant_id)t)t2
where ro=1

题目6

数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=44
请使⽤ Wine Quality Data 数据集《winequality-white.csv》,找出 pH=3.63的所有⽩葡萄酒,然后,
对其 residual sugar 量进⾏英式排名(⾮连续的排名,有并列名次)

select *,rank() over(order by `residual sugar`) as rk
from `winequality-white`
where PH=3.63;

题目7

数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

请使用A股上市公司季度营收预测中的数据集《Market Data.xlsx》中的sheet-DATA,

计算截止到2018年底,市值最大的三个行业是哪些?以及这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)

首先理解数据:

列名 含义
SECURITY_ID 安全标识
TICKER_SYMBOL 股票代码
END_DATE 截止时间
CLOSE_PRICE 闭市价格
TURNOVER_VOL 成交量
TURNOVER_VALUE 成交金额
MARKET_VALUE 市值
TYPE_ID 行业类型id
TYPE_NAME_EN 行业类型英文名称
TYPE_NAME_CN 行业类型中文名称

截止到2018年底,数据时间只到2018-05-31,所以取2018-05-31当天

-- 取前三名行业及股票代码
select TYPE_NAME_CN,TICKER_SYMBOL
from 	-- 两表连接,按行业分组,根据公司市值排序
		(select M.TYPE_NAME_CN,M.TICKER_SYMBOL,row_number() over(partition by M.TYPE_NAME_CN order by M.	MARKET_VALUE desc) rn
		from `market data` M
		inner join 
				-- 查询截止2018-05-31市值最大的三个行业
				(select TYPE_NAME_CN ,sum(MARKET_VALUE)
				from `market data`
				where END_DATE='2018-05-31'
				group by TYPE_NAME_CN
				order by sum(MARKET_VALUE) desc
				limit 3) TypeTop3
		on M.TYPE_NAME_CN=TypeTop3.TYPE_NAME_CN
		where M.TYPE_NAME_CN is not null
		and END_DATE='2018-05-31') t
where rn<=3

题目8

数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在2016年6月期间,线上线下累计优惠券使用次数最多的顾客。
理解数据:
表ccf_online_stage1_train

列名 含义
User_id 用户ID
Merchant_id 商户ID
Action 0 点击, 1购买,2领取优惠券
Coupon_id 优惠券ID:null表示无优惠券消费,此时Discount_rate和Date_received字段无意义。“fixed”表示该交易是限时低价活动。
Discount_rate 优惠率:x \in [0,1]代表折扣率;x:y表示满x减y。单位是元
Date_received 领取优惠券日期
Date 消费日期:如果Date=null & Coupon_id != null,该记录表示领取优惠券但没有使用,即负样本;如果Date!=null & Coupon_id = null,则表示普通消费日期;如果Date!=null & Coupon_id != null,则表示用优惠券消费日期,即正样本;

表ccf_offline_stage1_train

列名 含义
User_id 用户ID
Merchant_id 商户ID
Coupon_id 优惠券ID:null表示无优惠券消费,此时Discount_rate和Date_received字段无意义
Discount_rate 优惠率:x \in [0,1]代表折扣率;x:y表示满x减y。单位是元
Distance user经常活动的地点离该merchant的最近门店距离是x*500米(如果是连锁店,则取最近的一家门店),x\in[0,10];null表示无此信息,0表示低于500米,10表示大于5公里;
Date_received 领取优惠券日期
Date 消费日期:如果Date=null & Coupon_id != null,该记录表示领取优惠券但没有使用,即负样本;如果Date!=null & Coupon_id = null,则表示普通消费日期;如果Date!=null & Coupon_id != null,则表示用优惠券消费日期,即正样本;
-- 统计用户线上和线下消费次数
select user_id,count(user_id)
from
	(-- 用户在线上2016年6月份使用优惠券的情况
	select user_id,Date
	from ccf_online_stage1_train
	where date_format(Date,'%Y%m')=201606
	and  Coupon_id is not null
	union all # 合并,此处注意是union all
	-- 用户在线下2016年6月份使用优惠券的情况
	select user_id,Date
	from ccf_offline_stage1_train
	where date_format(Date,'%Y%m')=201606
	and  Coupon_id is not null)t
group by user_id
order by count(user_id) desc
limit 1

题目9

数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074

请使用A股上市公司季度营收预测数据集《Income Statement.xls》中的sheet-General Business和《Company Operating.xlsx》中的sheet-EN。

找出在数据集所有年份中,按季度统计,白云机场旅客吞吐量最高的那一季度对应的净利润是多少?(注意,是单季度对应的净利润,非累计净利润。)

Baiyun Airport:Passenger throughput 白云机场:旅客吞吐量

select t2.amount
from
	-- 计算白云机场旅客吞吐量最高的季度,为1季度
	(select TICKER_SYMBOL,
					YEAR(END_DATE) year,
					QUARTER(END_DATE) Q,
					SUM(VALUE) as tuntu
	from `company operating`
	where INDIC_NAME_EN='Baiyun Airport:Passenger throughput'
	group by TICKER_SYMBOL,year,Q
	order by tuntu desc
	limit 1) t1
inner join	
	-- 计算季度利润
	(select TICKER_SYMBOL,
					YEAR(END_DATE) year,
					QUARTER(END_DATE) Q,
					SUM(N_INCOME) as amount
	from `income statement`
	group by TICKER_SYMBOL,year,Q) t2
on t1.TICKER_SYMBOL=t2.TICKER_SYMBOL and t1.year=t2.year and t1.Q=t2.Q

题目10

数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information
使⽤Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和
《ccf_offline_stage1_train.csv》,试找出在2016年6⽉期间,线上线下累计被使⽤优惠券满减最多的前
3名商家。
⽐如商家A,消费者A在其中使⽤了⼀张200减50的,消费者B使⽤了⼀张30减1的,那么商家A累计被使
⽤优惠券满减51元。

查看Discount_rate列,优惠券形式为A:B,满A减B,如果要对满减金额求和,首先得将B从Discount_rate中分离出来,用到SUBSTRING_INDEX( s,分隔符,n)函数

-- 求每个商家总的满减金额
select Merchant_id,SUM(discount) as discount_amount
from 
		(-- 线上商家优惠卷分离出满减金额
		select Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discount
		from ccf_online_stage1_train 
		where date_format(Date,'%Y%m')=201606
		and Date is not null 
		and Coupon_id is not null
	union all
		-- 线上商家优惠卷分离出满减金额
		select Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discount
		from ccf_offline_stage1_train 
		where date_format(Date,'%Y%m')=201606
		and Date is not null 
		and Coupon_id is not null)t
group by Merchant_id
order by discount_amount desc
limit 3

你可能感兴趣的:(SQL,sql,mysql)