数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074
请使用A股上市公司季度营收预测数据集《Income Statement.xls》和《Company Operating.xlsx》和《Market Data.xlsx》,以Market Data为主表,将三张表中的TICKER_SYMBOL为600383和600048的信息合并在一起。只需要显示以下字段。
表名 | 字段名 |
---|---|
Income Statement | TICKER_SYMBOL |
Income Statement | END_DATE |
Income Statement | T_REVENUE |
Income Statement | T_COGS |
Income Statement | N_INCOME |
Market Data | TICKER_SYMBOL |
Market Data | END_DATE |
Market Data | CLOSE_PRICE |
ICompany Operating | TICKER_SYMBOL |
ICompany Operating | INDIC_NAME_EN |
ICompany Operating | END_DATE |
ICompany Operating | VALUE |
select i.TICKER_SYMBOL i_TICKER_SYMBOL
,i.END_DATE i_END_DATE
,i.T_REVENUE
,i.T_COGS
,i.N_INCOME
,m.TICKER_SYMBOL m_TICKER_SYMBOL
,m.END_DATE m_END_DATE
,m.CLOSE_PRICE
,c.TICKER_SYMBOL c_TICKER_SYMBOL
,c.INDIC_NAME_EN
,c.END_DATE c_END_DATE
,c.VALUE
from `market data` m left join `income statement` i on m.TICKER_SYMBOL=i.TICKER_SYMBOL
left join `company operating`c on m.TICKER_SYMBOL=c.TICKER_SYMBOL
where m.TICKER_SYMBOL in (600383,600048);
数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=44
请使用 Wine Quality Data 数据集《winequality-red.csv》,找出 pH=3.03的所有红葡萄酒,然后,对其 citric acid 进行中式排名(连续排名,换句话说,有并列名次,但名次之间不应该有“间隔”)
select *,dense_rank() over(order by `citric acid`)
from `winequality-red`
where pH=3.03;
数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information
使用Coupon Usage Data for O2O中的数据集《ccf_offline_stage1_test_revised.csv》,试分别找出在2016年7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。
这里只考虑满减的金额,不考虑打几折的优惠券。
select Merchant_id
from
(select *,rank() over(order by sum desc) sum_rk,rank() over(order by cnt desc) cnt_rk
from
(select Merchant_id
,sum(CONVERT(SUBSTR(Discount_rate,LOCATE(':',Discount_rate)+1,20),signed)) sum
,count(*) cnt
from ccf_offline_stage1_test_revised
where Discount_rate >1
and DATE_FORMAT(Date_received,'%Y%m')=201607
group by Merchant_id
)t1
)t2
where sum_rk =1 or cnt_rk=1
官方给的答案:
-- 发放优惠券总⾦额最多的商家
SELECT Merchant_id,
-- SUM(SUBSTRING_INDEX(`Discount_rate`,':', 1)) AS sale_amount,
SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount
FROM ccf_offline_stage1_test_revised
WHERE Date_received BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY Merchant_id
-- 发放优惠券张数最多的商家
SELECT Merchant_id,COUNT(1) AS cnt
FROM ccf_offline_stage1_test_revised
WHERE Date_received BETWEEN '2016-07-01' AND '2016-07-31'
GROUP BY Merchant_id
ORDER BY cnt DESC
LIMIT 1;
数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074
请使用A股上市公司季度营收预测中的数据集《Macro&Industry.xlsx》中的sheet-INDIC_DATA,请计算全社会用电量:第一产业:当月值在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?
首先分析数据,数据结构如下:
列名 | 含义 |
---|---|
indic_id | id |
name_cn | 指标名称 |
FREQUENCY_CD | 频率 |
PERIOD_DATE | 时间 |
DATA_VALUE | 数据值 |
Total Electricity Consumption: Primary Industry 总用电量:第一产业
限制条件:name_nc=Total Electricity Consumption: Primary Industry,时间2015
--
select CONCAT(round((t2.value-t.DATA_VALUE)/t.DATA_VALUE*100,2),'%')
from `macro industry` t inner join
-- 年、月、最大的用电量,id用于连接
(select indic_id,year(PERIOD_DATE) as year,month(PERIOD_DATE) as month,DATA_VALUE as value
from (select *,ROW_NUMBER() OVER(order by DATA_VALUE desc) rn
from `macro industry`
where year(PERIOD_DATE)=2015 and name_cn='Total Electricity Consumption: Primary Industry') t1
where rn=1) t2
--id相等,月份相等,年份差1,可以限制为year(t.PERIOD_DATE)=2014
on t.indic_id=t2.indic_id and year(t.PERIOD_DATE)=t2.year-1 and month(t.PERIOD_DATE)=t2.month
如果已知id为’2020101522’
select CONCAT(round((t1.max_value-t.DATA_VALUE)/t.DATA_VALUE*100,2),'%')
from `macro industry` t
inner join
(select PERIOD_DATE,max(DATA_VALUE) as max_value
from `macro industry`
where year(PERIOD_DATE)=2015 and indic_id='2020101522'
group by PERIOD_DATE
order by 2 desc
limit 1) t1
on year(t.PERIOD_DATE)=year(t1.PERIOD_DATE)-1 and month(t.PERIOD_DATE)=month(t1.PERIOD_DATE)
where t.indic_id='2020101522'
官方答案
-- 2015年⽤电最⾼峰是发⽣在哪⽉
SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522'
AND YEAR(PERIOD_DATE) = 2015
GROUP BY PERIOD_DATE
ORDER BY FianlValue DESC
LIMIT 1;
-- 并且相⽐去年同期增⻓/减少了多少个百分⽐?
SELECT BaseData.*,
(BaseData.FianlValue - YoY.FianlValue) / YoY.FianlValue YoY
FROM (SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522'
AND YEAR(PERIOD_DATE) = 2015
GROUP BY PERIOD_DATE
ORDER BY FianlValue DESC
LIMIT 1) BaseData
LEFT JOIN -- YOY
(SELECT PERIOD_DATE,
MAX(DATA_VALUE) FianlValue
FROM `macro industry`
WHERE INDIC_ID = '2020101522'
AND YEAR(PERIOD_DATE) = 2014
GROUP BY PERIOD_DATE ) YoY
ON YEAR(BaseData.PERIOD_DATE) = YEAR(YoY.PERIOD_DATE) + 1
AND MONTH(BaseData.PERIOD_DATE) = MONTH(YoY.PERIOD_DATE);
数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information
使⽤Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》,试统计在2016年6⽉期
间,线上总体优惠券弃⽤率为多少?并找出优惠券弃⽤率最⾼的商家。
弃⽤率 = 被领券但未使⽤的优惠券张数 / 总的被领取优惠券张数
理解数据:
列名 | 含义 |
---|---|
User_id | 用户ID |
Merchant_id | 商户ID |
Action | 0 点击, 1购买,2领取优惠券 |
Coupon_id | 优惠券ID:null表示无优惠券消费,此时Discount_rate和Date_received字段无意义。“fixed”表示该交易是限时低价活动。 |
Discount_rate | 优惠率:x \in [0,1]代表折扣率;x:y表示满x减y。单位是元 |
Date_received | 领取优惠券日期 |
Date | 消费日期:如果Date=null & Coupon_id != null,该记录表示领取优惠券但没有使用,即负样本;如果Date!=null & Coupon_id = null,则表示普通消费日期;如果Date!=null & Coupon_id != null,则表示用优惠券消费日期,即正样本; |
-- 弃⽤率 = 被领券但未使⽤的优惠券张数 / 总的被领取优惠券张数
select sum(case when Date is null and Coupon_id is not null then 1
else 0 end)
/
sum(case when Coupon_id is not null then 1
else 0 end) as discard_rate
from ccf_online_stage1_train
where date_format(Date_received,'%Y%m')=201606
-- 弃用率最高的商家
select Merchant_id,
sum(case when Date is null and Coupon_id is not null then 1
else 0 end)
/
sum(case when Coupon_id is not null then 1
else 0 end) as discard_rate
from ccf_online_stage1_train
where date_format(Date_received,'%Y%m')=201606
group by Merchant_id
order by discard_rate desc
limit 1;
-- 如果更严谨
select Merchant_id,discard_rate
from
(select *, rank() over( order by discard_rate desc ) as ro
from
(select Merchant_id,
sum(case when Date is null and Coupon_id is not null then 1
else 0 end)
/
sum(case when Coupon_id is not null then 1
else 0 end) as discard_rate
from ccf_online_stage1_train
where date_format(Date_received,'%Y%m')=201606
group by Merchant_id)t)t2
where ro=1
数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=44
请使⽤ Wine Quality Data 数据集《winequality-white.csv》,找出 pH=3.63的所有⽩葡萄酒,然后,
对其 residual sugar 量进⾏英式排名(⾮连续的排名,有并列名次)
select *,rank() over(order by `residual sugar`) as rk
from `winequality-white`
where PH=3.63;
数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074
请使用A股上市公司季度营收预测中的数据集《Market Data.xlsx》中的sheet-DATA,
计算截止到2018年底,市值最大的三个行业是哪些?以及这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)
首先理解数据:
列名 | 含义 |
---|---|
SECURITY_ID | 安全标识 |
TICKER_SYMBOL | 股票代码 |
END_DATE | 截止时间 |
CLOSE_PRICE | 闭市价格 |
TURNOVER_VOL | 成交量 |
TURNOVER_VALUE | 成交金额 |
MARKET_VALUE | 市值 |
TYPE_ID | 行业类型id |
TYPE_NAME_EN | 行业类型英文名称 |
TYPE_NAME_CN | 行业类型中文名称 |
截止到2018年底,数据时间只到2018-05-31,所以取2018-05-31当天
-- 取前三名行业及股票代码
select TYPE_NAME_CN,TICKER_SYMBOL
from -- 两表连接,按行业分组,根据公司市值排序
(select M.TYPE_NAME_CN,M.TICKER_SYMBOL,row_number() over(partition by M.TYPE_NAME_CN order by M. MARKET_VALUE desc) rn
from `market data` M
inner join
-- 查询截止2018-05-31市值最大的三个行业
(select TYPE_NAME_CN ,sum(MARKET_VALUE)
from `market data`
where END_DATE='2018-05-31'
group by TYPE_NAME_CN
order by sum(MARKET_VALUE) desc
limit 3) TypeTop3
on M.TYPE_NAME_CN=TypeTop3.TYPE_NAME_CN
where M.TYPE_NAME_CN is not null
and END_DATE='2018-05-31') t
where rn<=3
数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information
使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在2016年6月期间,线上线下累计优惠券使用次数最多的顾客。
理解数据:
表ccf_online_stage1_train
列名 | 含义 |
---|---|
User_id | 用户ID |
Merchant_id | 商户ID |
Action | 0 点击, 1购买,2领取优惠券 |
Coupon_id | 优惠券ID:null表示无优惠券消费,此时Discount_rate和Date_received字段无意义。“fixed”表示该交易是限时低价活动。 |
Discount_rate | 优惠率:x \in [0,1]代表折扣率;x:y表示满x减y。单位是元 |
Date_received | 领取优惠券日期 |
Date | 消费日期:如果Date=null & Coupon_id != null,该记录表示领取优惠券但没有使用,即负样本;如果Date!=null & Coupon_id = null,则表示普通消费日期;如果Date!=null & Coupon_id != null,则表示用优惠券消费日期,即正样本; |
表ccf_offline_stage1_train
列名 | 含义 |
---|---|
User_id | 用户ID |
Merchant_id | 商户ID |
Coupon_id | 优惠券ID:null表示无优惠券消费,此时Discount_rate和Date_received字段无意义 |
Discount_rate | 优惠率:x \in [0,1]代表折扣率;x:y表示满x减y。单位是元 |
Distance | user经常活动的地点离该merchant的最近门店距离是x*500米(如果是连锁店,则取最近的一家门店),x\in[0,10];null表示无此信息,0表示低于500米,10表示大于5公里; |
Date_received | 领取优惠券日期 |
Date | 消费日期:如果Date=null & Coupon_id != null,该记录表示领取优惠券但没有使用,即负样本;如果Date!=null & Coupon_id = null,则表示普通消费日期;如果Date!=null & Coupon_id != null,则表示用优惠券消费日期,即正样本; |
-- 统计用户线上和线下消费次数
select user_id,count(user_id)
from
(-- 用户在线上2016年6月份使用优惠券的情况
select user_id,Date
from ccf_online_stage1_train
where date_format(Date,'%Y%m')=201606
and Coupon_id is not null
union all # 合并,此处注意是union all
-- 用户在线下2016年6月份使用优惠券的情况
select user_id,Date
from ccf_offline_stage1_train
where date_format(Date,'%Y%m')=201606
and Coupon_id is not null)t
group by user_id
order by count(user_id) desc
limit 1
数据来源:https://tianchi.aliyun.com/dataset/dataDetail?dataId=1074
请使用A股上市公司季度营收预测数据集《Income Statement.xls》中的sheet-General Business和《Company Operating.xlsx》中的sheet-EN。
找出在数据集所有年份中,按季度统计,白云机场旅客吞吐量最高的那一季度对应的净利润是多少?(注意,是单季度对应的净利润,非累计净利润。)
Baiyun Airport:Passenger throughput 白云机场:旅客吞吐量
select t2.amount
from
-- 计算白云机场旅客吞吐量最高的季度,为1季度
(select TICKER_SYMBOL,
YEAR(END_DATE) year,
QUARTER(END_DATE) Q,
SUM(VALUE) as tuntu
from `company operating`
where INDIC_NAME_EN='Baiyun Airport:Passenger throughput'
group by TICKER_SYMBOL,year,Q
order by tuntu desc
limit 1) t1
inner join
-- 计算季度利润
(select TICKER_SYMBOL,
YEAR(END_DATE) year,
QUARTER(END_DATE) Q,
SUM(N_INCOME) as amount
from `income statement`
group by TICKER_SYMBOL,year,Q) t2
on t1.TICKER_SYMBOL=t2.TICKER_SYMBOL and t1.year=t2.year and t1.Q=t2.Q
数据来源:https://tianchi.aliyun.com/competition/entrance/231593/information
使⽤Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和
《ccf_offline_stage1_train.csv》,试找出在2016年6⽉期间,线上线下累计被使⽤优惠券满减最多的前
3名商家。
⽐如商家A,消费者A在其中使⽤了⼀张200减50的,消费者B使⽤了⼀张30减1的,那么商家A累计被使
⽤优惠券满减51元。
查看Discount_rate列,优惠券形式为A:B,满A减B,如果要对满减金额求和,首先得将B从Discount_rate中分离出来,用到SUBSTRING_INDEX( s,分隔符,n)函数
-- 求每个商家总的满减金额
select Merchant_id,SUM(discount) as discount_amount
from
(-- 线上商家优惠卷分离出满减金额
select Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discount
from ccf_online_stage1_train
where date_format(Date,'%Y%m')=201606
and Date is not null
and Coupon_id is not null
union all
-- 线上商家优惠卷分离出满减金额
select Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discount
from ccf_offline_stage1_train
where date_format(Date,'%Y%m')=201606
and Date is not null
and Coupon_id is not null)t
group by Merchant_id
order by discount_amount desc
limit 3