下面我会把我的思路和源码全公布出来,其实也没啥东西,不要失望哦~
官网:https://tianchi.shuju.aliyun.com/competition/introduction.htm?spm=5176.100066.333.8.VU8e4s&raceId=231530
赛题背景
阿里巴巴旗下电商拥有海量的买家和卖家交易场景下的数据。利用数据挖掘技术,我们能对未来的商品需求量进行准确地预测,从而帮助商家自动化很多供应链过程中的决策。这些以大数据驱动的供应链能够帮助商家大幅降低运营成本,提升用户的体验,对整个电商行业的效率提升起到重要作用。这是一个困难但是非常重要的问题。我们希望通过这次的大数据竞赛中得到一些对这个问题的新颖解法,朝智能化的供应链平台方向更加迈进一步。
赛题介绍
高质量的商品需求预测是供应链管理的基础和核心功能。本赛题以历史一年海量买家和卖家的数据为依据,要求参赛者预测某商品在未来二周全国和区域性需求量。选手们需要用数据挖掘技术和方法精准刻画商品需求的变动规律,对未来的全国和区域性需求量进行预测,同时考虑到未来的不确定性对物流成本的影响,做到全局的最优化。更精确的需求预测,能够大大地优化运营成本,降低收货时效,提升整个社会的供应链物流效率。
这里的补多
主要是 商品积压带来的资金成本、仓储成本等。补少
成本主要是商品不足带来的错失销售时机等缺货成本。
过多的供应会带来库存成本的增加,过少的供应又会错失销售时机,所以需求预测是供应链管理非常关键的一环。
简单来说,目标函数
是全国的物流成本最低,变量
是各个仓库的商品数量,约束条件
是补多和补少。
item<-read.csv("item_feature2.csv") colnames(item) = c("date","item_id","cate_id","cate_level_id","brand_id","supplier_id","pv_ipv","pv_uv","cart_ipv","cart_uv","collect_uv","num_gmv","amt_gmv","qty_gmv","unum_gmv","amt_alipay","num_alipay","qty_alipay","unum_alipay","ztc_pv_ipv","tbk_pv_ipv","ss_pv_ipv","jhs_pv_ipv","ztc_pv_uv","tbk_pv_uv","ss_pv_uv","jhs_pv_uv","num_alipay_njhs","amt_alipay_njhs","qty_alipay_njhs","unum_alipay_njhs") cainiaoq<-item[,c("date","item_id","qty_alipay_njhs")] write.table (cainiaoq,file ="cainiaoq.csv", sep =",", row.names = FALSE, col.names =FALSE, quote =FALSE) itemsf<-read.csv("item_store_feature2.csv") colnames(itemsf) = c("date","item_id","store_code","cate_id","cate_level_id","brand_id","supplier_id","pv_ipv","pv_uv","cart_ipv","cart_uv","collect_uv","num_gmv","amt_gmv","qty_gmv","unum_gmv","amt_alipay","num_alipay","qty_alipay","unum_alipay","ztc_pv_ipv","tbk_pv_ipv","ss_pv_ipv","jhs_pv_ipv","ztc_pv_uv","tbk_pv_uv","ss_pv_uv","jhs_pv_uv","num_alipay_njhs","amt_alipay_njhs","qty_alipay_njhs","unum_alipay_njhs") cainiao<-itemsf[,c("date","item_id","store_code","qty_alipay_njhs")] write.table (cainiao,file ="cainiao.csv", sep =",", row.names = FALSE, col.names =FALSE, quote =FALSE)1.2上传到HDFS(略)
guo@drguo:/opt/spark-1.6.1-bin-hadoop2.6/bin$ spark-sql 16/05/15 21:20:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... spark-sql> source /home/guo/1.sql1.sql
drop table if exists cainiao; create external table cainiao(dater bigint, item_id bigint, store_code bigint, qty_alipay_njhs bigint) row format delimited fields terminated by ',' location '/cainiao'; create table predict as select item_id, store_code, sum(qty_alipay_njhs) as target from cainiao where dater>=20141228 and dater<=20150110 group by item_id, store_code; drop table if exists cainiaoq; create external table cainiaoq(dater bigint, item_id bigint, qty_alipay_njhs bigint) row format delimited fields terminated by ',' location '/cainiaoq'; create table predictq as select item_id, "all" as store_code, sum(qty_alipay_njhs) as target from cainiaoq where dater>=20141228 and dater<=20150110 group by item_id;因为spark sql不支持像hive一样的导出表语法,所以我就又用了spark-shell
今天(2016.5.26)我刚想到另一个办法
因为hive如果什么都没配,也会用自带的derby存储,也是在哪启动的就存在哪,所以只要在相同目录下启动,在spark-sql里创建的表,hive里当然也有了!真是机智如我!!!
guo@drguo:/opt/spark-1.6.1-bin-hadoop2.6/bin$ hive hive> show tables; OK cainiao cainiaoq ijcai ijcaitest ijpredict predict predictq Time taken: 2.136 seconds, Fetched: 7 row(s)
guo@drguo:/opt/spark-1.6.1-bin-hadoop2.6/bin$ spark-shell 16/05/15 20:30:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.1 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext.1.5导出数据,它默认存在hdfs上(/user/guo/下),好像只能存为这三种格式,我试了下txt和csv都不行,如果还能存别的格式请告诉我。
scala> sqlContext.sql("select * from predict ").write.format("json").save("predict") scala> sqlContext.sql("select * from predictq ").write.format("json").save("predictq")scala> sqlContext.sql("select * from predict ").write.format("parquet").save("predictp")