zillow数据科学应用探索

zillow http://www.zillow.com/

data science at zillow

zillow 美国一家房地产租赁和销售企业。

Zillow serves the full lifecycle of owning and living in a home: buying, selling, renting, financing, remodeling and more.

Zillow described their 20TB dataset and the technology they use to estimate house values for more than 110 million homes in the US.

数据科学技术

zillow数据科学应用探索_第1张图片

基于R,python语言构建原型和生产环境,还会用到graphlab create构建模型

大数据

homes on zillow 110 million

home attributes 103

double precision 8byte

time series 220months

total 20T工具

R

python

R is used for prototyping work, such as proof-of-concept experiments on subsets of the dataset and also in production, mainly as an interface programming layer. The production computations avail of C++ technology. Zillow referenced proprietary R packages which they have developed in-house. One such package is ZPL(实现R并行计算), which provides a function similar to MapReduce. Both SQLserver and SQLite are used in Zillow.

应用

rent zestimate

zillow数据科学应用探索_第2张图片

计算租赁指数 zillow rent index

calculate raw median rent zestimates

应用平滑过滤

考虑季节性因素

质量控制

计算 房屋价值指数 zillow home value index

zillow地理信息技术

大多数开发在windows上完成

sql server 数据库

75%python,15% R,5% sql server,5% bash 和shell

linux-only database used for blazingly fast in memory and http look up

crawl->walk->run

数据挖掘模型

数据建模,寻找异常点,寻找脏数据,数据库清洗,缺失值插入

python在数据科学中的角色,科学计算中应用不断增长,机器学习算法实现更加容易

zillow常用python包,numpy,pandas,scikit-learn,textmining,

pymssql、pyodbc,sqlite3, graphlab create

使用sklearn构建欺诈检测模型,gbrt算法


总结:

基于大数据,数据科学技术,实现房产业务数据化,房产数据业务化,开发数据产品,进行精准营销。国内的安居客,搜房网等,需要接轨。​​

from:http://workinganalytics.com/zillow-opens-the-kimono-reveals-r-python-and-graphlab-create-underneath/​


 

你可能感兴趣的:(数据挖掘)