sklearn波士顿房价数据集说明

说明

由于某些原因,sklearn官网现在无法下载波士顿房价数据集,所以这里使用的是从网上搜索来的文件,放在项目的同目录下。

内容简介

该数据集包含美国人口普查局收集的美国马萨诸塞州波士顿住房价格的有关信息, 数据集很小,只有506个案例。

数据集都有以下14个属性:

属性 解释 备注
CRIM 城镇人均犯罪率
ZN 占地面积超过25,000平方英尺的住宅用地比例 住宅用地所占比例
INDUS 每个城镇非零售业务的比例 城镇中非商业用地占比例
CHAS Charles River虚拟变量(如果是河道,则为1,否则为0) 查尔斯河虚拟变量,用于回归分析
NOX 一氧化氮浓度(每千万份) 环保指标
RM 每间住宅的平均房间数 每栋住宅房间数
AGE 1940年以前建造的自住单位比例 1940年以前建造的自住单位比例
DIS 波士顿的五个就业中心加权距离 与波士顿的五个就业中心加权距离
RAD 径向高速公路的可达性指数 距离高速公路的便利指数
TAX 每10,000美元的全额物业税率 每一万美元的不动产税率
PTRATIO 城镇的学生与教师比例 城镇中教师学生比例
B 1000(Bk - 0.63)^ 2其中Bk是城镇黑人的比例 城镇中黑人比例
LSTAT 人口状况下降% 房东属于低等收入阶层比例
MEDV 自有住房的中位数报价, 单位1000美元 自住房屋房价中位数
from sklearn.datasets import load_boston
 
boston = load_boston()
print(boston.DESCR) 
 
 
 
Boston House Prices dataset
===========================
 
Notes
------
Data Set Characteristics:  
 
    :Number of Instances: 506 
 
    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target
 
    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's
    :Missing Attribute Values: None
    :Creator: Harrison, D. and Rubinfeld, D.L.
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.
The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
**References**
   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)

使用

X = boston.data
y = boston.target

你可能感兴趣的:(sklearn,人工智能,python,机器学习)