深度之眼Kaggle比赛实战项目记录—3—房价预测(2)【Kaggle:房价预测】第一周第二节赛题思路

任务

学习时长:12/31

任务名称:赛题的思路

任务简介:了解赛题要解决的问题,数据的说明以及介绍,要运用的算法

详细说明:进入比赛界面看到的第一眼的就是赛题的overview,要解决的问题,评估方式采用的是RMSE,接下来要看的就是Data,看数据Data里面的File Descriptions,看一下文件有哪些。再看的就是Data Fields里面的数据有哪些特征,然后就是数据的下载,点Download All 下载全部数据下来。

看到这个数据的第一眼就是SalePrice是房价也是标签,那肯定是用来做回归的了,这个时候就要想一下有哪些算法是可以用来做回归的了,还有看到数据的内容的时候,哪些是这个数值型,非数值的,大概就要想到有哪些的东西要做处理了以及有哪些方法来做处理了

代码下载

链接:https://pan.baidu.com/s/15CVlreLNaTdtJKZvryFH2Q

提取码:l3tv

作业名称(详解):掌握kaggle里每一个比赛里面数据的查看以及下载,并且对下载下来的数据进行发表自己的看法

作业提交形式:数据下载到本地的截图,针对这个赛题的数据发表自己的评论,

打卡内容:图片至少1张、评论至少100字

打卡截止日期:12/31

点我打卡!提交你的作业吧~

打卡

赛题思路

深度之眼Kaggle比赛实战项目记录—3—房价预测(2)【Kaggle:房价预测】第一周第二节赛题思路_第1张图片

数据集的加载:

① 导入 NumPy 和 Pandas 库
② 通过 Pandas 库的 pd.read_csv(读取 csv 文件)

import numpy as np
import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('train.csv') 

数据集的查看:

  • 显示前几行(默认为 5)
    head() 可以显示前几行的数据信息(默认为显示前五行)
train.head()

 	Id 	MSSubClass 	MSZoning 	LotFrontage 	LotArea 	Street 	Alley 	LotShape 	LandContour 	Utilities 	... 	PoolArea 	PoolQC 	Fence 	MiscFeature 	MiscVal 	MoSold 	YrSold 	SaleType 	SaleCondition 	SalePrice
0 	1 	60 	RL 	65.0 	8450 	Pave 	NaN 	Reg 	Lvl 	AllPub 	... 	0 	NaN 	NaN 	NaN 	0 	2 	2008 	WD 	Normal 	208500
1 	2 	20 	RL 	80.0 	9600 	Pave 	NaN 	Reg 	Lvl 	AllPub 	... 	0 	NaN 	NaN 	NaN 	0 	5 	2007 	WD 	Normal 	181500
2 	3 	60 	RL 	68.0 	11250 	Pave 	NaN 	IR1 	Lvl 	AllPub 	... 	0 	NaN 	NaN 	NaN 	0 	9 	2008 	WD 	Normal 	223500
3 	4 	70 	RL 	60.0 	9550 	Pave 	NaN 	IR1 	Lvl 	AllPub 	... 	0 	NaN 	NaN 	NaN 	0 	2 	2006 	WD 	Abnorml 	140000
4 	5 	60 	RL 	84.0 	14260 	Pave 	NaN 	IR1 	Lvl
  • 查看数据整体情况
    info() 显示行数、列数;以及每列数据的具体情况,比如数据个数、是否为空值、数据类型等
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
Id               1460 non-null int64
MSSubClass       1460 non-null int64
MSZoning         1460 non-null object
LotFrontage      1201 non-null float64
LotArea          1460 non-null int64
Street           1460 non-null object
Alley            91 non-null object
LotShape         1460 non-null object
LandContour      1460 non-null object
Utilities        1460 non-null object
LotConfig        1460 non-null object
LandSlope        1460 non-null object
Neighborhood     1460 non-null object
Condition1       1460 non-null object
Condition2       1460 non-null object
BldgType         1460 non-null object
HouseStyle       1460 non-null object
OverallQual      1460 non-null int64
OverallCond      1460 non-null int64
YearBuilt        1460 non-null int64
YearRemodAdd     1460 non-null int64
RoofStyle        1460 non-null object
RoofMatl         1460 non-null object
Exterior1st      1460 non-null object
Exterior2nd      1460 non-null object
MasVnrType       1452 non-null object
MasVnrArea       1452 non-null float64
ExterQual        1460 non-null object
ExterCond        1460 non-null object
Foundation       1460 non-null object
BsmtQual         1423 non-null object
BsmtCond         1423 non-null object
BsmtExposure     1422 non-null object
BsmtFinType1     1423 non-null object
BsmtFinSF1       1460 non-null int64
BsmtFinType2     1422 non-null object
BsmtFinSF2       1460 non-null int64
BsmtUnfSF        1460 non-null int64
TotalBsmtSF      1460 non-null int64
Heating          1460 non-null object
HeatingQC        1460 non-null object
CentralAir       1460 non-null object
Electrical       1459 non-null object
1stFlrSF         1460 non-null int64
2ndFlrSF         1460 non-null int64
LowQualFinSF     1460 non-null int64
GrLivArea        1460 non-null int64
BsmtFullBath     1460 non-null int64
BsmtHalfBath     1460 non-null int64
FullBath         1460 non-null int64
HalfBath         1460 non-null int64
BedroomAbvGr     1460 non-null int64
KitchenAbvGr     1460 non-null int64
KitchenQual      1460 non-null object
TotRmsAbvGrd     1460 non-null int64
Functional       1460 non-null object
Fireplaces       1460 non-null int64
FireplaceQu      770 non-null object
GarageType       1379 non-null object
GarageYrBlt      1379 non-null float64
GarageFinish     1379 non-null object
GarageCars       1460 non-null int64
GarageArea       1460 non-null int64
GarageQual       1379 non-null object
GarageCond       1379 non-null object
PavedDrive       1460 non-null object
WoodDeckSF       1460 non-null int64
OpenPorchSF      1460 non-null int64
EnclosedPorch    1460 non-null int64
3SsnPorch        1460 non-null int64
ScreenPorch      1460 non-null int64
PoolArea         1460 non-null int64
PoolQC           7 non-null object
Fence            281 non-null object
MiscFeature      54 non-null object
MiscVal          1460 non-null int64
MoSold           1460 non-null int64
YrSold           1460 non-null int64
SaleType         1460 non-null object
SaleCondition    1460 non-null object
SalePrice        1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
  • 查看数据的统计信息
    describe() 可以查看数据的平均值、std(标准差)、min、max 以及中位数等统计信息。
train.describe()

 	Id 	MSSubClass 	LotFrontage 	LotArea 	OverallQual 	OverallCond 	YearBuilt 	YearRemodAdd 	MasVnrArea 	BsmtFinSF1 	... 	WoodDeckSF 	OpenPorchSF 	EnclosedPorch 	3SsnPorch 	ScreenPorch 	PoolArea 	MiscVal 	MoSold 	YrSold 	SalePrice
count 	1460.000000 	1460.000000 	1201.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1452.000000 	1460.000000 	... 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000 	1460.000000
mean 	730.500000 	56.897260 	70.049958 	10516.828082 	6.099315 	5.575342 	1971.267808 	1984.865753 	103.685262 	443.639726 	... 	94.244521 	46.660274 	21.954110 	3.409589 	15.060959 	2.758904 	43.489041 	6.321918 	2007.815753 	180921.195890
std 	421.610009 	42.300571 	24.284752 	9981.264932 	1.382997 	1.112799 	30.202904 	20.645407 	181.066207 	456.098091 	... 	125.338794 	66.256028 	61.119149 	29.317331 	55.757415 	40.177307 	496.123024 	2.703626 	1.328095 	79442.502883
min 	1.000000 	20.000000 	21.000000 	1300.000000 	1.000000 	1.000000 	1872.000000 	1950.000000 	0.000000 	0.000000 	... 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	1.000000 	2006.000000 	34900.000000
25% 	365.750000 	20.000000 	59.000000 	7553.500000 	5.000000 	5.000000 	1954.000000 	1967.000000 	0.000000 	0.000000 	... 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	5.000000 	2007.000000 	129975.000000
50% 	730.500000 	50.000000 	69.000000 	9478.500000 	6.000000 	5.000000 	1973.000000 	1994.000000 	0.000000 	383.500000 	... 	0.000000 	25.000000 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	6.000000 	2008.000000 	163000.000000
75% 	1095.250000 	70.000000 	80.000000 	11601.500000 	7.000000 	6.000000 	2000.000000 	2004.000000 	166.000000 	712.250000 	... 	168.000000 	68.000000 	0.000000 	0.000000 	0.000000 	0.000000 	0.000000 	8.000000 	2009.000000 	214000.000000
max 	1460.000000 	190.000000 	313.000000 	215245.000000 	10.000000 	9.000000 	2010.000000 	2010.000000 	1600.000000 	5644.000000 	... 	857.000000 	547.000000 	552.000000 	508.000000 	480.000000 	738.000000 	15500.000000 	12.000000 	2010.000000 	755000.000000

8 rows × 38 columns
  • 其他查看技巧
    使用 pandas_profiling 工具中的 ProfileReport 可以一键生成数据的探索性分析(EDA)

需使用 pip install pandas_profiling 来进行安装

你可能感兴趣的:(网课学习)