Kaggle学习 Learn Machine Learning 4.Your First Scikit-Learn Model你的第一个skLearn

4.Your FirstScikit-Learn Model你的第一个skLearn

本文是Kaggle自助学习下的文章,转回到目录点击这里


Choosingthe Prediction Target选择预测目标

You have the code to load your data,and you know how to index it. You are ready to choose which column you want topredict. This column is called the predictiontarget. There is a convention that the prediction target isreferred to as y.Here is an example doing that with the example data.你有加载数据的代码,并且你知道如何对其进行索引。你已经准备好选择要预测的列了,此列称为预测目标(预测结果列)。有一种惯例是将预测目标称为y。下面是一个使用示例数据执行此操作的示例。

 

y = melbourne_data.Price

 

ChoosingPredictors选择预测因子

Next we select the predictors. Sometimes, you will want touse all of the variables except the target..接下来我们选择预测变量。有时候,你要使用除目标之外的所有变量。

It's possible to model with non-numeric variables, butwe'll start with a narrower set of numeric variables. In the example data, thepredictors will be chosen as:有时可能是非数值变量建模,(不失一般性)但我们将从一组数值变量开始。在示例数据中,预测器将选择像这样的:

melbourne_predictors = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 'YearBuilt', 'Lattitude', 'Longtitude']

 

By convention, this data is called X.按照惯例,该数据称为X

   像不像就是变量X通过一些算法得到Y。而这个求这个算法,就叫机器学习。因为这个算法不是我们给的,而是机器自己根据数据学习来的

X = melbourne_data[melbourne_predictors]

 

Building Your Model

      You will use the scikit-learn libraryto create your models. When coding, this library is written as sklearn,as you will see in the sample code. Scikit-learn is easily the most popularlibrary for modeling the types of data typically stored in DataFrames.你将使用scikit-learn库创建你的模型。编码时,你会在示例代码中看到这个库被写成sklearn。 Scikit-learn是建模数据通常存储在DataFrame中的数据类型的流行库。(SKlearn没接触过,接触了回来补。)

The steps to building and using a model are:建立和使用模型的步骤是:

·        Define: What type of model will it be? A decision tree?Some other type of model? Some other parameters of the model type are specifiedtoo.定义:它会是什么类型的模型?决策树?还是一些其他类型的模型?一些模型的其他参数类型也要具体说明。

·        Fit: Capture patterns from provided data. This is theheart of modeling.训练:从提供的数据中捕获模型。这是建模的核心。

·        Predict: Just what it sounds like预测:就像名字一样

·        Evaluate: Determine how accurate the model's predictions are.评估:确定模型预测的准确性。

Here is the example for defining and fitting the model.下面是定义和拟合模型的示例。

from sklearn.tree import DecisionTreeRegressor
 
# Define model
melbourne_model = DecisionTreeRegressor()
 
# Fit model
melbourne_model.fit(X, y)


简单说下干了什么事, 1.引用包 2.定义了一个回归决策树 3.Fit:训练模型,以X为参数,Y为结果来训练。(别把Id作为参数传入哈…ID这一行对于实际是没有意义的)看不清请放大看

 Kaggle学习 Learn Machine Learning 4.Your First Scikit-Learn Model你的第一个skLearn_第1张图片

The output describessome parameters about the type of model you've built. Don't worry about it fornow.输出描述了有关你所建模型的一些参数,现在不要担心。

In practice, you'llwant to make predictions for new houses coming on the market rather than thehouses we already have prices for. But we'll make predictions for the firstrows of the training data to see how the predict function works.在实践中,你会希望预测即将上市的新房子的价格,而不是我们已经知道价格的房子。但是我们将对训练数据的第一行进行预测,来了解预测函数是如何工作的。

print("Making predictions for the following 5 houses:")
print(X.head())
print("The predictions are")
     
print(melbourne_model.predict(X.head()))
 
  

可以看到,完全一样。

 

Your Turn 该你啦!(这个字体像TUM)

   Now it'stime for you to define and fit a model for your data (in your notebook).是时候为你的数据定义和匹配一个模型了(在notebook上)。

1.    Select the target variable,which corresponds to the sales price. Looking at previous commands may help youremember what this column is called. Save this to a new variable called y.选择与销售价格相对应的目标变量(就是找到结果的那一列)。查看前面的命令可能会帮助你记住本那一列叫什么名字。保存到一个名为y的新变量。

2.    Create a list ofthe names of the predictors we will use in the initial model. Use just thefollowing columns in the list (you can copy and paste the whole list to savesome typing, though you'll still need to add quotes):创建我们将在初始模型中使用的自变量值的名称列表。可以只使用列表中的下列(可以复制和粘贴整个列表作为输入,但记得要添加引号哦~):

l  LotArea

l  YearBuilt

l  1stFlrSF

l  2ndFlrSF

l  FullBath

l  BedroomAbvGr

l  TotRmsAbvGrd

3.    Using the list of variablenames you just created, select a new DataFrame of the predictors data. Savethis with the variable name X。使用刚才创建的变量名列表,选择预测数据的作为新的DataFrame。用变量名X保存此文件(一般我们都说赋值到X)

4.    Create a DecisionTreeRegressorModel and save it to a variable(with a name like my_model or iowa_model). Ensure you've done the relevantimport so you can run this command. 创建一个DecisionTreeRegressorModel并将其保存到一个变量中(名字类似于my_Model或iowa_model)。确保你已经完成了相关的导入(记得导入包 from sklearn.treeimport DecisionTreeRegressor),这样你就可以运行这个命令。

5.    Fit the model you have createdusing the data in X and the target data you savedabove.使用X中的数据和上面保存的目标数据匹配你创建的模型。

6.    Make a few predictions with themodel's predict command and print out the predictions.使用模型的预测命令进行一些预测,并打印出预测结果。

 

发现个问题,所有的代码都在前面以截图的方式发出来了。但可能会和他所给的代码不同(因为他用的另外一个数据集,并进行了一些操作。他所给的截图都是以那个数据集来完成的。)所以,希望大家按步骤做完后,再次默写一般他所给的任务。

 

 

Continue 继续。

 

You'vebuilt a decision tree model that can predict the prices of houses based ontheir characteristics. It's natural to ask how accurate the model's predictionswill be, and measuring accuracy is necessary for us to see whether or not otherapproaches improve our model. 你已经建立了一个决策树模型,可以根据其特征预测房屋的价格。模型预测的准确性到底怎么样,测量准确性对于我们来说是必要的,看看是否有其他方法能改进我们的模型。

 

 

Move onto the next page  to see how we measuremodel accuracy. 转到下一页,看看我们如何测量模型的准确性。

 

Q&A

     Q:Soi'm very confused. In my model, my predictions and the housing price head isthe same amount. So to me it doesn't seem like it predicted anything andinstead just copied over the SalePrice. I used iowa_predictors =['Enclosed.Porch', 'Pool.Area', 'Fireplaces', 'TotRms.AbvGrd', 'Full.Bath','Year.Built', 'Overall.Cond', 'Lot.Area']

Any help or clairty would be appreciated.

我很困惑。在我的模型中,我的预测和本来房价是相同的(我们的也是)。所以对我来说,它似乎并没有预测到任何东西,而只是抄袭了SalePrice。我用的是iowa_predictors =['Enclosed.Porch','Pool.Area','Fireplaces','TotRms.AbvGrd','Full.Bath','Year.Built','Overall.Cond','Lot.Area' ]

任何帮助或指点将不胜感激

       AThat's normal here. What's happened is you've trained yourdata on an entire set and then you're having your model predict values on thatsame set. So effectively it's 'learned' perfectly what the sales prices shouldbe because it's already seen them. Typically what happens is you would keepsome data back and use it to test your model. The next lessons has you do that.So nothing weird here!这在这里很正常。你已经在整个集合上训练了你的数据,然后你又在同一组中预测了你的模型的值。如此训练和测试当然是“完全学会”的(跟老师布置的作业同考试题目一样,你一定会考的更好)。通常情况下,你将保留一些数据,并使用它来测试你的模型。下一堂课你要做到这一点。所以这里没什么奇怪的


本文是Kaggle自助学习下的文章,转回到目录点击这里


你可能感兴趣的:(Kaggle)