keras神经网络回归预测_如何使用Keras建立您的第一个神经网络来预测房价

keras神经网络回归预测

by Joseph Lee Wei En

通过李维恩

一步一步的完整的初学者指南，可使用像Deep Learning专业版这样的几行代码来构建您的第一个神经网络！ (A step-by-step complete beginner’s guide to building your first Neural Network in a couple lines of code like a Deep Learning pro!)

Writing your first Neural Network can be done with merely a couple lines of code! In this post, we will be exploring how to use a package called Keras to build our first neural network to predict if house prices are above or below median value. In particular, we will go through the full Deep Learning pipeline, from:

只需几行代码就可以编写您的第一个神经网络！在本文中，我们将探讨如何使用称为Keras的程序包构建我们的第一个神经网络，以预测房价是高于还是低于中位数。特别是，我们将遍历完整的深度学习管道，其中包括：

Exploring and Processing the Data
探索和处理数据
Building and Training our Neural Network
建立和训练我们的神经网络
Visualizing Loss and Accuracy
可视化损失和准确性
Adding Regularization to our Neural Network
为我们的神经网络增加正则化

In just 20 to 30 minutes, you will have coded your own neural network just as a Deep Learning practitioner would have!

在20至30分钟内，您就可以像深度学习从业人员那样对自己的神经网络进行编码！

先决条件： (Pre-requisites:)

This post assumes you’ve got Jupyter notebook set up with an environment that has the packages keras, tensorflow, pandas, scikit-learn and matplotlib installed. If you have not done so, please follow the instructions in the tutorial below:

这篇文章假设您在Jupyter笔记本上安装了keras ， tensorflow ， pandas ， scikit-learn和matplotlib软件包。如果您尚未这样做，请按照以下教程中的说明进行操作：

Getting Started with Python for Deep Learning and Data Science
深度学习和数据科学Python入门

This is a Coding Companion to Intuitive Deep Learning Part 1. As such, we assume that you have some intuitive understanding of neural networks and how they work, including some of the nitty-gritty details, such as what overfitting is and the strategies to address them. If you need a refresher, please read these intuitive introductions:

这是直观深度学习的第1部分。因此，我们假设您对神经网络及其工作原理有一些直观的了解，包括一些实质性的细节，例如什么是过度拟合以及解决策略。他们。如果您需要复习，请阅读以下直观介绍：

Intuitive Deep Learning Part 1a: Introduction to Neural Networks
直观的深度学习第1a部分：神经网络简介
Intuitive Deep Learning Part 1b: Introduction to Neural Networks
直观的深度学习第1b部分：神经网络简介

您需要的资源： (Resources you need:)

The dataset we will use today is adapted from Zillow’s Home Value Prediction Kaggle competition data. We’ve reduced the number of input features and changed the task into predicting whether the house price is above or below median value. Please visit the below link to download the modified dataset below and place it in the same directory as your notebook. The download icon should be on the top right.

我们今天将使用的数据集改编自Zillow的房屋价值预测Kaggle竞争数据。我们减少了输入功能的数量，并将任务更改为预测房价是高于还是低于中位数。请访问下面的链接以下载以下修改后的数据集，并将其放置在与笔记本电脑相同的目录中。下载图标应位于右上角。

Download Dataset

下载数据集

Optionally, you may also download an annotated Jupyter notebook which has all the code covered in this post: Jupyter Notebook.

(可选)您还可以下载带注释的Jupyter笔记本，其中包含本文中涉及的所有代码： Jupyter Notebook 。

Note that to download this notebook from Github, you have to go to the front page and download ZIP to download all the files:

请注意，要从Github下载此笔记本，您必须转到首页并下载ZIP以下载所有文件：

And now, let’s begin!

现在，让我们开始吧！

探索和处理数据 (Exploring and Processing the Data)

Before we code any ML algorithm, the first thing we need to do is to put our data in a format that the algorithm will want. In particular, we need to:

在编写任何ML算法之前，我们需要做的第一件事就是将数据放入该算法所需的格式。特别是，我们需要：

Read in the CSV (comma separated values) file and convert them to arrays. Arrays are a data format that our algorithm can process.
读入CSV(逗号分隔值)文件，并将其转换为数组。数组是我们的算法可以处理的数据格式。
Split our dataset into the input features (which we call x) and the label (which we call y).
将我们的数据集分为输入要素(我们称为x)和标签(我们称为y)。
Scale the data (we call this normalization) so that the input features have similar orders of magnitude.
缩放数据(我们称此为归一化 )，以使输入要素具有相似的数量级。
Split our dataset into the training set, the validation set and the test set. If you need a refresher on why we need these three datasets, please refer to Intuitive Deep Learning Part 1b.
将我们的数据集分为训练集，验证集和测试集。如果您需要进一步了解为什么我们需要这三个数据集，请参阅《直观深度学习第1b部分》。

So let’s begin! From the Getting Started with Python for Deep Learning and Data Science tutorial, you should have downloaded the package pandas to your environment. We will need to tell our notebook that we will use that package by importing it. Type the following code and press Alt-Enter on your keyboard:

让我们开始吧！从“ 深度学习和数据科学Python入门”教程中，您应该已将pandas软件包下载到您的环境中。我们需要告诉我们的笔记本，我们将通过导入来使用该软件包。输入以下代码，然后在键盘上按Alt-Enter：

import pandas as pd

This just means that if I want to refer to code in the package ‘pandas’, I’ll refer to it with the name pd. We then read in the CSV file by running this line of code:

这只是意味着，如果我要引用“ pandas”包中的代码，则将使用名称pd进行引用。然后，我们通过运行以下代码行读取CSV文件：

df = pd.read_csv('housepricedata.csv')

This line of code means that we will read the csv file ‘housepricedata.csv’ (which should be in the same directory as your notebook) and store it in the variable ‘df’. If we want to find out what is in df, simply type df into the grey box and click Alt-Enter:

这行代码意味着我们将读取csv文件“ houselocatedata.csv ”(应与笔记本计算机位于同一目录中)并将其存储在变量“ df”中。如果要查找df中的内容，只需在灰色框中键入df并单击Alt-Enter：

df

Your notebook should look something like this:

您的笔记本应如下所示：

Here, you can explore the data a little. We have our input features in the first ten columns:

在这里，您可以浏览一些数据。我们在前十列中具有输入功能：

Lot Area (in sq ft)
地段面积(平方英尺)
Overall Quality (scale from 1 to 10)
整体质量(从1到10的等级)
Overall Condition (scale from 1 to 10)
总体状况(从1到10的比例)
Total Basement Area (in sq ft)
地下室总面积(平方英尺)
Number of Full Bathrooms
完整浴室数
Number of Half Bathrooms
半浴室数量
Number of Bedrooms above ground
地上卧室数
Total Number of Rooms above ground
地面上的房间总数
Number of Fireplaces
壁炉数量
Garage Area (in sq ft)
车库面积(平方英尺)

In our last column, we have the feature that we would like to predict:

在上一专栏中，我们具有要预测的功能：

Is the house price above the median or not? (1 for yes and 0 for no)
房价是否高于中位数？ (1代表是，0代表否)

Now that we’ve seen what our data looks like, we want to convert it into arrays for our machine to process:

现在我们已经看到了数据的样子，我们希望将其转换为数组以供我们的机器处理：

dataset = df.values

To convert our dataframe into an array, we just store the values of df (by accessing df.values) into the variable ‘dataset’. To see what is inside this variable ‘dataset’, simply type ‘dataset’ into a grey box on your notebook and run the cell (Alt-Enter):

要将数据帧转换为数组，我们只需将df的值(通过访问df.values )存储到变量“ dataset”中。要查看此变量“数据集”的内容，只需在笔记本上的灰色框中键入“数据集”，然后运行单元格(Alt-Enter)：

dataset

As you can see, it is all stored in an array now:

如您所见，它们现在都存储在一个数组中：

We now split our dataset into input features (X) and the feature we wish to predict (Y). To do that split, we simply assign the first 10 columns of our array to a variable called X and the last column of our array to a variable called Y. The code to do the first assignment is this:

现在，我们将数据集分为输入特征(X)和我们希望预测的特征(Y)。要进行拆分，我们只需将数组的前10列分配给一个名为X的变量，并将数组的最后一列分配给一个名为Y的变量。执行第一次分配的代码是这样的：

X = dataset[:,0:10]

This might look a bit weird, but let me explain what’s inside the square brackets. Everything before the comma refers to the rows of the array and everything after the comma refers to the columns of the arrays.

这看起来可能有点怪异，但让我解释一下方括号内的内容。逗号前的所有内容均指数组的行，而逗号后的所有内容均指数组的列。

Since we’re not splitting up the rows, we put ‘:’ before the comma. This means to take all the rows in dataset and put it in X.

由于我们没有拆分行，因此在逗号前加上“：”。这意味着将数据集中的所有行放入X中。

We want to extract out the first 10 columns, and so the ‘0:10’ after the comma means take columns 0 to 9 and put it in X (we don’t include column 10). Our columns start from index 0, so the first 10 columns are really columns 0 to 9.

我们要提取前10列，因此逗号后的“ 0:10”表示将0到9列放入X中(不包括10列)。我们的列从索引0开始，因此前10列实际上是列0至9。

We then assign the last column of our array to Y:

然后，将数组的最后一列分配给Y：

Y = dataset[:,10]

Ok, now we’ve split our dataset into input features (X) and the label of what we want to predict (Y).

好的，现在我们将数据集分为输入要素(X)和要预测的内容的标签(Y)。

The next step in our processing is to make sure that the scale of the input features are similar. Right now, features such as lot area are in the order of the thousands, a score for overall quality is ranged from 1 to 10, and the number of fireplaces tend to be 0, 1 or 2.

我们处理的下一步是确保输入要素的比例相似。目前，诸如地块面积之类的功能大约在数千个范围内，整体质量得分在1到10之间，壁炉的数量往往是0、1或2。

This makes it difficult for the initialization of the neural network, which causes some practical problems. One way to scale the data is to use an existing package from scikit-learn (that we’ve installed in the Getting Started post).

这使得神经网络的初始化变得困难，这引起了一些实际问题。扩展数据的一种方法是使用scikit-learn中的现有软件包(我们已在《入门指南》中安装了该软件包)。

We first have to import the code that we want to use:

我们首先必须导入要使用的代码：

from sklearn import preprocessing

This says I want to use the code in ‘preprocessing’ within the sklearn package. Then, we use a function called the min-max scaler, which scales the dataset so that all the input features lie between 0 and 1 inclusive:

这表示我想在sklearn包中的“预处理”中使用代码。然后，我们使用一个名为min-max缩放器的函数，该函数对数据集进行缩放，以使所有输入要素都位于0到1之间(包括0和1)(包括0和1)：

min_max_scaler = preprocessing.MinMaxScaler()X_scale = min_max_scaler.fit_transform(X)

Note that we chose 0 and 1 intentionally to aid the training of our neural network. We won’t go through the theory behind this. Now, our scaled dataset is stored in the array ‘X_scale’. If you wish to see what ‘X_scale’ looks like, simply run the cell:

请注意，我们故意选择0和1来帮助训练神经网络。我们将不讨论其背后的理论。现在，我们的缩放数据集存储在数组“ X_scale”中。如果您希望查看“ X_scale”的外观，只需运行单元格：

X_scale

Your Jupyter notebook should now look a bit like this:

您的Jupyter笔记本现在应该看起来像这样：

Now, we are down to our last step in processing the data, which is to split our dataset into a training set, a validation set and a test set.

现在，我们进入了处理数据的最后一步，即将数据集分为训练集，验证集和测试集。

We will use the code from scikit-learn called ‘train_test_split’, which as the name suggests, split our dataset into a training set and a test set. We first import the code we need:

我们将使用来自scikit-learn的名为“ train_test_split”的代码，顾名思义，该代码将我们的数据集分为训练集和测试集。我们首先导入所需的代码：

from sklearn.model_selection import train_test_split

Then, split your dataset like this:

然后，像这样拆分数据集：

X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X_scale, Y, test_size=0.3)

This tells scikit-learn that your val_and_test size will be 30% of the overall dataset. The code will store the split data into the first four variables on the left of the equal sign as the variable names suggest.

这告诉scikit-learn您的val_and_test大小将占整个数据集的30％。该代码会将拆分的数据存储到等号左侧的前四个变量中，如变量名所示。

Unfortunately, this function only helps us split our dataset into two. Since we want a separate validation set and test set, we can use the same function to do the split again on val_and_test:

不幸的是，该函数仅帮助我们将数据集一分为二。由于我们需要单独的验证集和测试集，因此可以使用相同的函数在val_and_test上再次进行拆分：

X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)

The code above will split the val_and_test size equally to the validation set and the test set.

上面的代码会将val_and_test大小均等地划分为验证集和测试集。

In summary, we now have a total of six variables for our datasets we will use:

总而言之，我们现在使用的数据集共有六个变量：

X_train (10 input features, 70% of full dataset)
X_train (10个输入要素，占全部数据集的70％)
X_val (10 input features, 15% of full dataset)
X_val (10个输入要素，占全部数据集的15％)
X_test (10 input features, 15% of full dataset)
X_test (10个输入要素，占全部数据集的15％)
Y_train (1 label, 70% of full dataset)
Y_train (1个标签，占整个数据集的70％)
Y_val (1 label, 15% of full dataset)
Y_val (1个标签，占整个数据集的15％)
Y_test (1 label, 15% of full dataset)
Y_test (1个标签，占整个数据集的15％)

If you want to see how the shapes of the arrays are for each of them (i.e. what dimensions they are), simply run

如果要查看每个数组的形状(即它们的尺寸)如何，只需运行

print(X_train.shape, X_val.shape, X_test.shape, Y_train.shape, Y_val.shape, Y_test.shape)

This is how your Jupyter notebook should look like:

这就是Jupyter笔记本的外观：

As you can see, the training set has 1022 data points while the validation and test set has 219 data points each. The X variables have 10 input features, while the Y variables only has one feature to predict.

如您所见，训练集有1022个数据点，而验证和测试集各有219个数据点。 X变量具有10个输入特征，而Y变量仅具有一个预测特征。

And now, our data is finally ready! Phew!

现在，我们的数据终于准备好了！！

Summary: In processing the data, we’ve:

摘要：在处理数据时，我们已经：

Read in the CSV (comma separated values) file and convert them to arrays.
读入CSV(逗号分隔值)文件，并将其转换为数组。
Split our dataset into the input features and the label.
将我们的数据集分为输入要素和标签。
Scale the data so that the input features have similar orders of magnitude.
缩放数据，以使输入要素具有相似的数量级。
Split our dataset into the training set, the validation set and the test set.
将我们的数据集分为训练集，验证集和测试集。

建立和训练我们的第一个神经网络 (Building and Training our First Neural Network)

In Intuitive Deep Learning Part 1a, we said that Machine Learning consists of two steps. The first step is to specify a template (an architecture) and the second step is to find the best numbers from the data to fill in that template. Our code from here on will also follow these two steps.

在直观深度学习第1a部分中，我们说过机器学习包括两个步骤。第一步是指定模板(架构)，第二步是从数据中找到最佳编号以填充该模板。从这里开始，我们的代码还将遵循这两个步骤。

第一步：建立架构 (First Step: Setting up the Architecture)

The first thing we have to do is to set up the architecture. Let’s first think about what kind of neural network architecture we want. Suppose we want this neural network:

我们要做的第一件事就是建立架构。首先考虑一下我们想要哪种神经网络架构。假设我们想要这个神经网络：

In words, we want to have these layers:

换句话说，我们要具有以下层：

Hidden layer 1: 32 neurons, ReLU activation
隐藏层1：32个神经元，ReLU激活
Hidden layer 2: 32 neurons, ReLU activation
隐藏层2：32个神经元，ReLU激活
Output Layer: 1 neuron, Sigmoid activation
输出层：1个神经元，乙状结肠激活

Now, we need to describe this architecture to Keras. We will be using the Sequential model, which means that we merely need to describe the layers above in sequence.

现在，我们需要向Keras描述这种架构。我们将使用顺序模型，这意味着我们只需要按顺序描述以上各层。

First, let’s import the necessary code from Keras:

首先，让我们从Keras导入必要的代码：

from keras.models import Sequentialfrom keras.layers import Dense

Then, we specify that in our Keras sequential model like this:

然后，我们在Keras顺序模型中指定如下：

model = Sequential([    Dense(32, activation='relu', input_shape=(10,)),    Dense(32, activation='relu'),    Dense(1, activation='sigmoid'),])

And just like that, the code snippet above has defined our architecture! The code above can be interpreted like this:

就像这样，上面的代码段定义了我们的体系结构！上面的代码可以这样解释：

model = Sequential([ ... ])

This says that we will store our model in the variable ‘model’, and we’ll describe it sequentially (layer by layer) in between the square brackets.

这表示我们将模型存储在变量“模型”中，并在方括号之间依次(逐层)描述它。

Dense(32, activation='relu', input_shape=(10,)),

We have our first layer as a dense layer with 32 neurons, ReLU activation and the input shape is 10 since we have 10 input features. Note that ‘Dense’ refers to a fully-connected layer, which is what we will be using.

我们的第一层是具有32个神经元的密集层，ReLU激活并且输入形状为10，因为我们有10个输入特征。请注意，“密集”是指我们将使用的全连接层。

Dense(32, activation='relu'),

Our second layer is also a dense layer with 32 neurons, ReLU activation. Note that we do not have to describe the input shape since Keras can infer from the output of our first layer.

我们的第二层也是具有32个神经元ReLU激活的致密层。注意，由于Keras可以从第一层的输出中推断出，因此我们不必描述输入形状。

Dense(1, activation='sigmoid'),

Our third layer is a dense layer with 1 neuron, sigmoid activation.

我们的第三层是具有1个神经元，乙状结肠激活的致密层。

And just like that, we have written our model architecture (template) in code!

就像那样，我们已经用代码编写了模型架构(模板)！

第二步：填写最佳数字 (Second Step: Filling in the best numbers)

Now that we’ve got our architecture specified, we need to find the best numbers for it. Before we start our training, we have to configure the model by

现在我们已经指定了架构，我们需要找到最佳的架构。在开始训练之前，我们必须通过以下方式配置模型

Telling it which algorithm you want to use to do the optimization
告诉您要使用哪种算法进行优化
Telling it what loss function to use
告诉它使用什么损失函数
Telling it what other metrics you want to track apart from the loss function
告诉它除损失函数外还想跟踪哪些其他指标

Configuring the model with these settings requires us to call the function model.compile, like this:

使用这些设置配置模型需要我们调用函数model.compile，如下所示：

model.compile(optimizer='sgd',              loss='binary_crossentropy',              metrics=['accuracy'])

We put the following settings inside the brackets after model.compile:

我们将以下设置放在model.compile之后的括号内：

optimizer='sgd'

‘sgd’ refers to stochastic gradient descent (over here, it refers to mini-batch gradient descent), which we’ve seen in Intuitive Deep Learning Part 1b.

“ sgd”指的是随机梯度下降(在这里，它指的是小批量梯度下降)，我们已经在直观深度学习第1b部分中看到了。

loss='binary_crossentropy'

The loss function for outputs that take the values 1 or 0 is called binary cross entropy.

输出值为1或0的输出的损失函数称为二进制交叉熵。

metrics=['accuracy']

Lastly, we want to track accuracy on top of the loss function. Now once we’ve run that cell, we are ready to train!

最后，我们要在损失函数的基础上跟踪准确性。现在，一旦我们运行了该单元，就可以开始训练了！

Training on the data is pretty straightforward and requires us to write one line of code:

对数据的培训非常简单，需要我们编写一行代码：

hist = model.fit(X_train, Y_train,          batch_size=32, epochs=100,          validation_data=(X_val, Y_val))

The function is called ‘fit’ as we are fitting the parameters to the data. We have to specify what data we are training on, which is X_train and Y_train. Then, we specify the size of our mini-batch and how long we want to train it for (epochs). Lastly, we specify what our validation data is so that the model will tell us how we are doing on the validation data at each point. This function will output a history, which we save under the variable hist. We’ll use this variable a little later when we get to visualization.

当我们将参数拟合到数据时，该函数称为“拟合”。我们必须指定正在训练的数据，即X_train和Y_train 。然后，我们指定迷你批处理的大小以及要训练的时间(历时)。最后，我们指定验证数据是什么，以便模型可以告诉我们在每个点上如何处理验证数据。此函数将输出一个历史记录，我们将其保存在变量hist下。稍后我们将在可视化中使用此变量。

Now, run the cell and watch it train! Your Jupyter notebook should look like this:

现在，运行单元并观看其训练！您的Jupyter笔记本应如下所示：

You can now see that the model is training! By looking at the numbers, you should be able to see the loss decrease and the accuracy increase over time. At this point, you can experiment with the hyper-parameters and neural network architecture. Run the cells again to see how your training has changed when you’ve tweaked your hyperparameters.

现在您可以看到模型正在训练！通过查看数字，您应该能够看到损耗随着时间的推移而减少，而准确性随时间而增加。在这一点上，您可以尝试使用超参数和神经网络体系结构。再次运行单元格，以查看调整超参数后训练的变化。

Once you’re happy with your final model, we can evaluate it on the test set. To find the accuracy on our test set, we run this code snippet:

对最终模型满意后，我们可以在测试集中对其进行评估。为了找到测试集的准确性，我们运行以下代码片段：

model.evaluate(X_test, Y_test)[1]

The reason why we have the index 1 after the model.evaluate function is because the function returns the loss as the first element and the accuracy as the second element. To only output the accuracy, simply access the second element (which is indexed by 1, since the first element starts its indexing from 0).

之所以在model.evaluate函数之后有索引1，是因为该函数将损失作为第一个元素，将精度作为第二个元素。仅输出精度，只需访问第二个元素(索引为1，因为第一个元素从0开始索引)。

Due to the randomness in how we have split the dataset as well as the initialization of the weights, the numbers and graph will differ slightly each time we run our notebook. Nevertheless, you should get a test accuracy anywhere between 80% to 95% if you’ve followed the architecture I specified above!

由于我们在分割数据集以及权重初始化方面的随机性，因此每次运行笔记本时，数字和图形都会略有不同。但是，如果您遵循我上面指定的体系结构，则应该获得80％到95％之间的测试精度！

And there you have it, you’ve coded up your very first neural network and trained it! Congratulations!

在这里，您已经编写了第一个神经网络并对其进行了培训！恭喜你！

Summary: Coding up our first neural network required only a few lines of code:

简介：编写我们的第一个神经网络只需要几行代码：

We specify the architecture with the Keras Sequential model.
我们使用Keras顺序模型指定体系结构。
We specify some of our settings (optimizer, loss function, metrics to track) with model.compile
我们使用model.compile指定一些设置(优化器，损失函数，要跟踪的指标)
We train our model (find the best parameters for our architecture) with the training data with model.fit
我们使用model.fit的训练数据训练模型(为我们的架构找到最佳参数)
We evaluate our model on the test set with model.evaluate
我们使用model.evaluate在测试集上评估我们的模型

可视化损失和准确性 (Visualizing Loss and Accuracy)

In Intuitive Deep Learning Part 1b, we talked about overfitting and some regularization techniques. How do we know if our model is currently overfitting?

在直观的深度学习第1b部分中，我们讨论了过拟合和一些正则化技术。我们如何知道我们的模型当前是否过度拟合？

What we might want to do is to plot the training loss and the val loss over the number of epochs passed. To display some nice graphs, we will use the package matplotlib. As usual, we have to import the code we wish to use:

我们可能想做的是在经过的历时数上绘制训练损失和val损失。为了显示一些漂亮的图形，我们将使用matplotlib软件包。和往常一样，我们必须导入要使用的代码：

import matplotlib.pyplot as plt

Then, we want to visualize the training loss and the validation loss. To do so, run this snippet of code:

然后，我们要可视化训练损失和验证损失。为此，请运行以下代码段：

plt.plot(hist.history['loss'])plt.plot(hist.history['val_loss'])plt.title('Model loss')plt.ylabel('Loss')plt.xlabel('Epoch')plt.legend(['Train', 'Val'], loc='upper right')plt.show()

We’ll explain each line of the above code snippet. The first two lines says that we want to plot the loss and the val_loss. The third line specifies the title of this graph, “Model Loss”. The fourth and fifth line tells us what the y and x axis should be labelled respectively. The sixth line includes a legend for our graph, and the location of the legend will be in the upper right. And the seventh line tells Jupyter notebook to display the graph.

我们将解释上述代码片段的每一行。前两行说我们要绘制损耗和val_loss。第三行指定此图的标题“模型损失”。第四和第五行告诉我们分别应标记y和x轴的位置。第六行包括图形的图例，图例的位置在右上角。第七行告诉Jupyter笔记本显示图形。

Your Jupyter notebook should look something like this:

您的Jupyter笔记本应如下所示：

We can do the same to plot our training accuracy and validation accuracy with the code below:

我们可以使用以下代码执行相同的操作来绘制训练准确性和验证准确性：

plt.plot(hist.history['acc'])plt.plot(hist.history['val_acc'])plt.title('Model accuracy')plt.ylabel('Accuracy')plt.xlabel('Epoch')plt.legend(['Train', 'Val'], loc='lower right')plt.show()

You should get a graph that looks a bit like this:

您应该得到一个看起来像这样的图：

Since the improvements in our model to the training set looks somewhat matched up with improvements to the validation set, it doesn’t seem like overfitting is a huge problem in our model.

由于我们对训练集模型的改进看起来与对验证集的改进有些匹配，因此看起来过度拟合似乎不是我们模型中的巨大问题。

Summary: We use matplotlib to visualize the training and validation loss / accuracy over time to see if there’s overfitting in our model.

简介：我们使用matplotlib随时间可视化训练和验证损失/准确性，以查看模型是否过度拟合。

将正则化添加到我们的神经网络 (Adding Regularization to our Neural Network)

For the sake of introducing regularization to our neural network, let’s formulate with a neural network that will badly overfit on our training set. We’ll call this Model 2.

为了将正则化引入我们的神经网络，让我们用一个神经网络来公式化，该神经网络会严重拟合我们的训练集。我们将其称为Model 2。

model_2 = Sequential([    Dense(1000, activation='relu', input_shape=(10,)),    Dense(1000, activation='relu'),    Dense(1000, activation='relu'),    Dense(1000, activation='relu'),    Dense(1, activation='sigmoid'),])

model_2.compile(optimizer='adam',              loss='binary_crossentropy',              metrics=['accuracy'])

hist_2 = model_2.fit(X_train, Y_train,          batch_size=32, epochs=100,          validation_data=(X_val, Y_val))

Here, we’ve made a much larger model and we’ve use the Adam optimizer. Adam is one of the most common optimizers we use, which adds some tweaks to stochastic gradient descent such that it reaches the lower loss function faster. If we run this code and plot the loss graphs for hist_2 using the code below (note that the code is the same except that we use ‘hist_2’ instead of ‘hist’):

在这里，我们制作了一个更大的模型，并使用了Adam优化器。亚当(Adam)是我们使用的最常见的优化器之一，它对随机梯度下降进行了一些调整，以使其更快地达到较低的损耗函数。如果我们运行此代码并使用以下代码绘制hist_2的损耗图(请注意，除了使用“ hist_2”而不是“ hist”以外，代码是相同的)：

plt.plot(hist_2.history['loss'])plt.plot(hist_2.history['val_loss'])plt.title('Model loss')plt.ylabel('Loss')plt.xlabel('Epoch')plt.legend(['Train', 'Val'], loc='upper right')plt.show()

We get a plot like this:

我们得到这样的情节：

This is a clear sign of over-fitting. The training loss is decreasing, but the validation loss is way above the training loss and increasing (past the inflection point of Epoch 20). If we plot accuracy using the code below:

这是过度拟合的明显标志。训练损失正在减少，但是验证损失远高于训练损失并在增加(过去的时间点是拐点20)。如果我们使用以下代码绘制准确性：

plt.plot(hist_2.history['acc'])plt.plot(hist_2.history['val_acc'])plt.title('Model accuracy')plt.ylabel('Accuracy')plt.xlabel('Epoch')plt.legend(['Train', 'Val'], loc='lower right')plt.show()

We can see a clearer divergence between train and validation accuracy as well:

我们还可以看到训练和验证准确性之间的差异更加明显：

Now, let’s try out some of our strategies to reduce over-fitting (apart from changing our architecture back to our first model). Remember from Intuitive Deep Learning Part 1b that we introduced three strategies to reduce over-fitting.

现在，让我们尝试一些减少过度拟合的策略(除了将架构更改回我们的第一个模型之外)。请记住，在《直观深度学习》第1b部分中，我们介绍了三种减少过度拟合的策略。

Of the three, we’ll incorporate L2 regularization and dropout here. The reason we don’t add early stopping here is because after we’ve used the first two strategies, the validation loss doesn’t take the U-shape we see above and so early stopping will not be as effective.

在这三个中，我们将在此处合并L2正则化和辍学。我们不在此处添加提前停止的原因是，在使用了前两种策略之后，验证损失不会采用我们在上面看到的U形，因此提前停止将不会那么有效。

First, let’s import the code that we need for L2 regularization and dropout:

首先，让我们导入进行L2正则化和辍学所需的代码：

from keras.layers import Dropoutfrom keras import regularizers

We then specify our third model like this:

然后，我们指定第三个模型，如下所示：

model_3 = Sequential([    Dense(1000, activation='relu', kernel_regularizer=regularizers.l2(0.01), input_shape=(10,)),    Dropout(0.3),    Dense(1000, activation='relu', kernel_regularizer=regularizers.l2(0.01)),    Dropout(0.3),    Dense(1000, activation='relu', kernel_regularizer=regularizers.l2(0.01)),    Dropout(0.3),    Dense(1000, activation='relu', kernel_regularizer=regularizers.l2(0.01)),    Dropout(0.3),    Dense(1, activation='sigmoid', kernel_regularizer=regularizers.l2(0.01)),])

Can you spot the differences between Model 3 and Model 2? There are two main differences:

您能看出Model 3和Model 2之间的区别吗？有两个主要区别：

Difference 1: To add L2 regularization, notice that we’ve added a bit of extra code in each of our dense layers like this:

差异1 ：要添加L2正则化，请注意，我们在每个密集层中都添加了一些额外的代码，如下所示：

kernel_regularizer=regularizers.l2(0.01)

This tells Keras to include the squared values of those parameters in our overall loss function, and weight them by 0.01 in the loss function.

这告诉Keras在我们的整体损失函数中包括这些参数的平方值，并在损失函数中将它们加权0.01。

Difference 2: To add Dropout, we added a new layer like this:

差异2 ：要添加Dropout，我们添加了一个新层，如下所示：

Dropout(0.3),

This means that the neurons in the previous layer has a probability of 0.3 in dropping out during training. Let’s compile it and run it with the same parameters as our Model 2 (the overfitting one):

这意味着在训练过程中，上一层中的神经元掉出的可能性为0.3。让我们对其进行编译并以与Model 2(过拟合的模型)相同的参数运行它：

model_3.compile(optimizer='adam',              loss='binary_crossentropy',              metrics=['accuracy'])

hist_3 = model_3.fit(X_train, Y_train,          batch_size=32, epochs=100,          validation_data=(X_val, Y_val))

And now, let’s plot the loss and accuracy graphs. You’ll notice that the loss is a lot higher at the start, and that’s because we’ve changed our loss function. To plot such that the window is zoomed in between 0 and 1.2 for the loss, we add an additional line of code (plt.ylim) when plotting:

现在，让我们绘制损耗和精度图。您会注意到，损失一开始会高得多，这是因为我们已经更改了损失函数。为了进行绘制以使窗口在0到1.2之间放大以显示损失，我们在绘制时添加了另一行代码(plt.ylim)：

plt.plot(hist_3.history['loss'])plt.plot(hist_3.history['val_loss'])plt.title('Model loss')plt.ylabel('Loss')plt.xlabel('Epoch')plt.legend(['Train', 'Val'], loc='upper right')plt.ylim(top=1.2, bottom=0)plt.show()

We’ll get a loss graph that looks like this:

我们将得到一个如下的损耗图：

You can see that the validation loss much more closely matches our training loss. Let’s plot the accuracy with similar code snippet:

您会看到验证损失与我们的训练损失更为接近。让我们用类似的代码片段来绘制精度：

plt.plot(hist_3.history['acc'])plt.plot(hist_3.history['val_acc'])plt.title('Model accuracy')plt.ylabel('Accuracy')plt.xlabel('Epoch')plt.legend(['Train', 'Val'], loc='lower right')plt.show()

And we will get a plot like this:

我们将得到如下图：

Compared to our model in Model 2, we’ve reduced overfitting substantially! And that’s how we apply our regularization techniques to reduce overfitting to the training set.

与Model 2中的模型相比，我们大大减少了过度拟合！这就是我们应用正则化技术来减少对训练集的过度拟合的方式。

Summary: To deal with overfitting, we can code in the following strategies into our model each with about one line of code:

简介：为了解决过度拟合问题，我们可以将以下策略编码到我们的模型中，每个策略大约需要一行代码：

L2 Regularization
L2正则化
Dropout
退出

If we visualize the training / validation loss and accuracy, we can see that these additions have helped deal with overfitting!

如果我们可视化训练/验证的损失和准确性，我们可以看到这些增加有助于解决过度拟合问题！

合并摘要： (Consolidated Summary:)

In this post, we’ve written Python code to:

在本文中，我们将Python代码编写为：

Explore and Process the Data
探索和处理数据
Build and Train our Neural Network
建立和训练我们的神经网络
Visualize Loss and Accuracy
可视化损失和准确性
Add Regularization to our Neural Network
向我们的神经网络添加正则化

We’ve been through a lot, but we haven’t written too many lines of code! Building and Training our Neural Network has only taken about 4 to 5 lines of code, and experimenting with different model architectures is just a simple matter of swapping in different layers or changing different hyperparameters. Keras has indeed made it a lot easier to build our neural networks, and we’ll continue to use it for more advanced applications in Computer Vision and Natural Language Processing.

我们已经经历了很多事情，但是还没有写太多的代码！构建和训练我们的神经网络只需要大约4到5行代码，而尝试使用不同的模型体系结构只是在不同层中交换或更改不同超参数的简单问题。实际上，Keras使构建神经网络变得容易得多，并且我们将继续将其用于计算机视觉和自然语言处理中的更高级应用程序。

What’s Next: In our next Coding Companion Part 2, we will explore how to code up our own Convolutional Neural Networks (CNNs) to do image recognition!

下一步是什么 ：在下一个编码伴侣第二部分中，我们将探索如何编码自己的卷积神经网络(CNN)来进行图像识别！

Build your first Convolutional Neural Network to recognize imagesA step-by-step guide to building your own image recognition software with Convolutional Neural Networks using Keras on…medium.com

建立您的第一个卷积神经网络以识别图像 一步一步的指南，在... medium.com 上使用 Keras 使用卷积神经网络构建自己的图像识别软件

Be sure to first get an intuitive understanding of CNNs here: Intuitive Deep Learning Part 2: CNNs for Computer Vision

请确保首先在这里对CNN有了直观的了解：直观的深度学习第2部分：用于计算机视觉的CNN

About the author:

关于作者：

Hi there, I’m Joseph! I recently graduated from Stanford University, where I worked with Andrew Ng in the Stanford Machine Learning Group. I want to make Deep Learning concepts as intuitive and as easily understandable as possible by everyone, which has motivated my publication: Intuitive Deep Learning.

嗨，我是约瑟夫！我最近从斯坦福大学毕业，在那里我与斯坦福机器学习小组的 Andrew Ng合作。我想使深度学习的概念对每个人都尽可能直观和易于理解，这激发了我的著作：《直观的深度学习》。

翻译自: https://www.freecodecamp.org/news/how-to-build-your-first-neural-network-to-predict-house-prices-with-keras-f8db83049159/