Neural networks are used as a method of deep learning, one of the many subfields of artificial intelligence. They were first proposed around 70 years ago as an attempt at simulating the way the human brain works, though in a much more simplified form. Individual ‘neurons’ are connected in layers, with weights assigned to determine how the neuron responds when signals are propagated through the network. Previously, neural networks were limited in the number of neurons they were able to simulate, and therefore the complexity of learning they could achieve. But in recent years, due to advancements in hardware development, we have been able to build very deep networks, and train them on enormous datasets to achieve breakthroughs in machine intelligence.
神经网络被用作深度学习的方法,这是人工智能的许多子领域之一。 它们最初是在70年前提出的,目的是模拟人脑的工作方式,尽管形式更为简化。 各个“神经元”层级连接,分配权重以确定在信号通过网络传播时神经元如何响应。 以前,神经网络在其能够模拟的神经元数量方面受到限制,因此可以实现学习的复杂性。 但是近年来,由于硬件开发的进步,我们已经能够构建非常深的网络,并在庞大的数据集上对其进行训练,以实现机器智能方面的突破。
These breakthroughs have allowed machines to match and exceed the capabilities of humans at performing certain tasks. One such task is object recognition. Though machines have historically been unable to match human vision, recent advances in deep learning have made it possible to build neural networks which can recognize objects, faces, text, and even emotions.
这些突破使机器在执行某些任务时能够与人类匹敌并超越人类的能力。 这样的任务之一是对象识别。 尽管机器在历史上一直无法匹配人类的视觉,但深度学习的最新进展使得构建能够识别物体,面部,文本甚至情绪的神经网络成为可能。
In this tutorial, you will implement a small subsection of object recognition—digit recognition. Using TensorFlow, an open-source Python library developed by the Google Brain labs for deep learning research, you will take hand-drawn images of the numbers 0-9 and build and train a neural network to recognize and predict the correct label for the digit displayed.
在本教程中,您将实现对象识别的一小部分-数字识别。 使用TensorFlow (由Google Brain实验室开发的用于深度学习的开源Python库),您将拍摄数字0-9的手绘图像,并构建和训练神经网络以识别和预测该数字的正确标签显示。
While you won’t need prior experience in practical deep learning or TensorFlow to follow along with this tutorial, we’ll assume some familiarity with machine learning terms and concepts such as training and testing, features and labels, optimization, and evaluation. You can learn more about these concepts in An Introduction to Machine Learning.
尽管您不需要实际的深度学习经验或TensorFlow即可随本教程一起学习,但我们将假定您对机器学习术语和概念有所了解,例如培训和测试,功能和标签,优化和评估。 您可以在《机器学习入门》中了解有关这些概念的更多信息。
To complete this tutorial, you’ll need:
要完成本教程,您需要:
A local Python 3 development environment, including pip, a tool for installing Python packages, and venv, for creating virtual environments.
本地Python 3开发环境 ,包括pip (用于安装Python软件包的工具)和venv (用于创建虚拟环境)。
Before you can develop the recognition program, you’ll need to install a few dependencies and create a workspace to hold your files.
在开发识别程序之前,您需要安装一些依赖项并创建一个工作区来保存文件。
We’ll use a Python 3 virtual environment to manage our project’s dependencies. Create a new directory for your project and navigate to the new directory:
我们将使用Python 3虚拟环境来管理项目的依赖项。 为您的项目创建一个新目录,并导航到新目录:
Execute the following commands to set up the virtual environment for this tutorial:
执行以下命令来设置本教程的虚拟环境:
Next, install the libraries you’ll use in this tutorial. We’ll use specific versions of these libraries by creating a requirements.txt
file in the project directory which specifies the requirement and the version we need. Create the requirements.txt
file:
接下来,安装将在本教程中使用的库。 我们将通过在项目目录中创建一个requirements.txt
文件来使用这些库的特定版本,该文件指定了需求和所需的版本。 创建requirements.txt
文件:
Open the file in your text editor and add the following lines to specify the Image, NumPy, and TensorFlow libraries and their versions:
在文本编辑器中打开文件,并添加以下行以指定Image,NumPy和TensorFlow库及其版本:
image==1.5.20
numpy==1.14.3
tensorflow==1.4.0
Save the file and exit the editor. Then install these libraries with the following command:
保存文件并退出编辑器。 然后使用以下命令安装这些库:
With the dependencies installed, we can start working on our project.
安装依赖项后,我们就可以开始进行项目了。
The dataset we will be using in this tutorial is called the MNIST dataset, and it is a classic in the machine learning community. This dataset is made up of images of handwritten digits, 28x28 pixels in size. Here are some examples of the digits included in the dataset:
我们将在本教程中使用的数据集称为MNIST数据集,它是机器学习社区中的经典之作。 该数据集由大小为28x28像素的手写数字图像组成。 以下是数据集中包含的数字的一些示例:
Let’s create a Python program to work with this dataset. We will use one file for all of our work in this tutorial. Create a new file called main.py
:
让我们创建一个Python程序来使用此数据集。 在本教程中,我们将使用一个文件来完成所有工作。 创建一个名为main.py
的新文件:
Now open this file in your text editor of choice and add this line of code to the file to import the TensorFlow library:
现在在您选择的文本编辑器中打开此文件,并将以下代码行添加到文件中以导入TensorFlow库:
import tensorflow as tf
Add the following lines of code to your file to import the MNIST dataset and store the image data in the variable mnist
:
将以下代码行添加到文件中,以导入MNIST数据集并将图像数据存储在变量mnist
:
...
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # y labels are oh-encoded
When reading in the data, we are using one-hot-encoding to represent the labels (the actual digit drawn, e.g. “3”) of the images. One-hot-encoding uses a vector of binary values to represent numeric or categorical values. As our labels are for the digits 0-9, the vector contains ten values, one for each possible digit. One of these values is set to 1, to represent the digit at that index of the vector, and the rest are set to 0. For example, the digit 3 is represented using the vector [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
. As the value at index 3 is stored as 1, the vector therefore represents the digit 3.
读取数据时,我们使用单次热编码来表示图像的标签(绘制的实际数字,例如“ 3”)。 一键编码使用二进制值的向量表示数字或分类值。 由于我们的标签是数字0-9,因此向量包含十个值,每个可能的数字一个。 这些值之一设置为1,以表示矢量在该索引处的数字,其余值设置为0。例如,数字3使用矢量[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
。 由于索引3的值存储为1,因此向量表示数字3。
To represent the actual images themselves, the 28x28 pixels are flattened into a 1D vector which is 784 pixels in size. Each of the 784 pixels making up the image is stored as a value between 0 and 255. This determines the grayscale of the pixel, as our images are presented in black and white only. So a black pixel is represented by 255, and a white pixel by 0, with the various shades of gray somewhere in between.
为了表示实际图像本身,将28x28像素展平为大小为784像素的一维矢量。 构成图像的784个像素中的每一个都存储为0到255之间的值。这决定了像素的灰度,因为我们的图像仅以黑白显示。 因此,黑色像素用255表示,白色像素用0表示,中间的各种阴影表示灰色。
We can use the mnist
variable to find out the size of the dataset we have just imported. Looking at the num_examples
for each of the three subsets, we can determine that the dataset has been split into 55,000 images for training, 5000 for validation, and 10,000 for testing. Add the following lines to your file:
我们可以使用mnist
变量来查找刚刚导入的数据集的大小。 查看三个子集中的每个子集的num_examples
,我们可以确定该数据集已分为55,000张图像进行训练,5000张图像进行验证和10,000张图像进行测试。 将以下行添加到您的文件:
...
n_train = mnist.train.num_examples # 55,000
n_validation = mnist.validation.num_examples # 5000
n_test = mnist.test.num_examples # 10,000
Now that we have our data imported, it’s time to think about the neural network.
现在我们已经导入了数据,现在该考虑神经网络了。
The architecture of the neural network refers to elements such as the number of layers in the network, the number of units in each layer, and how the units are connected between layers. As neural networks are loosely inspired by the workings of the human brain, here the term unit is used to represent what we would biologically think of as a neuron. Like neurons passing signals around the brain, units take some values from previous units as input, perform a computation, and then pass on the new value as output to other units. These units are layered to form the network, starting at a minimum with one layer for inputting values, and one layer to output values. The term hidden layer is used for all of the layers in between the input and output layers, i.e. those “hidden” from the real world.
神经网络的架构指的是诸如网络中的层数,每层中的单元数以及这些单元如何在层之间连接的元素。 由于神经网络是受到人脑运作方式的宽松启发,因此在这里,单位一词代表了我们在生物学上认为是神经元的东西。 就像神经元在大脑周围传递信号一样,单位将以前单位的一些值作为输入,执行计算,然后将新值作为输出传递给其他单位。 这些单元分层形成网络,至少从一层开始输入值,然后一层开始输出值。 术语“ 隐藏层”用于输入和输出层之间的所有层,即从现实世界“隐藏”的那些层。
Different architectures can yield dramatically different results, as the performance can be thought of as a function of the architecture among other things, such as the parameters, the data, and the duration of training.
不同的体系结构可以产生截然不同的结果,因为性能可以认为是体系结构的函数,例如参数,数据和训练持续时间。
Add the following lines of code to your file to store the number of units per layer in global variables. This allows us to alter the network architecture in one place, and at the end of the tutorial you can test for yourself how different numbers of layers and units will impact the results of our model:
将以下代码行添加到文件中,以在全局变量中存储每层的单位数。 这使我们可以在一处更改网络体系结构,并且在本教程的最后,您可以亲自测试不同数量的层和单元如何影响模型的结果:
...
n_input = 784 # input layer (28x28 pixels)
n_hidden1 = 512 # 1st hidden layer
n_hidden2 = 256 # 2nd hidden layer
n_hidden3 = 128 # 3rd hidden layer
n_output = 10 # output layer (0-9 digits)
The following diagram shows a visualization of the architecture we’ve designed, with each layer fully connected to the surrounding layers:
下图显示了我们设计的架构的可视化,每一层都完全连接到周围的层:
The term “deep neural network” relates to the number of hidden layers, with “shallow” usually meaning just one hidden layer, and “deep” referring to multiple hidden layers. Given enough training data, a shallow neural network with a sufficient number of units should theoretically be able to represent any function that a deep neural network can. But it is often more computationally efficient to use a smaller deep neural network to achieve the same task that would require a shallow network with exponentially more hidden units. Shallow neural networks also often encounter overfitting, where the network essentially memorizes the training data that it has seen, and is not able to generalize the knowledge to new data. This is why deep neural networks are more commonly used: the multiple layers between the raw input data and the output label allow the network to learn features at various levels of abstraction, making the network itself better able to generalize.
术语“深层神经网络”涉及隐藏层的数量,其中“浅层”通常仅指一个隐藏层,而“深层”是指多个隐藏层。 给定足够的训练数据,理论上具有足够数量单元的浅层神经网络应该能够代表深层神经网络可以实现的任何功能。 但是,使用较小的深度神经网络来完成相同的任务通常需要更高的计算效率,而该任务需要具有指数级更多隐藏单元的浅层网络。 浅层神经网络也经常会遇到过度拟合问题,在这种情况下,网络实际上会记住已看到的训练数据,而无法将知识概括为新数据。 这就是为什么深度神经网络更常用的原因:原始输入数据和输出标签之间的多层允许网络学习各个抽象级别的特征,从而使网络本身能够更好地进行泛化。
Other elements of the neural network that need to be defined here are the hyperparameters. Unlike the parameters that will get updated during training, these values are set initially and remain constant throughout the process. In your file, set the following variables and values:
在这里需要定义的神经网络的其他元素是超参数。 与在训练过程中会更新的参数不同,这些值是最初设置的,并在整个过程中保持不变。 在文件中,设置以下变量和值:
...
learning_rate = 1e-4
n_iterations = 1000
batch_size = 128
dropout = 0.5
The learning rate represents how much the parameters will adjust at each step of the learning process. These adjustments are a key component of training: after each pass through the network we tune the weights slightly to try and reduce the loss. Larger learning rates can converge faster, but also have the potential to overshoot the optimal values as they are updated. The number of iterations refers to how many times we go through the training step, and the batch size refers to how many training examples we are using at each step. The dropout
variable represents a threshold at which we eliminate some units at random. We will be using dropout
in our final hidden layer to give each unit a 50% chance of being eliminated at every training step. This helps prevent overfitting.
学习率表示在学习过程的每个步骤中参数将调整多少。 这些调整是培训的关键组成部分:每次通过网络后,我们都会对权重进行微调,以尝试减少损失。 较高的学习率可以收敛得更快,但是也有可能在更新后的最优值过高。 迭代次数是指我们经过训练步骤的次数,而批量大小是指我们在每个步骤中使用的训练示例的数量。 dropout
变量表示一个阈值,在该阈值处我们随机消除一些单位。 我们将在最后的隐藏层中使用dropout
,以使每个单元在每个训练步骤中都有50%的机会被淘汰。 这有助于防止过度拟合。
We have now defined the architecture of our neural network, and the hyperparameters that impact the learning process. The next step is to build the network as a TensorFlow graph.
现在,我们定义了神经网络的体系结构以及影响学习过程的超参数。 下一步是将网络构建为TensorFlow图。
To build our network, we will set up the network as a computational graph for TensorFlow to execute. The core concept of TensorFlow is the tensor, a data structure similar to an array or list. initialized, manipulated as they are passed through the graph, and updated through the learning process.
为了构建我们的网络,我们将网络设置为TensorFlow执行的计算图。 TensorFlow的核心概念是张量 ,即类似于数组或列表的数据结构。 在通过图传递时进行初始化,操作,并在学习过程中进行更新。
We’ll start by defining three tensors as placeholders, which are tensors that we’ll feed values into later. Add the following to your file:
我们将从定义三个张量作为占位符开始 ,这些张量将在以后将值提供给它们。 将以下内容添加到您的文件中:
...
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_output])
keep_prob = tf.placeholder(tf.float32)
The only parameter that needs to be specified at its declaration is the size of the data we will be feeding in. For X
we use a shape of [None, 784]
, where None
represents any amount, as we will be feeding in an undefined number of 784-pixel images. The shape of Y
is [None, 10]
as we will be using it for an undefined number of label outputs, with 10 possible classes. The keep_prob
tensor is used to control the dropout rate, and we initialize it as a placeholder rather than an immutable variable because we want to use the same tensor both for training (when dropout
is set to 0.5
) and testing (when dropout
is set to 1.0
).
需要在其声明中指定的唯一参数是将要馈入的数据的大小。对于X
我们使用[None, 784]
的形状,其中None
表示任何数量,因为我们将馈入未定义的数据784像素图像的数量。 Y
的形状为[None, 10]
因为我们将使用它来定义数量不定的标签输出,并提供10种可能的类。 keep_prob
张量用于控制辍学率,我们将其初始化为占位符而不是不可变变量,因为我们想在训练(当dropout
设置为0.5
)和测试(当dropout
设置为)时使用相同的张量。 1.0
)。
The parameters that the network will update in the training process are the weight
and bias
values, so for these we need to set an initial value rather than an empty placeholder. These values are essentially where the network does its learning, as they are used in the activation functions of the neurons, representing the strength of the connections between units.
网络在训练过程中将更新的参数是weight
和bias
值,因此对于这些参数,我们需要设置一个初始值,而不是一个空的占位符。 这些值实质上是网络进行学习的地方,因为它们在神经元的激活功能中使用,代表了单元之间连接的强度。
Since the values are optimized during training, we could set them to zero for now. But the initial value actually has a significant impact on the final accuracy of the model. We’ll use random values from a truncated normal distribution for the weights. We want them to be close to zero, so they can adjust in either a positive or negative direction, and slightly different, so they generate different errors. This will ensure that the model learns something useful. Add these lines:
由于这些值是在训练期间优化的,因此我们现在可以将它们设置为零。 但是,初始值实际上会对模型的最终精度产生重大影响。 我们将使用截断正态分布中的随机值作为权重。 我们希望它们接近零,以便它们可以在正或负方向上进行调整,并且略有不同,因此它们会产生不同的误差。 这将确保模型学到有用的东西。 添加这些行:
...
weights = {
'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),
'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),
'w3': tf.Variable(tf.truncated_normal([n_hidden2, n_hidden3], stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden3, n_output], stddev=0.1)),
}
For the bias, we use a small constant value to ensure that the tensors activate in the intial stages and therefore contribute to the propagation. The weights and bias tensors are stored in dictionary objects for ease of access. Add this code to your file to define the biases:
对于偏差,我们使用一个小的常数值来确保张量在初始阶段激活,从而有助于传播。 权重和偏差张量存储在字典对象中,以便于访问。 将此代码添加到文件中以定义偏差:
...
biases = {
'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])),
'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])),
'b3': tf.Variable(tf.constant(0.1, shape=[n_hidden3])),
'out': tf.Variable(tf.constant(0.1, shape=[n_output]))
}
Next, set up the layers of the network by defining the operations that will manipulate the tensors. Add these lines to your file:
接下来,通过定义将操纵张量的操作来设置网络层。 将这些行添加到您的文件中:
...
layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
layer_drop = tf.nn.dropout(layer_3, keep_prob)
output_layer = tf.matmul(layer_3, weights['out']) + biases['out']
Each hidden layer will execute matrix multiplication on the previous layer’s outputs and the current layer’s weights, and add the bias to these values. At the last hidden layer, we will apply a dropout operation using our keep_prob
value of 0.5.
每个隐藏层都将在前一层的输出和当前层的权重上执行矩阵乘法,并将偏差添加到这些值。 在最后一个隐藏层,我们将使用keep_prob
值为0.5来应用keep_prob
操作。
The final step in building the graph is to define the loss function that we want to optimize. A popular choice of loss function in TensorFlow programs is cross-entropy, also known as log-loss, which quantifies the difference between two probability distributions (the predictions and the labels). A perfect classification would result in a cross-entropy of 0, with the loss completely minimized.
构建图形的最后一步是定义我们要优化的损失函数。 TensorFlow程序中损失函数的一种流行选择是交叉熵 ,也称为对数损失 ,它可以量化两个概率分布(预测和标签)之间的差异。 完美的分类将导致交叉熵为0,而损失则完全降至最低。
We also need to choose the optimization algorithm which will be used to minimize the loss function. A process named gradient descent optimization is a common method for finding the (local) minimum of a function by taking iterative steps along the gradient in a negative (descending) direction. There are several choices of gradient descent optimization algorithms already implemented in TensorFlow, and in this tutorial we will be using the Adam optimizer. This extends upon gradient descent optimization by using momentum to speed up the process through computing an exponentially weighted average of the gradients and using that in the adjustments. Add the following code to your file:
我们还需要选择将最小化损失函数的优化算法。 称为梯度下降优化的过程是一种常见的方法,它通过沿负(下降)方向沿梯度进行迭代步骤来找到函数的(局部)最小值。 TensorFlow中已经实现了梯度下降优化算法的多种选择,在本教程中,我们将使用Adam优化器 。 通过使用动量通过计算梯度的指数加权平均值并在调整中使用该动量来加快过程,从而扩展了梯度下降优化。 将以下代码添加到您的文件中:
...
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels=Y, logits=output_layer
))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
We’ve now defined the network and built it out with TensorFlow. The next step is to feed data through the graph to train it, and then test that it has actually learnt something.
我们现在已经定义了网络,并使用TensorFlow对其进行了构建。 下一步是通过图提供数据以对其进行训练,然后测试它是否确实学到了一些东西。
The training process involves feeding the training dataset through the graph and optimizing the loss function. Every time the network iterates through a batch of more training images, it updates the parameters to reduce the loss in order to more accurately predict the digits shown. The testing process involves running our testing dataset through the trained graph, and keeping track of the number of images that are correctly predicted, so that we can calculate the accuracy.
训练过程包括通过图表输入训练数据集并优化损失函数。 每次网络迭代一批更多的训练图像时,它都会更新参数以减少损失,以便更准确地预测所显示的数字。 测试过程包括通过训练好的图形运行我们的测试数据集,并跟踪正确预测的图像数量,以便我们可以计算准确性。
Before starting the training process, we will define our method of evaluating the accuracy so we can print it out on mini-batches of data while we train. These printed statements will allow us to check that from the first iteration to the last, loss decreases and accuracy increases; they will also allow us to track whether or not we have ran enough iterations to reach a consistent and optimal result:
在开始训练过程之前,我们将定义评估准确性的方法,以便在训练时可以将其打印在小批量数据上。 这些打印的报表将使我们能够检查从第一次迭代到最后一次迭代,损失减少,准确性提高; 它们还将使我们能够跟踪是否进行了足够的迭代以达到一致且最佳的结果:
...
correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
In correct_pred
, we use the arg_max
function to compare which images are being predicted correctly by looking at the output_layer
(predictions) and Y
(labels), and we use the equal
function to return this as a list of Booleans. We can then cast this list to floats and calculate the mean to get a total accuracy score.
在correct_pred
,我们使用arg_max
函数通过查看output_layer
(预测)和Y
(标签)来比较正在正确预测哪些图像,并使用equal
函数将其作为布尔值列表返回。 然后,我们可以将此列表转换为浮点数,并计算均值以获得总准确度得分。
We are now ready to initialize a session for running the graph. In this session we will feed the network with our training examples, and once trained, we feed the same graph with new test examples to determine the accuracy of the model. Add the following lines of code to your file:
现在,我们准备初始化运行该图的会话。 在本节中,我们将向我们的训练示例提供给网络,一旦接受训练,我们将向同一图提供新的测试示例,以确定模型的准确性。 将以下代码行添加到您的文件中:
...
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
The essence of the training process in deep learning is to optimize the loss function. Here we are aiming to minimize the difference between the predicted labels of the images, and the true labels of the images. The process involves four steps which are repeated for a set number of iterations:
深度学习训练过程的本质是优化损失函数。 在这里,我们的目标是使图像的预测标签与图像的真实标签之间的差异最小化。 该过程涉及四个步骤,这些步骤会重复进行一定数量的迭代:
At each training step, the parameters are adjusted slightly to try and reduce the loss for the next step. As the learning progresses, we should see a reduction in loss, and eventually we can stop training and use the network as a model for testing our new data.
在每个训练步骤中,均会略微调整参数,以尝试减少下一步的损失。 随着学习的进行,我们应该看到损失减少,并且最终我们可以停止培训并将网络用作测试新数据的模型。
Add this code to the file:
将此代码添加到文件中:
...
# train on mini batches
for i in range(n_iterations):
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(train_step, feed_dict={
X: batch_x, Y: batch_y, keep_prob: dropout
})
# print loss and accuracy (per minibatch)
if i % 100 == 0:
minibatch_loss, minibatch_accuracy = sess.run(
[cross_entropy, accuracy],
feed_dict={X: batch_x, Y: batch_y, keep_prob: 1.0}
)
print(
"Iteration",
str(i),
"\t| Loss =",
str(minibatch_loss),
"\t| Accuracy =",
str(minibatch_accuracy)
)
After 100 iterations of each training step in which we feed a mini-batch of images through the network, we print out the loss and accuracy of that batch. Note that we should not be expecting a decreasing loss and increasing accuracy here, as the values are per batch, not for the entire model. We use mini-batches of images rather than feeding them through individually to speed up the training process and allow the network to see a number of different examples before updating the parameters.
在每个训练步骤进行100次迭代之后,我们通过网络提供了一个小批量的图像,我们打印出了该批次的损失和准确性。 请注意,我们不应该期望损失减少和准确性提高,因为这些值是每批而不是整个模型的。 我们使用图像的小批处理,而不是逐个馈入图像,以加快训练过程,并允许网络在更新参数之前看到许多不同的示例。
Once the training is complete, we can run the session on the test images. This time we are using a keep_prob
dropout rate of 1.0
to ensure all units are active in the testing process.
培训完成后,我们可以在测试图像上运行该会话。 这次我们使用的keep_prob
退出率是1.0
以确保所有单元在测试过程中都处于活动状态。
Add this code to the file:
将此代码添加到文件中:
...
test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels, keep_prob: 1.0})
print("\nAccuracy on test set:", test_accuracy)
It’s now time to run our program and see how accurately our neural network can recognize these handwritten digits. Save the main.py
file and execute the following command in the terminal to run the script:
现在是时候运行我们的程序了,看看我们的神经网络可以多么准确地识别这些手写数字。 保存main.py
文件,并在终端中执行以下命令以运行脚本:
You’ll see an output similar to the following, although individual loss and accuracy results may vary slightly:
您将看到类似于以下的输出,尽管单个损失和准确性结果可能略有不同:
Output
Iteration 0 | Loss = 3.67079 | Accuracy = 0.140625
Iteration 100 | Loss = 0.492122 | Accuracy = 0.84375
Iteration 200 | Loss = 0.421595 | Accuracy = 0.882812
Iteration 300 | Loss = 0.307726 | Accuracy = 0.921875
Iteration 400 | Loss = 0.392948 | Accuracy = 0.882812
Iteration 500 | Loss = 0.371461 | Accuracy = 0.90625
Iteration 600 | Loss = 0.378425 | Accuracy = 0.882812
Iteration 700 | Loss = 0.338605 | Accuracy = 0.914062
Iteration 800 | Loss = 0.379697 | Accuracy = 0.875
Iteration 900 | Loss = 0.444303 | Accuracy = 0.90625
Accuracy on test set: 0.9206
To try and improve the accuracy of our model, or to learn more about the impact of tuning hyperparameters, we can test the effect of changing the learning rate, the dropout threshold, the batch size, and the number of iterations. We can also change the number of units in our hidden layers, and change the amount of hidden layers themselves, to see how different architectures increase or decrease the model accuracy.
为了尝试提高模型的准确性,或者更多地了解调整超参数的影响,我们可以测试更改学习率,辍学阈值,批处理大小和迭代次数的影响。 我们还可以更改隐藏层中的单元数,并更改隐藏层本身的数量,以了解不同的体系结构如何增加或降低模型精度。
To demonstrate that the network is actually recognizing the hand-drawn images, let’s test it on a single image of our own.
为了证明网络实际上在识别手绘图像,让我们在自己的单个图像上对其进行测试。
If you are on a local machine and you would like to use your own hand-drawn number, you can use a graphics editor to create your own 28x28 pixel image of a digit. Otherwise, you can use curl
to download the following sample test image to your server or computer:
如果您在本地计算机上,并且想使用自己的手绘数字,则可以使用图形编辑器来创建自己的28x28像素数字图像。 否则,您可以使用curl
将以下示例测试图像下载到服务器或计算机:
Open the main.py
file in your editor and add the following lines of code to the top of the file to import two libraries necessary for image manipulation.
在编辑器中打开main.py
文件,并将以下代码行添加到文件顶部,以导入两个图像处理所需的库。
import numpy as np
from PIL import Image
...
Then at the end of the file, add the following line of code to load the test image of the handwritten digit:
然后,在文件末尾,添加以下代码行以加载手写数字的测试图像:
...
img = np.invert(Image.open("test_img.png").convert('L')).ravel()
The open
function of the Image
library loads the test image as a 4D array containing the three RGB color channels and the Alpha transparency. This is not the same representation we used previously when reading in the dataset with TensorFlow, so we’ll need to do some extra work to match the format.
Image
库的open
功能将测试图像加载为4D数组,其中包含三个RGB颜色通道和Alpha透明度。 这与之前使用TensorFlow读取数据集时使用的表示形式不同,因此我们需要做一些额外的工作以匹配格式。
First, we use the convert
function with the L
parameter to reduce the 4D RGBA representation to one grayscale color channel. We store this as a numpy
array and invert it using np.invert
, because the current matrix represents black as 0 and white as 255, whereas we need the opposite. Finally, we call ravel
to flatten the array.
首先,我们使用带有L
参数的convert
函数将4D RGBA表示缩减为一个灰度彩色通道。 我们将其存储为一个numpy
数组,并使用np.invert
对其进行np.invert
,因为当前矩阵将black表示为0,将white表示为255,而我们需要相反。 最后,我们调用ravel
来展平数组。
Now that the image data is structured correctly, we can run a session in the same way as previously, but this time only feeding in the single image for testing.
现在,图像数据已正确构建,我们可以以与以前相同的方式运行会话,但是这次仅提供单个图像进行测试。
Add the following code to your file to test the image and print the outputted label.
将以下代码添加到文件中以测试图像并打印输出的标签。
...
prediction = sess.run(tf.argmax(output_layer, 1), feed_dict={X: [img]})
print ("Prediction for test image:", np.squeeze(prediction))
The np.squeeze
function is called on the prediction to return the single integer from the array (i.e. to go from [2] to 2). The resulting output demonstrates that the network has recognized this image as the digit 2.
在预测中调用np.squeeze
函数,以从数组返回单个整数(即从[2]到2)。 结果输出表明网络已将该图像识别为数字2。
Output
Prediction for test image: 2
You can try testing the network with more complex images –– digits that look like other digits, for example, or digits that have been drawn poorly or incorrectly –– to see how well it fares.
您可以尝试使用更复杂的图像(例如,看起来像其他数字的数字,或绘制得不好或不正确的数字)测试网络,以查看其运行状况。
In this tutorial you successfully trained a neural network to classify the MNIST dataset with around 92% accuracy and tested it on an image of your own. Current state-of-the-art research achieves around 99% on this same problem, using more complex network architectures involving convolutional layers. These use the 2D structure of the image to better represent the contents, unlike our method which flattened all the pixels into one vector of 784 units. You can read more about this topic on the TensorFlow website, and see the research papers detailing the most accurate results on the MNIST website.
在本教程中,您成功地训练了神经网络,以约92%的准确度对MNIST数据集进行分类,并在您自己的图像上对其进行了测试。 使用涉及卷积层的更复杂的网络体系结构,当前的最新研究可在同一问题上实现约99%的目标。 它们使用图像的2D结构更好地表示内容,这与我们将所有像素展平为784个单位的矢量的方法不同。 您可以在TensorFlow网站上阅读有关此主题的更多信息,并在MNIST网站上查看详细介绍最准确结果的研究论文。
Now that you know how to build and train a neural network, you can try and use this implementation on your own data, or test it on other popular datasets such as the Google StreetView House Numbers, or the CIFAR-10 dataset for more general image recognition.
现在,您知道如何构建和训练神经网络,可以尝试在自己的数据上使用此实现,或者在其他流行的数据集(例如Google StreetView门牌号或CIFAR-10数据集)上进行测试,以获取更通用的图像承认。
翻译自: https://www.digitalocean.com/community/tutorials/how-to-build-a-neural-network-to-recognize-handwritten-digits-with-tensorflow