利用tensorflow一步一步实现基于MNIST 数据集进行手写数字识别的神经网络,逻辑回归

MNIST from scratch



This notebook walks through an example oftraining a TensorFlow model to do digit classification using the MNIST dataset. MNIST is a labeled set of images of handwritten digits.

这次我们来看一个使用MNIST 数据集来训练一个TensorFlow 模型,从而实现数字分类。MNIST 数据集是一系列经过标签处理过的手写数字图片。


An example follows.


We're going to be building a model thatrecognizes these digits as 5, 0, and 4.


Imports and input data


We'll proceed in steps, beginning withimporting and inspecting the MNIST data. This doesn't have anything to do withTensorFlow in particular -- we're just downloading the data archive.

我们将一步一步进行,首先是导入MNIST数据,并查看下MNIST 数据的结构。当然这步和TensorFlow 没有什么特别关心。我们只是下载这些数据包



import os

from six.moves.urllib.request importurlretrieve

SOURCE_URL ='https://storage.googleapis.com/cvdf-datasets/mnist/'


WORK_DIRECTORY ="/tmp/mnist-data"



def maybe_download(filename):

   """A helper to download the data files if notpresent."""

   if not os.path.exists(WORK_DIRECTORY):


   filepath = os.path.join(WORK_DIRECTORY, filename)

   if not os.path.exists(filepath):

       filepath, _ = urlretrieve(SOURCE_URL + filename, filepath)

       statinfo = os.stat(filepath)

       print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')


       print('Already downloaded', filename)

   return filepath

#以上函数的功能是输入一个文件名,并判断本地是否已经有,如果没有相关目录和文件,则会创建一个,然后到通过urrretrieve 去下载并存到那个目录下的。


train_data_filename =maybe_download('train-images-idx3-ubyte.gz')

train_labels_filename =maybe_download('train-labels-idx1-ubyte.gz')

test_data_filename = maybe_download('t10k-images-idx3-ubyte.gz')

test_labels_filename =maybe_download('t10k-labels-idx1-ubyte.gz')



Working with the images


Now we have the files, but the formatrequires a bit of pre-processing before we can work with it. The data isgzipped, requiring us to decompress it. And, each of the images aregrayscale-encoded with values from [0, 255]; we'll normalize these to [-0.5,0.5].

下载我们已经把文件下载下来了,但是这些格式需要我们预处理下,否则我们还不能直接使用它,因为这些数据是gzipped 压缩的,需要我们解压。而且,没有图片是基于黑白编码的,他们的数字范围是[0255],我们将标准化他们到[-0.50.5]之间。

Let's try to unpack the data using thedocumented format:


[offset] [type]          [value]          [description]

0000    32 bit integer  0x00000803(2051)magic number

0004    32 bit integer  60000            number of images

0008    32 bit integer  28               number of rows

0012    32 bit integer  28               number of columns

0016    unsigned byte   ??               pixel

0017    unsigned byte   ??               pixel


xxxx    unsigned byte   ??               pixel



Pixels are organized row-wise. Pixel valuesare 0 to 255. 0 means background (white), 255 means foreground (black).



We'll start by reading the first image fromthe test data as a sanity check.




import gzip, binascii, struct, numpy

import matplotlib.pyplot as plt


with gzip.open(test_data_filename) as f:

    #Print the header fields.

   for field in ['magic number', 'image count', 'rows', 'columns']:

       # struct.unpack reads the binary data provided by f.read.

       # The format string '>i' decodes a big-endian integer, which

       # is the encoding of the data.

       print(field, struct.unpack('>i', f.read(4))[0])


    #Read the first 28x28 set of pixel values.

    #Each pixel is one byte, [0, 255], a uint8.

buf = f.read(28* 28)

#由于我们知道图片的大小是28x28 的,因此读取一幅图片的字节大小为28x28。

   image = numpy.frombuffer(buf, dtype=numpy.uint8)


    #Print the first few values of image.

print('First 10pixels:', image[:10])


The first 10pixels are all 0 values. Not very interesting, but also unsurprising. We'dexpect most of the pixel values to be the background color, 0.


We could printall 28 * 28 values, but what we really need to do to make sure we're readingour data properly is look at an image.






# We'll showthe image and its pixel value histogram side-by-side.

_, (ax1, ax2) =plt.subplots(1, 2)


# To interpretthe values as a 28x28 image, we need to reshape

# the numpyarray, which is one dimensional.

ax1.imshow(image.reshape(28,28), cmap=plt.cm.Greys);

# cmap: 颜色图谱(colormap), 默认绘制为RGB(A)颜色空间。


ax2.hist(image,bins=20, range=[0,255]);



The largenumber of 0 values correspond to the background of the image, another large massof value 255 is black, and a mix of grayscale transition values in between.


Both the imageand histogram look sensible. But, it's good practice when training image modelsto normalize values to be centered around 0.



We'll do thatnext. The normalization code is fairly short, and it may be tempting to assumewe haven't made mistakes, but we'll double-check by looking at the renderedinput and histogram again. Malformed inputs are a surprisingly common source oferrors when developing new models.



# Let's convertthe uint8 image to 32 bit floats and rescale

# the values tobe centered around 0, between [-0.5, 0.5].


# We again plotthe image and histogram to check that we

# haven'tmangled the data.

scaled =image.astype(numpy.float32)

scaled =(scaled - (255 / 2.0)) / 255

_, (ax1, ax2) =plt.subplots(1, 2)

ax1.imshow(scaled.reshape(28,28), cmap=plt.cm.Greys);

ax2.hist(scaled,bins=20, range=[-0.5, 0.5]);


上面这段代码就是把uint8 先转为32位浮点数,然后再进行数字范围转换到[-0.50.5]这样这些数的中心就变为0


Great -- we'veretained the correct image data while properly rescaling to the range [-0.5,0.5].



Reading thelabels



Let's nextunpack the test label data. The format here is similar: a magic number followedby a count followed by the labels as uint8 values. In more detail:



[offset][type]          [value]          [description]

0000     32 bit integer  0x00000801(2049) magic number (MSB first)

0004     32 bit integer  10000            number of items

0008     unsigned byte   ??               label

0009     unsigned byte   ??               label


xxxx     unsigned byte   ??               label


As with theimage data, let's read the first test set value to sanity check our input path.We'll expect a 7.



withgzip.open(test_labels_filename) as f:

    # Print the header fields.

    for field in ['magic number', 'labelcount']:

        print(field, struct.unpack('>i',f.read(4))[0])


    print('First label:', struct.unpack('B',f.read(1))[0])


magic number2049

label count10000

First label: 7


indeed, thefirst label of the test set is 7.

Forming thetraining, testing, and validation data sets



Now that weunderstand how to read a single element, we can read a much larger set thatwe'll use for training, testing, and validation.


Image data


The code belowis a generalization of our prototyping above that reads the entire test andtraining data set.

下面的代码是实现以上介绍的所有内容的一个函数实现,包括读取整个测试和训练数据集。大家可以使用python 去测试下。





defextract_data(filename, num_images):

    """Extract the images into a4D tensor [image index, y, x, channels].

      For MNIST data, the number of channels isalways 1.

    Values are rescaled from [0, 255] down to[-0.5, 0.5].


    print('Extracting', filename)

    with gzip.open(filename) as bytestream:

        # Skip the magic number and dimensions;we know these values.



        buf = bytestream.read(IMAGE_SIZE *IMAGE_SIZE * num_images)

        data = numpy.frombuffer(buf,dtype=numpy.uint8).astype(numpy.float32)

        data = (data - (PIXEL_DEPTH / 2.0)) /PIXEL_DEPTH

        data = data.reshape(num_images,IMAGE_SIZE, IMAGE_SIZE, 1)

        return data


train_data =extract_data(train_data_filename, 60000)

test_data =extract_data(test_data_filename, 10000)






A crucialdifference here is how we reshape the array of pixel values. Instead of oneimage that's 28x28, we now have a set of 60,000 images, each one being 28x28. Wealso include a number of channels, which for grayscale images as we have hereis 1.

在这里有一个关键的不同,那就是我们将使用reshape 来重新规整像素值为一维数组,而不是一幅图像使用28x28二维数组,我们有60000张图,每一个是28x28,这里也包括通道数。因为我们这里是灰度(黑白)图像,因此我们的通道数是1

Let's make surewe've got the reshaping parameters right by inspecting the dimensions and thefirst two images. (Again, mangled input is a very common source of errors.)

我们知道输入上的错误是一个比较常见的错误,因此有必要让我们来确保下我们的reshaping 参数正确:我们将检测下维数及最开始的2张图片。


print('Trainingdata shape', train_data.shape)

_, (ax1, ax2) =plt.subplots(1, 2)

ax1.imshow(train_data[0].reshape(28,28), cmap=plt.cm.Greys);

ax2.imshow(train_data[1].reshape(28,28), cmap=plt.cm.Greys);


Training datashape (60000, 28, 28, 1)


Looks good. Nowwe know how to index our full set of training and test images.



Label data



Let's move onto loading the full set of labels. As is typical in classification problems,we'll convert our input labels into a 1-hot encoding over a length 10 vectorcorresponding to 10 digits. The vector [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], forexample, would correspond to the digit 1.

让我们继续去获取完整的标签数据集,因为这是一个典型的分类问题,我们将转换我们的输入标签为一个1-hot 编码格式,在这10数字的向量里只有一个是1,其它都是0

比如这样的一个向量[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],它在第2位为1,对于得数字就是1

[1,0, 0, 0, 0, 0, 0, 0, 0, 0],这个在第一位是1,因此对应的数字就是0




def  extract_labels(filename, num_images):

    """Extract the labels into a1-hot matrix [image index, label index]."""

    print('Extracting', filename)

    with gzip.open(filename) as bytestream:

        # Skip the magic number and count; weknow these values.


        buf = bytestream.read(1 * num_images)

        labels = numpy.frombuffer(buf,dtype=numpy.uint8)

    # Convert to dense 1-hot representation.

    return (numpy.arange(NUM_LABELS) ==labels[:, None]).astype(numpy.float32)


train_labels =extract_labels(train_labels_filename, 60000)

test_labels =extract_labels(test_labels_filename, 10000)





As with ourimage data, we'll double-check that our 1-hot encoding of the first few valuesmatches our expectations.

现在我们已经有了我们自己转换好的图片数据,我们将再次检查下我们的1-hot 编码,我们利用前几个数字来看下是否符合我们的预期结果。


print('Traininglabels shape', train_labels.shape)

print('Firstlabel vector', train_labels[0])

print('Secondlabel vector', train_labels[1])



Training labelsshape (60000, 10)

First labelvector [ 0.  0.  0. 0.  0.  1. 0.  0.  0.  0.]

Second labelvector [ 1.  0.  0. 0.  0.  0. 0.  0.  0.  0.]


The 1-hotencoding looks reasonable.


Segmenting datainto training, test, and validation


The final stepin preparing our data is to split it into three sets: training, test, andvalidation. This isn't the format of the original data set, so we'll take asmall slice of the training data and treat that as our validation set.






validation_data= train_data[:VALIDATION_SIZE, :, :, :]

validation_labels= train_labels[:VALIDATION_SIZE]

train_data =train_data[VALIDATION_SIZE:, :, :, :]

train_labels =train_labels[VALIDATION_SIZE:]


train_size =train_labels.shape[0]


print('Validationshape', validation_data.shape)

print('Trainsize', train_size)


Validationshape (5000, 28, 28, 1)

Train size55000



Defining themodel


Now that we'veprepared our data, we're ready to define our model.

既然我们已经准备好了所有的数据,现在我们要定义我们的模型了。真正使用tensorflow 创建神经网络开始了,让我们全神贯注起来吧。


The commentsdescribe the architecture, which fairly typical of models that process imagedata. The raw input passes through several convolution and max pooling layerswith rectified linear activations before several fully connected layers and asoftmax loss for predicting the output class. During training, we use dropout.


原始的图片数据经过几个卷积层和最大池化层,再通过校正后的线性激活函数,传人到几个全连接网络层,网络层输出后给到softmax 进行损失计算。从而来预测属于哪个类别。注意,在我们训练的时候还要使用dropout 功能,以防过度拟合。



We'll separateour model definition into three steps:


    Defining the variables that will hold thetrainable weights.

    Defining the basic model graph structuredescribed above. And,

    Stamping out several copies of the modelgraph for training, testing, and validation.





We'll startwith the variables.


import tensorflowas tf


# We'll bundlegroups of examples during training for efficiency.


# This definesthe size of the batch.



# We have onlyone channel in our grayscale images.



# The randomseed that defines initialization.


SEED = 42


# This is wheretraining samples and labels are fed to the graph.


# Theseplaceholder nodes will be fed a batch of training data at each

# trainingstep, which we'll write once we define the graph structure.




train_data_node= tf.placeholder(tf.float32,


train_labels_node= tf.placeholder(tf.float32,



# For thevalidation and test data, we'll just hold the entire dataset in

# one constantnode.


validation_data_node= tf.constant(validation_data)

test_data_node= tf.constant(test_data)


# The variablesbelow hold all the trainable weights. For each, the

# parameterdefines how the variables will be initialized.


conv1_weights =tf.Variable(

  tf.truncated_normal([5, 5, NUM_CHANNELS,32],  # 5x5 filter, depth 32.

                      stddev=0.1, seed=SEED))

conv1_biases =tf.Variable(tf.zeros([32]))

conv2_weights =tf.Variable(

  tf.truncated_normal([5, 5, 32, 64],stddev=0.1, seed=SEED))

conv2_biases = tf.Variable(tf.constant(0.1,shape=[64]))

fc1_weights =tf.Variable(  # fully connected, depth512.

  tf.truncated_normal([IMAGE_SIZE // 4 *IMAGE_SIZE // 4 * 64, 512],

                      stddev=0.1, seed=SEED))

fc1_biases =tf.Variable(tf.constant(0.1, shape=[512]))

fc2_weights =tf.Variable(

  tf.truncated_normal([512, NUM_LABELS],stddev=0.1, seed=SEED))

fc2_biases =tf.Variable(tf.constant(0.1, shape=[NUM_LABELS]))






Now that we'vedefined the variables to be trained, we're ready to wire them together into aTensorFlow graph.

既然我们已经定义好了用于训练的变量,我们下一步将把他们组织到一个TensorFLow 图谱里来。

We'll define ahelper to do this, model, which will return copies of the graph suitable fortraining and testing. Note the train argument, which controls whether or notdropout is used in the hidden layer. (We want to use dropout only duringtraining.)

我们将定义一个函数来实现它,训练模型,到时候我们就可以进行函数调用,从而可以分别进行训练和测试。注意,在这个函数的参数里,有一个train 参数,它将决定是否需要使用在隐藏层里使用dropout 功能,因为我们只需要在训练的时候使用它,但把它设置为训练=true 的时候,这个时候才会使用dropout 功能。

def model(data,train=False):

    """The Model definition."""

    # 2D convolution, with 'SAME' padding (i.e.the output feature map has

    # the same size as the input). Note that{strides} is a 4D array whose

    # shape matches the data layout: [imageindex, y, x, depth].

    #2D 卷积,使用SAME 填充,这样确保输出特征地图和输入会有相同的大小。

    #主要边带是4D数组,它的形状遵循这样的数据排列: [image index, y, x, depth]

    conv = tf.nn.conv2d(data,


                        strides=[1, 1, 1, 1],



    # Bias and rectified linear non-linearity.


#Relu 是这样一个作用:小于0的值就变成0(非线性了),大于0的等于它本身(线性化)

    relu = tf.nn.relu(tf.nn.bias_add(conv,conv1_biases))


    # Max pooling. The kernel size spec ksizealso follows the layout of

    # the data. Here we have a pooling windowof 2, and a stride of 2.



    pool = tf.nn.max_pool(relu, ksize=[1, 2, 2,1],

                          strides=[1, 2, 2, 1],


    conv = tf.nn.conv2d(pool,


                        strides=[1, 1, 1, 1],


    relu = tf.nn.relu(tf.nn.bias_add(conv,conv2_biases))

    pool = tf.nn.max_pool(relu,

                          ksize=[1, 2, 2, 1],

                          strides=[1, 2, 2, 1],



    # Reshape the feature map cuboid into a 2Dmatrix to feed it to the

    # fully connected layers.

   #Reshape 特征长方体图形映射到2D矩阵,用来填充到全连接层。

    pool_shape = pool.get_shape().as_list()

    reshape = tf.reshape(pool,

        [pool_shape[0], pool_shape[1] *pool_shape[2] * pool_shape[3]])


    # Fully connected layer. Note that the '+'operation automatically

    # broadcasts the biases.


    hidden = tf.nn.relu(tf.matmul(reshape,fc1_weights) + fc1_biases)


    # Add a 50% dropout during training only.Dropout also scales

    # activations such that no rescaling isneeded at evaluation time.

   #对于训练过程,增加一个50% dropout (舍弃). Dropout 的时候会缩放激活量,


    if train:

        hidden = tf.nn.dropout(hidden, 0.5,seed=SEED)

    return tf.matmul(hidden, fc2_weights) +fc2_biases





Having definedthe basic structure of the graph, we're ready to stamp out multiple copies fortraining, testing, and validation.



Here, we'll dosome customizations depending on which graph we're constructing.train_prediction holds the training graph, for which we use cross-entropy lossand weight regularization. We'll adjust the learning rate during training --that's handled by the exponential_decay operation, which is itself an argumentto the MomentumOptimizer that performs the actual training.



The vaildationand prediction graphs are much simpler the generate -- we need only createcopies of the model with the validation and test inputs and a softmaxclassifier as the output.


# Trainingcomputation: logits + cross-entropy loss.

logits =model(train_data_node, True)

loss =tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(

  labels=train_labels_node, logits=logits))


# L2regularization for the fully connected parameters.

 L2 正则化所有全连接参数。

regularizers =(tf.nn.l2_loss(fc1_weights) + tf.nn.l2_loss(fc1_biases) +

                tf.nn.l2_loss(fc2_weights) +tf.nn.l2_loss(fc2_biases))

# Add theregularization term to the loss.


loss += 5e-4 *regularizers


# Optimizer:set up a variable that's incremented once per batch and

# controls thelearning rate decay.

优化器,设置一个变量来累计数,每一个batch 后增加1。并控制我们学习率的衰减。

batch =tf.Variable(0)

# Decay onceper epoch, using an exponential schedule starting at 0.01.


learning_rate =tf.train.exponential_decay(

  0.01,                # Base learning rate.学习率为0.01

  batch * BATCH_SIZE,  # Current index into the dataset.

  train_size,          # Decay step.

  0.95,                # Decay rate.


# Use simplemomentum for the optimization.


optimizer =tf.train.MomentumOptimizer(learning_rate,

 0.9).minimize(loss, global_step=batch)


# Predictionsfor the minibatch, validation set and test set.


train_prediction= tf.nn.softmax(logits)

# We'll computethem only once in a while by calling their {eval()} method.


validation_prediction= tf.nn.softmax(model(validation_data_node))

test_prediction= tf.nn.softmax(model(test_data_node))





Training andvisualizing results


Now that wehave the training, test, and validation graphs, we're ready to actually gothrough the training loop and periodically evaluate loss and error.


All of theseoperations take place in the context of a session. In Python, we'd writesomething like:



withtf.Session() as s:

  ...training / test / evaluation loop...


But, here,we'll want to keep the session open so we can poke at values as we work out thedetails of training. The TensorFlow API includes a function for this,InteractiveSession.

但是这次,我们想要保持会话期一直打开,这样我们就可以抓取一些数字,从而可以让我们看到更详细的训练过程。TensorFLow API 以及包含有一个这样的功能了,那就是InteractiveSession. 交互式会话期。


We'll start bycreating a session and initializing the varibles we defined above.


# Create a newinteractive session that we'll use in # subsequent code cells.


s =tf.InteractiveSession()


# Use our newlycreated session as the default for

# subsequentoperations.




# Initializeall the variables we defined above.




Now we're readyto perform operations on the graph. Let's start with one round of training.We're going to organize our training steps into batches for efficiency; i.e.,training using a small set of examples at each step rather than a singleexample.





# Grab thefirst BATCH_SIZE examples and labels.

 抓取第一份batch 个图片和标签。

batch_data =train_data[:BATCH_SIZE, :, :, :]

batch_labels =train_labels[:BATCH_SIZE]


# Thisdictionary maps the batch data (as a numpy array) to the

# node in thegraph it should be fed to.



feed_dict ={train_data_node: batch_data,

             train_labels_node: batch_labels}


# Run the graphand fetch some of the nodes.


_, l, lr,predictions = s.run(

  [optimizer, loss, learning_rate,train_prediction],





Let's take alook at the predictions. How did we do? Recall that the output will beprobabilities over the possible classes, so let's look at those probabilities.

让我们看下predictions 里的数据,我们该怎么做呢?让我们回想一下,输出代表着可能的类的概率,所以让我们看看这些概率值是多少呢?


[  2.25393116e-04   4.76219611e-05   1.66867452e-03   5.67827519e-05

   6.03432178e-01   4.34969068e-02   2.19316553e-05   1.41286102e-04

   1.54903100e-05   3.50893795e-01]

As expectedwithout training, the predictions are all noise. Let's write a scoring functionthat picks the class with the maximum probability and compares with theexample's label. We'll start by converting the probability vectors returned bythe softmax into predictions we can match against the labels.

如我们所期待,这些是没有经过训练的结果,这些数据都是一些混乱的,如噪声一样。让我们写一个打分函数,它将抓取最大的概率值,然后去和我们实际的标签值做个对比。我们使用softmax 来转换这个概率向量,并实现分类,从而跟我们的实际的标签向量比较。


# The highestprobability in the first entry.


print('Firstprediction', numpy.argmax(predictions[0]))


# But,predictions is actually a list of BATCH_SIZE probability vectors.




# So, we'lltake the highest probability for each vector.


print('Allpredictions', numpy.argmax(predictions, 1))

Firstprediction 4


(60, 10)

All predictions[4 4 2 7 7 7 7 7 7 7 7 7 0 8 9 0 7 7 0 7 4 0 5 0 9 9 7 0 7 4 7 7 7 0 7 7 9

 7 9 9 0 7 7 7 2 7 0 7 2 9 9 9 9 9 0 7 9 4 8 7]


Next, we can dothe same thing for our labels -- using argmax to convert our 1-hot encodinginto a digit class.

下一步,我们要对标签数据做一些处理,使用argmax 函数来转换1-hot 编码到一个数字类别。

print('Batchlabels', numpy.argmax(batch_labels, 1))

Batch labels [73 4 6 1 8 1 0 9 8 0 3 1 2 7 0 2 9 6 0 1 6 7 1 9 7 6 5 5 8 8 3 4 4 8 7 3

 6 4 6 6 3 8 8 9 9 4 4 0 7 8 1 0 0 1 8 5 7 1 7]

Now we cancompare the predicted and label classes to compute the error rate and confusionmatrix for this batch.

correct =numpy.sum(numpy.argmax(predictions, 1) == numpy.argmax(batch_labels, 1))

total =predictions.shape[0]


print(float(correct)/ float(total))


confusions =numpy.zeros([10, 10], numpy.float32)

bundled =zip(numpy.argmax(predictions, 1), numpy.argmax(batch_labels, 1))

for predicted,actual in bundled:

  confusions[predicted, actual] += 1





plt.imshow(confusions,cmap=plt.cm.jet, interpolation='nearest');



Now let's wrapthis up into our scoring function.


deferror_rate(predictions, labels):

    """Return the error rate andconfusions."""

    correct = numpy.sum(numpy.argmax(predictions,1) == numpy.argmax(labels, 1))

    total = predictions.shape[0]

    error = 100.0 - (100 * float(correct) /float(total))

    confusions = numpy.zeros([10, 10],numpy.float32)

    bundled = zip(numpy.argmax(predictions, 1),numpy.argmax(labels, 1))

    for predicted, actual in bundled:

        confusions[predicted, actual] += 1

        return error, confusions




We'll need totrain for some time to actually see useful predicted values. Let's define aloop that will go through our data. We'll print the loss and errorperiodically.


Here, we wantto iterate over the entire data set rather than just the first batch, so we'llneed to slice the data to that end.


(One passthrough our training set will take some time on a CPU, so be patient if you areexecuting this notebook.)



# Train overthe first 1/4th of our training set.

steps=train_size // BATCH_SIZE

for step inrange(steps):

            # Compute the offset of the currentminibatch in the data.

            # Note that we could use betterrandomization across epochs.

      offset = (step * BATCH_SIZE) %(train_size - BATCH_SIZE)

      batch_data = train_data[offset:(offset +BATCH_SIZE), :, :, :]

      batch_labels = train_labels[offset:(offset+ BATCH_SIZE)]

      # This dictionary maps the batchdata (asa numpy array) tothe

      # node in the graph it should be fed to.

      feed_dict = {train_data_node: batch_data,train_labels_node: batch_lbels}

      #Run the graph and fech some f the nodes.

      _, l, lr, predictions = s.run([optimizer, loss, learning_rate, train_prediction],


       # Print out the loss periodically.

       if step % 100 == 0:

       error,_ = error_rate(predictions, bach_labels)

       print('Step %d of %d')


Step 0 of 916

Mini-batchloss: 7.71249 Error: 91.66667 Learning rate: 0.01000

Validationerror: 88.9%

Step 100 of 916

Mini-batchloss: 3.28715 Error: 8.33333 Learning rate: 0.01000

Validationerror: 5.8%

Step 200 of 916



The error seemsto have gone down. Let's evaluate the results using the test set.


To helpidentify rare mispredictions, we'll include the raw count of each (prediction,label) pair in the confusion matrix.


test_error,confusions = error_rate(test_prediction.eval(), test_labels)

print('Testerror: %.1f%%' % test_error)







plt.imshow(confusions,cmap=plt.cm.jet, interpolation='nearest');


for i, cas inenumerate(confusions):

    for j, count in enumerate(cas):

        if count > 0:

            xoff = .07 * len(str(count))

            plt.text(j-xoff, i+.2, int(count),fontsize=9, color='white')


Test error:2.0%


We can see herethat we're mostly accurate, with some errors you might expect, e.g., '9' isoften confused as '4'.



Let's doanother sanity check to make sure this matches roughly the distribution of ourtest set, e.g., it seems like we have fewer '5' values.






Indeed, weappear to have fewer 5 labels in the test set. So, on the whole, it seems likeour model is learning and our early results are sensible.


But, we've onlydone one round of training. We can greatly improve accuracy by training forlonger. To try this out, just re-execute the training cell above.

