This is a Gentle introduction to neural networks and deep learning. In this article I will use jupyter notebook with python and Keras.
Click here to have a idea on How to Install and Import Keras in Anaconda/Jupyter Notebooks.
There also exists Python packages for neural networks and deep learning. Here are some popular packages:
In this section, you will learn how to build and use a fully connected or dense neural network to perform simple classification tasks on images of hand-written digits.
To do this, we will use the MNIST data set that is included with the Keras package.
We will do this in four steps:
Loading data - Here we download the images into numpy arrays.
Constructing the network - In this step we specify the architecture of the network that we plan to use.
Training the network - In this step we use an optimization algorithm known as stochastic gradient descent to minimize a loss function of our choosing.
Testing the network - In this step, we run our test data through the network to determine the accuracy of our classifications.
import keras
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(type(train_images))
print(type(train_images[0,0,0]))
The MNIST data set contains two arrays, “train_images” and “test_images”. The training data will be used to train our algorithms. However, in order to access the accuracy of our resulting classifier, we will need a separate set of testing images. The training data has 60,000 grayscale images of size 28x28 while the testing data has 10,000 that are stored as numpy ND-arrays of unint8
integers.
The following example shows how to visualize the first image in the training set using the “imshow” function in the matplotlib package.
image = train_images[0,:,:,0] # Select first image
plt.imshow(image, cmap='gray') # Display as a gray scale image
ax = plt.gca() # Get handle to image
ax.grid(b=None) # Turn off grid
plt.show() # Show image
Deep neural networks typically consist of successive processing layers, with each layer formed by a linear transformation followed by a non-linear operation.
In this section, each layer of our network will be formed by a fully connected layer (Dense).
This approach is simple, but it tends to result in networks that require a great deal of computation and many parameters to optimize.
Here is how we define a sequential model from the keras.models package and how we add dense layers from the keras.layers package.
from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Flatten(input_shape=(28, 28, 1)))
network.add(layers.Dense(512, activation='relu'))
network.add(layers.Dense(10, activation='softmax'))
network.summary()
Sequential
function creates a placeholder named “network” for the newly created network.network.add(layers.Flatten(input_shape=(28, 28, 1)))
network.add()
- This command adds a layer to the model.input_shape=(28,28,1)
- This specifies that the input images will have size (28x28x1). This parameter only needs to be specified for the first layer. Subsequent layers infer the proper input size automatically.Flatten
- Since we are using a fully connected layer first, we reshape the input tensor into a 1-D vectornetwork.add(layers.Dense(512, activation='relu'))
layers.Dense()
- This command specifies that the layer will be a fully connected or “dense” linear network.512
- The first argument specifies that the there will be 512 outputs to the first first layer of the network.activation='relu'
- This specifies that the non-linear activation mapping will be a rectifying linear unit commonly known as a ReLU. This is a simple and effective choice.The next following command adds a second layer to the network:
network.add(layers.Dense(10, activation='softmax'))
The output dimension in 10 and the activation function for this layer is the “softmax” function which is a non-linear operator that converts the output into a pseudo probability distribution.
Finally, we can print a summary of our neural network we have constructed by using the command network.summary()
as shown below. The dimenstions that are of size None
correspond to the batch size. This also shows how many trainable parameters are in the network and where they are located.
Optimization:
In order to train the network, we will run an optimization engine that minimizes a loss function that we choose.
The key to optimization of neural networks is an algorithm called back propogation that computes the gradient of the loss function with repect to the network parameters or weights.
The following command sets up the optimizer and sets its parameters.
network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
Data preprocessing:
Since the weights in the neural network are initialized near zero, we also want to rescale the data into the range from 0 to 1 (i.e., roughly the same order of magnitude of the weights).
More information about initialization of weights can be found here.
train_images_nor = train_images.astype('float32') / 255
test_images_nor = test_images.astype('float32') / 255
from keras.utils import to_categorical
train_labels_cat = to_categorical(train_labels)
test_labels_cat = to_categorical(test_labels)
print(f'Training Labels Shape: {train_labels_cat.shape}')
print(f'Testing Labels Shape: {test_labels_cat.shape}')
Training:
The fit
command can next be used to run the optimization algorithm and train the network.
Key parameters are defined below:
train_images_nor
- This is the input data array used for training the network.train_labels_cat
- These are the labels used for training.epochs=5
- This specifies that the algorithm should run 5 epochs, where a single epoch is defined as one full pass through all the training data.batch_size=128
- This specifies that data is processed in batches of 128 training images. With each batch, a gradient update is performed for the network parameters.hist = network.fit(train_images_nor, train_labels_cat, epochs=5, batch_size=128)
test_loss, test_acc = network.evaluate(test_images_nor, test_labels_cat)
print('test_accuracy:', test_acc)
The optimization algorithm prints out the value of loss as well as the accuracy of the network immediately after each epoch during train.
Notice that the final accuracy is about 99% after 5 epochs of training, which seems quite good.
However, it isn’t clear if this result can be trusted since the network was provided the correct classes in the training process.
In order to better access the networks performance, we next run it on the new testing data to see its accuracy.
This step, which is often called “inference”, is run with the the network parameters that we learned from the training step.