卷积神经网络进行图像识别

简介

这是一个机器学习和计算机视觉的项目,框架设计得非常好,需要自己填充核心代码。该项目以一个图像识别任务由浅入深,从实现knn、pca-knn完成30%-40%左右的准确率,到使用多层感知机实现50%的准确率,再到最后使用卷积神经网络实现60%左右的准确率,很有意思。


Machine Learning and Computer Vision

Problem 1: Install Tensorflow

Follow the directions on https://www.tensorflow.org/install/ to install Tensorflow on your computer.

Note: You will not need GPU support for this assignment so don’t worry if you don’t have one. Furthermore, installing with GPU support is often more difficult to configure so it is suggested that you install the CPU only version. However, if you have a GPU and would like to install GPU support feel free to do so at your own risk

Note: On windows, Tensorflow is only supported in python3, so you will need to install python3 for this assignment.

Run the following cell to verify your instalation.

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
b'Hello, TensorFlow!'

Problem 2: Downloading CIFAR10

Download the CIFAR10 dataset. You will need the python version.
Extract the data to ./data
Once extracted run the following cell to view a few example images.

import numpy as np

# unpickles raw data files
def unpickle(file):
    import pickle
    import sys
    with open(file, 'rb') as fo:
        if sys.version_info[0] < 3:
            dict = pickle.load(fo)
        else:
           dict = pickle.load(fo, encoding='bytes') 
    return dict

# loads data from a single file
def getBatch(file):
    dict = unpickle(file)
    data = dict[b'data'].reshape(-1,3,32,32).transpose(0,2,3,1)
    labels = np.asarray(dict[b'labels'], dtype=np.int64)
    return data,labels

# loads all training and testing data
def getData(path='./data'):
    classes = [s.decode('UTF-8') for s in unpickle(path+'/batches.meta')[b'label_names']]
    
    trainData, trainLabels = [], []
    for i in range(5):
        data, labels = getBatch(path+'/data_batch_%d'%(i+1))
        trainData.append(data)
        trainLabels.append(labels)
    trainData = np.concatenate(trainData)
    trainLabels = np.concatenate(trainLabels)
    
    testData, testLabels = getBatch(path+'/test_batch')
    return classes, trainData, trainLabels, testData, testLabels

# training and testing data that will be used in the following problems
classes, trainData, trainLabels, testData, testLabels = getData()

# display some example images
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(14, 6))
for i in range(14):
    plt.subplot(2,7,i+1)
    plt.imshow(trainData[i])
    plt.title(classes[trainLabels[i]])
plt.show()

print ('train shape: ' + str(trainData.shape) + ', ' + str(trainLabels.shape))
print ('test shape : ' + str(testData.shape) + ', ' + str(testLabels.shape))

卷积神经网络进行图像识别_第1张图片

train shape: (50000, 32, 32, 3), (50000,)
test shape : (10000, 32, 32, 3), (10000,)

Below are some helper functions that will be used in the following problems.

# a generator for batches of data
# yields data (batchsize, 3, 32, 32) and labels (batchsize)
# if shuffle, it will load batches in a random order
def DataBatch(data, label, batchsize, shuffle=True):
    n = data.shape[0]
    if shuffle:
        index = np.random.permutation(n)
    else:
        index = np.arange(n)
    for i in range(int(np.ceil(n/batchsize))):
        inds = index[i*batchsize : min(n,(i+1)*batchsize)]
        yield data[inds], label[inds]

# tests the accuracy of a classifier
def test(testData, testLabels, classifier):
    batchsize=50
    correct=0.
    for data,label in DataBatch(testData,testLabels,batchsize):
        prediction = classifier(data)
        #print (prediction)
        correct += np.sum(prediction==label)
    return correct/testData.shape[0]*100

# a sample classifier
# given an input it outputs a random class
class RandomClassifier():
    def __init__(self, classes=10):
        self.classes=classes
    def __call__(self, x):
        return np.random.randint(self.classes, size=x.shape[0])

randomClassifier = RandomClassifier()
print ('Random classifier accuracy: %f'%test(testData, testLabels, randomClassifier))
Random classifier accuracy: 9.530000

Problem 3: Confusion Matirx

Here you will implement a test script that computes the confussion matrix for a classifier.
The matrix should be nxn where n is the number of classes.
Entry M[i,j] should contain the number of times an image of class i was classified as class j.
M should be normalized such that each row sums to 1.

Hint: see the function test() above for reference.

def confusion(testData, testLabels, classifier):
    
    """your code here"""

    n = len(set(testLabels))
    prediction = classifier(testData)
    M = np.zeros((n,n))
    for i,j in zip(testLabels,prediction):
        M[i,j]+=1
    M=M/1000
    return M

def VisualizeConfussion(M):
    plt.figure(figsize=(14, 6))
    plt.imshow(M)#, vmin=0, vmax=1)
    plt.xticks(np.arange(len(classes)), classes, rotation='vertical')
    plt.yticks(np.arange(len(classes)), classes)
    plt.show()

M = confusion(testData, testLabels, randomClassifier)
VisualizeConfussion(M)

卷积神经网络进行图像识别_第2张图片

Problem 4: K-Nearest Neighbors (KNN)

Here you will implemnet a simple knn classifer. The distance metric is euclidian in pixel space. k refers to the number of neighbors involved in voting on the class.

Hint: you may want to use: sklearn.neighbors.KNeighborsClassifier

from sklearn.neighbors import KNeighborsClassifier
class KNNClassifer():
    def __init__(self, k=3):
        """your code here"""
        self.classes = k
        # k is the number of neighbors involved in voting
        
        
        
    def train(self, trainData, trainLabels):
        """your code here"""
        self.model = KNeighborsClassifier()
        n = trainData.shape[0]
        xn = trainData.shape[1]
        yn = trainData.shape[2]
        zn = trainData.shape[3]
        trainData = trainData.reshape((n,xn*yn*zn))
        self.model.fit(trainData, trainLabels)
        
        
        
    def __call__(self, x):
        """your code here"""
        n = x.shape[0]
        xn = x.shape[1]
        yn = x.shape[2]
        zn = x.shape[3]
        x = x.reshape((n,xn*yn*zn))
        y = self.model.predict(x)
        return y
    
        # this method should take a batch of images (batchsize, 32, 32, 3) and return a batch of prediction (batchsize)
        # predictions should be int64 values in the range [0,9] corrisponding to the class that the image belongs to
        
    
    
# test your classifier with only the first 100 training examples (use this while debugging)
# note you should get around 10-20% accuracy
knnClassiferX = KNNClassifer()
knnClassiferX.train(trainData[:100], trainLabels[:100])
print ('KNN classifier accuracy: %f'%test(testData, testLabels, knnClassiferX))
KNN classifier accuracy: 17.410000
# test your classifier with all the training examples (This may take a while)
# note you should get around 30% accuracy
knnClassifer = KNNClassifer()
knnClassifer.train(trainData, trainLabels)
print ('KNN classifier accuracy: %f'%test(testData, testLabels, knnClassifer))

# display confusion matrix for your KNN classifier with all the training examples
M = confusion(testData, testLabels, knnClassifer)
VisualizeConfussion(M)
KNN classifier accuracy: 33.980000

Problem 5: Principal Component Analysis (PCA) K-Nearest Neighbors (KNN)

Here you will implemnet a simple knn classifer in PCA space.
You should implement PCA yourself using svd (you may not use sklearn.decomposition.PCA
or any other package that directly implements PCA transofrmations

Hint: Don’t forget to apply the same normalization at test time.

Note: you should get similar accuracy to above, but it should run faster.

from sklearn.decomposition import PCA
class PCAKNNClassifer():
    def __init__(self, components=25, k=3):
        """your code here"""
        self.components = 25
        
    def train(self, trainData, trainLabels):
        """your code here"""
        self.model = KNeighborsClassifier()
        n = trainData.shape[0]
        xn = trainData.shape[1]
        yn = trainData.shape[2]
        zn = trainData.shape[3]
        trainData = trainData.reshape((n,xn*yn*zn))
        
        #pca求主成分
        Mat = np.array(trainData, dtype='float64')
        p,n = np.shape(Mat) # shape of Mat 
        t = np.mean(Mat, 0) # mean of each column
        Mat = Mat - t
        
        # covariance Matrix
        cov_Mat = np.dot(Mat.T, Mat)/(p-1)
        u,d,v = np.linalg.svd(cov_Mat)
        self.u = u 
        T2 = np.dot(Mat, u[:,:self.components])
        self.model.fit(T2, trainLabels)
        
        
    def __call__(self, x):
        """your code here"""
        n = x.shape[0]
        xn = x.shape[1]
        yn = x.shape[2]
        zn = x.shape[3]
        x = x.reshape((n,xn*yn*zn))
        
        Mat = np.array(x, dtype='float64')
        p,n = np.shape(Mat) # shape of Mat 
        t = np.mean(Mat, 0) # mean of each column
        Mat = Mat - t
        
        # covariance Matrix
        cov_Mat = np.dot(Mat.T, Mat)/(p-1)
         
        T = np.dot(Mat, self.u[:,:self.components])
        y = self.model.predict(T)
        return y

        
    
# test your classifier with only the first 100 training examples (use this while debugging)
pcaknnClassiferX = PCAKNNClassifer()
pcaknnClassiferX.train(trainData[:100], trainLabels[:100])
print ('PCA-KNN classifier accuracy: %f'%test(testData, testLabels, pcaknnClassiferX))
PCA-KNN classifier accuracy: 16.530000
# test your classifier with all the training examples (This may take a few minutes)
pcaknnClassifer = PCAKNNClassifer()
pcaknnClassifer.train(trainData, trainLabels)
print ('KNN classifier accuracy: %f'%test(testData, testLabels, pcaknnClassifer))

# display the confusion matrix
M = confusion(testData, testLabels, pcaknnClassifer)
VisualizeConfussion(M)
KNN classifier accuracy: 39.800000

卷积神经网络进行图像识别_第3张图片

Deep learning

Below is some helper code to train your deep networks

Hint: see https://www.tensorflow.org/get_started/mnist/pros or https://www.tensorflow.org/get_started/mnist/beginners for reference

# base class for your Tensorflow networks. It implements the training loop (train) and prediction(__call__)  for you.
# You will need to implement the __init__ function to define the networks structures in the following problems
class TFClassifier():
    def __init__(self):
        pass
    
    def train(self, trainData, trainLabels, epochs=1, batchsize=50):
        self.prediction = tf.argmax(self.y,1)
        self.cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.y_, logits=self.y))
        self.train_step = tf.train.AdamOptimizer(1e-4).minimize(self.cross_entropy)
        self.correct_prediction = tf.equal(self.prediction, self.y_)
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))
        
        self.sess.run(tf.global_variables_initializer())
        
        for epoch in range(epochs):
            for i, (data,label) in enumerate(DataBatch(trainData, trainLabels, batchsize, shuffle=True)):
                _, acc = self.sess.run([self.train_step, self.accuracy], feed_dict={self.x: data, self.y_: label})
                #if i%100==99:
                #    print ('%d/%d %d %f'%(epoch, epochs, i, acc))
                    
            print ('testing epoch:%d accuracy: %f'%(epoch+1, test(testData, testLabels, self)))
        
    def __call__(self, x):
        return self.sess.run(self.prediction, feed_dict={self.x: x})

# helper function to get weight variable
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(initial)

# helper function to get bias variable
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# example linear classifier
class LinearClassifer(TFClassifier):
    def __init__(self, classes=10):
        self.sess = tf.Session()

        self.x = tf.placeholder(tf.float32, shape=[None,32,32,3]) # input batch of images
        self.y_ = tf.placeholder(tf.int64, shape=[None]) # input labels

        # model variables
        self.W = weight_variable([32*32*3,classes])
        self.b = bias_variable([classes])

        # linear operation
        self.y = tf.matmul(tf.reshape(self.x,(-1,32*32*3)),self.W) + self.b
        
# test the example linear classifier (note you should get around 20-30% accuracy)
linearClassifer = LinearClassifer()
linearClassifer.train(trainData, trainLabels, epochs=20)

# display confusion matrix
M = confusion(testData, testLabels, linearClassifer)
VisualizeConfussion(M)
testing epoch:1 accuracy: 23.910000
testing epoch:2 accuracy: 27.150000
testing epoch:3 accuracy: 28.420000
testing epoch:4 accuracy: 26.790000
testing epoch:5 accuracy: 29.410000
testing epoch:6 accuracy: 28.210000
testing epoch:7 accuracy: 28.040000
testing epoch:8 accuracy: 29.030000
testing epoch:9 accuracy: 25.070000
testing epoch:10 accuracy: 25.520000
testing epoch:11 accuracy: 32.700000
testing epoch:12 accuracy: 26.960000
testing epoch:13 accuracy: 27.370000
testing epoch:14 accuracy: 29.700000
testing epoch:15 accuracy: 24.490000
testing epoch:16 accuracy: 27.470000
testing epoch:17 accuracy: 28.980000
testing epoch:18 accuracy: 29.040000
testing epoch:19 accuracy: 25.450000
testing epoch:20 accuracy: 28.610000

卷积神经网络进行图像识别_第4张图片

Problem 6: Multi Layer Perceptron (MLP)

Here you will implement an MLP. The MLP shoud consist of 3 linear layers (matrix multiplcation and bias offset) that map to the following feature dimensions:

32x32x3 -> hidden

hidden -> hidden

hidden -> classes

The first two linear layers should be followed with a ReLU nonlinearity. The final layer should not have a nonlinearity applied as we desire the raw logits output (see: the documentation for tf.nn.sparse_softmax_cross_entropy_with_logits used in the training)

The final output of the computation graph should be stored in self.y as that will be used in the training.

Hint: see the example linear classifier

Note: you should get around 50% accuracy

class MLPClassifer(TFClassifier):
    def __init__(self, classes=10, hidden=100):
        self.sess = tf.Session()

        self.x = tf.placeholder(tf.float32, shape=[None,32,32,3]) # input batch of images
        self.y_ = tf.placeholder(tf.int64, shape=[None]) # input labels

        """your code here"""
        #初始化输入层权重,尺寸为[32*32*3,1000]
        self.W1 = weight_variable([32*32*3,hidden*10])
        self.b1 = bias_variable([hidden*10])
        
        #隐层第一层与第二层之间的权重,尺寸为[1000,100]
        self.W2 = weight_variable([hidden*10,hidden])
        self.b2 = bias_variable([hidden])
        
        #隐层第二层与输出层之间的权重,尺寸为[100,10]
        self.W3 = weight_variable([hidden,classes])
        self.b3 = bias_variable([classes])
        
        #隐层第一层relu激活
        self.hidden1 = tf.nn.relu(tf.matmul(tf.reshape(self.x,(-1,32*32*3)), self.W1) + self.b1)

        #隐层第二层relu激活
        self.hidden2 = tf.nn.relu(tf.matmul(self.hidden1, self.W2) + self.b2)
        

        #输出层线性变化
        self.y = tf.matmul(self.hidden2, self.W3) + self.b3

# test your MLP classifier (note you should get around 50% accuracy)
mlpClassifer = MLPClassifer()
mlpClassifer.train(trainData, trainLabels, epochs=20)

# display confusion matrix
M = confusion(testData, testLabels, mlpClassifer)
VisualizeConfussion(M)
testing epoch:1 accuracy: 37.790000
testing epoch:2 accuracy: 42.510000
testing epoch:3 accuracy: 45.190000
testing epoch:4 accuracy: 45.060000
testing epoch:5 accuracy: 47.180000
testing epoch:6 accuracy: 48.400000
testing epoch:7 accuracy: 46.450000
testing epoch:8 accuracy: 48.190000
testing epoch:9 accuracy: 49.500000
testing epoch:10 accuracy: 50.230000
testing epoch:11 accuracy: 50.970000
testing epoch:12 accuracy: 49.440000
testing epoch:13 accuracy: 49.480000
testing epoch:14 accuracy: 49.980000
testing epoch:15 accuracy: 50.620000
testing epoch:16 accuracy: 51.400000
testing epoch:17 accuracy: 52.450000
testing epoch:18 accuracy: 50.270000
testing epoch:19 accuracy: 50.670000
testing epoch:20 accuracy: 51.900000

卷积神经网络进行图像识别_第5张图片

Problem 7: Convolutional Neural Netork (CNN)

Here you will implement a CNN with the following architecture:

ReLU( Conv(kernel_size=4x4 stride=2, output_features=n) )

ReLU( Conv(kernel_size=4x4 stride=2, output_features=n*2) )

ReLU( Conv(kernel_size=4x4 stride=2, output_features=n*4) )

Linear(output_features=classes)

def conv2d(x, W, stride=2):
    return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding='SAME')

class CNNClassifer(TFClassifier):
    def __init__(self, classes=10, n=16):
        self.sess = tf.Session()

        self.x = tf.placeholder(tf.float32, shape=[None,32,32,3]) # input batch of images
        self.y_ = tf.placeholder(tf.int64, shape=[None]) # input labels
        """your code here"""
        #初始化网络层权重
        conv1_weight = tf.Variable(tf.truncated_normal([4,4,3,n],stddev=0.05,dtype=tf.float32))
        conv1_bias = tf.Variable(tf.truncated_normal([n],stddev=0.05,dtype=tf.float32))
  
        conv2_weight = tf.Variable(tf.truncated_normal([4,4,n,n*2],stddev=0.05,dtype=tf.float32))
        conv2_bias = tf.Variable(tf.truncated_normal([n*2],stddev=0.05,dtype=tf.float32))
        
        conv3_weight = tf.Variable(tf.truncated_normal([4,4,n*2,n*4],stddev=0.05,dtype=tf.float32))
        conv3_bias = tf.Variable(tf.truncated_normal([n*4],stddev=0.05,dtype=tf.float32))
        
        #初始化cnn模型结构
        conv1 = conv2d(self.x,conv1_weight)
        relu1 = tf.nn.relu(tf.nn.bias_add(conv1,conv1_bias))
        max_pool1 = tf.nn.max_pool(relu1,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
        
        conv2 = conv2d(max_pool1,conv2_weight)
        relu2 = tf.nn.relu(tf.nn.bias_add(conv2,conv2_bias))
        max_pool2 = tf.nn.max_pool(relu2,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME') 
        
        conv3 = conv2d(max_pool2,conv3_weight)
        relu3 = tf.nn.relu(tf.nn.bias_add(conv3,conv3_bias))
        
        #线性输出
        self.W = weight_variable([n*4,classes])
        #self.W  = tf.Variable(tf.truncated_normal([n*4,classes],stddev=0.05,dtype=tf.float32))
        
        #self.b  = tf.Variable(tf.truncated_normal([classes],stddev=0.05,dtype=tf.float32))
        
        self.b = bias_variable([classes])

        # linear operation
        self.y = tf.matmul(tf.reshape(relu3,(-1,n*4)),self.W) + self.b


# test your CNN classifier (note you should get around 65% accuracy)
cnnClassifer = CNNClassifer()
cnnClassifer.train(trainData, trainLabels, epochs=20)

# display confusion matrix
M = confusion(testData, testLabels, cnnClassifer)
VisualizeConfussion(M)
testing epoch:1 accuracy: 40.500000
testing epoch:2 accuracy: 43.380000
testing epoch:3 accuracy: 46.930000
testing epoch:4 accuracy: 48.190000
testing epoch:5 accuracy: 50.180000
testing epoch:6 accuracy: 51.990000
testing epoch:7 accuracy: 53.040000
testing epoch:8 accuracy: 53.060000
testing epoch:9 accuracy: 54.430000
testing epoch:10 accuracy: 55.080000
testing epoch:11 accuracy: 55.130000
testing epoch:12 accuracy: 55.900000
testing epoch:13 accuracy: 56.700000
testing epoch:14 accuracy: 56.630000
testing epoch:15 accuracy: 56.710000
testing epoch:16 accuracy: 57.580000
testing epoch:17 accuracy: 57.630000
testing epoch:18 accuracy: 58.490000
testing epoch:19 accuracy: 59.120000
testing epoch:20 accuracy: 59.420000

卷积神经网络进行图像识别_第6张图片
有需要数据自己动手实验的的请私信楼主。

你可能感兴趣的:(课程学习笔记,神经网络,图像识别,卷积,tensorflow)