实验要求可以参考deep learning的tutorial,Exercise:Convolution and Poling 卷积和池化。
本实验通过卷积神经网络对RGB彩色图像进行分类,先通过CNN网络从图像从学习得到3200维度的特征,然后训练四分类的softmax分类器进行分类。
整个网络可以包括四部分,线性解码器,卷积,池化和softmax回归。线性解码器的输入层8*8*3个neuron,隐含层为400个neuron(都不包括bias结点),输出层为8*8*3个neuron,通过线性解码器学习到特征。
卷积的大小为8*8(一层),池化大小为19*19平均池化(一层)。
在这种结构下,我们给定64*64*3大小的RGB图像,通过卷积操作得到400*57*57*3(64-8+1 =57)大小的矩阵(400为隐含层的个数,每一个为一个特征),为了方便实验中把RGB三个通道进行了求和,得到400*57*57大小的数据。然后进行池化操作,得到400*3*3(57*19=3)大小3维数据,然后这个三维数据转化成3600大小的向量来表示图像。通过这个网络后,每一张64*64*3大小的RGB图像就变成了3600大小的向量,然后通过softmax回归对图像进行分类。
实验中的数据集为STL-10图像集,每一个数据是大小为96x96标注的彩色图像,这数据属于airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck十个类中的一类。为了减少计算时间,使用中只采用了airplane, car, cat, dog四个类的图像。其中,训练集大小为2000,特测试集大小为3200.
实验数据以及预先把数据集表示成了一个四维矩阵,images(r, c, channel, image number),第一维为行,第二维为列,第三维为通道(RGB),第三位表示图像,根据这种表示方法,训练解的大小为64*64*3*2000。
实验中,实现对卷积和池化的代码实现进行了检验。从下图中,我们可以发现卷积和池化共分了8次进行(400/50=8),每一次进行50维度大小的计算,这样做是为了避免出现内存不足的情况。
最后用3200的特征训练四分类的softmax分类器,最后在测试集上的正确率为80.406%(平均池化)。实验中,我把平均池化用最大池化进行替代,最后得到的正确率为78.563%,从中可以发现选择不同的池化方式最最后的结果也会有比较大的影响。
实验结果一
正确率(平均池化)
正确率(最大池化)
源代码下载
cnnConvolve.m
function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch) %cnnConvolve Returns the convolution of the features given by W and b with %the given images % % Parameters: % patchDim - patch (feature) dimension % numFeatures - number of features % images - large images to convolve with, matrix in the form % images(r, c, channel, image number) % W, b - W, b for features from the sparse autoencoder % ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for % preprocessing % % Returns: % convolvedFeatures - matrix of convolved features in the form % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) numImages = size(images, 4); imageDim = size(images, 1); imageChannels = size(images, 3); convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); % Instructions: % Convolve every feature with every large image here to produce the % numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) % matrix convolvedFeatures, such that % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the % value of the convolved featureNum feature for the imageNum image over % the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1) % % Expected running times: % Convolving with 100 images should take less than 3 minutes % Convolving with 5000 images should take around an hour % (So to save time when testing, you should convolve with less images, as % described earlier) % -------------------- YOUR CODE HERE -------------------- % Precompute the matrices that will be used during the convolution. Recall % that you need to take into account the whitening and mean subtraction % steps WT = W*ZCAWhite;%等效的网络参数 b_mean = b - WT*meanPatch;%针对未均值化的输入数据需要加入该项 % -------------------------------------------------------- convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); for imageNum = 1:numImages for featureNum = 1:numFeatures % convolution of image with feature matrix for each channel convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1); for channel = 1:3 % Obtain the feature (patchDim x patchDim) needed during the convolution % ---- YOUR CODE HERE ---- feature = zeros(8,8); % You should replace this patchSize = 64; offset = (channel - 1)*patchSize; feature = reshape(WT(featureNum, offset+1 : offset+patchSize), 8, 8); % ------------------------ % Flip the feature matrix because of the definition of convolution, as explained later feature = flipud(fliplr(squeeze(feature))); % Obtain the image im = squeeze(images(:, :, channel, imageNum)); % Convolve "feature" with "im", adding the result to convolvedImage % be sure to do a 'valid' convolution % ---- YOUR CODE HERE ---- convolvedoneChannel = conv2(im, feature, 'valid'); convolvedImage = convolvedImage + convolvedoneChannel; % ------------------------ end % Subtract the bias unit (correcting for the mean subtraction as well) % Then, apply the sigmoid function to get the hidden activation % ---- YOUR CODE HERE ---- convolvedImage = sigmoid(convolvedImage+b_mean(featureNum)); % ------------------------ % The convolved feature is the sum of the convolved values for all channels convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage; end end end function sigm = sigmoid(x) sigm = 1./(1+exp(-x)); end
cnnPool.m
function pooledFeatures = cnnPool(poolDim, convolvedFeatures) %cnnPool Pools the given convolved features % % Parameters: % poolDim - dimension of pooling region % convolvedFeatures - convolved features to pool (as given by cnnConvolve) % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) % % Returns: % pooledFeatures - matrix of pooled features in the form % pooledFeatures(featureNum, imageNum, poolRow, poolCol) % numImages = size(convolvedFeatures, 2); numFeatures = size(convolvedFeatures, 1); convolvedDim = size(convolvedFeatures, 3); pooledFeatures = zeros(numFeatures, numImages, floor(convolvedDim / poolDim), floor(convolvedDim / poolDim)); % -------------------- YOUR CODE HERE -------------------- % Instructions: % Now pool the convolved features in regions of poolDim x poolDim, % to obtain the % numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim) % matrix pooledFeatures, such that % pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the % value of the featureNum feature for the imageNum image pooled over the % corresponding (poolRow, poolCol) pooling region % (see http://ufldl/wiki/index.php/Pooling ) % % Use mean pooling here. % -------------------- YOUR CODE HERE -------------------- resultDim = floor(convolvedDim / poolDim); for imageNum = 1:numImages for featureNum = 1:numFeatures for poolRow = 1:resultDim offsetRow = 1+(poolRow-1)*poolDim; for poolCol = 1:resultDim offsetCol = 1+(poolCol-1)*poolDim; patch = convolvedFeatures(featureNum,imageNum,offsetRow:offsetRow+poolDim-1,... offsetCol:offsetCol+poolDim-1); pooledFeatures(featureNum,imageNum,poolRow,poolCol) = mean(patch(:)); %均值池化 %pooledFeatures(featureNum,imageNum,poolRow,poolCol) = max(patch(:)); %最大池化 end end end end end