Convolution and Pooling

      博文参考standford UFLDL教程working with large images小节。


1、卷积特征提取

之前做过的练习如sparse autoencoders、softmax regression、stacked autoencoders等处理的都是比较小的图像,如8x8啊,28x28啊,那时用的是全联通网络(full connected networks),就是隐含层的每个单元都是与输入层的全部单元连接的,如果图像很大的话,比如96*96,那么每个隐含层的单元都要有96*96个权重,如果要学习100个特征的话,就有接近100w个权重了,这么多权重参数学习速度会很慢,而且容易导致过拟合。一个解决办法是使用部分联通网络(locally connected networks)。

     

     部分联通网络的优点:

  1. 由于隐含单元与输入单元的连接有了限制,每个隐含单元仅仅连接输入层的一部分,要学习的参数大大减少;对于图像而言,每个隐含单元仅仅连接图像中的某一块小区域;
  2. 这种部分联通的网络结构符合生物学里面的视觉神经系统,视觉皮层的神经元是局部接受信息的(神经元只响应部分区域的刺激)。
     
      用一个特征矩阵从图片中学习到卷积特征,这个特征矩阵只对图片中大小相等的矩阵作卷积,所以特征矩阵会从图片的不同区域作卷积得到一个响应度矩阵即卷积特征矩阵。如果图片大小是rxc的话,有k个特征矩阵,每个特征矩阵大小为axb,那么每张图片就可以学习到k个大小为(r-a+1)x(c-b+1)的卷积特征。
    
2、池化
      我们卷积之后的特征向量维数还是很大,用这样的特征去训练分类器还是会过拟合,一种办法是进行池化(pooling),池化是对卷积后的特征进行聚合,可以有平均池化(mean pooling)和最大池化(max pooling),池化的作用是具有平移不变性,一个图像某个区域平移到另一块区域后,通过卷积特征后池化还是具有一样的效果的。


3、Exercise: Convolution and Pooling
          该实验是用之前Linear Decoders训练出来的特征来对数据进行卷积、池化的得到训练集和测试集,然后用softmax训练分类器,因为在Linear Decoders中用的也是STL-10数据集,是8x8的RGB patches的特征,这里也是用STL-10数据集,是64x64的GRB图片,可以进行8x8的卷积。

    matlab基础知识:
  1. randi([m n])返回[m,n]中的一个随机整数;
  2. squeeze(a)是去掉a中维度大小为1的维,比如a=rand(2,1,3),squeeze(a)后变为2x3,但是元素还是一样的;
    实验重要函数说明:
  1. convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch);这个函数是求卷积特征,卷积矩阵大小为patchDim*patchDim,有numFeatures个特征(隐含层单元数),images是要被卷积的图片数据,W是权重矩阵,ZCAWhite是白化矩阵,b、meanPatch分别是长度为patchDim*patchDim的偏置项、图片patch平均值;
  2. pooledFeatures = cnnPool(poolDim, convolvedFeatures);对poolDim*poolDim大小的矩阵进行池化。

      要注意的是由于之前学特征时用了白化处理,所以在进行卷积特征提取时也要进行一样的处理。

     实验结果:Accuracy: 80.281%

    matlab代码:
    
    cnnExercise.m
%% CS294A/CS294W Convolutional Neural Networks Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  convolutional neural networks exercise. In this exercise, you will only
%  need to modify cnnConvolve.m and cnnPool.m. You will not need to modify
%  this file.

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

imageDim = 64;         % image dimension
imageChannels = 3;     % number of channels (rgb, so 3)

patchDim = 8;          % patch dimension
numPatches = 50000;    % number of patches

visibleSize = patchDim * patchDim * imageChannels;  % number of input units 
outputSize = visibleSize;   % number of output units
hiddenSize = 400;           % number of hidden units 

epsilon = 0.1;	       % epsilon for ZCA whitening

poolDim = 19;          % dimension of pooling region

%%======================================================================
%% STEP 1: Train a sparse autoencoder (with a linear decoder) to learn 
%  features from color patches. If you have completed the linear decoder
%  execise, use the features that you have obtained from that exercise, 
%  loading them into optTheta. Recall that we have to keep around the 
%  parameters used in whitening (i.e., the ZCA whitening matrix and the
%  meanPatch)

% --------------------------- YOUR CODE HERE --------------------------
% Train the sparse autoencoder and fill the following variables with 
% the optimal parameters:

optTheta =  zeros(2*hiddenSize*visibleSize+hiddenSize+visibleSize, 1);
ZCAWhite =  zeros(visibleSize, visibleSize);
meanPatch = zeros(visibleSize, 1);

%载入之前linear decoder练习中学到的参数
load 'STL10Features.mat'

% --------------------------------------------------------------------

% Display and check to see that the features look good
W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);
b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

displayColorNetwork( (W*ZCAWhite)');

%%======================================================================
%% STEP 2: Implement and test convolution and pooling
%  In this step, you will implement convolution and pooling, and test them
%  on a small part of the data set to ensure that you have implemented
%  these two functions correctly. In the next step, you will actually
%  convolve and pool the features with the STL10 images.

%% STEP 2a: Implement convolution
%  Implement convolution in the function cnnConvolve in cnnConvolve.m

% Note that we have to preprocess the images in the exact same way 
% we preprocessed the patches before we can obtain the feature activations.

load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels

%% Use only the first 8 images for testing
convImages = trainImages(:, :, :, 1:8); 

% NOTE: Implement cnnConvolve in cnnConvolve.m first!
convolvedFeatures = cnnConvolve(patchDim, hiddenSize, convImages, W, b, ZCAWhite, meanPatch);

%% STEP 2b: Checking your convolution
%  To ensure that you have convolved the features correctly, we have
%  provided some code to compare the results of your convolution with
%  activations from the sparse autoencoder

% For 1000 random points
for i = 1:1000    %随机挑选1000个patch进行验证
    featureNum = randi([1, hiddenSize]); %随机选一个feature
    imageNum = randi([1, 8]); %随机选张图
    imageRow = randi([1, imageDim - patchDim + 1]); %随机选valid的一行
    imageCol = randi([1, imageDim - patchDim + 1]); %随机选valid的一列   
   
    patch = convImages(imageRow:imageRow + patchDim - 1, imageCol:imageCol + patchDim - 1, :, imageNum); %取出那张图RGB通道的那个patch
    patch = patch(:);  %组合成长向量           
    patch = patch - meanPatch; %预处理
    patch = ZCAWhite * patch;
    
    features = feedForwardAutoencoder(optTheta, hiddenSize, visibleSize, patch); %算出激活值
    %与convoledFeatures比较是否相等
    if abs(features(featureNum, 1) - convolvedFeatures(featureNum, imageNum, imageRow, imageCol)) > 1e-9
        fprintf('Convolved feature does not match activation from autoencoder\n');
        fprintf('Feature Number    : %d\n', featureNum);
        fprintf('Image Number      : %d\n', imageNum);
        fprintf('Image Row         : %d\n', imageRow);
        fprintf('Image Column      : %d\n', imageCol);
        fprintf('Convolved feature : %0.5f\n', convolvedFeatures(featureNum, imageNum, imageRow, imageCol));
        fprintf('Sparse AE feature : %0.5f\n', features(featureNum, 1));       
        error('Convolved feature does not match activation from autoencoder');
    end 
end

disp('Congratulations! Your convolution code passed the test.');

%% STEP 2c: Implement pooling
%  Implement pooling in the function cnnPool in cnnPool.m

% NOTE: Implement cnnPool in cnnPool.m first!
pooledFeatures = cnnPool(poolDim, convolvedFeatures);

%% STEP 2d: Checking your pooling
%  To ensure that you have implemented pooling, we will use your pooling
%  function to pool over a test matrix and check the results.

testMatrix = reshape(1:64, 8, 8);
expectedMatrix = [mean(mean(testMatrix(1:4, 1:4))) mean(mean(testMatrix(1:4, 5:8))); ...
                  mean(mean(testMatrix(5:8, 1:4))) mean(mean(testMatrix(5:8, 5:8))); ];
            
testMatrix = reshape(testMatrix, 1, 1, 8, 8);
        
pooledFeatures = squeeze(cnnPool(4, testMatrix)); %pool 4*4的矩阵

if ~isequal(pooledFeatures, expectedMatrix)
    disp('Pooling incorrect');
    disp('Expected');
    disp(expectedMatrix);
    disp('Got');
    disp(pooledFeatures);
    error('Pooled feature does not match expection.');
else
    disp('Congratulations! Your pooling code passed the test.');
end

%%======================================================================
%% STEP 3: Convolve and pool with the dataset
%  In this step, you will convolve each of the features you learned with
%  the full large images to obtain the convolved features. You will then
%  pool the convolved features to obtain the pooled features for
%  classification.
%
%  Because the convolved features matrix is very large, we will do the
%  convolution and pooling 50 features at a time to avoid running out of
%  memory. Reduce this number if necessary

stepSize = 50;
assert(mod(hiddenSize, stepSize) == 0, 'stepSize should divide hiddenSize');

load stlTrainSubset.mat % loads numTrainImages, trainImages, trainLabels
load stlTestSubset.mat  % loads numTestImages,  testImages,  testLabels

pooledFeaturesTrain = zeros(hiddenSize, numTrainImages, ...
    floor((imageDim - patchDim + 1) / poolDim), ...
    floor((imageDim - patchDim + 1) / poolDim) );
pooledFeaturesTest = zeros(hiddenSize, numTestImages, ...
    floor((imageDim - patchDim + 1) / poolDim), ...
    floor((imageDim - patchDim + 1) / poolDim) );

tic();

%每次仅计算stepSize个特征,之所以这样是因为卷积特征矩阵太大了,为了避免out of memory
for convPart = 1:(hiddenSize / stepSize)
    
    featureStart = (convPart - 1) * stepSize + 1; %特征起点
    featureEnd = convPart * stepSize; %特征终点
    
    fprintf('Step %d: features %d to %d\n', convPart, featureStart, featureEnd);  
    Wt = W(featureStart:featureEnd, :); %取出特征矩阵
    bt = b(featureStart:featureEnd);    
    
    fprintf('Convolving and pooling train images\n');
    %计算卷积特征
    convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
        trainImages, Wt, bt, ZCAWhite, meanPatch);
    pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis); %pooling
    pooledFeaturesTrain(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;   %计算好的特征放进去
    toc();
    clear convolvedFeaturesThis pooledFeaturesThis;
    
    fprintf('Convolving and pooling test images\n');
    convolvedFeaturesThis = cnnConvolve(patchDim, stepSize, ...
        testImages, Wt, bt, ZCAWhite, meanPatch);
    pooledFeaturesThis = cnnPool(poolDim, convolvedFeaturesThis);
    pooledFeaturesTest(featureStart:featureEnd, :, :, :) = pooledFeaturesThis;   
    toc();

    clear convolvedFeaturesThis pooledFeaturesThis;

end


% You might want to save the pooled features since convolution and pooling takes a long time
save('cnnPooledFeatures.mat', 'pooledFeaturesTrain', 'pooledFeaturesTest');
%load 'cnnPooledFeatures.mat';
toc();

%%======================================================================
%% STEP 4: Use pooled features for classification
%  Now, you will use your pooled features to train a softmax classifier,
%  using softmaxTrain from the softmax exercise.
%  Training the softmax classifer for 1000 iterations should take less than
%  10 minutes.

% Add the path to your softmax solution, if necessary
% addpath /path/to/solution/

% Setup parameters for softmax
softmaxLambda = 1e-4;
numClasses = 4;
% Reshape the pooledFeatures to form an input vector for softmax
softmaxX = permute(pooledFeaturesTrain, [1 3 4 2]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTrain) / numTrainImages,...
    numTrainImages);
softmaxY = trainLabels;

options = struct;
options.maxIter = 200;
softmaxModel = softmaxTrain(numel(pooledFeaturesTrain) / numTrainImages,...
    numClasses, softmaxLambda, softmaxX, softmaxY, options);

%%======================================================================
%% STEP 5: Test classifer
%  Now you will test your trained classifer against the test images

softmaxX = permute(pooledFeaturesTest, [1 3 4 2]);
softmaxX = reshape(softmaxX, numel(pooledFeaturesTest) / numTestImages, numTestImages);
softmaxY = testLabels;

[pred] = softmaxPredict(softmaxModel, softmaxX);
acc = (pred(:) == softmaxY(:));
acc = sum(acc) / size(acc, 1);
fprintf('Accuracy: %2.3f%%\n', acc * 100);

% You should expect to get an accuracy of around 80% on the test images.

cnnConvolve.m
function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch)
%用稀疏自编码学习到的W和b去卷积images
%W有400(numFeatures)行,每行是一个变换,可以对image进行卷积
%这样每张图片都能卷积到numFeatures个feature矩阵,叫feature map吧
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  patchDim - patch (feature) dimension
%  numFeatures - number of features
%  images - large images to convolve with, matrix in the form
%           images(r, c, channel, image number)
%  W, b - W, b for features from the sparse autoencoder
%  ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for
%                        preprocessing
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)

numImages = size(images, 4); %样本图片
imageDim = size(images, 1); %图片大小
imageChannels = size(images, 3); %图片颜色通道
% 初始化images的卷积特征矩阵,numImages张图片,每张图片numFeataures个卷积矩阵,每个矩阵size为imageDim-patchDim+1
convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);
patchSize = patchDim * patchDim;

% Instructions:
%   Convolve every feature with every large image here to produce the 
%   numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) 
%   matrix convolvedFeatures, such that 
%   convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the
%   value of the convolved featureNum feature for the imageNum image over
%   the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1)
%
% Expected running times: 
%   Convolving with 100 images should take less than 3 minutes 
%   Convolving with 5000 images should take around an hour
%   (So to save time when testing, you should convolve with less images, as
%   described earlier)

% -------------------- YOUR CODE HERE --------------------
% Precompute the matrices that will be used during the convolution. Recall
% that you need to take into account the whitening and mean subtraction
% steps
%W是特征矩阵,每行有patchDim*patchDim*3个元素
W = W*ZCAWhite;
b = b - W*meanPatch;



% --------------------------------------------------------

convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);
for imageNum = 1:numImages
  for featureNum = 1:numFeatures

    % convolution of image with feature matrix for each channel
    convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);
    for channel = 1:3

      % Obtain the feature (patchDim x patchDim) needed during the convolution
      % ---- YOUR CODE HERE ----
      feature = zeros(8,8); % You should replace this
      offset = (channel - 1) * patchSize; %patchSize = patchDim * patchDim
      feature = reshape(W(featureNum, offset+1:offset+patchSize), patchDim, patchDim);  %从W特征矩阵中取出第featureNum个特征的第channel个通道对应的特征
      
      
      
      % ------------------------

      % Flip the feature matrix because of the definition of convolution, as explained later
      feature = flipud(fliplr(squeeze(feature)));
      
      % Obtain the image
      im = squeeze(images(:, :, channel, imageNum));

      % Convolve "feature" with "im", adding the result to convolvedImage
      % be sure to do a 'valid' convolution
      % ---- YOUR CODE HERE ----

      convolvedImage = convolvedImage + conv2(im, feature, 'valid'); %把RGB通道的特征响应加起来
      
      
      % ------------------------

    end
    
    % Subtract the bias unit (correcting for the mean subtraction as well)
    % Then, apply the sigmoid function to get the hidden activation
    % ---- YOUR CODE HERE ----
    convolvedImage = sigmoid(convolvedImage + b(featureNum));
    
    
    
    % ------------------------
    
    % The convolved feature is the sum of the convolved values for all channels
    convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;
  end
end


end

function sigm = sigmoid(x)
    sigm = 1 ./ (1 + exp(-x));
end

cnnPool.m
function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
%  poolDim - dimension of pooling region
%  convolvedFeatures - convolved features to pool (as given by cnnConvolve)
%                      convolvedFeatures(featureNum, imageNum, imageRow, imageCol)
%
% Returns:
%  pooledFeatures - matrix of pooled features in the form
%                   pooledFeatures(featureNum, imageNum, poolRow, poolCol)
%     

numImages = size(convolvedFeatures, 2);
numFeatures = size(convolvedFeatures, 1);
convolvedDim = size(convolvedFeatures, 3);

pooledFeatures = zeros(numFeatures, numImages, floor(convolvedDim / poolDim), floor(convolvedDim / poolDim));

% -------------------- YOUR CODE HERE --------------------
% Instructions:
%   Now pool the convolved features in regions of poolDim x poolDim,
%   to obtain the 
%   numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim) 
%   matrix pooledFeatures, such that
%   pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the 
%   value of the featureNum feature for the imageNum image pooled over the
%   corresponding (poolRow, poolCol) pooling region 
%   (see http://ufldl/wiki/index.php/Pooling )
%   
%   Use mean pooling here.
% -------------------- YOUR CODE HERE --------------------
%对poolDim*poolDim的patch进行平均池化
numRows = convolvedDim / poolDim; %池化后总行数
numCols = convolvedDim / poolDim; %池化后总列数

for imageNum = 1:numImages
    for featureNum = 1:numFeatures
        for poolRow = 1:numRows
            for poolCol = 1:numCols
                pooledFeatures(featureNum, imageNum, poolRow, poolCol) = ...
                    mean(mean(convolvedFeatures(featureNum, imageNum, (poolRow-1)*poolDim+1:poolRow*poolDim, (poolCol-1)*poolDim+1:poolCol*poolDim)));
            end
        end
    end
end
        
end



     
    

你可能感兴趣的:(Convolution and Pooling)