本程序的主要目的是通过自主编写alexnet网络,深入了解卷积、pooling、局部区域正则化、全连接层、基于动量的随机梯度下降、卷积神经网络的参数修正算法等神经网络的基础知识。
写这个博客一方面帮助自己理清思路,另一方面督促自己加快进度,也希望可以帮助到同样想深入了解卷积神经网络运算机制的朋友们。
根据前人们对Alexnet深入浅出的分析,基本可以完整还原整个网络结构。但还是遇到了一个问题:第一层卷积结果厚度是96,为什么到第二层时卷积核的厚度是48呢?
由于GPU显存资源的限制,Alexnet在卷积的过程中讲卷积核分成两组,由两个GPU进行运算。96 是两块GPU中卷积核数量的总和,其中每块GPU卷积核的数量都是48。所以输出的数据是两个厚度为48的tensor。由此得出结论:
分组对卷积核参数数量有很大影响。例如 第一层到第二层的过程,第一层有96个的卷积核,分成两组后,每组有48个卷积核,经过pooling层后,输出两个27*27*28大小的tensor,此时第二层的卷积核厚度为48,个数为128+128,第二层卷积参数数量为(5*5*48*128+128)*2=307456。如果不分组计算,输出的第一层输出的tensor大小为27*27*96,第二层参的卷积核厚度为96,个数为256,第二层卷积参数数量为5*5*96*256+256=614656.数量几乎增加1倍。
这里给出Alexnet结构代码,其中kernalcell中存放卷积核参数,bias存放偏置。
function [ output,kernalcell,biasout,Edeltak,Edeltab ] = myalexnet( X,label,kernalcell,bias,Edeltak,Edeltab )
%1st Layer: Conv (w ReLu) -> Lrn -> Pool
conv1_1 = conv(X, kernalcell{1}{1},bias{1}{1},0,4, 4);
conv1_2 = conv(X, kernalcell{1}{2}, bias{1}{2},0,4, 4);%55*55*48
norm1_1 = local_response_norm(conv1_1, 2,1, 2e-05, 0.75);
norm1_2 = local_response_norm(conv1_2, 2,1, 2e-05, 0.75);
pool1_1 = max_pool(norm1_1, 3, 3, 2, 2);
pool1_2 = max_pool(norm1_2, 3, 3, 2, 2);%27*27*48
%2nd Layer: Conv (w ReLu) -> Lrn -> Pool
conv2_1 = conv(pool1_1, kernalcell{2}{1}, bias{2}{1}, 2, 1, 1);
conv2_2 = conv(pool1_2, kernalcell{2}{2}, bias{2}{2}, 2, 1, 1);%27*27*128
conv2 = cat(3,conv2_1,conv2_2);
norm2 = local_response_norm(conv2, 2,1, 2e-05, 0.75);
pool2 = max_pool(norm2, 3, 3, 2, 2);%13*13*256*b
%3rd Layer: Conv (w ReLu)
conv3_1 = conv(pool2, kernalcell{3}{1}, bias{3}{1},1,1, 1);
conv3_2 = conv(pool2, kernalcell{3}{2},bias{3}{2},1,1, 1);%13*13*192*b
%4th Layer: Conv (w ReLu) splitted into two groups
conv4_1 = conv(conv3_1, kernalcell{4}{1},bias{4}{1},1, 1, 1);
conv4_2 = conv(conv3_2, kernalcell{4}{2},bias{4}{2},1, 1, 1);%13*13*192*b
%5th Layer: Conv (w ReLu) -> Pool splitted into two groups
conv5_1 = conv(conv4_1, kernalcell{5}{1},bias{5}{1}, 1,1, 1);
conv5_2 = conv(conv4_2, kernalcell{5}{2},bias{5}{2}, 1,1, 1); %输出为13*13*128*batch
conv5 = cat(3,conv5_1,conv5_2);%13*13*256*b
pool5 = max_pool(conv5, 3, 3, 2, 2);%6*6*256*batch
% 6th Layer: Flatten -> FC (w ReLu) -> Dropout
size_pool5.a=size(pool5,1);
size_pool5.b=size(pool5,2);
size_pool5.c=size(pool5,3);
pool5=reshape(pool5,1,size_pool5.a*size_pool5.b*size_pool5.c,batch);%batch=size(pool5,4) pool5:9216*batch
fc6 = fc(pool5, kernalcell{6},bias{6});%1*4096*b
%dropout6=dropout(conv6);
%7th Layer: FC (w ReLu) -> Dropout
%conv6=rand(1,1,4096);
fc7 = fc(fc6,kernalcell{7},bias{7}); %1*4096*batch
% dropout7 = dropout(fc7);
%8th Layer: FC and return unscaled activations
fc8 = fc(fc7, kernalcell{8},bias{8});
%softmax Layer
output=zeros(size(fc8,1),batch);%numclass*batch
for b=1:batch;
output(:,b)= exp(fc8(:,b))/sum(exp(fc8(:,b)));
end
end
正向传播过程中用到了卷积、最大池化降采样、局部归一化、ReLU激活函数四个函数。下面逐一介绍。
卷积函数:
难点在于2D多通道卷积的理解问题,可以参考连接3.
此处给出卷积函数代码:(卷积层最后直接跟着Relu函数)
function [ output_args ] = conv( input_args, kernal,bias,padding, stridew, strideh )
%MYCONV 此处显示有关此函数的摘要
%括号内容表示维数
%input_arg 输入数据(height*width*channel*batch)
%kernal 卷积核 (kernalheight*kernalwidth*channel*num)
%bias 偏重(1l*outputnum)
%padding 原图像补零圈数
%stridew 卷积横向步长
%strideh 卷积纵向步长
%
heightk = size(kernal,1);
widthk = size(kernal,2);
channelk = size(kernal,3);
numk = size(kernal,4);
widthin=size(input_args,2);
heightin=size(input_args,1);
channel = size(input_args,3);
batch = size(input_args,4);
widthout = (widthin+2*padding-widthk)/stridew+1;
heightout = (heightin+2*padding-heightk)/strideh+1;
if channelk~=channel
fprintf('kernalchannel~=channel');
end
%补零
inputz = zeros(widthin+2*padding,widthin+2*padding,channel,batch);
inputz(padding+1:padding+heightin,padding+1:padding+widthin,:,:)=input_args;
output_args = zeros(heightout,widthout,numk,batch);
for b = 1:batch
for d = 1:numk
for i=1:heightout
for j=1:widthout
for n = 1:channel
output_args(i,j,d,b) = output_args(i,j,d,b)+conv2(rot90(inputz( (i-1)*strideh+1 : (i-1)*strideh+heightk , (j-1)*stridew+1 : (j-1)*stridew+widthk ,n,b),2),kernal(:,:,n,d),'valid');
end%rot90将图像旋转180°,原因:https://www.cnblogs.com/zf-blog/p/8638664.html
end
end
output_args(:,:,d,b) = output_args(:,:,d,b)+bias(d);%jia pianzhi
end
end
output_args = ReLU(output_args);
end
局部响应归一化lrn(Local Response Normalization):
据说这个方法已经很少有人用了,但为了还原alexnet,还是实现了一下。
function [ output_args ] = local_response_norm( input_args,depth_radius,bias,alpha,beta )
%LOCAL_RESPONSE_NORM 局部响应归一化
%对每个点计算其在通道方向的局部归一化值,局部大小为depth_radius
% 此处显示详细说明
%input_arg 输入数据(height*width*channel*batch)
widthin=size(input_args,2);
heightin=size(input_args,1);
channel = size(input_args,3);
batch = size(input_args,4);
output_args = zeros(heightin,widthin,channel,batch);
for n = 1:channel
sumbegin = max(1,n-depth_radius/2);
sumend = min(channel,n+depth_radius/2);
for b = 1:batch
for i=1:heightin
for j=1:widthin
sqr_sum=sum(input_args(i,j,sumbegin:sumend,b).^2);
output_args(i,j,n,b)=input_args(i,j,n,b)/(bias+alpha*sqr_sum)^beta;
end
end
end
end
end
maxpooling函数:
原理比较简单,就直接给代码吧:
function [ output_args ] = max_pool( input_args,poolsizewidth,poolsizeheight,stridew,strideh )
%MYPOOLING 此处显示有关此函数的摘要
% 此处显示详细说明
%pools为一个可被inputsize整除的常数。
%input_arg 输入数据(height*width*channel*batch)
widthin = size(input_args,2);
heightin = size(input_args,1);
deepin = size(input_args,3);
batchin = size(input_args,4);
widthout=(widthin-poolsizewidth)/stridew+1;
heightout = (heightin-poolsizeheight)/strideh+1;
output_args = zeros(heightout,widthout,deepin,batchin);
for b = batchin
for d = 1:deepin
for i = 1:heightout
for j = 1:widthout
output_args(i,j,d,b)=max(max(input_args( (i-1)*strideh+1 : (i-1)*strideh+poolsizeheight , (j-1)*stridew+1 : (j-1)*stridew+poolsizewidth , d,b )));
end
end
end
end
end
全连接:
function [ output_args ] = fc( input_args,kernal,bias )
%FC 此处显示有关此函数的摘要
% kernal 全卷积参数 size(kernal)=inputsize*outputsize,outputsize为神经元个数
% bias 偏置, size(bias)=outputsize
output_args=zeros(size(kernal,1),b);
for b=1:batch
output_args(:,b)=kernal'*input_args(:,b)+bias;
end
output_args=ReLU(output_args);
end
整个过程是理解卷积神经网络底层算法原理的过程,希望大家可以自己从原理的角度出发,进行matlab编程。
以上代码都是我根据网上的一些资料理解后自己写的,可能有理解不周写的不对的地方,欢迎指正!
也欢迎交流关于提高代码运行效率方面的问题!
好啦,Alexnet的正向传播过程到这里就结束了。下一篇博客会给出反向传播过程的代码。
在这里,要感谢大牛们分享的alexnet网络分析。
参考:
1.Alexnet各层作用、原理和计算方法,及各层卷积核pooling核大小的构建 :https://blog.csdn.net/chaipp0607/article/details/72847422
2.Alexnet各层参数数量:https://vimsky.com/article/3664.html
3.多通道卷积运算:https://blog.csdn.net/yudiemiaomiao/article/details/72466402
4.局部响应归一化lrn:https://blog.csdn.net/yangdashi888/article/details/77918311