benchmark中的这个例子计算了各层Transform所需的时间,以及各层的输出。运行并分析一下有助于理解各层网络的结构。
python plot_overfeat_benchmark.py
运行的结果是: (其实整个网络只有12层(0-11)。)
分析一下程序运行的输出,测试使用了5张图片,所以ouput.shape第一个维度总是5。
图像库asirra中有两类:cat and dog. 每个图片的大小是不定的。大概在500*300左右。
Load asirra图片的时候,已经resize过了,X[count] = np.array(im.resize((231, 231)))。所以输入CNN网络的是231×231×3= 160083。
输入的Shape:(5, 231, 231, 3)
Shape of layer 0 output () 第0层是normalize
(5, 160083) 5个samples,每个图片231×231×3= 160083
(‘Time for layer 0’, 0.004625082015991211) 几乎不需要时间
()
Shape of layer 1 output convolution with stride(4,4),filter shape:(96, 3, 11, 11)
(5, 301056) int((231-11)/4)+1=56, 301056=96*56*56,
(‘Time for layer 1’, 0.5500462055206299)
()
Shape of layer 2 output maxpool层, MaxPool((2, 2))
(5, 75264) 75264 = 301056/4 = 96×28×28*
(‘Time for layer 2’, 0.5582330226898193)
()
Shape of layer 3 output convolution with stride(1,1),filter shape:(256, 96, 5, 5)
(5, 147456) 24=(28-5)+1, 147456 = 256×24×24*
(‘Time for layer 3’, 2.358441114425659)
()
Shape of layer 4 output maxpool层, MaxPool((2, 2))
(5, 36864) 36864=147456/4
(‘Time for layer 4’, 2.3493311405181885)
()
Shape of layer 5 output
(5, 73728) (filter:((512, 256, 3, 3)), border=full, 然后去掉最外一圈,变成12×12) 512×12×12
(‘Time for layer 5’, 6.5379478931427)
()
Shape of layer 6 output
(5, 147456)
(‘Time for layer 6’, 23.23018503189087)
()
Shape of layer 7 output
(5, 147456)
(‘Time for layer 7’, 56.522364139556885)
()
Shape of layer 8 output
(5, 36864) 1024×6×6 , 这里就已经是最小的Size了。
(‘Time for layer 8’, 56.03724789619446)
()
Shape of layer 9 output
(5, 3072) * filter(3072*6*6) 3072*1*1*
(‘Time for layer 9’, 57.496111154556274)
()
Shape of layer 10 output
(5, 4096)
(‘Time for layer 10’, 58.83445715904236)
()
Shape of layer 11 output
(5, 1000)
(‘Time for layer 11’, 59.27335500717163)
()
Shape of layer 12 output
(5, 160083)
(‘Time for layer 12’, 0.0028297901153564453)
()
Shape of layer 13 output
(5, 301056)
(‘Time for layer 13’, 0.5609118938446045)
()
Shape of layer 14 output
(5, 75264)
(‘Time for layer 14’, 0.5506980419158936)
()
subsample also called strides elsewhere. Convulotion with stride (stride = 2 )
what is crop border.
但是sklearn-theano的cropping是指在卷积后的结果中切一部分出来:例如下面例子就是去掉整个图像最外一圈。
c=[(1, -1), (1, -1)] # cropping
self.expression_ = T.nnet.conv2d(self.input_,
self.convolution_filter_,
border_mode=self.border_mode,
subsample=self.subsample_)[:, :, c[0][0]:c[0][1],
c[1][0]:c[1][1]]
0 Standardize(118.380948, 61.896913),
1 Convolution(ws[0], bs[0], subsample=(4, 4),
activation=’relu’),
2 MaxPool((2, 2)),
3 Convolution(ws[1], bs[1], activation=’relu’),
4 MaxPool((2, 2)),
5 Convolution(ws[2], bs[2],
activation=’relu’,
cropping=[(1, -1), (1, -1)],
border_mode=’full’),
6 Convolution(ws[3], bs[3],
activation=’relu’,
cropping=[(1, -1), (1, -1)],
border_mode=’full’),
7 Convolution(ws[4], bs[4],
activation=’relu’,
cropping=[(1, -1), (1, -1)],
border_mode=’full’),
8 MaxPool((2, 2)),
9 Convolution(ws[5], bs[5],
activation=’relu’),
10 Convolution(ws[6], bs[6],
activation=’relu’),
11 Convolution(ws[7], bs[7],
activation=’identity’)]
SMALL_NETWORK_FILTER_SHAPES = np.array([(96, 3, 11, 11),
(256, 96, 5, 5),
(512, 256, 3, 3),
(1024, 512, 3, 3),
(1024, 1024, 3, 3),
(3072, 1024, 6, 6),
(4096, 3072, 1, 1),
(1000, 4096, 1, 1)])