一.读前说明
1.论文"Densely Connected Convolutional Networks"是现在为止效果最好的CNN架构,比Resnet还好,有必要学习一下它为什么效果这么好.
2.代码地址:https://github.com/liuzhuang13/DenseNet
3.这篇论文主要参考了Highway Networks,Residual Networks (ResNets)和GoogLeNet,所以在读本篇论文之前,有必要读一下这几篇论文,另外还可以看一下Very Deep Learning with Highway Networks
4.参考文献 :ResNet && DenseNet(原理篇), DenseNet模型
二.阅读笔记
Abstract
最近的一些论文表明,如果卷积神经网络的各层到输入层和输出层的连接更短,那么该网络就大体上可以设计得更深、更准确、训练得更有效。本文基于此提出了“稠密卷积网络(DensNet),该网络每一层均以前馈的形式与其他任一层连接。因此,传统卷积网络有L层就只有L个连接,而DenseNet的任一层不仅与相邻层有连接,而且与它的随后的所有层都有直接连接,所以该网络有L(L+1)/2个直接连接。任意一层的输入都是其前面所有层的特征图,而该层自己的特征图是其随后所有层的输入。DenseNet有以下几个令人激动的优点:1.减轻了梯度消失问题;2.强化了特征传播;3.大幅度减少了参数数量。该网络结构在4个高竞争性的目标识别基准数据集上进行了评估,包括:CIFAR-10,CIFAR-100,SVHN,ImageNet。DenseNet在这些数据集上大部分都获得了巨大的提高,达到目前为止最高的识别准确率。
1.Introduction
在视觉识别中,CNN是一种强大的机器学习方法。尽管CNN在20年以前就被提出来,但是只是在最近几年,计算机硬件和网络结构的提高才使得真正的深层CNN的训练变成可能。最开始的LeNet5包含5层,VGG包含19层,只有去年的Highway Networks和ResNets才超过了100层这个关卡。
三.阅读感想:
翻译了一半,居然感觉完全不用翻译,真接看英文原文也能看懂,嗯对,这篇文章写得通俗易懂,根本不用像看那些什么hiton、begio、yanlecun之类大牛写的文章一样,直接一遍看过去,看得似懂非懂的。看这篇论文看完之后,感觉像吃了蜂蜜一样,看了还想看,连连最后实验结果分析和discuss也写得非常好,特别是discuss中那个图,该文创意非常棒,并且简单,最主要的是该文创意来源就是我最喜欢的那种,就是总结以前很多文章中效果好的原因,找出它们的共性,然后强化这个共性,从而得到更好的结果。
四.DenseNet结构:
1.在CIFAR-10上用训练时的结构DenseNet-BC:
如果depth=40, growth_rate=12, bottleneck=True, reduction=0.5=1-compression,则每个denseblock里面的层数n_layers=((40-4)/3)//2=6.其中//2表示除以2后向下取整。
注:conv表示正常的2D卷积,CONV表示BN-ReLU-conv
结构如下:
input:(32,32,3)
conv(24,3,3), % 其中conv(24,3,3)=conv(filters=2*growth_rate=24,kernel_size=3,3)
#第1个dense block
CONV(48,1,1)-CONV(12,3,3)-merge(36)- % 其中CONV(48,1,1)=CONV(filters=inter_channel = nb_filter*4=48,1,1),merge后nb_filter=24+12=36
CONV(48,1,1)-CONV(12,3,3)-merge(48)- % 同上,merge后nb_filter=36+12=48
CONV(48,1,1)-CONV(12,3,3)-merge(60)-
CONV(48,1,1)-CONV(12,3,3)-merge(72)-
CONV(48,1,1)-CONV(12,3,3)-merge(84)-
CONV(48,1,1)-CONV(12,3,3)-merge(96)- % 此时nb_filter每多一层就增加growth_rate=12个,这里1个dense block里有6层,故增加72个,所以nb_falter=24+72=96
#第1个Transition Layer
CONV(48,1,1) % nb_filter=nb_filter*compression=96*0.5=48
AveragePool(2,2,(2,2)) % pool_size=2,2 strides=(2,2)
#第2个dense block
CONV(48,1,1)-CONV(12,3,3)-merge(108)- % 其中CONV(48,1,1)=CONV(filters=inter_channel = nb_filter*4=48,1,1),merge后nb_filter=96+12=108
CONV(48,1,1)-CONV(12,3,3)-merge(120)-
CONV(48,1,1)-CONV(12,3,3)-merge(132)-
CONV(48,1,1)-CONV(12,3,3)-merge(144)-
CONV(48,1,1)-CONV(12,3,3)-merge(156)-
CONV(48,1,1)-CONV(12,3,3)-merge(168)- % 此时nb_filter每多一层就增加growth_rate=12个,这里1个dense block里有6层,故增加72个,所以nb_falter=96+72=168
#第2个Transition Layer
CONV(60,1,1) % nb_filter=nb_filter*compression=120*0.5=60
AveragePool(2,2,(2,2)) % pool_size=2,2 strides=(2,2)
#第3个dense block
CONV(48,1,1)-CONV(12,3,3)-merge(180)- % 其中CONV(48,1,1)=CONV(filters=inter_channel = nb_filter*4=48,1,1)
CONV(48,1,1)-CONV(12,3,3)-merge(192)-
CONV(48,1,1)-CONV(12,3,3)-merge(204)-
CONV(48,1,1)-CONV(12,3,3)-merge(216)-
CONV(48,1,1)-CONV(12,3,3)-merge(228)-
CONV(48,1,1)-CONV(12,3,3)-merge(240)- % 此时nb_filter每多一层就增加growth_rate=12个,这里1个dense block里有6层,故增加72个,所以nb_falter=168+72=240
Relu-GlobalAveragePool-softmax
为验证以上的分析,用keras==1.2.0版本验证结果如下:
1 Model created 2 ____________________________________________________________________________________________________ 3 Layer (type) Output Shape Param # Connected to 4 ==================================================================================================== 5 input_1 (InputLayer) (None, 32, 32, 3) 0 6 ____________________________________________________________________________________________________ 7 initial_conv2D (Convolution2D) (None, 32, 32, 24) 648 input_1[0][0] 8 ____________________________________________________________________________________________________ 9 batchnormalization_1 (BatchNorma (None, 32, 32, 24) 96 initial_conv2D[0][0] 10 ____________________________________________________________________________________________________ 11 activation_1 (Activation) (None, 32, 32, 24) 0 batchnormalization_1[0][0] 12 ____________________________________________________________________________________________________ 13 convolution2d_1 (Convolution2D) (None, 32, 32, 48) 1152 activation_1[0][0] 14 ____________________________________________________________________________________________________ 15 batchnormalization_2 (BatchNorma (None, 32, 32, 48) 192 convolution2d_1[0][0] 16 ____________________________________________________________________________________________________ 17 activation_2 (Activation) (None, 32, 32, 48) 0 batchnormalization_2[0][0] 18 ____________________________________________________________________________________________________ 19 convolution2d_2 (Convolution2D) (None, 32, 32, 12) 5184 activation_2[0][0] 20 ____________________________________________________________________________________________________ 21 merge_1 (Merge) (None, 32, 32, 36) 0 initial_conv2D[0][0] 22 convolution2d_2[0][0] 23 ____________________________________________________________________________________________________ 24 batchnormalization_3 (BatchNorma (None, 32, 32, 36) 144 merge_1[0][0] 25 ____________________________________________________________________________________________________ 26 activation_3 (Activation) (None, 32, 32, 36) 0 batchnormalization_3[0][0] 27 ____________________________________________________________________________________________________ 28 convolution2d_3 (Convolution2D) (None, 32, 32, 48) 1728 activation_3[0][0] 29 ____________________________________________________________________________________________________ 30 batchnormalization_4 (BatchNorma (None, 32, 32, 48) 192 convolution2d_3[0][0] 31 ____________________________________________________________________________________________________ 32 activation_4 (Activation) (None, 32, 32, 48) 0 batchnormalization_4[0][0] 33 ____________________________________________________________________________________________________ 34 convolution2d_4 (Convolution2D) (None, 32, 32, 12) 5184 activation_4[0][0] 35 ____________________________________________________________________________________________________ 36 merge_2 (Merge) (None, 32, 32, 48) 0 initial_conv2D[0][0] 37 convolution2d_2[0][0] 38 convolution2d_4[0][0] 39 ____________________________________________________________________________________________________ 40 batchnormalization_5 (BatchNorma (None, 32, 32, 48) 192 merge_2[0][0] 41 ____________________________________________________________________________________________________ 42 activation_5 (Activation) (None, 32, 32, 48) 0 batchnormalization_5[0][0] 43 ____________________________________________________________________________________________________ 44 convolution2d_5 (Convolution2D) (None, 32, 32, 48) 2304 activation_5[0][0] 45 ____________________________________________________________________________________________________ 46 batchnormalization_6 (BatchNorma (None, 32, 32, 48) 192 convolution2d_5[0][0] 47 ____________________________________________________________________________________________________ 48 activation_6 (Activation) (None, 32, 32, 48) 0 batchnormalization_6[0][0] 49 ____________________________________________________________________________________________________ 50 convolution2d_6 (Convolution2D) (None, 32, 32, 12) 5184 activation_6[0][0] 51 ____________________________________________________________________________________________________ 52 merge_3 (Merge) (None, 32, 32, 60) 0 initial_conv2D[0][0] 53 convolution2d_2[0][0] 54 convolution2d_4[0][0] 55 convolution2d_6[0][0] 56 ____________________________________________________________________________________________________ 57 batchnormalization_7 (BatchNorma (None, 32, 32, 60) 240 merge_3[0][0] 58 ____________________________________________________________________________________________________ 59 activation_7 (Activation) (None, 32, 32, 60) 0 batchnormalization_7[0][0] 60 ____________________________________________________________________________________________________ 61 convolution2d_7 (Convolution2D) (None, 32, 32, 48) 2880 activation_7[0][0] 62 ____________________________________________________________________________________________________ 63 batchnormalization_8 (BatchNorma (None, 32, 32, 48) 192 convolution2d_7[0][0] 64 ____________________________________________________________________________________________________ 65 activation_8 (Activation) (None, 32, 32, 48) 0 batchnormalization_8[0][0] 66 ____________________________________________________________________________________________________ 67 convolution2d_8 (Convolution2D) (None, 32, 32, 12) 5184 activation_8[0][0] 68 ____________________________________________________________________________________________________ 69 merge_4 (Merge) (None, 32, 32, 72) 0 initial_conv2D[0][0] 70 convolution2d_2[0][0] 71 convolution2d_4[0][0] 72 convolution2d_6[0][0] 73 convolution2d_8[0][0] 74 ____________________________________________________________________________________________________ 75 batchnormalization_9 (BatchNorma (None, 32, 32, 72) 288 merge_4[0][0] 76 ____________________________________________________________________________________________________ 77 activation_9 (Activation) (None, 32, 32, 72) 0 batchnormalization_9[0][0] 78 ____________________________________________________________________________________________________ 79 convolution2d_9 (Convolution2D) (None, 32, 32, 48) 3456 activation_9[0][0] 80 ____________________________________________________________________________________________________ 81 batchnormalization_10 (BatchNorm (None, 32, 32, 48) 192 convolution2d_9[0][0] 82 ____________________________________________________________________________________________________ 83 activation_10 (Activation) (None, 32, 32, 48) 0 batchnormalization_10[0][0] 84 ____________________________________________________________________________________________________ 85 convolution2d_10 (Convolution2D) (None, 32, 32, 12) 5184 activation_10[0][0] 86 ____________________________________________________________________________________________________ 87 merge_5 (Merge) (None, 32, 32, 84) 0 initial_conv2D[0][0] 88 convolution2d_2[0][0] 89 convolution2d_4[0][0] 90 convolution2d_6[0][0] 91 convolution2d_8[0][0] 92 convolution2d_10[0][0] 93 ____________________________________________________________________________________________________ 94 batchnormalization_11 (BatchNorm (None, 32, 32, 84) 336 merge_5[0][0] 95 ____________________________________________________________________________________________________ 96 activation_11 (Activation) (None, 32, 32, 84) 0 batchnormalization_11[0][0] 97 ____________________________________________________________________________________________________ 98 convolution2d_11 (Convolution2D) (None, 32, 32, 48) 4032 activation_11[0][0] 99 ____________________________________________________________________________________________________ 100 batchnormalization_12 (BatchNorm (None, 32, 32, 48) 192 convolution2d_11[0][0] 101 ____________________________________________________________________________________________________ 102 activation_12 (Activation) (None, 32, 32, 48) 0 batchnormalization_12[0][0] 103 ____________________________________________________________________________________________________ 104 convolution2d_12 (Convolution2D) (None, 32, 32, 12) 5184 activation_12[0][0] 105 ____________________________________________________________________________________________________ 106 merge_6 (Merge) (None, 32, 32, 96) 0 initial_conv2D[0][0] 107 convolution2d_2[0][0] 108 convolution2d_4[0][0] 109 convolution2d_6[0][0] 110 convolution2d_8[0][0] 111 convolution2d_10[0][0] 112 convolution2d_12[0][0] 113 ____________________________________________________________________________________________________ 114 batchnormalization_13 (BatchNorm (None, 32, 32, 96) 384 merge_6[0][0] 115 ____________________________________________________________________________________________________ 116 activation_13 (Activation) (None, 32, 32, 96) 0 batchnormalization_13[0][0] 117 ____________________________________________________________________________________________________ 118 convolution2d_13 (Convolution2D) (None, 32, 32, 96) 9216 activation_13[0][0] 119 ____________________________________________________________________________________________________ 120 averagepooling2d_1 (AveragePooli (None, 16, 16, 96) 0 convolution2d_13[0][0] 121 ____________________________________________________________________________________________________ 122 batchnormalization_14 (BatchNorm (None, 16, 16, 96) 384 averagepooling2d_1[0][0] 123 ____________________________________________________________________________________________________ 124 activation_14 (Activation) (None, 16, 16, 96) 0 batchnormalization_14[0][0] 125 ____________________________________________________________________________________________________ 126 convolution2d_14 (Convolution2D) (None, 16, 16, 48) 4608 activation_14[0][0] 127 ____________________________________________________________________________________________________ 128 batchnormalization_15 (BatchNorm (None, 16, 16, 48) 192 convolution2d_14[0][0] 129 ____________________________________________________________________________________________________ 130 activation_15 (Activation) (None, 16, 16, 48) 0 batchnormalization_15[0][0] 131 ____________________________________________________________________________________________________ 132 convolution2d_15 (Convolution2D) (None, 16, 16, 12) 5184 activation_15[0][0] 133 ____________________________________________________________________________________________________ 134 merge_7 (Merge) (None, 16, 16, 108) 0 averagepooling2d_1[0][0] 135 convolution2d_15[0][0] 136 ____________________________________________________________________________________________________ 137 batchnormalization_16 (BatchNorm (None, 16, 16, 108) 432 merge_7[0][0] 138 ____________________________________________________________________________________________________ 139 activation_16 (Activation) (None, 16, 16, 108) 0 batchnormalization_16[0][0] 140 ____________________________________________________________________________________________________ 141 convolution2d_16 (Convolution2D) (None, 16, 16, 48) 5184 activation_16[0][0] 142 ____________________________________________________________________________________________________ 143 batchnormalization_17 (BatchNorm (None, 16, 16, 48) 192 convolution2d_16[0][0] 144 ____________________________________________________________________________________________________ 145 activation_17 (Activation) (None, 16, 16, 48) 0 batchnormalization_17[0][0] 146 ____________________________________________________________________________________________________ 147 convolution2d_17 (Convolution2D) (None, 16, 16, 12) 5184 activation_17[0][0] 148 ____________________________________________________________________________________________________ 149 merge_8 (Merge) (None, 16, 16, 120) 0 averagepooling2d_1[0][0] 150 convolution2d_15[0][0] 151 convolution2d_17[0][0] 152 ____________________________________________________________________________________________________ 153 batchnormalization_18 (BatchNorm (None, 16, 16, 120) 480 merge_8[0][0] 154 ____________________________________________________________________________________________________ 155 activation_18 (Activation) (None, 16, 16, 120) 0 batchnormalization_18[0][0] 156 ____________________________________________________________________________________________________ 157 convolution2d_18 (Convolution2D) (None, 16, 16, 48) 5760 activation_18[0][0] 158 ____________________________________________________________________________________________________ 159 batchnormalization_19 (BatchNorm (None, 16, 16, 48) 192 convolution2d_18[0][0] 160 ____________________________________________________________________________________________________ 161 activation_19 (Activation) (None, 16, 16, 48) 0 batchnormalization_19[0][0] 162 ____________________________________________________________________________________________________ 163 convolution2d_19 (Convolution2D) (None, 16, 16, 12) 5184 activation_19[0][0] 164 ____________________________________________________________________________________________________ 165 merge_9 (Merge) (None, 16, 16, 132) 0 averagepooling2d_1[0][0] 166 convolution2d_15[0][0] 167 convolution2d_17[0][0] 168 convolution2d_19[0][0] 169 ____________________________________________________________________________________________________ 170 batchnormalization_20 (BatchNorm (None, 16, 16, 132) 528 merge_9[0][0] 171 ____________________________________________________________________________________________________ 172 activation_20 (Activation) (None, 16, 16, 132) 0 batchnormalization_20[0][0] 173 ____________________________________________________________________________________________________ 174 convolution2d_20 (Convolution2D) (None, 16, 16, 48) 6336 activation_20[0][0] 175 ____________________________________________________________________________________________________ 176 batchnormalization_21 (BatchNorm (None, 16, 16, 48) 192 convolution2d_20[0][0] 177 ____________________________________________________________________________________________________ 178 activation_21 (Activation) (None, 16, 16, 48) 0 batchnormalization_21[0][0] 179 ____________________________________________________________________________________________________ 180 convolution2d_21 (Convolution2D) (None, 16, 16, 12) 5184 activation_21[0][0] 181 ____________________________________________________________________________________________________ 182 merge_10 (Merge) (None, 16, 16, 144) 0 averagepooling2d_1[0][0] 183 convolution2d_15[0][0] 184 convolution2d_17[0][0] 185 convolution2d_19[0][0] 186 convolution2d_21[0][0] 187 ____________________________________________________________________________________________________ 188 batchnormalization_22 (BatchNorm (None, 16, 16, 144) 576 merge_10[0][0] 189 ____________________________________________________________________________________________________ 190 activation_22 (Activation) (None, 16, 16, 144) 0 batchnormalization_22[0][0] 191 ____________________________________________________________________________________________________ 192 convolution2d_22 (Convolution2D) (None, 16, 16, 48) 6912 activation_22[0][0] 193 ____________________________________________________________________________________________________ 194 batchnormalization_23 (BatchNorm (None, 16, 16, 48) 192 convolution2d_22[0][0] 195 ____________________________________________________________________________________________________ 196 activation_23 (Activation) (None, 16, 16, 48) 0 batchnormalization_23[0][0] 197 ____________________________________________________________________________________________________ 198 convolution2d_23 (Convolution2D) (None, 16, 16, 12) 5184 activation_23[0][0] 199 ____________________________________________________________________________________________________ 200 merge_11 (Merge) (None, 16, 16, 156) 0 averagepooling2d_1[0][0] 201 convolution2d_15[0][0] 202 convolution2d_17[0][0] 203 convolution2d_19[0][0] 204 convolution2d_21[0][0] 205 convolution2d_23[0][0] 206 ____________________________________________________________________________________________________ 207 batchnormalization_24 (BatchNorm (None, 16, 16, 156) 624 merge_11[0][0] 208 ____________________________________________________________________________________________________ 209 activation_24 (Activation) (None, 16, 16, 156) 0 batchnormalization_24[0][0] 210 ____________________________________________________________________________________________________ 211 convolution2d_24 (Convolution2D) (None, 16, 16, 48) 7488 activation_24[0][0] 212 ____________________________________________________________________________________________________ 213 batchnormalization_25 (BatchNorm (None, 16, 16, 48) 192 convolution2d_24[0][0] 214 ____________________________________________________________________________________________________ 215 activation_25 (Activation) (None, 16, 16, 48) 0 batchnormalization_25[0][0] 216 ____________________________________________________________________________________________________ 217 convolution2d_25 (Convolution2D) (None, 16, 16, 12) 5184 activation_25[0][0] 218 ____________________________________________________________________________________________________ 219 merge_12 (Merge) (None, 16, 16, 168) 0 averagepooling2d_1[0][0] 220 convolution2d_15[0][0] 221 convolution2d_17[0][0] 222 convolution2d_19[0][0] 223 convolution2d_21[0][0] 224 convolution2d_23[0][0] 225 convolution2d_25[0][0] 226 ____________________________________________________________________________________________________ 227 batchnormalization_26 (BatchNorm (None, 16, 16, 168) 672 merge_12[0][0] 228 ____________________________________________________________________________________________________ 229 activation_26 (Activation) (None, 16, 16, 168) 0 batchnormalization_26[0][0] 230 ____________________________________________________________________________________________________ 231 convolution2d_26 (Convolution2D) (None, 16, 16, 168) 28224 activation_26[0][0] 232 ____________________________________________________________________________________________________ 233 averagepooling2d_2 (AveragePooli (None, 8, 8, 168) 0 convolution2d_26[0][0] 234 ____________________________________________________________________________________________________ 235 batchnormalization_27 (BatchNorm (None, 8, 8, 168) 672 averagepooling2d_2[0][0] 236 ____________________________________________________________________________________________________ 237 activation_27 (Activation) (None, 8, 8, 168) 0 batchnormalization_27[0][0] 238 ____________________________________________________________________________________________________ 239 convolution2d_27 (Convolution2D) (None, 8, 8, 48) 8064 activation_27[0][0] 240 ____________________________________________________________________________________________________ 241 batchnormalization_28 (BatchNorm (None, 8, 8, 48) 192 convolution2d_27[0][0] 242 ____________________________________________________________________________________________________ 243 activation_28 (Activation) (None, 8, 8, 48) 0 batchnormalization_28[0][0] 244 ____________________________________________________________________________________________________ 245 convolution2d_28 (Convolution2D) (None, 8, 8, 12) 5184 activation_28[0][0] 246 ____________________________________________________________________________________________________ 247 merge_13 (Merge) (None, 8, 8, 180) 0 averagepooling2d_2[0][0] 248 convolution2d_28[0][0] 249 ____________________________________________________________________________________________________ 250 batchnormalization_29 (BatchNorm (None, 8, 8, 180) 720 merge_13[0][0] 251 ____________________________________________________________________________________________________ 252 activation_29 (Activation) (None, 8, 8, 180) 0 batchnormalization_29[0][0] 253 ____________________________________________________________________________________________________ 254 convolution2d_29 (Convolution2D) (None, 8, 8, 48) 8640 activation_29[0][0] 255 ____________________________________________________________________________________________________ 256 batchnormalization_30 (BatchNorm (None, 8, 8, 48) 192 convolution2d_29[0][0] 257 ____________________________________________________________________________________________________ 258 activation_30 (Activation) (None, 8, 8, 48) 0 batchnormalization_30[0][0] 259 ____________________________________________________________________________________________________ 260 convolution2d_30 (Convolution2D) (None, 8, 8, 12) 5184 activation_30[0][0] 261 ____________________________________________________________________________________________________ 262 merge_14 (Merge) (None, 8, 8, 192) 0 averagepooling2d_2[0][0] 263 convolution2d_28[0][0] 264 convolution2d_30[0][0] 265 ____________________________________________________________________________________________________ 266 batchnormalization_31 (BatchNorm (None, 8, 8, 192) 768 merge_14[0][0] 267 ____________________________________________________________________________________________________ 268 activation_31 (Activation) (None, 8, 8, 192) 0 batchnormalization_31[0][0] 269 ____________________________________________________________________________________________________ 270 convolution2d_31 (Convolution2D) (None, 8, 8, 48) 9216 activation_31[0][0] 271 ____________________________________________________________________________________________________ 272 batchnormalization_32 (BatchNorm (None, 8, 8, 48) 192 convolution2d_31[0][0] 273 ____________________________________________________________________________________________________ 274 activation_32 (Activation) (None, 8, 8, 48) 0 batchnormalization_32[0][0] 275 ____________________________________________________________________________________________________ 276 convolution2d_32 (Convolution2D) (None, 8, 8, 12) 5184 activation_32[0][0] 277 ____________________________________________________________________________________________________ 278 merge_15 (Merge) (None, 8, 8, 204) 0 averagepooling2d_2[0][0] 279 convolution2d_28[0][0] 280 convolution2d_30[0][0] 281 convolution2d_32[0][0] 282 ____________________________________________________________________________________________________ 283 batchnormalization_33 (BatchNorm (None, 8, 8, 204) 816 merge_15[0][0] 284 ____________________________________________________________________________________________________ 285 activation_33 (Activation) (None, 8, 8, 204) 0 batchnormalization_33[0][0] 286 ____________________________________________________________________________________________________ 287 convolution2d_33 (Convolution2D) (None, 8, 8, 48) 9792 activation_33[0][0] 288 ____________________________________________________________________________________________________ 289 batchnormalization_34 (BatchNorm (None, 8, 8, 48) 192 convolution2d_33[0][0] 290 ____________________________________________________________________________________________________ 291 activation_34 (Activation) (None, 8, 8, 48) 0 batchnormalization_34[0][0] 292 ____________________________________________________________________________________________________ 293 convolution2d_34 (Convolution2D) (None, 8, 8, 12) 5184 activation_34[0][0] 294 ____________________________________________________________________________________________________ 295 merge_16 (Merge) (None, 8, 8, 216) 0 averagepooling2d_2[0][0] 296 convolution2d_28[0][0] 297 convolution2d_30[0][0] 298 convolution2d_32[0][0] 299 convolution2d_34[0][0] 300 ____________________________________________________________________________________________________ 301 batchnormalization_35 (BatchNorm (None, 8, 8, 216) 864 merge_16[0][0] 302 ____________________________________________________________________________________________________ 303 activation_35 (Activation) (None, 8, 8, 216) 0 batchnormalization_35[0][0] 304 ____________________________________________________________________________________________________ 305 convolution2d_35 (Convolution2D) (None, 8, 8, 48) 10368 activation_35[0][0] 306 ____________________________________________________________________________________________________ 307 batchnormalization_36 (BatchNorm (None, 8, 8, 48) 192 convolution2d_35[0][0] 308 ____________________________________________________________________________________________________ 309 activation_36 (Activation) (None, 8, 8, 48) 0 batchnormalization_36[0][0] 310 ____________________________________________________________________________________________________ 311 convolution2d_36 (Convolution2D) (None, 8, 8, 12) 5184 activation_36[0][0] 312 ____________________________________________________________________________________________________ 313 merge_17 (Merge) (None, 8, 8, 228) 0 averagepooling2d_2[0][0] 314 convolution2d_28[0][0] 315 convolution2d_30[0][0] 316 convolution2d_32[0][0] 317 convolution2d_34[0][0] 318 convolution2d_36[0][0] 319 ____________________________________________________________________________________________________ 320 batchnormalization_37 (BatchNorm (None, 8, 8, 228) 912 merge_17[0][0] 321 ____________________________________________________________________________________________________ 322 activation_37 (Activation) (None, 8, 8, 228) 0 batchnormalization_37[0][0] 323 ____________________________________________________________________________________________________ 324 convolution2d_37 (Convolution2D) (None, 8, 8, 48) 10944 activation_37[0][0] 325 ____________________________________________________________________________________________________ 326 batchnormalization_38 (BatchNorm (None, 8, 8, 48) 192 convolution2d_37[0][0] 327 ____________________________________________________________________________________________________ 328 activation_38 (Activation) (None, 8, 8, 48) 0 batchnormalization_38[0][0] 329 ____________________________________________________________________________________________________ 330 convolution2d_38 (Convolution2D) (None, 8, 8, 12) 5184 activation_38[0][0] 331 ____________________________________________________________________________________________________ 332 merge_18 (Merge) (None, 8, 8, 240) 0 averagepooling2d_2[0][0] 333 convolution2d_28[0][0] 334 convolution2d_30[0][0] 335 convolution2d_32[0][0] 336 convolution2d_34[0][0] 337 convolution2d_36[0][0] 338 convolution2d_38[0][0] 339 ____________________________________________________________________________________________________ 340 batchnormalization_39 (BatchNorm (None, 8, 8, 240) 960 merge_18[0][0] 341 ____________________________________________________________________________________________________ 342 activation_39 (Activation) (None, 8, 8, 240) 0 batchnormalization_39[0][0] 343 ____________________________________________________________________________________________________ 344 globalaveragepooling2d_1 (Global (None, 240) 0 activation_39[0][0] 345 ____________________________________________________________________________________________________ 346 dense_1 (Dense) (None, 10) 2410 globalaveragepooling2d_1[0][0] 347 ==================================================================================================== 348 Total params: 257,218 349 Trainable params: 249,946 350 Non-trainable params: 7,272 351 ____________________________________________________________________________________________________ 352 Finished compiling 353 Building model...
五.疑问:
1.运行完keras实验之后发现,居然在每个CONV(48,1,1)-CONV(12,3,3)- 后面都有一个Merge,可是在代码中我并没有发现呀,哪里来的?肯定是我看漏了,可是它是从哪来的呢?
答:原来在dense_block的定义中有这样一句话看掉了:
1 for i in range(nb_layers): 2 x = conv_block(x, growth_rate, bottleneck, dropout_rate, weight_decay) 3 feature_list.append(x) 4 x = merge(feature_list, mode='concat', concat_axis=concat_axis) 5 nb_filter += growth_rate
意思就是在每个这样一个模块后,都要进行Merge,即:就是把每一层的输出都串联在一起,从而组成一个新的tensor。
2.为什么每个denseblock里面的层数n_layers=((40-4)/3)//2=6.其中//2表示除以2后向下取整?即为什么是减4?
答:因为该结构中层,除了dense block 中有很多层外,还1个初始的卷积层、2个过渡层、以及1个最后分类输出层。注意:在该论文中,讲的结构深度depth为L,它并不包括输入层在内。
所以对本论文中的深度depth或L的定义如下:
a.初始的卷积conv,算作1层;
b.每个过渡层,算作1层;
c.每个dense block中的CONV(48,1,1)-CONV(12,3,3)模块,算作2层,即:1个CONV就算作1层;
d.最后的输出模块Relu-GlobalAveragePool-softmax,算作1层。
也可这么说:深度就是卷积层的层数加上1个softmax层。