在运行ShuffleNet的过程中碰到了如下报错 :
I1018 19:26:19.104892 3548 net.cpp:84] Creating Layer resx13_concat
I1018 19:26:19.104895 3548 net.cpp:406] resx13_concat <- resx13_match_conv
I1018 19:26:19.104898 3548 net.cpp:406] resx13_concat <- resx13_conv3
I1018 19:26:19.104902 3548 net.cpp:380] resx13_concat -> resx13_concat
F1018 19:26:19.104913 3548 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (1 vs. 2) All inputs must have the same shape, except at concat_axis.
*** Check failure stack trace: ***
@ 0x7f2beb8fcdaa (unknown)
@ 0x7f2beb8fcce4 (unknown)
@ 0x7f2beb8fc6e6 (unknown)
@ 0x7f2beb8ff687 (unknown)
@ 0x7f2bebfc6227 caffe::ConcatLayer<>::Reshape()
@ 0x7f2bec05e365 caffe::Net<>::Init()
@ 0x7f2bec060262 caffe::Net<>::Net()
@ 0x7f2bec01b9a0 caffe::Solver<>::InitTrainNet()
@ 0x7f2bec01c8f3 caffe::Solver<>::Init()
@ 0x7f2bec01cbcf caffe::Solver<>::Solver()
@ 0x7f2bec079b01 caffe::Creator_SGDSolver<>()
@ 0x40ee6e caffe::SolverRegistry<>::CreateSolver()
@ 0x407efd train()
@ 0x40590c main
@ 0x7f2bea908f45 (unknown)
@ 0x40617b (unknown)
@ (nil) (unknown)
可以看到,是输入和输出的blob尺寸不对才导致了这个错误,查看训练的log文件,报错是在
I1018 19:26:19.104895 3548 net.cpp:406] resx13_concat <- resx13_match_conv
I1018 19:26:19.104898 3548 net.cpp:406] resx13_concat <- resx13_conv3
I1018 19:26:19.104902 3548 net.cpp:380] resx13_concat -> resx13_concat
层上面,也就是Concat层数据传输有问题,在这之前也有相似的Concat连接,刚开始也是没有头绪,通过仔细查看前面concat连接的日志发现了问题所在。
I1018 19:26:19.052892 3548 net.cpp:84] Creating Layer resx1_conv3
I1018 19:26:19.052904 3548 net.cpp:406] resx1_conv3 <- resx1_conv2
I1018 19:26:19.052908 3548 net.cpp:380] resx1_conv3 -> resx1_conv3
I1018 19:26:19.053154 3548 net.cpp:122] Setting up resx1_conv3
I1018 19:26:19.053160 3548 net.cpp:129] Top shape: 90 216 6 6 (699840)
上面是rex1_conv3
层的定义,可以看见输出shape 为 [90 216 6 6]
I1018 19:26:19.051407 3548 net.cpp:84] Creating Layer resx1_match_conv
I1018 19:26:19.051409 3548 net.cpp:406] resx1_match_conv <- pool1_pool1_0_split_0
I1018 19:26:19.051414 3548 net.cpp:380] resx1_match_conv -> resx1_match_conv
I1018 19:26:19.051427 3548 net.cpp:122] Setting up resx1_match_conv
I1018 19:26:19.051434 3548 net.cpp:129] Top shape: 90 24 6 6 (77760)
上面是resx1_match_conv
层的定义,可以看见输出的shape 为 [90 24 6 6]
然后:
I1018 19:26:19.053496 3548 net.cpp:84] Creating Layer resx1_concat
I1018 19:26:19.053500 3548 net.cpp:406] resx1_concat <- resx1_match_conv
I1018 19:26:19.053503 3548 net.cpp:406] resx1_concat <- resx1_conv3
I1018 19:26:19.053508 3548 net.cpp:380] resx1_concat -> resx1_concat
I1018 19:26:19.053527 3548 net.cpp:122] Setting up resx1_concat
I1018 19:26:19.053532 3548 net.cpp:129] Top shape: 90 240 6 6 (777600)
I1018 19:26:19.053534 3548 net.cpp:137] Memory required for data: 51244200
正常输入,网络继续搭建。
————————————–分割线—————————————
I1018 19:26:19.102349 3548 net.cpp:84] Creating Layer resx13_match_conv
I1018 19:26:19.102351 3548 net.cpp:406] resx13_match_conv <- resx12_elewise_resx12_elewise_relu_0_split_0
I1018 19:26:19.102355 3548 net.cpp:380] resx13_match_conv -> resx13_match_conv
I1018 19:26:19.102373 3548 net.cpp:122] Setting up resx13_match_conv
I1018 19:26:19.102377 3548 net.cpp:129] Top shape: 90 480 1 1 (43200)
此时resx13_match_conv
的输出shape是 [ 90 480 1 1]
I1018 19:26:19.103997 3548 net.cpp:84] Creating Layer resx13_conv3
I1018 19:26:19.103999 3548 net.cpp:406] resx13_conv3 <- resx13_conv2
I1018 19:26:19.104003 3548 net.cpp:380] resx13_conv3 -> resx13_conv3
I1018 19:26:19.104576 3548 net.cpp:122] Setting up resx13_conv3
I1018 19:26:19.104599 3548 net.cpp:129] Top shape: 90 480 2 2 (172800)
此时resx13_conv3
的输出shape是 [ 90 480 2 2 ]
这就是top_shape[j] == bottom[i]->shape(j) (1 vs. 2)
报错是(1 vs. 2)的原因。
我们需要在train.prototxt修改网络参数,来让resx13_match_conv
的输出也变成[ 90 480 2 2],所以找到resx13_match_conv
层的定义:
layer {
name: "resx13_match_conv"
type: "Pooling"
bottom: "resx12_elewise"
top: "resx13_match_conv"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
因为上一层来的数据从下面日志文件可以读到shape是:[ 90 480 3 3]
I1018 19:26:19.102326 3548 net.cpp:122] Setting up resx12_elewise_resx12_elewise_relu_0_split
I1018 19:26:19.102330 3548 net.cpp:129] Top shape: 90 480 3 3 (388800)
I1018 19:26:19.102334 3548 net.cpp:129] Top shape: 90 480 3 3 (388800)
I1018 19:26:19.102335 3548 net.cpp:137] Memory required for data: 256336200
I1018 19:26:19.102337 3548 layer_factory.hpp:77] Creating layer resx13_match_conv
I1018 19:26:19.102349 3548 net.cpp:84] Creating Layer resx13_match_conv
I1018 19:26:19.102351 3548 net.cpp:406] resx13_match_conv <- resx12_elewise_resx12_elewise_relu_0_split_0
I1018 19:26:19.102355 3548 net.cpp:380] resx13_match_conv -> resx13_match_conv
I1018 19:26:19.102373 3548 net.cpp:122] Setting up resx13_match_conv
I1018 19:26:19.102377 3548 net.cpp:129] Top shape: 90 480 1 1 (43200)
所以这里将 kernel_size: 3
改为 kernel_size: 2
,即:
layer {
name: "resx13_match_conv"
type: "Pooling"
bottom: "resx12_elewise"
top: "resx13_match_conv"
pooling_param {
pool: AVE
kernel_size: 2
stride: 2
}
}
这样就可以改变输出的数据shape,网络就可以成功训练了!