Check failed: top_shape[j] == bottom[i]->shape(j) (1 vs. 2) All inputs must have the same shape, exc

在运行ShuffleNet的过程中碰到了如下报错 :

I1018 19:26:19.104892  3548 net.cpp:84] Creating Layer resx13_concat
I1018 19:26:19.104895  3548 net.cpp:406] resx13_concat <- resx13_match_conv
I1018 19:26:19.104898  3548 net.cpp:406] resx13_concat <- resx13_conv3
I1018 19:26:19.104902  3548 net.cpp:380] resx13_concat -> resx13_concat
F1018 19:26:19.104913  3548 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (1 vs. 2) All inputs must have the same shape, except at concat_axis.
*** Check failure stack trace: ***
    @     0x7f2beb8fcdaa  (unknown)
    @     0x7f2beb8fcce4  (unknown)
    @     0x7f2beb8fc6e6  (unknown)
    @     0x7f2beb8ff687  (unknown)
    @     0x7f2bebfc6227  caffe::ConcatLayer<>::Reshape()
    @     0x7f2bec05e365  caffe::Net<>::Init()
    @     0x7f2bec060262  caffe::Net<>::Net()
    @     0x7f2bec01b9a0  caffe::Solver<>::InitTrainNet()
    @     0x7f2bec01c8f3  caffe::Solver<>::Init()
    @     0x7f2bec01cbcf  caffe::Solver<>::Solver()
    @     0x7f2bec079b01  caffe::Creator_SGDSolver<>()
    @           0x40ee6e  caffe::SolverRegistry<>::CreateSolver()
    @           0x407efd  train()
    @           0x40590c  main
    @     0x7f2bea908f45  (unknown)
    @           0x40617b  (unknown)
    @              (nil)  (unknown)

可以看到,是输入和输出的blob尺寸不对才导致了这个错误,查看训练的log文件,报错是在

I1018 19:26:19.104895  3548 net.cpp:406] resx13_concat <- resx13_match_conv
I1018 19:26:19.104898  3548 net.cpp:406] resx13_concat <- resx13_conv3
I1018 19:26:19.104902  3548 net.cpp:380] resx13_concat -> resx13_concat

层上面,也就是Concat层数据传输有问题,在这之前也有相似的Concat连接,刚开始也是没有头绪,通过仔细查看前面concat连接的日志发现了问题所在。

I1018 19:26:19.052892  3548 net.cpp:84] Creating Layer resx1_conv3
I1018 19:26:19.052904  3548 net.cpp:406] resx1_conv3 <- resx1_conv2
I1018 19:26:19.052908  3548 net.cpp:380] resx1_conv3 -> resx1_conv3
I1018 19:26:19.053154  3548 net.cpp:122] Setting up resx1_conv3
I1018 19:26:19.053160  3548 net.cpp:129] Top shape: 90 216 6 6 (699840)

上面是rex1_conv3层的定义,可以看见输出shape 为 [90 216 6 6]

I1018 19:26:19.051407  3548 net.cpp:84] Creating Layer resx1_match_conv
I1018 19:26:19.051409  3548 net.cpp:406] resx1_match_conv <- pool1_pool1_0_split_0
I1018 19:26:19.051414  3548 net.cpp:380] resx1_match_conv -> resx1_match_conv
I1018 19:26:19.051427  3548 net.cpp:122] Setting up resx1_match_conv
I1018 19:26:19.051434  3548 net.cpp:129] Top shape: 90 24 6 6 (77760)

上面是resx1_match_conv层的定义,可以看见输出的shape 为 [90 24 6 6]
然后:

I1018 19:26:19.053496  3548 net.cpp:84] Creating Layer resx1_concat
I1018 19:26:19.053500  3548 net.cpp:406] resx1_concat <- resx1_match_conv
I1018 19:26:19.053503  3548 net.cpp:406] resx1_concat <- resx1_conv3
I1018 19:26:19.053508  3548 net.cpp:380] resx1_concat -> resx1_concat
I1018 19:26:19.053527  3548 net.cpp:122] Setting up resx1_concat
I1018 19:26:19.053532  3548 net.cpp:129] Top shape: 90 240 6 6 (777600)
I1018 19:26:19.053534  3548 net.cpp:137] Memory required for data: 51244200

正常输入,网络继续搭建。

————————————–分割线—————————————

I1018 19:26:19.102349  3548 net.cpp:84] Creating Layer resx13_match_conv
I1018 19:26:19.102351  3548 net.cpp:406] resx13_match_conv <- resx12_elewise_resx12_elewise_relu_0_split_0
I1018 19:26:19.102355  3548 net.cpp:380] resx13_match_conv -> resx13_match_conv
I1018 19:26:19.102373  3548 net.cpp:122] Setting up resx13_match_conv
I1018 19:26:19.102377  3548 net.cpp:129] Top shape: 90 480 1 1 (43200)

此时resx13_match_conv的输出shape是 [ 90 480 1 1]

I1018 19:26:19.103997  3548 net.cpp:84] Creating Layer resx13_conv3
I1018 19:26:19.103999  3548 net.cpp:406] resx13_conv3 <- resx13_conv2
I1018 19:26:19.104003  3548 net.cpp:380] resx13_conv3 -> resx13_conv3
I1018 19:26:19.104576  3548 net.cpp:122] Setting up resx13_conv3
I1018 19:26:19.104599  3548 net.cpp:129] Top shape: 90 480 2 2 (172800)

此时resx13_conv3的输出shape是 [ 90 480 2 2 ]

这就是top_shape[j] == bottom[i]->shape(j) (1 vs. 2)报错是(1 vs. 2)的原因。

解决方法

我们需要在train.prototxt修改网络参数,来让resx13_match_conv的输出也变成[ 90 480 2 2],所以找到resx13_match_conv层的定义:

layer {
  name: "resx13_match_conv"
  type: "Pooling"
  bottom: "resx12_elewise"
  top: "resx13_match_conv"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}

因为上一层来的数据从下面日志文件可以读到shape是:[ 90 480 3 3]

I1018 19:26:19.102326  3548 net.cpp:122] Setting up resx12_elewise_resx12_elewise_relu_0_split
I1018 19:26:19.102330  3548 net.cpp:129] Top shape: 90 480 3 3 (388800)
I1018 19:26:19.102334  3548 net.cpp:129] Top shape: 90 480 3 3 (388800)
I1018 19:26:19.102335  3548 net.cpp:137] Memory required for data: 256336200
I1018 19:26:19.102337  3548 layer_factory.hpp:77] Creating layer resx13_match_conv
I1018 19:26:19.102349  3548 net.cpp:84] Creating Layer resx13_match_conv
I1018 19:26:19.102351  3548 net.cpp:406] resx13_match_conv <- resx12_elewise_resx12_elewise_relu_0_split_0
I1018 19:26:19.102355  3548 net.cpp:380] resx13_match_conv -> resx13_match_conv
I1018 19:26:19.102373  3548 net.cpp:122] Setting up resx13_match_conv
I1018 19:26:19.102377  3548 net.cpp:129] Top shape: 90 480 1 1 (43200)

所以这里将 kernel_size: 3 改为 kernel_size: 2,即:

layer {
  name: "resx13_match_conv"
  type: "Pooling"
  bottom: "resx12_elewise"
  top: "resx13_match_conv"
  pooling_param {
    pool: AVE
    kernel_size: 2
    stride: 2
  }
}

这样就可以改变输出的数据shape,网络就可以成功训练了!

你可能感兴趣的:(机器学习,caffe学习,配置文件)