关于YOLOv3转为caffemodel中一点问题

首先,我参考https://blog.csdn.net/Chen_yingpeng/article/details/80692018提供的yolov3_darknet2caffe.py脚本实现了darknet-yolov3到caffemodel的转化,得到了prototxt和caffemodel。

但是,我在编译Chen提供的caffe-yolov3时,由于server上并没有sudo权限,也无法安装opencv3,故无法编译成功,为此根据之前使用caffe下的MobileNet-YOLOv3,我使用https://github.com/eric612/MobileNet-YOLO中提供的ssd_detect.cpp/yolo_detect.cpp(博主之前已经对MobileNet-YOLO已经使用cmake完全编译通了)进行测试(当然,其中根据博主的需求已经更改了一些相关code)

需要注意的是使用MobileNet-YOLO提供的ssd_detect.cpp需要将Chen提供转化过来的yolov3.prototxt做一定的修改,即要添加相应的yolo层和最后的detection_out层,具体可参考该框架中给出的prototxt文件修改。

最终可以正常测试转化过来的caffemodel,经过demo测试,结果也还正常,但是我们在专用的数据集中测试具体的指标时,发现对traffic sign的recall相对于darknet下差了很多,经过show图片发现caffemodel会把蓝色的traffic sign检出而忽略红色的,增加了训练数据之后情况稍微好转,但仍然严重。

为此,我debug了Chen提供的convert python代码,其中weights值转化的部分如下(整个code最主要的部分):

def darknet2caffe(cfgfile, weightfile, protofile, caffemodel):
    net_info = cfg2prototxt(cfgfile)
    save_prototxt(net_info , protofile, region=False)

    net = caffe.Net(protofile, caffe.TEST)
    params = net.params

    blocks = parse_cfg(cfgfile)

    #Open the weights file
    fp = open(weightfile, "rb")

    #The first 4 values are header information 
    # 1. Major version number
    # 2. Minor Version Number
    # 3. Subversion number 
    # 4. IMages seen 
    header = np.fromfile(fp, dtype = np.int32, count = 5)

    #fp = open(weightfile, 'rb')
    #header = np.fromfile(fp, count=5, dtype=np.int32)
    #header = np.ndarray(shape=(5,),dtype='int32',buffer=fp.read(20))
    #print(header)
    buf = np.fromfile(fp, dtype = np.float32)
    #print(buf)
    fp.close()

    layers = []
    layer_id = 1
    start = 0
    for block in blocks:
        if start >= buf.size:
            break

        if block['type'] == 'net':
            continue
        elif block['type'] == 'convolutional':
            batch_normalize = int(block['batch_normalize'])
            if block.has_key('name'):
                conv_layer_name = block['name']
                bn_layer_name = '%s-bn' % block['name']
                scale_layer_name = '%s-scale' % block['name']
            else:
                conv_layer_name = 'layer%d-conv' % layer_id
                bn_layer_name = 'layer%d-bn' % layer_id
                scale_layer_name = 'layer%d-scale' % layer_id

            if batch_normalize:
                start = load_conv_bn2caffe(buf, start, params[conv_layer_name], params[bn_layer_name], params[scale_layer_name])
            else:
                start = load_conv2caffe(buf, start, params[conv_layer_name])
            layer_id = layer_id+1
        elif block['type'] == 'depthwise_convolutional':
            batch_normalize = int(block['batch_normalize'])
            if block.has_key('name'):
                conv_layer_name = block['name']
                bn_layer_name = '%s-bn' % block['name']
                scale_layer_name = '%s-scale' % block['name']
            else:
                conv_layer_name = 'layer%d-dwconv' % layer_id
                bn_layer_name = 'layer%d-bn' % layer_id
                scale_layer_name = 'layer%d-scale' % layer_id

            if batch_normalize:
                start = load_conv_bn2caffe(buf, start, params[conv_layer_name], params[bn_layer_name], params[scale_layer_name])
            else:
                start = load_conv2caffe(buf, start, params[conv_layer_name])
            layer_id = layer_id+1
        elif block['type'] == 'connected':
            if block.has_key('name'):
                fc_layer_name = block['name']
            else:
                fc_layer_name = 'layer%d-fc' % layer_id
            start = load_fc2caffe(buf, start, params[fc_layer_name])
            layer_id = layer_id+1
        elif block['type'] == 'maxpool':
            layer_id = layer_id+1
        elif block['type'] == 'avgpool':
            layer_id = layer_id+1
        elif block['type'] == 'region':
            layer_id = layer_id + 1
        elif block['type'] == 'route':
            layer_id = layer_id + 1
        elif block['type'] == 'shortcut':
            layer_id = layer_id + 1
        elif block['type'] == 'softmax':
            layer_id = layer_id + 1
        elif block['type'] == 'cost':
            layer_id = layer_id + 1
	elif block['type'] == 'upsample':
	    layer_id = layer_id + 1
        else:
            print('unknow layer type %s ' % block['type'])
            layer_id = layer_id + 1
    print('save prototxt to %s' % protofile)
    save_prototxt(net_info , protofile, region=True)
    print('save caffemodel to %s' % caffemodel)
    net.save(caffemodel)

......


def load_conv_bn2caffe(buf, start, conv_param, bn_param, scale_param):
    conv_weight = conv_param[0].data
    running_mean = bn_param[0].data
    running_var = bn_param[1].data
    scale_weight = scale_param[0].data
    scale_bias = scale_param[1].data

    
   
    scale_param[1].data[...] = np.reshape(buf[start:start+scale_bias.size], scale_bias.shape); start = start + scale_bias.size
    #print scale_bias.size
    #print scale_bias

    scale_param[0].data[...] = np.reshape(buf[start:start+scale_weight.size], scale_weight.shape); start = start + scale_weight.size
    #print scale_weight.size

    bn_param[0].data[...] = np.reshape(buf[start:start+running_mean.size], running_mean.shape); start = start + running_mean.size
    #print running_mean.size

    bn_param[1].data[...] = np.reshape(buf[start:start+running_var.size], running_var.shape); start = start + running_var.size
    #print running_var.size

    bn_param[2].data[...] = np.array([1.0])
    conv_param[0].data[...] = np.reshape(buf[start:start+conv_weight.size], conv_weight.shape); start = start + conv_weight.size
    #print conv_weight.size

    return start

其中,buf为读取的yolov3.weights的权值,根据darknet的权重存储方式,buf为一个一维的vector,维度为61592497 x 1,这是由yolov3中所有层累加得到,而start记录每一层权值开始的位置。然而转换到caffemodel后(code中load_conv_bn2caffe()即将buf中获取的权值写入到caffemodel中)变为四维vector,例如64x32x3x3(分别表示该卷积层输入channel为64,输出channel为32,kernel大小为3x3),这一步的完成主要是由load_conv_bn2caffe中的numpy包中reshape()函数完成。所以导致最后测试结果下降的原因是否是一维reshape为四维时顺序与darknet中不对应导致的RGB权值错位产生的影响。

然后我检查了使用的MobileNet-YOLO提供的ssd_detection.cpp,从image input网络部分入手,code示例如下:

		  cv::Mat img = cv::imread(fn[k]);
		  if (img.empty()) continue; //only proceed if sucsessful
									// you probably want to do some preprocessing
		  CHECK(!img.empty()) << "Unable to decode image " << file;
		  Timer batch_timer;
		  batch_timer.Start();
		  std::vector > detections = detector.Detect(img);
		  LOG(INFO) << "Computing time: " << batch_timer.MilliSeconds() << " ms.";

从上述code中发现,读取image使用的是opencv中的imread()函数,而将读取到的img传入Detector中测试,而Detector被定义为class类,包含三个子函数,如下:

class Detector {
 public:
  Detector(const string& model_file,
           const string& weights_file,
           const string& mean_file,
           const string& mean_value,
		   const float confidence_threshold,
		   const float normalize_value);

  std::vector > Detect(const cv::Mat& img);

 private:
  void SetMean(const string& mean_file, const string& mean_value);

  void WrapInputLayer(std::vector* input_channels);

  void Preprocess(const cv::Mat& img,
                  std::vector* input_channels);
  void Preprocess(const cv::Mat& img,
	  std::vector* input_channels,double normalize_value);
 private:
  shared_ptr > net_;
  cv::Size input_geometry_;
  int num_channels_;
  cv::Mat mean_;
  float nor_val = 1.0;
};

这4个子函数对输入的img并未做RGB channel转换,只进行了resize操作,然后测试。

从此处我似乎得到一些启发,opencv中的imread读取的RGB图像是按照BGR顺序读取的,这是否刚好对应了traffic sign中蓝色检出而红色漏检的规律呢?然后我将读取imread读取的img channel(2)和channel(0)互换,然后input到Detector中测试,测试的结果(坐标和分数信息)使用rectangle()函数画在未RGB channel转换的原img上,发现,检测结果正常了,与darknet下结果基本无异,这证明了Chen提供的convert code中reshape并无错误,而是我使用了cv中的imread()直接输入检测器中导致。

之后我由考虑为什么原始darknet中和caffe中其它model(如ssd、RefineDet等)并不会出现此情况呢?我又对其进行了探究。阅读了darknet中图像输入部分的code,具体在src/image.c中函数load_image_color(),如下code:

image load_image_color(char *filename, int w, int h)
{
    return load_image(filename, w, h, 3);
}

image load_image(char *filename, int w, int h, int c)
{
#ifdef OPENCV
    image out = load_image_cv(filename, c);
#else
    image out = load_image_stb(filename, c);
#endif

    if((h && w) && (h != out.h || w != out.w)){
		//按网络要求调整到(w,h)大小,前提是输入的w,h不要是0
        image resized = resize_image(out, w, h);
        free_image(out);
        out = resized;
    }
    return out;
}

其中的load_image()函数调用了load_image_cv()函数,显然darknet中仍然使用cv读取image,因为我们知道cv读取image的两种方式imread(C++)和cvLoadImage(C#)都是按照BGR格式读取,这更加使我疑惑,查看load_image_cv()函数,code如下:

image load_image_cv(char *filename, int channels)
{
    IplImage* src = 0;
    int flag = -1;
    if (channels == 0) flag = -1;
    else if (channels == 1) flag = 0;  //grayscale image
    else if (channels == 3) flag = 1;  //3-channel color image
    else {
        fprintf(stderr, "OpenCV can't force load with %d channels\n", channels);
    }
 
	//opencv api load image
    if( (src = cvLoadImage(filename, flag)) == 0 )
    {
        fprintf(stderr, "Cannot load image \"%s\"\n", filename);
        char buff[256];
        sprintf(buff, "echo %s >> bad.list", filename);
        system(buff);
        return make_image(10,10,3);
        //exit(0);
    }

	//将读取到的IplImage容器中的图片装入image结构中
    image out = ipl_to_image(src);
    cvReleaseImage(&src);
    rgbgr_image(out); //convert BGR to RGB
    
    return out;
}

该code中调用C#中cvLoadImage()函数读取图像,然后使用rgbgr_image()函数将cvLoadImage读入图像的BGR格式转换为RGB。转换的代码如下(仍然位于image.c文件中):

void rgbgr_image(image im)
{
    int i;
    for(i = 0; i < im.w*im.h; ++i){
        float swap = im.data[i];
        im.data[i] = im.data[i+im.w*im.h*2];
        im.data[i+im.w*im.h*2] = swap;
    }
}

然后博主又顺带查看了caffe下测试时读取图像的方式,发现python脚本中使用的是pycaffe接口提供的caffe.io.load_image()函数,具体code位于caffe/python/caffe/io.py文件中的load_image函数。具体代码如下:

def load_image(filename, color=True):
    """
    Load an image converting from grayscale or alpha as needed.
    Parameters
    ----------
    filename : string
    color : boolean
        flag for color format. True (default) loads as RGB while False
        loads as intensity (if image is already grayscale).
    Returns
    -------
    image : an image with type np.float32 in range [0, 1]
        of size (H x W x 3) in RGB or
        of size (H x W x 1) in grayscale.
    """
    img = skimage.img_as_float(skimage.io.imread(filename, as_grey=not color)).astype(np.float32)
    if img.ndim == 2:
        img = img[:, :, np.newaxis]
        if color:
            img = np.tile(img, (1, 1, 3))
    elif img.shape[2] == 4:
        img = img[:, :, :3]
    return img

其中使用了python包skimage读取图像,那么显然skimage读入的图像为RGB格式。

然而skimage读入的图像会归一化为(0~1),所以在caffe中python接口使用cv2.imread()读入接口会出错(cv2读入为0~255),所以在caffe中python接口想使用cv2.imread()代替caffe.io.load_image()需要做归一化处理,如博主另一篇博客中code展示了cv2取代load_image()的方式。链接为:https://blog.csdn.net/xunan003/article/details/94740569

至此,博主解决了YOLOv3转换为caffemodel后精度下降的原因。开始对于Chen的误解也被证明是错误的。也要感谢这次的debug,让我了解到更多关于两种框架的区别。

你可能感兴趣的:(图像处理)