Le0v1n

[学习笔记] Faster R-CNN源码解析

0. 引言

0.1 如何找到PyTorch官方的模型源码

from torchvision.models.detection import faster_rcnn

但是只有模型代码，并没有和训练相关的代码。

0.2 如何找到PyTorch官方的训练相关代码

链接：https://github.com/pytorch/vision/tree/main/references

0.3 Faster R-CNN的环境配置

Python 3.6以上
PyTorch 1.5以上
pycocotools

pip install pycocotools
Ubuntu或Centos（不建议Windows）
GPU
lxml
matplotlib
numpy
tqdm
Pillow

0.4 文件结构

  ├── backbone: 特征提取网络，可以根据自己的要求选择：①MobileNet v2；②ResNet-50 + FPN（Feature Pyramid Networks， 特征金字塔、池化金字塔）
  ├── network_files: Faster R-CNN网络（包括Fast R-CNN以及RPN等模块）
  ├── train_utils: 训练验证相关模块（包括cocotools）：https://github.com/pytorch/vision/tree/main/references/detection
  ├── my_dataset.py: 自定义dataset用于读取VOC数据集
  ├── train_mobilenet.py: 以MobileNetV2做为backbone进行训练（主讲，准确率没有下面的好）
  ├── train_resnet50_fpn.py: 以resnet50+FPN做为backbone进行训练（效果最好）
  ├── train_multi_GPU.py: 针对使用多GPU的用户使用（并行训练）
  ├── predict.py: 简易的预测脚本，使用训练好的权重进行预测
  ├── validation.py: 利用训练好的权重验证/测试数据的COCO指标，并生成record_mAP.txt文件
  └── pascal_voc_classes.json: PASCAL VOC标签文件

0.5 pascal_voc_classes.json

内容如下：

{
    "aeroplane": 1,
    "bicycle": 2,
    "bird": 3,
    "boat": 4,
    "bottle": 5,
    "bus": 6,
    "car": 7,
    "cat": 8,
    "chair": 9,
    "cow": 10,
    "diningtable": 11,
    "dog": 12,
    "horse": 13,
    "motorbike": 14,
    "person": 15,
    "pottedplant": 16,
    "sheep": 17,
    "sofa": 18,
    "train": 19,
    "tvmonitor": 20
}

Q：类别为什么不从0开始？
A：在目标检测中，一般0是留给背景（负样本）的

虽然PASCAL VOC 2012有20个类别，但实际训练时给了21个类别（为背景专门设置了一个类别）。

0.6 预训练权重下载地址（下载后放入backbone文件夹中）：

MobileNetV2 backbone: https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
ResNet50+FPN backbone: https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

注意，下载的预训练权重记得要重命名，比如在train_resnet50_fpn.py中读取的是fasterrcnn_resnet50_fpn_coco.pth文件，
不是fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

对于MobileNet v2来说，因为预训练权重只有backbone，而Faster R-CNN除了backbone还有RPN网络以及FC层，所以这个训练权重是不完整的。

而ResNet50+FPN是包含backbone和RPN的，所以是一个完整的预训练权重。

0.7 数据集

Pascal VOC2012 train/val数据集下载地址：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

0.8 训练方法

确保提前准备好数据集
确保提前下载好对应预训练模型权重
若要训练mobilenetv2+fasterrcnn，直接使用train_mobilenet.py训练脚本
若要训练resnet50+fpn+fasterrcnn，直接使用train_resnet50_fpn.py训练脚本
若要使用多GPU训练，使用python -m torch.distributed.launch --nproc_per_node=8 --use_env train_multi_GPU.py指令,nproc_per_node参数为使用GPU数量
如果想指定使用哪些GPU设备可在指令前加上CUDA_VISIBLE_DEVICES=0,3(例如我只要使用设备中的第1块和第4块GPU设备)
CUDA_VISIBLE_DEVICES=0,3 python -m torch.distributed.launch --nproc_per_node=2 --use_env train_multi_GPU.py

学术研究使用mobilenetv2+fasterrcnn

实际应用使用resnet50+fpn+fasterrcnn

0.9 注意事项

在使用训练脚本时，注意要将--data-path(VOC_root)设置为自己存放VOCdevkit文件夹所在的根目录
由于带有FPN结构的Faster RCNN很吃显存，如果GPU的显存不够(如果batch_size小于8的话)建议在create_model函数中使用默认的norm_layer，即不传递norm_layer变量，默认去使用FrozenBatchNorm2d(即不会去更新参数的bn层)，使用中发现效果也很好。
在使用预测脚本时，要将train_weights设置为你自己生成的权重路径。
使用validation文件时，注意确保你的验证集或者测试集中必须包含每个类别的目标，并且使用时只需要修改--num-classes、--data-path和--weights-path即可，其他代码尽量不要改动

0.10 Faster R-CNN框架图

1. 自定义Dataset

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

Defining the Dataset，定义你的Dataset
The reference scripts for training object detection, instance segmentation and person keypoint detection allows for easily supporting adding new custom datasets. The dataset should inherit from the standard torch.utils.data.Dataset class, and implement __len__ and __getitem__.
用于训练对象检测、实例分割和人物关键点检测的参考脚本允许轻松支持添加新的自定义数据集。数据集应该从标准的torch.utils.data.Dataset类继承，并实现__len__和__getitem__。

__len__：获取数据集长度
__getitem__：返回图片和信息

The only specificity that we require is that the dataset __getitem__ should return:

image: a PIL Image of size (H, W)
target: a dict containing the following fields
- boxes (FloatTensor[N, 4]): the coordinates of the N bounding boxes in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H
- labels (Int64Tensor[N]): the label for each bounding box. 0 represents always the background class.
- image_id (Int64Tensor[1]): an image identifier. It should be unique between all the images in the dataset, and is used during evaluation
- area (Tensor[N]): The area of the bounding box. This is used during evaluation with the COCO metric, to separate the metric scores between small, medium and large boxes.
- iscrowd (UInt8Tensor[N]): instances with iscrowd=True will be ignored during evaluation.
- (optionally) masks (UInt8Tensor[N, H, W]): The segmentation masks for each one of the objects
- (optionally) keypoints (FloatTensor[N, K, 3]): For each one of the N objects, it contains the K keypoints in [x, y, visibility] format, defining the object. visibility=0 means that the keypoint is not visible. Note that for data augmentation, the notion of flipping a keypoint is dependent on the data representation, and you should probably adapt references/detection/transforms.py for your new keypoint representation

If your model returns the above methods, they will make it work for both training and evaluation, and will use the evaluation scripts from pycocotools which can be installed with pip install pycocotools.
如果您的模型返回上述方法，它们将使其适用于训练和评估，并将使用来自 pycocotools 的评估脚本，可以使用 pip install pycocotools 安装。

NOTE
One note on the labels. The model considers class 0 as background. If your dataset does not contain the background class, you should not have 0 in your labels. For example, assuming you have just two classes, cat and dog, you can define 1 (not 0) to represent cats and 2 to represent dogs. So, for instance, if one of the images has both classes, your labels tensor should look like [1,2].

Additionally, if you want to use aspect ratio grouping during training (so that each batch only contains images with similar aspect ratios), then it is recommended to also implement a get_height_and_width method, which returns the height and the width of the image. If this method is not provided, we query all elements of the dataset via __getitem__ , which loads the image in memory and is slower than if a custom method is provided.
另外，如果你想在训练时使用长宽比分组（这样每批只包含长宽比相似的图像），那么建议也实现一个 get_height_and_width 方法，它返回图像的高度和宽度。如果没有提供这个方法，我们会通过__getitem__查询数据集的所有元素，这会将图像加载到内存中，并且比提供自定义方法要慢。

1.1 自定义dataset代码

import numpy as np
from torch.utils.data import Dataset
import os
import torch
import json
from PIL import Image
from lxml import etree


class VOCDataSet(Dataset):
    """
        读取解析PASCAL VOC2007/2012数据集

        需要实现两个方法：
            1. __len__
            2. __getitem__
            [可选] 3. get_height_and_width
    """

    def __init__(self, voc_root, year="2012", transforms=None, txt_name: str = "train.txt"):
        assert year in ["2007", "2012"], "year must be in ['2007', '2012']"
        # 增加容错能力
        if "VOCdevkit" in voc_root:
            self.root = os.path.join(voc_root, f"VOC{year}")
        else:
            self.root = os.path.join(voc_root, "VOCdevkit", f"VOC{year}")
        self.img_root = os.path.join(self.root, "JPEGImages")
        self.annotations_root = os.path.join(self.root, "Annotations")

        # read train.txt or val.txt file
        txt_path = os.path.join(self.root, "ImageSets", "Main", txt_name)
        assert os.path.exists(txt_path), "not found {} file.".format(txt_name)

        # 打开txt文件并读取每一行
        """
            因为每一行后面都有一个换行符，所以使用line.strip()方法将换行符去掉
        """
        with open(txt_path) as read:
            # xml_list用list存储每一个图片信息文件的名称 xxx.xml
            xml_list = [os.path.join(self.annotations_root, line.strip() + ".xml")
                        for line in read.readlines() if len(line.strip()) > 0]

        self.xml_list = []
        # check file
        for xml_path in xml_list:
            if os.path.exists(xml_path) is False:  # 如果xxx.xml并不存在
                print(f"Warning: not found '{xml_path}', skip this annotation file.")
                continue

            # check for targets
            with open(xml_path) as fid:
                xml_str = fid.read()  # 读取标注文件的每一行
            """
                etree.fromstring()
                    该方法是将xml格式转化为Element对象，Element 对象代表 XML 文档中的一个元素。
                    元素可以包含属性、其他元素或文本。如果一个元素包含文本，则在文本节点中表示该文本。
                    传入的为一个xml文件，经过该方法后变成一个Element对象
            """
            xml = etree.fromstring(xml_str)
            data = self.parse_xml_to_dict(xml)["annotation"]  # 将xml文件解析成字典形式
            if "object" not in data:  # 标注信息中如果没有要对象
                print(f"INFO: no objects in {xml_path}, skip this annotation file.")
                continue

            self.xml_list.append(xml_path)

        assert len(self.xml_list) > 0, "in '{}' file does not find any information.".format(txt_path)

        # read class_indict
        json_file = './pascal_voc_classes.json'
        assert os.path.exists(json_file), "{} file not exist.".format(json_file)
        with open(json_file, 'r') as f:
            self.class_dict = json.load(f)  # 这里是把json转换为dict

        self.transforms = transforms

    def __len__(self):
        # 获取所有数据文件（标签个数=图片）的个数
        return len(self.xml_list)

    def __getitem__(self, idx):
        # read xml
        xml_path = self.xml_list[idx]  # 获取xml文件路径
        with open(xml_path) as fid:
            xml_str = fid.read()
        xml = etree.fromstring(xml_str)  # 读取xml文件: 

        """
            self.parse_xml_to_dict(xml)就可以得到字典，但因为是父节点，所以我们需要去掉它，故["annotation"]
            这样就可以得到我们想要的字典了：
                {"folder": "VOC2012",
                 "filename": "2007_000063.jpg",
                 "source": {"database": "The VOC2007 Database", "annotation": "PASCAL VOC2007", "image": "flickr"},
                 "size": {"width": '500', "height": '375', "depth": '3'},
                 "segmented": 1,
                 "object": {"name": "dog", "pose": "Unspecified", "truncated": '0', "difficult": '0', "bndbox": 
                                                            {"xmin": 123, "ymin": 115, "xmax": '379', "ymax": '275'}},
                 "object": {"name": "chair", "pose": "Frontal", "truncated": '1', "difficult": 0, "bndbox": 
                                                            {"xmin": '75', "ymin": '1', "xmax": '428', "ymax": '375'}}
                 }
             Note: 这样解析后，所有的值的数据类型均为str
        """
        data = self.parse_xml_to_dict(xml)["annotation"]  # 获取下一层字典的信息
        img_path = os.path.join(self.img_root, data["filename"])  # 获取图片完整的路径
        image = Image.open(img_path)  # 使用Pillow打开图片（这里只是在内存中打开并没有通过GUI显示）
        if image.format != "JPEG":
            raise ValueError("Image '{}' format not JPEG".format(img_path))

        boxes = []
        labels = []
        iscrowd = []  # coco数据独有的，表示目标之间是否有重叠，这里我们简单理解为“是否好检测”。0表示单目标 -> 比较好检测 -> 可以和
        # VOC的difficult参数结合起来
        assert "object" in data, "{} lack of object information.".format(xml_path)

        # 遍历每一个

[学习笔记] Faster R-CNN源码解析

0. 引言

0.1 如何找到PyTorch官方的模型源码

0.2 如何找到PyTorch官方的训练相关代码

0.3 Faster R-CNN的环境配置

0.4 文件结构

0.5 pascal_voc_classes.json

0.6 预训练权重下载地址（下载后放入backbone文件夹中）：

0.7 数据集

0.8 训练方法

0.9 注意事项

0.10 Faster R-CNN框架图

1. 自定义Dataset

1.1 自定义dataset代码

1.2 自定义Dataset的重点

1.3 自定义transform

2. Faster R-CNN框架图 —— `faster_rcnn_framework.py`

3. Transform —— `transform.py`

4. RPN源码讲解 —— `rpn_function.py`

4.1 RPN Head

4.2 Anchors Generator

4.3 Region Proposal Network

4.4 Detection utils

4.5 ROI Align、Two MLP Head、 Faster R-CNN Predictor

4.5.1 ROI Align

4.5.2 Two MLP Head

4.5.3 Faster R-CNN Predictor

4.6 Fast R-CNN正负样本划分及采样

4.6.1 select_training_samples

4.7 Fast R-CNN的损失计算

5. 换backbone

5.1 不带FPN

5.2 带FPN

参考

你可能感兴趣的:(PyTorch,目标检测,python,pytorch,深度学习)

[学习笔记] Faster R-CNN源码解析

0. 引言

0.1 如何找到PyTorch官方的模型源码

0.2 如何找到PyTorch官方的训练相关代码

0.3 Faster R-CNN的环境配置

0.4 文件结构

0.5 pascal_voc_classes.json

0.6 预训练权重下载地址（下载后放入backbone文件夹中）：

0.7 数据集

0.8 训练方法

0.9 注意事项

0.10 Faster R-CNN框架图

1. 自定义Dataset

1.1 自定义dataset代码

1.2 自定义Dataset的重点

1.3 自定义transform

2. Faster R-CNN框架图 —— faster_rcnn_framework.py

3. Transform —— transform.py

4. RPN源码讲解 —— rpn_function.py

4.1 RPN Head

4.2 Anchors Generator

4.3 Region Proposal Network

4.4 Detection utils

4.5 ROI Align、Two MLP Head、 Faster R-CNN Predictor

4.5.1 ROI Align

4.5.2 Two MLP Head

4.5.3 Faster R-CNN Predictor

4.6 Fast R-CNN正负样本划分及采样

4.6.1 select_training_samples

4.7 Fast R-CNN的损失计算

5. 换backbone

5.1 不带FPN

5.2 带FPN

参考

你可能感兴趣的:(PyTorch,目标检测,python,pytorch,深度学习)

2. Faster R-CNN框架图 —— `faster_rcnn_framework.py`

3. Transform —— `transform.py`

4. RPN源码讲解 —— `rpn_function.py`