目标检测之小试牛刀

1. 目标检测基础知识

1.1 目标检测概念

根据对比图像分类,来明晰目标检测:
图像分类:
只需要判断输入的图像中是否包含感兴趣物体。
目标检测:
需要在识别出图片中目标类别的基础上,还要精确定位到目标的具体位置,并用外接矩形框标出。

1.2 目标检测思路

总体思路:先确立众多候选框,再对候选框进行分类和微调。
目标检测之小试牛刀_第1张图片
图1 结合分类来看目标检测

1.3 目标框定义方式

在图像分类中,标签信息是类别。目标检测的标签信息除了类别label以外,需要同时包含目标的位置信息,也就是目标的外接矩形框bounding box。
用来表达bbox的格式通常有两种,(x1, y1, x2, y2) 和 (c_x, c_y, w, h) 目标检测之小试牛刀_第2张图片

1.4 交并比(IoU)

IoU的全称是交并比Intersection over Union),表示两个目标框的交集占其并集的比例。
目标检测之小试牛刀_第3张图片

具体计算方法:
1.首先获取两个框的坐标,红框坐标: 左上(red_x1, red_y1), 右下(red_x2, red_y2),绿框坐标: 左上(green_x1, green_y1),右下(green_x2, green_y2)
2.计算两个框左上点的坐标最大值:(max(red_x1, green_x1), max(red_y1, green_y1)), 和右下点坐标最小值:(min(red_x2, green_x2), min(red_y2, green_y2))
3.利用2算出的信息计算黄框面积:yellow_area
4.计算红绿框的面积:red_area 和 green_area
5.iou = yellow_area / (red_area + green_area - yellow_area)

2. 目标检测数据集VOC

2.1 VOC数据集介绍

2.1.1 数据集类别:
目标检测之小试牛刀_第4张图片
2.1.2 数据集量级
VOC数量集图像和目标数量的基本信息如下图所示:
目标检测之小试牛刀_第5张图片
其中,Images表示图片数量,Objects表示目标数量

2. VOC数据集的dataloader构建

2.1 数据集准备

  1. 准备create_data_lists.py
"""python
    create_data_lists
"""
from utils import create_data_lists

if __name__ == '__main__':
    # voc07_path,voc12_path为我们训练测试所需要用到的数据集,output_folder为我们生成构建dataloader所需文件的路径
    # 参数中涉及的路径以个人实际路径为准,建议将数据集放到dataset目录下,和教程保持一致
    create_data_lists(voc07_path='Desktop/Computer Science/Datawhale/CV/dataset/VOCdevkit/VOC2007',
                      voc12_path='Desktop/Computer Science/Datawhale/CV/dataset/VOCdevkit/VOC2012',
                      output_folder='Desktop/Computer Science/Datawhale/CV/dataset/VOCdevkit')
                    

在设置好对应路径后,我们可以在jupyter notebook上进行运行

2.2 构建dataloader
我们直接上代码:

"""python
    PascalVOCDataset具体实现过程
"""
import torch
from torch.utils.data import Dataset
import json
import os
from PIL import Image
from utils import transform


class PascalVOCDataset(Dataset):
    """
    A PyTorch Dataset class to be used in a PyTorch DataLoader to create batches.
    """

    #初始化相关变量
    #读取images和objects标注信息
    def __init__(self, data_folder, split, keep_difficult=False):
        """
        :param data_folder: folder where data files are stored
        :param split: split, one of 'TRAIN' or 'TEST'
        :param keep_difficult: keep or discard objects that are considered difficult to detect?
        """
        self.split = split.upper()    #保证输入为纯大写字母,便于匹配{'TRAIN', 'TEST'}

        assert self.split in {'TRAIN', 'TEST'}

        self.data_folder = data_folder
        self.keep_difficult = keep_difficult

        # Read data files
        with open(os.path.join(data_folder, self.split + '_images.json'), 'r') as j:
            self.images = json.load(j)
        with open(os.path.join(data_folder, self.split + '_objects.json'), 'r') as j:
            self.objects = json.load(j)

        assert len(self.images) == len(self.objects)

    #循环读取image及对应objects
    #对读取的image及objects进行tranform操作(数据增广)
    #返回PIL格式图像,标注框,标注框对应的类别索引,对应的difficult标志(True or False)
    def __getitem__(self, i):
        # Read image
        #*需要注意,在pytorch中,图像的读取要使用Image.open()读取成PIL格式,不能使用opencv
        #*由于Image.open()读取的图片是四通道的(RGBA),因此需要.convert('RGB')转换为RGB通道
        image = Image.open(self.images[i], mode='r')
        image = image.convert('RGB')

        # Read objects in this image (bounding boxes, labels, difficulties)
        objects = self.objects[i]
        boxes = torch.FloatTensor(objects['boxes'])  # (n_objects, 4)
        labels = torch.LongTensor(objects['labels'])  # (n_objects)
        difficulties = torch.ByteTensor(objects['difficulties'])  # (n_objects)

        # Discard difficult objects, if desired
        #如果self.keep_difficult为False,即不保留difficult标志为True的目标
        #那么这里将对应的目标删去
        if not self.keep_difficult:
            boxes = boxes[1 - difficulties]
            labels = labels[1 - difficulties]
            difficulties = difficulties[1 - difficulties]

        # Apply transformations
        #对读取的图片应用transform
        image, boxes, labels, difficulties = transform(image, boxes, labels, difficulties, split=self.split)

        return image, boxes, labels, difficulties

    #获取图片的总数,用于计算batch数
    def __len__(self):
        return len(self.images)

    #我们知道,我们输入到网络中训练的数据通常是一个batch一起输入,而通过__getitem__我们只读取了一张图片及其objects信息
    #如何将读取的一张张图片及其object信息整合成batch的形式呢?
    #collate_fn就是做这个事情,
    #对于一个batch的images,collate_fn通过torch.stack()将其整合成4维tensor,对应的objects信息分别用一个list存储
    def collate_fn(self, batch):
        """
        Since each image may have a different number of objects, we need a collate function (to be passed to the DataLoader).
        This describes how to combine these tensors of different sizes. We use lists.
        Note: this need not be defined in this Class, can be standalone.
        :param batch: an iterable of N sets from __getitem__()
        :return: a tensor of images, lists of varying-size tensors of bounding boxes, labels, and difficulties
        """

        images = list()
        boxes = list()
        labels = list()
        difficulties = list()

        for b in batch:
            images.append(b[0])
            boxes.append(b[1])
            labels.append(b[2])
            difficulties.append(b[3])

        #(3,224,224) -> (N,3,224,224)
        images = torch.stack(images, dim=0)

        return images, boxes, labels, difficulties  # tensor (N, 3, 224, 224), 3 lists of N tensors each

最后,我们可以向pytorch传入dataset直接构建DataLoader了。

"""python
    DataLoader
"""
#参数说明:
#在train时一般设置shufle=True打乱数据顺序,增强模型的鲁棒性
#num_worker表示读取数据时的线程数,一般根据自己设备配置确定(如果是windows系统,建议设默认值0,防止出错)
#pin_memory,在计算机内存充足的时候设置为True可以加快内存中的tensor转换到GPU的速度,具体原因可以百度哈~
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                                           collate_fn=train_dataset.collate_fn, num_workers=workers,
                                           pin_memory=True)  # note that we're passing the collate function here

文献资料参考DataWhale社区

你可能感兴趣的:(计算机视觉,python,深度学习,计算机视觉)