matterport_MaskRCNN(5)——代码解读model.py

model.py——def load_image_gt()

用于载入图片的相关标注信息(),即根据给定的image_id,从dataset中载入:原图,mask图,boxes信息
输入:
数据集实例,配置类实例,image_id,其他可选参数
输出:
原图([h,w,3]),图片原始形状,class_ids(图片中各目标的类别id),bbox(各目标的坐标信息),mask(各个目标的掩膜信息)

def load_image_gt(dataset, config, image_id, augment=False, augmentation=None,
                  use_mini_mask=False):
    """Load and return ground truth data for an image (image, mask, bounding boxes).
    载入并返回一张图片的真实数据(图片,掩膜,边界框)

    augment: 不建议使用。(deprecated. Use augmentation instead). If true, apply random
        image augmentation. Currently, only horizontal flipping is offered.
        
    augmentation: 可选项,用于图片增强,Optional. An imgaug (https://github.com/aleju/imgaug) augmentation.
        For example, passing imgaug.augmenters.Fliplr(0.5) flips images
        right/left 50% of the time.
    use_mini_mask: 
        如果不使用,返回的mask与原图一致,可能会非常大
        If False, returns full-size masks that are the same height
        and width as the original image. These can be big, for example
        1024x1024x100 (for 100 instances). Mini masks are smaller, typically,
        224x224 and are generated by extracting the bounding box of the
        object and resizing it to MINI_MASK_SHAPE.

    Returns:返回值
    image: 原图,[height, width, 3]
    shape: 图片的原始形状(resize且crop之前)
           the original shape of the image before resizing and cropping.
    class_ids: 类别id,也就是一维数组,长度=图中实例个数
               [instance_count] Integer class IDs
    bbox: 边界框,[instance_count, (y1, x1, y2, x2)]
    mask: 掩膜数组(图片)[height, width, instance_count]. The height and width are those
        of the image unless use_mini_mask is True, in which case they are
        defined in MINI_MASK_SHAPE.
    """
    # 载入图片和mask,load_image和load_mask见utils.py中dataset类的内部方法
    # 或者在重写class dataset时重新定义的
    image = dataset.load_image(image_id)          #根据id载入图片,如果是灰度图会自动转换成三通道[h,w,3]形式
    mask, class_ids = dataset.load_mask(image_id) # 获取mask图片[h,w,n]以及对应class_ids一维数组[n,]
    original_shape = image.shape                  # 图片原始shape(h,w,c)
    
    # 根据相关参数对image和mask进行resize
    image, window, scale, padding, crop = utils.resize_image(
        image,
        min_dim=config.IMAGE_MIN_DIM,
        min_scale=config.IMAGE_MIN_SCALE,
        max_dim=config.IMAGE_MAX_DIM,
        mode=config.IMAGE_RESIZE_MODE)
    mask = utils.resize_mask(mask, scale, padding, crop)

    # 数据增强部分,暂时略过
    # Random horizontal flips.
    # TODO: will be removed in a future update in favor of augmentation
    if augment:
        logging.warning("'augment' is deprecated. Use 'augmentation' instead.")
        if random.randint(0, 1):
            image = np.fliplr(image)
            mask = np.fliplr(mask)

    # Augmentation
    # This requires the imgaug lib (https://github.com/aleju/imgaug)
    if augmentation:
        import imgaug

        # Augmenters that are safe to apply to masks
        # Some, such as Affine, have settings that make them unsafe, so always
        # test your augmentation on masks
        MASK_AUGMENTERS = ["Sequential", "SomeOf", "OneOf", "Sometimes",
                           "Fliplr", "Flipud", "CropAndPad",
                           "Affine", "PiecewiseAffine"]

        def hook(images, augmenter, parents, default):
            """Determines which augmenters to apply to masks."""
            return augmenter.__class__.__name__ in MASK_AUGMENTERS

        # Store shapes before augmentation to compare
        image_shape = image.shape
        mask_shape = mask.shape
        # Make augmenters deterministic to apply similarly to images and masks
        det = augmentation.to_deterministic()
        image = det.augment_image(image)
        # Change mask to np.uint8 because imgaug doesn't support np.bool
        mask = det.augment_image(mask.astype(np.uint8),
                                 hooks=imgaug.HooksImages(activator=hook))
        # Verify that shapes didn't change
        assert image.shape == image_shape, "Augmentation shouldn't change image size"
        assert mask.shape == mask_shape, "Augmentation shouldn't change mask size"
        # Change mask back to bool
        mask = mask.astype(np.bool)

    # 如果resize使用了crop,可能将mask裁减掉,这里筛选掉全0的mask
    # 将mask的第0,1维度所有像素值相加,并判断是否>0
    # _idx是一个bool值列表,如果是False说明该index对应的mask[:,:,index]像素全0,要被筛选掉
    _idx = np.sum(mask, axis=(0, 1)) > 0     
    mask = mask[:, :, _idx]                 # _idx中应False的维度将被删除
    class_ids = class_ids[_idx]             #class_ids中对应的也删除
    # 根据mask中的像素值计算边界框的角点值
    bbox = utils.extract_bboxes(mask)

    # Active classes  活动的类别,应该是指用到的类别吧,在单数据集训练时似乎没有用处
    # Different datasets have different classes, so track the
    # classes supported in the dataset of this image.
    active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32)
    source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]]
    active_class_ids[source_class_ids] = 1

    # 如果使用nimi_mask则,再次对mask进行resize,可以节约内存
    if use_mini_mask:
        mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)

    # 将原始图片及相关信息(原图尺寸,缩放因子等等)存入一维数组中
    image_meta = compose_image_meta(image_id, original_shape, image.shape,
                                    window, scale, active_class_ids)

    return image, image_meta, class_ids, bbox, mask

untils.py——class Dataset(object)

所有数据集的父类,当使用模型时,我们需要根据自己的数据集,创建子类并重写其中的一些方法。
在该类中,数据的组织形式是image_info=[],该list中包含了多个键值对,对应了图片的各种信息,在载入图片时,主要是从image_info中取需要的信息。
数据集的构建过程:先自定义数据集类,然后手动导入类别信息和图片信息(调用add_class以及add_image),之后的操作全在dataset.prepare()中

class Dataset(object):
	# 初始化
    def __init__(self, class_map=None):
        self._image_ids = [] #图片类别id号的list
        self.image_info = [] #图片信息(最常用的list)
        self.class_info = [{"source": "", "id": 0, "name": "BG"}] 
        # class_info包含了整个数据集中所有的类别name和id,而source指的应该是‘COCO’,'VOC'这种数据集名称
        self.source_class_ids = {}  #源类别id
	
	# 添加新类别
    def add_class(self, source, class_id, class_name):
        assert "." not in source, "Source name cannot contain a dot"
        # 如果该类已经添加过,直接返回
        for info in self.class_info:
            if info['source'] == source and info["id"] == class_id:
                return
        # 否则,添加该类(以字典形式添加到class_info中)
        self.class_info.append({
            "source": source,
            "id": class_id,
            "name": class_name,
        })
        
    # 添加图片,输入(source,图片id,图片路径,其他参数)
    def add_image(self, source, image_id, path, **kwargs):
        image_info = {
            "id": image_id,
            "source": source,
            "path": path,
        }
        image_info.update(kwargs)           #如果有其他参数传入,则更新image_info
        self.image_info.append(image_info)  #将图片信息以字典形式加入到image_info的list中
        
    # 用于查找图像用
    def image_reference(self, image_id):
        """可以根据图片id给图片一个链接信息
        方便图片查找(源码中没有写,故跳过)
        """
        return ""
    
    # 数据集准备工作(包含了类别和图片的从name->id的映射,以及当数据来源多余一个数据集时,对数据集进行整理)
    # 数据集来源source指的是‘COCO’,‘VOC’这种,当然也可以自定义,比如‘18年’,‘19年’等
    def prepare(self, class_map=None):
        """
        数据集的准备工作

        TODO: class map is not supported yet. When done, it should handle mapping
              classes from different datasets to the same class ID.
              尚不支持类映射。 完成后,它应处理从不同数据集到相同类ID的映射类。
        """
        # 返回一个简短的对象名用于简洁显示(没看懂意义)
        # 可能是作者所使用的数据名称有特定格式,作用就是取name中第一个','之前的字符串
        def clean_name(name):
            return ",".join(name.split(",")[:1])
        
        # 从info字典中创建(或重建)其他信息
        self.num_classes = len(self.class_info)       #获取类别数目(类别信息list的长度)
        self.class_ids = np.arange(self.num_classes)  #生成类别id(用于将class_name映射成class_id)
        self.class_names = [clean_name(c["name"]) for c in self.class_info]   #类别名称list
        self.num_images = len(self.image_info)        #图片数量
        self._image_ids = np.arange(self.num_images)  # 图片id(也是用于映射的)
        
        # 映射操作,即对names和ids两个迭代器进行同时迭代,然后生成字典
        # 字典中的键值对就是映射关系
        # 先进行类别映射
        # 从class_info和class_ids同时取值,然后生成如 'COCO.x':'y' 形式的键值对
        self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id
                                      for info, id in zip(self.class_info, self.class_ids)}
        # 进行图片映射,同上
        self.image_from_source_map = {"{}.{}".format(info['source'], info['id']): id
                                      for info, id in zip(self.image_info, self.image_ids)}
        
        # 获取source名称的list,使用set避免重复
        self.sources = list(set([i['source'] for i in self.class_info]))
        
        self.source_class_ids = {}
        # Loop over datasets
        # 遍历数据集
        for source in self.sources:
            # 对于sources中的每个中数据集,都在source_class_ids中创建一个键值对,键是数据集名称,值是空列表
            self.source_class_ids[source] = []
            # 寻找属于该classes的数据集
            # 遍历class_info,找到属于数据集source中的类别,添加到对应的列表中
            # 注:背景类属于任何一个数据集,所以都要添加
            for i, info in enumerate(self.class_info):
                # Include BG class in all datasets所有数据集都包含背景类
                if i == 0 or source == info['source']:
                    self.source_class_ids[source].append(i)
                    
    # 根据source名称获取对应的分配其的整型ID,数据集来源多余一个时用到,此处略过
    def map_source_class_id(self, source_class_id):
        """Takes a source class ID and returns the int class ID assigned to it.
        For example:
        dataset.map_source_class_id("coco.12") -> 23
        """
        return self.class_from_source_map[source_class_id]
    # 根据source和class_id查到该类在calss_innfo中的id,只有一类数据集时也用不到
    def get_source_class_id(self, class_id, source):
        """Map an internal class ID to the corresponding class ID in the source dataset.
        将内部类ID映射到源数据集中的相应类ID
        输入分配给该类的id,看该id对应的source与输入source"""
        info = self.class_info[class_id]
        assert info['source'] == source
        return info['id']
        
    @property  # 一个装饰器,用于把函数方法变成属性,方便调用,即可以直接写list1=dataset.image_ids
    # 返回图片内部id的list,用于查看有多找张图像以及他们在数据集中的id
    def image_ids(self):
        return self._image_ids
    # 根据image_id返回image的path
    def source_image_link(self, image_id):
        return self.image_info[image_id]["path"]
        
    # 载入图像,一般来说是要根据自己的数据集格式,进行重写的
    # 只要保证返回值是[h,w,3]即可,要与网络的输入相对应
    def load_image(self, image_id):
        """Load the specified image and return a [H,W,3] Numpy array.
        """
        image = skimage.io.imread(self.image_info[image_id]['path'])
        # 如果是灰度图则变成RGB图,图是有alpha通道则删除改成3通道图
        # If grayscale. Convert to RGB for consistency.
            image = skimage.color.gray2rgb(image)
        # If has an alpha channel, remove it for consistency
        if image.shape[-1] == 4:
            image = image[..., :3]
        return image 
        
    def load_mask(self, image_id):
        """.
        载入给定图片的mask实例,也需要根据数据集格式进行重写
        保证其返回值的shape=[height,width,instances]且为bool型,也是为了与网络对应
        class_ids:一维数组,对应了mask中每个instances的类别id
        即,mask[:,:,i]代表一个实例的mask,其类别id号是class_ids[i]
        """
        # 由于需要重写,源码只给出了空mask的代码
        logging.warning("You are using the default load_mask(), maybe you need to define your own one.")
        mask = np.empty([0, 0, 0])
        class_ids = np.empty([0], np.int32)
        return mask, class_ids

untils.py——def resize_image

根据congfig中的相关参数,对进行image resize,resize前后的图片长宽比保持不变
min_dim:resize时图片较小的边的边长
max_dim:长边边长
min_scale:最小缩放尺度,min_dim的值不能小于最小缩放尺度
mode:resize的模式。默认‘square’
返回值:
image:resize后的图片
window:(y1, x1, y2, x2)当使用了padding时,该值给出了原图在resize图中坐标
scale:resize时使用的比例因子
padding:[(top, bottom), (left, right), (0, 0)],resize时的padding操作信息

def resize_image(image, min_dim=None, max_dim=None, min_scale=None, mode="square"):
	# 保持传入图片和输出图片的type一致
    image_dtype = image.dtype
    # window,scale,padding设置默认值
    h, w = image.shape[:2]
    window = (0, 0, h, w)
    scale = 1
    padding = [(0, 0), (0, 0), (0, 0)]
    crop = None# 根据参数求所使用的scale
    if mode == "none":
        return image, window, scale, padding, crop
    if min_dim:
        # 根据min_dim计算缩放的比例因子scale,如果小于min_scale,则使用min_scale
        scale = max(1, min_dim / min(h, w))
    if min_scale and scale < min_scale:
        scale = min_scale
    # 如果给出了max_dim且是‘square’模式,则判断使用之前的scale进行缩放
    # 其长边是否会大于给出max_dim,如果超过了,再次改变缩放因子scale
    if max_dim and mode == "square":
        image_max = max(h, w)
        if round(image_max * scale) > max_dim:
            scale = max_dim / image_max

    # 使用双线性插值进行resize
    if scale != 1:
        image = resize(image, (round(h * scale), round(w * scale)),
                       preserve_range=True)

    # 进行padding操作
    if mode == "square":
        h, w = image.shape[:2]  
        top_pad = (max_dim - h) // 2  
        bottom_pad = max_dim - h - top_pad  
        left_pad = (max_dim - w) // 2   
        right_pad = max_dim - w - left_pad  
        padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
        image = np.pad(image, padding, mode='constant', constant_values=0)
        window = (top_pad, left_pad, h + top_pad, w + left_pad)
    elif mode == "pad64":
        h, w = image.shape[:2]
        # Both sides must be divisible by 64
        assert min_dim % 64 == 0, "Minimum dimension must be a multiple of 64"
        # Height
        if h % 64 > 0:
            max_h = h - (h % 64) + 64
            top_pad = (max_h - h) // 2
            bottom_pad = max_h - h - top_pad
        else:
            top_pad = bottom_pad = 0
        # Width
        if w % 64 > 0:
            max_w = w - (w % 64) + 64
            left_pad = (max_w - w) // 2
            right_pad = max_w - w - left_pad
        else:
            left_pad = right_pad = 0
        padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
        image = np.pad(image, padding, mode='constant', constant_values=0)
        window = (top_pad, left_pad, h + top_pad, w + left_pad)
    elif mode == "crop":
        # Pick a random crop
        h, w = image.shape[:2]
        y = random.randint(0, (h - min_dim))
        x = random.randint(0, (w - min_dim))
        crop = (y, x, min_dim, min_dim)
        image = image[y:y + min_dim, x:x + min_dim]
        window = (0, 0, min_dim, min_dim)
    else:
        raise Exception("Mode {} not supported".format(mode))
    return image.astype(image_dtype), window, scale, padding, crop

untils.py——def resize_mask(mask, scale, padding, crop=None)

根据给定的scale和padding信息,进行mask图的resize,是在上述resize_image之后的操作。

def resize_mask(mask, scale, padding, crop=None):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        # 使用scipy.ndimage.zoom直接对数组进行resize
        mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0)
    # 进行裁剪或者padding
    if crop is not None:
        y, x, h, w = crop
        mask = mask[y:y + h, x:x + w]
    else:
        mask = np.pad(mask, padding, mode='constant', constant_values=0)
    return mask

untils.py——def extract_bboxes(mask):

根据mask提取边界框,使用np.wherenp.any查找像素值不全=0的行,列,以此求出角点坐标
输入:
mask:[h,w,num_instances],其值只有0或1
返回值:
bbox_array:[num_instances,(y1, x1, y2, x2)]

def extract_bboxes(mask):
    # 根据num_instances确定boxes的维度(n,4)
    boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
    # 逐个取instance的mask
    for i in range(mask.shape[-1]):
        m = mask[:, :, i]
        # Bounding box.
        # 水平方向上boxes对应的索引号(也就是哪一列中有mask的像素),where方法返回的都是满足条件的索引号
        horizontal_indicies = np.where(np.any(m, axis=0))[0]
        # 对应垂直方向的
        vertical_indicies = np.where(np.any(m, axis=1))[0]
        # 如果horizontal_indicies[0]不是0,即存在值
        # 则可以得到box的四个坐标
        if horizontal_indicies.shape[0]:
            x1, x2 = horizontal_indicies[[0, -1]]
            y1, y2 = vertical_indicies[[0, -1]]
            # x2 and y2 should not be part of the box. Increment by 1.
            x2 += 1
            y2 += 1
        # 否则返回0
        else:
            # No mask for this instance. Might happen due to
            # resizing or cropping. Set bbox to zeros
            x1, x2, y1, y2 = 0, 0, 0, 0
        # 将第i个mask的box信息添加进去
        boxes[i] = np.array([y1, x1, y2, x2])
    # 返回boxes
    return boxes.astype(np.int32)

untils.py——def minisize_mask(bbox, mask, mini_shape):

具体操作:先使用boxes信息将mask裁剪出去,即去掉了无用的背景像素,然后再对裁剪出的mask做一次resize

def minimize_mask(bbox, mask, mini_shape):
    """Resize masks to a smaller version to reduce memory load.
    Mini-masks can be resized back to image scale using expand_masks()
    See inspect_data.ipynb notebook for more details.
    """
    # 创建np数组时,可以直接通过加法来增加维度,如np.zeros((56,56) + (10,)),则会创建(56,56,10)的数组
    mini_mask = np.zeros(mini_shape + (mask.shape[-1],), dtype=bool)
    for i in range(mask.shape[-1]):
        # Pick slice and cast to bool in case load_mask() returned wrong dtype
        m = mask[:, :, i].astype(bool)
        y1, x1, y2, x2 = bbox[i][:4]
        m = m[y1:y2, x1:x2]
        if m.size == 0:
            raise Exception("Invalid bounding box with area of zero")
        # Resize with bilinear interpolation
        m = resize(m, mini_shape)
        mini_mask[:, :, i] = np.around(m).astype(np.bool)
    return mini_mask

config.py

配置文件,在自定义训练时,需要创建class(config)的子类,并根据需求重写部分成员。

"""
Mask R-CNN
Base Configurations class.
Copyright (c) 2017 Matterport, Inc.
Licensed under the MIT License (see LICENSE for details)
Written by Waleed Abdulla
"""
import numpy as np

class Config(object):
    """
    基础的config类,对于特定的config,创建一个子类并继承于此,然后重写需要改变的属性即可
    """
    
    
    NAME = None          # 配置子类的名称,便于区分
    GPU_COUNT = 1        # GPU使用数量
    IMAGES_PER_GPU = 2   # 训练时每个GPU处理的图片数量,根据GPU内存和图片尺寸调整。

    # Number of training steps per epoch
    # This doesn't need to match the size of the training set. Tensorboard
    # updates are saved at the end of each epoch, so setting this to a
    # smaller number means getting more frequent TensorBoard updates.
    # Validation stats are also calculated at each epoch end and they
    # might take a while, so don't set this too small to avoid spending
    # a lot of time on validation stats.
    # 每个epoch的训练步长steps
    STEPS_PER_EPOCH = 1000

    # Number of validation steps to run at the end of every training epoch.
    # A bigger number improves accuracy of validation stats, but slows
    # down the training.
    # 每个训练epoch结束时的验证步长steps。
    # 增加该值可以提高验证统计信息的精确度,但是会减慢训练时间
    VALIDATION_STEPS = 50

    # Backbone network architecture
    # Supported values are: resnet50, resnet101.
    # You can also provide a callable that should have the signature
    # of model.resnet_graph. If you do so, you need to supply a callable
    # to COMPUTE_BACKBONE_SHAPE as well
    # 网络的骨干结构,支持resnet50和resnet101,也可以自定一个可调用的网络,但是需要在COMPUTE_BACKBONE_SHAPE中进行定义
    BACKBONE = "resnet101"

    # Only useful if you supply a callable to BACKBONE. Should compute
    # the shape of each layer of the FPN Pyramid.
    # See model.compute_backbone_shapes
    # 只在你提供了可调用的BACKBONE情况下使用,会计算FPN金字塔的每一层的shape
    # 见model.compute_backbone_shapes
    COMPUTE_BACKBONE_SHAPE = None

    # The strides of each layer of the FPN Pyramid. These values
    # are based on a Resnet101 backbone.
    # FPN金字塔每一层的滑动步长。这些值是基于Resnet101骨干的
    BACKBONE_STRIDES = [4, 8, 16, 32, 64]

    # Size of the fully-connected layers in the classification graph
    # 在分类器的graph中的全连接层的size
    FPN_CLASSIF_FC_LAYERS_SIZE = 1024

    # Size of the top-down layers used to build the feature pyramid
    # 用来建立特征图金字塔的top-down层的size
    TOP_DOWN_PYRAMID_SIZE = 256

    # Number of classification classes (including background)
    # 分类器的类别数(包含背景)
    NUM_CLASSES = 1  # 需要在子类中重写

    # Length of square anchor side in pixels
    # 矩形anchor的长度,以像素为单位
    RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)

    # Ratios of anchors at each cell (width/height)
    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
    # 每个cell处的anchor比例(宽高比),1是正方形,0.5和2是长方形
    RPN_ANCHOR_RATIOS = [0.5, 1, 2]

    # Anchor stride
    # If 1 then anchors are created for each cell in the backbone feature map.
    # If 2, then anchors are created for every other cell, and so on.
    # Anchor滑动步长
    # 如果是1,对feature map上的每个cell都创建一个anchor
    # 如果是2,每隔一个cell创建一个anchor,以此类推
    RPN_ANCHOR_STRIDE = 1

    # Non-max suppression threshold to filter RPN proposals.
    # You can increase this during training to generate more propsals.
    # 过滤RPN推荐区域使用的NMS阈值
    # 你可以在训练时增加它以生成更多的推荐区域
    RPN_NMS_THRESHOLD = 0.7

    # How many anchors per image to use for RPN training
    #每张图片使用多少anchors进行RPN训练
    RPN_TRAIN_ANCHORS_PER_IMAGE = 256
    
    # ROIs kept after tf.nn.top_k and before non-maximum suppression
    # 在tf.nn.top_k之后且在NMS之前保持ROIs数量
    PRE_NMS_LIMIT = 6000

    # ROIs kept after non-maximum suppression (training and inference)
    # 在NMS之后保持的ROIs数量(训练和推理过程)
    POST_NMS_ROIS_TRAINING = 2000
    POST_NMS_ROIS_INFERENCE = 1000

    # If enabled, resizes instance masks to a smaller size to reduce
    # memory load. Recommended when using high-resolution images.
    # 如果启用,会将实例的maskresize成更小的size来减少内存使用。当使用高分辨率图片是开启
    USE_MINI_MASK = True
    MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask

    # Input image resizing
    # 输入图片的resize
    # Generally, use the "square" resizing mode for training and predicting
    # and it should work well in most cases. In this mode, images are scaled
    # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
    # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
    # padded with zeros to make it a square so multiple images can be put
    # in one batch.
    # 通常,使用“正方形”调整大小模式进行训练和预测,并且在大多数情况下应该可以很好地
    # 工作。 在此模式下,按比例放大图像,使小边= IMAGE_MIN_DIM,但要确保缩放不会使
    # 长边> IMAGE_MAX_DIM。 然后用零填充图像以使其成为正方形,以便可以将一批图像放
    # 入一个批次batch中。
    # Available resizing modes:
    # 可选的resize模式:none,square,pad64
    # none:   No resizing or padding. Return the image unchanged.
    # square: Resize and pad with zeros to get a square image
    #         of size [max_dim, max_dim].
    # pad64:  Pads width and height with zeros to make them multiples of 64.
    #         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
    #         up before padding. IMAGE_MAX_DIM is ignored in this mode.
    #         The multiple of 64 is needed to ensure smooth scaling of feature
    #         maps up and down the 6 levels of the FPN pyramid (2**6=64).
    #         用零填充宽度和高度,以使其为64的倍数。如果IMAGE_MIN_DIM或IMAGE_MIN_SCALE
    #         不为None,则在填充之前会放大。 在此模式下,将忽略IMAGE_MAX_DIM。需要64的倍
    #         数来确保特征图在FPN金字塔的6个级别上下平滑缩放(2 ** 6 = 64)。
    # crop:   Picks random crops from the image. First, scales the image based
    #         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
    #         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
    #         IMAGE_MAX_DIM is not used in this mode.
    #         对图像进行随机裁剪。首先,根据IMAGE_MIN_DIM和IMAGE_MIN_SCALE缩放图像
    #         ,然后选择大小为IMAGE_MIN_DIM x IMAGE_MIN_DIM的随机裁剪。 仅可用于训
    #         练。在此模式下不使用IMAGE_MAX_DIM
    IMAGE_RESIZE_MODE = "square"
    IMAGE_MIN_DIM = 800
    IMAGE_MAX_DIM = 1024
    # Minimum scaling ratio. Checked after MIN_IMAGE_DIM and can force further
    # up scaling. For example, if set to 2 then images are scaled up to double
    # the width and height, or more, even if MIN_IMAGE_DIM doesn't require it.
    # However, in 'square' mode, it can be overruled by IMAGE_MAX_DIM.
    # 最小缩放比例。在MIN_DIM之后会检查缩放尺寸是否小于MIN_SCALE,如果小于,则强制时候MIN_SCALE
    IMAGE_MIN_SCALE = 0
    # Number of color channels per image. RGB = 3, grayscale = 1, RGB-D = 4
    # Changing this requires other changes in the code. See the WIKI for more
    # details: https://github.com/matterport/Mask_RCNN/wiki
    # 图片颜色通道
    IMAGE_CHANNEL_COUNT = 3

    # Image mean (RGB)
    # 图片均值(RGB)
    MEAN_PIXEL = np.array([123.7, 116.8, 103.9])

    # Number of ROIs per image to feed to classifier/mask heads
    # The Mask RCNN paper uses 512 but often the RPN doesn't generate
    # enough positive proposals to fill this and keep a positive:negative
    # ratio of 1:3. You can increase the number of proposals by adjusting
    # the RPN NMS threshold.
    # 每张图片的ROIs数量,即被送入分类器或者mask分支头部的ROIs
    # 论文建议是512,但是RPN通常不会生成那么多正推荐区域并且保持正负比为1:3
    # 你可以通过调整NMS阈值来增加它
    TRAIN_ROIS_PER_IMAGE = 200

    # Percent of positive ROIs used to train classifier/mask heads
    # 使用的正类ROIs的百分比
    ROI_POSITIVE_RATIO = 0.33

    # Pooled ROIs
    # ROIs池化的尺寸(ROI池化层,输入为任意,输出为固定尺寸)
    POOL_SIZE = 7
    MASK_POOL_SIZE = 14

    # Shape of output mask
    # To change this you also need to change the neural network mask branch
    # 输出mask的形状,想改变此数值,你也需要去改变神经网络的mask分支
    MASK_SHAPE = [28, 28]

    # Maximum number of ground truth instances to use in one image
    # 一张图片中最大的GT实例数量
    MAX_GT_INSTANCES = 100

    # Bounding box refinement standard deviation for RPN and final detections.
    # RPN和最终检测的边界框 优化标准偏差。
    RPN_BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])
    BBOX_STD_DEV = np.array([0.1, 0.1, 0.2, 0.2])

    # Max number of final detections
    # 最终检测目标的最大数目
    DETECTION_MAX_INSTANCES = 100

    # Minimum probability value to accept a detected instance
    # ROIs below this threshold are skipped
    # 可接受的检测到的实例的最小置信度
    DETECTION_MIN_CONFIDENCE = 0.7

    # Non-maximum suppression threshold for detection
    # 检测的NMS
    DETECTION_NMS_THRESHOLD = 0.3

    # Learning rate and momentum
    # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
    # weights to explode. Likely due to differences in optimizer
    # implementation.
    # 学习率和动量,论文中lr=0.2,但是在tensorflow会导致梯度爆炸,可能是优化器不同
    LEARNING_RATE = 0.001
    LEARNING_MOMENTUM = 0.9

    # Weight decay regularization
    # 权重衰减的正则化项
    WEIGHT_DECAY = 0.0001

    # Loss weights for more precise optimization.
    # 损失权重以进行更精确的优化
    # Can be used for R-CNN training setup.
    # 可以用在R-CNN的训练配置上。
    LOSS_WEIGHTS = {
        "rpn_class_loss": 1.,
        "rpn_bbox_loss": 1.,
        "mrcnn_class_loss": 1.,
        "mrcnn_bbox_loss": 1.,
        "mrcnn_mask_loss": 1.
    }

    # Use RPN ROIs or externally generated ROIs for training
    # Keep this True for most situations. Set to False if you want to train
    # the head branches on ROI generated by code rather than the ROIs from
    # the RPN. For example, to debug the classifier head without having to
    # train the RPN.
    # 使用RPN ROIs或者外部生成的ROIs进行训练
    # 大多数情况下设为True。
    # 设置False的情况:你想要通过代码训练用于生成ROI的头部分支而不是从RPN获得ROIs
    # 例如,无需训练RPN就可以debug分类器
    USE_RPN_ROIS = True

    # Train or freeze batch normalization layers
    #     None: Train BN layers. This is the normal mode
    #     False: Freeze BN layers. Good when using a small batch size
    #     True: (don't use). Set layer in training mode even when predicting
    # 训练或者冻结BN层
    # none:训练BN层,这是正常模式
    # False:冻结BN层,当时用小批次时效果较好
    # True:(不使用),即使在预测时,也将该层设置为训练模式
    TRAIN_BN = False  # Defaulting to False since batch size is often small
    # 默认关闭,因为batch size通常都较小

    # Gradient norm clipping
    # 梯度正则化裁剪(防止梯度爆炸的trick)
    GRADIENT_CLIP_NORM = 5.0

    def __init__(self):
        """Set values of computed attributes.
        设置计算的属性的值"""
        # Effective batch size
        # 有效batch size(GPU数量*每个GPU处理的数量)
        self.BATCH_SIZE = self.IMAGES_PER_GPU * self.GPU_COUNT

        # Input image size
        # 输入图像尺寸
        if self.IMAGE_RESIZE_MODE == "crop":
            self.IMAGE_SHAPE = np.array([self.IMAGE_MIN_DIM, self.IMAGE_MIN_DIM,
                self.IMAGE_CHANNEL_COUNT])
        else:
            self.IMAGE_SHAPE = np.array([self.IMAGE_MAX_DIM, self.IMAGE_MAX_DIM,
                self.IMAGE_CHANNEL_COUNT])

        # Image meta data length
        # See compose_image_meta() for details
        # 图片元数据长度,详见compose_image_meta()
        self.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + self.NUM_CLASSES

    def display(self):
        """Display Configuration values.
        显示配置值"""
        print("\nConfigurations:")
        # dir() 函数不带参数时,返回当前范围内的变量、方法和定义的类型列表;
        # 带参数时,返回参数的属性、方法列表。
        for a in dir(self):
            if not a.startswith("__") and not callable(getattr(self, a)):
            # 如果a不以__开头并且 self的属性a是可调用的
                print("{:30} {}".format(a, getattr(self, a)))
                # 打印a的名称及其属性
        print("\n")

model.py——class MaskRCNN():

封装好的Mask RCNN模型

    """Encapsulates the Mask RCNN model functionality. The actual Keras model is in the keras_model property.
    """
    # 初始化操作
    def __init__(self, mode, config, model_dir):
        """
        mode: Either "training" or "inference"   训练模式或者推理模式的模型
        config: A Sub-class of the Config class  配置信息,Config的子类
        model_dir: Directory to save training logs and trained weights  模型保存路径
        """
        assert mode in ['training', 'inference']
        self.mode = mode
        self.config = config
        self.model_dir = model_dir
        self.set_log_dir()
        # 使用build来创建keras模型
        self.keras_model = self.build(mode=mode, config=config)
	def build(self, mode, config):
        """Build Mask R-CNN architecture.
            input_shape: The shape of the input image.
            mode: Either "training" or "inference". The inputs and
                outputs of the model differ accordingly.
        """
        assert mode in ['training', 'inference']
		
        # 训练图片的宽高必须至少被2^6(64)整除,避免采样时出现小数
        h, w = config.IMAGE_SHAPE[:2]
        if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6):
            raise Exception("Image size must be dividable by 2 at least 6 times "
                            "to avoid fractions when downscaling and upscaling."
                            "For example, use 256, 320, 384, 448, 512, ... etc. ")
		# 训练图片的输入层
        input_image = KL.Input(shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
        input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE], name="input_image_meta")
        # 在训练模式先,还需要输入RPN网络的训练数据,以及GT数据
        if mode == "training":
            # 创建RPN数据的输入层,分别是n*1的向量,1代表的是anchor的标签(-1,0,1)和n*4的tensor,4代表box的4个数据
            input_rpn_match = KL.Input(shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
            input_rpn_bbox = KL.Input(shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)

            # 图片的gt_class数据输入层,一维向量,表示各boxes的class
            input_gt_class_ids = KL.Input(shape=[None], name="input_gt_class_ids", dtype=tf.int32)
            # 输入图片的gt_boxes数据输入层,[batch, 4],表示各个box的坐标 
            # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
            input_gt_boxes = KL.Input(shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
            
            # 该层的输入是input_gt_boxes,输出是归一化后的gt_boxes,用于将boxes的坐标归一化
            gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
                x, K.shape(input_image)[1:3]))(input_gt_boxes)
            # Mask的输入层
            if config.USE_MINI_MASK:
                input_gt_masks = KL.Input(shape=[config.MINI_MASK_SHAPE[0],
                           config.MINI_MASK_SHAPE[1], None], name="input_gt_masks", dtype=bool)
            else:
                input_gt_masks = KL.Input(shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],name="input_gt_masks",dtype=bool)
        # 如果是推理模式,只需要输入RPN网络生成的anchor即可
        elif mode == "inference":
            input_anchors = KL.Input(shape=[None, 4], name="input_anchors")

        # 构建共享卷积层,
        if callable(config.BACKBONE):
            _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,
                                                train_bn=config.TRAIN_BN)
        else:
            _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
                                             stage5=True, train_bn=config.TRAIN_BN)
        # 在获得多个stage(FPN中所说的多个level的feature map,即特征图金字塔),对其进行上采样和叠加,得到包含语义信息和类别信息的特征图P2~P5
        # 对C5进行1*1卷积得到P5,使其通道数与C4一致,TOP_DOWN_SIZE指的就是FPN中的特征图的Size
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)
        # P4:将P5进行上采样,对C4进行1*1卷积,相加
        P4 = KL.Add(name="fpn_p4add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
        # P3算法与P4相同
        P3 = KL.Add(name="fpn_p3add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
        # P2同上
        P2 = KL.Add(name="fpn_p2add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
        # 对所有的P进行3*3卷积,得到最终的特征图
        # 注:全都使用padding,所以卷积后size不变
        P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
        P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
        P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
        # P6 is used for the 5th anchor scale in RPN. Generated by subsampling from P5 with stride of 2.
        # P6用于anchor的第五个scale??
        P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)                                   

model.py——def resnet_graph(input_image, architecture, stage5=False, train_bn=True):

构建resnet网络的流图graph,输入是image,输出是五个stage的feature,C1~C5

def resnet_graph()(input_image, architecture, stage5=False, train_bn=True):
'''
stage5:bool型,是否创建第五个卷积阶段stage5
train_bn:bool型,是否训练BN层
'''
	assert architecture in ["resnet50","rennet101"]
	
	# Stage1 (1次conv+pooling)
	x = KL.ZeroPadding2D((3,3))(input_image)
	x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
	x = BatchNorm(name='bn_conv1')(x, training=train_bn)
	x = KL.Activation('relu')(x)
	C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)
	
	# Stage2 (1次conv_block,1次identity_block)
	x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)
	x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)
	C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)
	
    # Stage 3 (一次conv_block,三次identity_block)
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)
    
    # Stage 4 (一次conv_block,n次identity_block)
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
    #根据是resnet50还是resnet101,添加不同数量的identity_block层
    block_count = {"resnet50": 5, "resnet101": 22}[architecture]
    for i in range(block_count):
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
    C4 = x
    
    # Stage 5 (如果启用了stage5=True,启用,一次conv_block,一次indentity_block)
    if stage5:
        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)
    else:
        C5 = None
    return [C1, C2, C3, C4, C5]

model.py——def conv_block(input_tensor, kernel_size, filters, stage, block,strides=(2, 2), use_bias=True, train_bn=True):

ResNet中的残差模块,内部包含了4层卷积层和1层add层,及其对应的BN层和激活层,其中前三个卷积层是提取特征,最后一个只是为了使得shape一致。
conv_block整体被视为一层的时候其内部学习的数据特征,是输入和输出之间的残差(因为输入是feature,输出是其经过三次卷积之后,再加上自身得到的结果),所以其参数量要少于正常学习到的三层卷积参数,这也是ResNet的特点所在。
输入参数:
kernel_size:第二层卷积层的尺寸,默认=3
filters:各层卷积层的通道数(卷积核数目)
stage:指明了该block属于第几个Stage,便于命名
block:指明了该block是该stage中第几个block(以a,b,c,命名),便于命名
use_bias:卷积层是否使用bias

def conv_block(input_tensor, kernel_size, filters, stage, block,
               strides=(2, 2), use_bias=True, train_bn=True):
    """Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
    And the shortcut should have subsample=(2,2) as well
    (这句没有看懂,意思是应该在第一个conv前边加上subsample??,源码中并没有找到对应部分)
    """
    nb_filter1, nb_filter2, nb_filter3 = filters                # 获取三个卷积层通道数
    conv_name_base = 'res' + str(stage) + block + '_branch'     # 为各层设置base_name
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    x = KL.Conv2D(nb_filter1, (1, 1), strides=strides,          # 1*1卷积层
                  name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',   # 第二个卷积层b
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base +      # 1*1卷积层
                  '2c', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)
    shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides,    # 1*1卷积,为了使X与shortcut形状一致
                         name=conv_name_base + '1', use_bias=use_bias)(input_tensor)
    shortcut = BatchNorm(name=bn_name_base + '1')(shortcut, training=train_bn)
    x = KL.Add()([x, shortcut])      # 求输入tensor与卷积结果x的和,并返回
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

model.py——def identity_block(input_tensor, kernel_size, filters, stage, block,use_bias=True, train_bn=True):

也是ResNet的一个block,与conv_block的区别是,其input_tensor经过三次卷积后得到的x,能直接与input_tensor相加,不需要进行1*1卷积,也就是没有shortcut那一层。

def identity_block(input_tensor, kernel_size, filters, stage, block, use_bias=True, train_bn=True):
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',
                  use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',
                  use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    x = KL.Add()([x, input_tensor])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

model.py——def get_anchors(self, image_shape):

根据给定的image_size,返回一系列achor

model.py——def compute_backbone_shapes(config, image_shape):

根据模型的backbone以及输入的图片形状,计算出各阶段feature map的shape(h,w)
输入:config 和 image_shape
输出:[N,(h,w)] ,N代表第N个stage,h,w是该阶段feature的shape

def compute_backbone_shapes(config, image_shape):

    # 如果backbone是可调用的(不是本项目中自定义的),则直接返回COMPUTE_BACKBONE_SHAPE
    # 在该项目中,默认使用的是resNet101
    if callable(config.BACKBONE):
        return config.COMPUTE_BACKBONE_SHAPE(image_shape)
    # 当前仅支持ResNet
    # config.BACKBONE_STRIDES中提供了resnet101网络的FPN的五个strides
    # 根据这个strides可以计算出每个阶段结束后的feature_map的size
    assert config.BACKBONE in ["resnet50", "resnet101"]
    return np.array(
        [[int(math.ceil(image_shape[0] / stride)),
            int(math.ceil(image_shape[1] / stride))]
            for stride in config.BACKBONE_STRIDES])

未完

utils.py——def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides, anchor_stride):

用于生成特金字塔中,不同特征图对应的anchors,每个scale与特征金子塔中的某个level的特征图相关联,但是所有的ratio都会用在所有level的pyramid中。
(也就是,对有某level的feature,使用一个scale以及多个ratio??)
在anchors的原文中,每个特征图都是使用了三种scale和三种ratio的,所以此处每个特征图只使用了一个scale和三个ratio是不是有出入
返回值:
[N,(y1,x1,y2,x2)],N代表了所有anchors的数量,后边是角点坐标
所有scale对应的anchors都在其中,最开始是scale[0]的,后边依次

def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
                             anchor_stride):
    """Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.
    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.
    """
    # Anchors
    # [anchor_count, (y1, x1, y2, x2)]
    anchors = []
    # 使用generate_anchors生成所有尺度和比例的的anchors(返回值是anchors在原图上的两个角点的坐标)
    # 将其返回值(一个n*4)的np数组,添加到anchors的list中
    # 最终list中的每个元素都是n*4维的np数组
    for i in range(len(scales)):
        anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
                                        feature_strides[i], anchor_stride))
    # 将所有n*4维向量在第0维度进行拼接,最后得到的是“一个” N*4维np向量,
    # 代表了N个anchor的信息
    return np.concatenate(anchors, axis=0)

未完

utils.py——def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):

用于生成给定shape的feature的anchors,先列举出所有可能高,宽,中心点的可能值,然后求出每一个box的宽高,中心点,然后转换成角点坐标的形式
输入:
anchor的尺度scales,比例ratios,步长stride
feature的shape,步长stride

#用于生成anchors的函数
def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
    """
    scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
    ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
    shape: [height, width] spatial shape of the feature map over which
            to generate anchors.
    feature_stride: Stride of the feature map relative to the image in pixels.
    anchor_stride: Stride of anchors on the feature map. For example, if the
        value is 2 then generate anchors for every other feature map pixel.
    """
    # 得到所有scale与ratios的组合(使用meshgrid得到两一维向量的所有组合)
    scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
    scales = scales.flatten()
    ratios = ratios.flatten()

    # 得到所有H,W的可能值(注,此处每种h,w都只出现一次)
    heights = scales / np.sqrt(ratios)
    widths = scales * np.sqrt(ratios)

    # 得到所有anchor中心点的坐标值
    shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
    shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
    shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)

	# 计算所有anchor的H,W(此处的box_wdiths和box_height是所有anchors的高和宽)
    box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
    box_heights, box_centers_y = np.meshgrid(heights, shifts_y)
	# 计算所有anchors的中心点坐标以及sizes
    box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
    box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])
    
	# 将center和size转换成角点坐标形式
    boxes = np.concatenate([box_centers - 0.5 * box_sizes,
                            box_centers + 0.5 * box_sizes], axis=1)   
    return boxes

你可能感兴趣的:(MaskRCNN,tensorflow)