本人看的是代码是这个https://github.com/darolt/mask_rcnn,使用pytorch1.0。
MaskRCNN的目的是输入一张图片,能够分割出里面的实例,instance segmentation.
下面的内容分成dataloader和network两部分
这篇博文结合代码对resnet的结构讲得很详细
这篇博文可视化了Resnet每一层的特征图
这篇文章介绍FPN的还不错
def generate_anchors(scale, ratios, shape, feature_stride, anchor_stride): """
scale: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
shape: [height, width] spatial shape of the feature map over which to generate anchors.
feature_stride: Stride of the feature map relative to the image in pixels. anchor_stride: Stride of anchors on the feature map. For example, if the value is 2 then generate anchors for every other feature map pixel.
"""
# Get all combinations of scale and ratios
scales, ratios = np.meshgrid(np.array(scale), np.array(ratios)) #array([[32], [32],[32]]),array([[0.5],[1],[2]]),array([[0.5],[1],[2]])
scales = scales.flatten() #[32, 32,32]
ratios = np.sqrt(ratios.flatten()) #array([0.70710678, 1, 1.14159])
# Enumerate heights and widths from scales and ratios
heights = scales / ratios #(3,)
widths = scales * ratios #(3,)
# Enumerate shifts in feature space
shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride #[128]
shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride #[128]
shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y) #[128,128],#[128,128]
# Meshgrid of shifts, widths, and heights
box_widths, box_centers_x = np.meshgrid(widths, shifts_x) #[128*128,3]
box_heights, box_centers_y = np.meshgrid(heights, shifts_y)#[128*128,3]
# Reshape to get a list of (y, x) and a list of (h, w)
box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) #[128*128*3,2]
box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])#[128*128*3,2]
# Convert to corner coordinates (y1, x1, y2, x2)
boxes = np.concatenate([box_centers - 0.5 * box_sizes, box_centers + 0.5 * box_sizes], axis=1)
# print(boxes.shape) #[128*128*3,4]
return boxes
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides, anchor_stride, batch_size):
"""Generate anchors at different levels of a feature pyramid. Each scale is associated with a level of the pyramid, but each ratio is used in all levels of the pyramid.
SCALES: [32, 64, 128, 256, 512]
RATIOS: [0.5, 1, 2]
SHAPES: ~ # [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], set at runtime
# The strides of each layer of the FPN Pyramid. These values are based on a Resnet101 backbone. STRIDES: [4, 8, 16, 32, 64]
STRIDE: 1
Returns:
anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array.
Sorted with the same order of the given scales.
So, anchors of scale[0] come first, then anchors of scale[1], and so on. """
anchors = []
for i, scale in enumerate(scales):
anchors.append(generate_anchors(scale, ratios, feature_shapes[i], feature_strides[i], anchor_stride))
anchors = np.concatenate(anchors, axis=0)
new_anchors_shape = (batch_size,) + anchors.shape
anchors = np.broadcast_to(anchors, new_anchors_shape)
return th.from_numpy(anchors).float()
得到每个特征层的像素对应anchor的poposal的分类(softmax),和bbox的值[(dy, dx, log(dh), log(dw))]
dataloader主要写了两个对象。一个是dataset_handler