Maskrcnn代码框架结构

本人看的是代码是这个https://github.com/darolt/mask_rcnn,使用pytorch1.0。

1.MaskRCNN

MaskRCNN的目的是输入一张图片,能够分割出里面的实例,instance segmentation.
下面的内容分成dataloader和network两部分

2.Network

2.2 backbone Resnet + FPN

2.2.1 Resnet

这篇博文结合代码对resnet的结构讲得很详细
这篇博文可视化了Resnet每一层的特征图

2.2.2 FPN

这篇文章介绍FPN的还不错

2.3 Generate_pyramid_anchors

def generate_anchors(scale, ratios, shape, feature_stride, anchor_stride):    """    
	scale: 1D array of anchor sizes in pixels. Example: [32, 64, 128]    
	ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]    
	shape: [height, width] spatial shape of the feature map over which to generate anchors.    
	feature_stride: Stride of the feature map relative to the image in pixels.  anchor_stride: Stride of anchors on the feature map. For example, if the        value is 2 then generate anchors for every other feature map pixel.    
	"""    
	# Get all combinations of scale and ratios    
	scales, ratios = np.meshgrid(np.array(scale), np.array(ratios))    #array([[32], [32],[32]]),array([[0.5],[1],[2]]),array([[0.5],[1],[2]])    
	scales = scales.flatten() #[32, 32,32]    
	ratios = np.sqrt(ratios.flatten())  #array([0.70710678, 1,  1.14159])
	# Enumerate heights and widths from scales and ratios    
	heights = scales / ratios   #(3,)    
	widths = scales * ratios    #(3,)
	# Enumerate shifts in feature space    
	shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride #[128]
	shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride  #[128]
	shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)   #[128,128],#[128,128]
	# Meshgrid of shifts, widths, and heights    
	box_widths, box_centers_x = np.meshgrid(widths, shifts_x)  #[128*128,3]
	box_heights, box_centers_y = np.meshgrid(heights, shifts_y)#[128*128,3]
	# Reshape to get a list of (y, x) and a list of (h, w)    
	box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2]) #[128*128*3,2]    
	box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])#[128*128*3,2]
	# Convert to corner coordinates (y1, x1, y2, x2)    
	boxes = np.concatenate([box_centers - 0.5 * box_sizes,                            box_centers + 0.5 * box_sizes], axis=1)   
	# print(boxes.shape) #[128*128*3,4]    
	return boxes
	
	def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,  anchor_stride, batch_size):   
	 """Generate anchors at different levels of a feature pyramid. Each scale    is associated with a level of the pyramid, but each ratio is used in all levels of the pyramid.
	 SCALES: [32, 64, 128, 256, 512]    
	 RATIOS: [0.5, 1, 2]    
	 SHAPES: ~  # [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], set at runtime    
	 # The strides of each layer of the FPN Pyramid. These values are based on a Resnet101 backbone.    STRIDES: [4, 8, 16, 32, 64]    
	 STRIDE: 1
      Returns:    
      anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. 
      Sorted  with the same order of the given scales. 
      So, anchors of scale[0] come first, then anchors of scale[1], and so on.    """    
      anchors = []    
      for i, scale in enumerate(scales):        
      	anchors.append(generate_anchors(scale, ratios, feature_shapes[i],                                        feature_strides[i], anchor_stride))
      	anchors = np.concatenate(anchors, axis=0)   
      	new_anchors_shape = (batch_size,) + anchors.shape    
      	anchors = np.broadcast_to(anchors, new_anchors_shape)
    return th.from_numpy(anchors).float()

2.4 RPN

得到每个特征层的像素对应anchor的poposal的分类(softmax),和bbox的值[(dy, dx, log(dh), log(dw))]

3.Dataloader

dataloader主要写了两个对象。一个是dataset_handler

你可能感兴趣的:(检测分割)