beishang0031

Region Proposal Network

- - - Region Proposal Network
    - 模型配置文件概览
    - 网络分析
    - RPNHead
    - - 1.配置输入
      - 2.代码架构概览（主要是先宏观看看Head的网络架构）
      - 3.代码分析（这里是分析Head从self.rpn_head = build_head(rpn_head)语句开始的详细初始化过程）
      - 1.执行RPNHead的Init函数
        
        2.跳转到AnchorHead的init函数
        
        3.跳转到BaseDenseHead的init函数
        
        4.继续执行AnchorHead初始化函数
        
        5.引用到4中的编码器初始化函数
        
        6.引用到4中的分类损失初始化函数
        
        7.引用到4中的回归损失初始化函数
        
        8.引用到4中的分配器初始化函数
        
        9.引用到4中的采样器初始化函数
        
        10.引用到4中的anchor生成器初始化函数
      - 4.Head前向传播
      - 1.RPN网络的前向传播
        
        2.链接到1中的「一」方法：通过骨架与颈部提取特征
        
        3.链接到1中的「二」方法：通过网络头部计算损失值
        
        4.链接到3中的「三」方法：将FPN的输出传入到头部网络中，进行前向推理
        
        5.链接到4中的「四」方法：用partial对象，分别将FPN的输出（5,）中的每一个特征图传入到self.forward_single函数中，再对输出结果取一个元组，所以最后的返回结果是（分类得分，边界框回归）
        
        6.链接到5中的「五」方法：首先通过rpn_conv函数将FPN输出的单个特征图进行维度转换，转换到统一的channels数量，方便后续进行分类rpn_cls预测和rpn_reg预测，分类结果形状为(N, num_anchors * num_classes, H, W)，边框回归结果形状为(N, num_anchors * 4, H, W)
        
        7.链接到3中的「六」方法：根据gt框和outs计算loss
        
        8.链接到7中的「七」方法：调用父类的loss前向传播函数
        
        9.链接到8中的「八」方法：根据特征图大小获取anchors
        
        10.链接到9中的「九」方法：生成anchors
        
        11.链接到10中的「十」方法：单个特征图生成anchors方法
        
        12.链接到9中的「十一」方法：计算有效项
        
        13.链接到8中的「十二」方法：获取到正负样本
        
        14.链接到13中的「十三」方法：多项提交同时计算多个目标(当batch_size>1时)
        
        15.链接到14中的「十四」方法：
        
        16.链接到14中的「十五」方法：
        
        17.链接到16中的「十六」方法：
        
        18.链接到16中的「十七」方法：
        
        19.链接到18中的「十八」方法：
        
        20.链接到14中的「十九」方法：
        
        21.链接到20中的「二十」方法：
        
        22.链接到20中的「二十一」方法：
        
        23.链接到20中的「二十二」方法：
        
        24.链接到14中的「二十三」方法：
        
        25.链接到24中的「二十四」方法：
    - RPN网络的全流程图示
    - - 1.RPN网络的初始化
    - Reference

Region Proposal Network

在RPN网络中，我们首先生成区域候选框，以往的算法例如RCNN网络会采用选择性搜索（selective search）来实现区域候选框的生成，缺点是其计算成本高昂切需要人工调试，所以需要我们今天所讨论的主角RPN网络以另一种方式帮助我们快速根据Anchor生成区域候选框，在区域候选框生成后，我们需要对每个候选框进行Assign，在对其做Sample样本均衡化，进而对其做分类任务，这是目标检测方法的主要分支之一，另一个分支是对边界框的宽高及中心偏移量进行回归任务（例如SSD）。

模型配置文件概览

model = dict(
    type='RPN',
    pretrained='torchvision://resnet50',	#使用预训练模型
    backbone=dict(
        type='ResNet',
        depth=50,	#深度50层
        num_stages=4,	#网络分为4个阶段
        out_indices=(0, 1, 2, 3),	#提取特征图的阶段下标，这里指提取每个阶段的输出特征图
        frozen_stages=1,	#冻结预训练模型的特征图权重，这里为1,指的是冻结第二层提取特征图的权重
        norm_cfg=dict(type='BN', requires_grad=True),	#标准化配置，多卡训练用SyncBN
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],	# 骨架多尺度特征图输出通道
        out_channels=256,	# 增强后通道输出
        num_outs=5), # 输出num_outs个多尺度特征图
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,	# 
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)))
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        debug=False))

网络分析

根据配置信息，简述网络。

组件	名称
Backbone	ResNet
Neck	FPN
Head	RPNHead
BBox Assigner	MaxIoUAssigner
BBox Sampler	RandomSampler
BBox Encoder	DeltaXYWHBBoxCoder
Loss	loss_cls（CrossEntropyLoss），loss_bbox（L1Loss）

RPN解决了什么问题？

RPN第一次出现在世人眼中是在Faster RCNN这个结构中，专门用来提取候选框，在RCNN和Fast RCNN等物体检测架构中，用来提取候选框的方法通常是Selective Search，是比较传统的方法，而且比较耗时，在CPU上要2s一张图。所以作者提出RPN，专门用来提取候选框，一方面RPN耗时少，另一方面RPN可以很容易结合到Fast RCNN中，称为一个整体。

提出了哪些创新点？

仍有哪些不足？
应用场景有哪些？

RPNHead

1.配置输入

rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0))

2.代码架构概览（主要是先宏观看看Head的网络架构）

1.rpn_head.py

class RPNHead(RPNTestMixin, AnchorHead):
    """RPN head.

    Args:
        in_channels (int): Number of channels in the input feature map.
    """  # noqa: W605
	
	# 初始化RPN头部网络，继承自 RPNTestMixin, AnchorHead
    def __init__(self, in_channels, **kwargs):
        super(RPNHead, self).__init__(1, in_channels, **kwargs)

    def _init_layers(self):
        """Initialize layers of the head."""
        ...

    def init_weights(self):
        """Initialize weights of the head."""
        ...

    def forward_single(self, x):
        """Forward feature map of a single scale level."""
        ...
        return rpn_cls_score, rpn_bbox_pred

    def loss(self,
             cls_scores,
             bbox_preds,
             gt_bboxes,
             img_metas,
             gt_bboxes_ignore=None):
        """Compute losses of the head.

        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W)
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W)
            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
                boxes can be ignored when computing the loss.

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
       	...
        return dict(
            loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])

    def _get_bboxes_single(self,
                           cls_scores,
                           bbox_preds,
                           mlvl_anchors,
                           img_shape,
                           scale_factor,
                           cfg,
                           rescale=False):
        """Transform outputs for a single batch item into bbox predictions.

        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (num_anchors * num_classes, H, W).
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (num_anchors * 4, H, W).
            mlvl_anchors (list[Tensor]): Box reference for each scale level
                with shape (num_total_anchors, 4).
            img_shape (tuple[int]): Shape of the input image,
                (height, width, 3).
            scale_factor (ndarray): Scale factor of the image arange as
                (w_scale, h_scale, w_scale, h_scale).
            cfg (mmcv.Config): Test / postprocessing configuration,
                if None, test_cfg would be used.
            rescale (bool): If True, return boxes in original image space.

        Returns:
            Tensor: Labeled boxes in shape (n, 5), where the first 4 columns
                are bounding box positions (tl_x, tl_y, br_x, br_y) and the
                5-th column is a score between 0 and 1.
        """
        ...
        return dets[:cfg.nms_post]

2.anchor_head

@HEADS.register_module()
class AnchorHead(BaseDenseHead, BBoxTestMixin):
    """Anchor-based head (RPN, RetinaNet, SSD, etc.).

    Args:
        num_classes (int): Number of categories excluding the background
            category.
        in_channels (int): Number of channels in the input feature map.
        feat_channels (int): Number of hidden channels. Used in child classes.
        anchor_generator (dict): Config dict for anchor generator
        bbox_coder (dict): Config of bounding box coder.
        reg_decoded_bbox (bool): If true, the regression loss would be
            applied on decoded bounding boxes. Default: False
        loss_cls (dict): Config of classification loss.
        loss_bbox (dict): Config of localization loss.
        train_cfg (dict): Training config of anchor head.
        test_cfg (dict): Testing config of anchor head.
    """  # noqa: W605
	# 继承自	BaseDenseHead，BBoxTestMixin
    def __init__(self,
                 num_classes,
                 in_channels,
                 feat_channels=256,
                 anchor_generator=dict(
                     type='AnchorGenerator',
                     scales=[8, 16, 32],
                     ratios=[0.5, 1.0, 2.0],
                     strides=[4, 8, 16, 32, 64]),
                 bbox_coder=dict(
                     type='DeltaXYWHBBoxCoder',
                     clip_border=True,
                     target_means=(.0, .0, .0, .0),
                     target_stds=(1.0, 1.0, 1.0, 1.0)),
                 reg_decoded_bbox=False,
                 loss_cls=dict(
                     type='CrossEntropyLoss',
                     use_sigmoid=True,
                     loss_weight=1.0),
                 loss_bbox=dict(
                     type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
                 train_cfg=None,
                 test_cfg=None):

    def _init_layers(self):
        """Initialize layers of the head."""

    def init_weights(self):
        """Initialize weights of the head."""

    def forward_single(self, x):
        """Forward feature of a single scale level.

        Args:
            x (Tensor): Features of a single scale level.

        Returns:
            tuple:
                cls_score (Tensor): Cls scores for a single scale level \
                    the channels number is num_anchors * num_classes.
                bbox_pred (Tensor): Box energies / deltas for a single scale \
                    level, the channels number is num_anchors * 4.
        """
        ...
        return cls_score, bbox_pred

    def forward(self, feats):
        """Forward features from the upstream network.

        Args:
            feats (tuple[Tensor]): Features from the upstream network, each is
                a 4D-tensor.

        Returns:
            tuple: A tuple of classification scores and bbox prediction.

                - cls_scores (list[Tensor]): Classification scores for all \
                    scale levels, each is a 4D-tensor, the channels number \
                    is num_anchors * num_classes.
                - bbox_preds (list[Tensor]): Box energies / deltas for all \
                    scale levels, each is a 4D-tensor, the channels number \
                    is num_anchors * 4.
        """
        return multi_apply(self.forward_single, feats)

    def get_anchors(self, featmap_sizes, img_metas, device='cuda'):
        """Get anchors according to feature map sizes.

        Args:
            featmap_sizes (list[tuple]): Multi-level feature map sizes.
            img_metas (list[dict]): Image meta info.
            device (torch.device | str): Device for returned tensors

        Returns:
            tuple:
                anchor_list (list[Tensor]): Anchors of each image.
                valid_flag_list (list[Tensor]): Valid flags of each image.
        """
        ...

        return anchor_list, valid_flag_list  # valid_flags得到在padding以内的值

    def _get_targets_single(self,
                            flat_anchors,
                            valid_flags,
                            gt_bboxes,
                            gt_bboxes_ignore,
                            gt_labels,
                            img_meta,
                            label_channels=1,
                            unmap_outputs=True):
        """Compute regression and classification targets for anchors in a
        single image.

        Args:
            flat_anchors (Tensor): Multi-level anchors of the image, which are
                concatenated into a single tensor of shape (num_anchors ,4)
            valid_flags (Tensor): Multi level valid flags of the image,
                which are concatenated into a single tensor of
                    shape (num_anchors,).
            gt_bboxes (Tensor): Ground truth bboxes of the image,
                shape (num_gts, 4).
            img_meta (dict): Meta info of the image.
            gt_bboxes_ignore (Tensor): Ground truth bboxes to be
                ignored, shape (num_ignored_gts, 4).
            img_meta (dict): Meta info of the image.
            gt_labels (Tensor): Ground truth labels of each box,
                shape (num_gts,).
            label_channels (int): Channel of label.
            unmap_outputs (bool): Whether to map outputs back to the original
                set of anchors.

        Returns:
            tuple:
                labels_list (list[Tensor]): Labels of each level
                label_weights_list (list[Tensor]): Label weights of each level
                bbox_targets_list (list[Tensor]): BBox targets of each level
                bbox_weights_list (list[Tensor]): BBox weights of each level
                num_total_pos (int): Number of positive samples in all images
                num_total_neg (int): Number of negative samples in all images
        """
        ...

        return (labels, label_weights, bbox_targets, bbox_weights, pos_inds,
                neg_inds, sampling_result)

    def get_targets(self,
                    anchor_list,
                    valid_flag_list,
                    gt_bboxes_list,
                    img_metas,
                    gt_bboxes_ignore_list=None,
                    gt_labels_list=None,
                    label_channels=1,
                    unmap_outputs=True,
                    return_sampling_results=False):
        """Compute regression and classification targets for anchors in
        multiple images.

        Args:
            anchor_list (list[list[Tensor]]): Multi level anchors of each
                image. The outer list indicates images, and the inner list
                corresponds to feature levels of the image. Each element of
                the inner list is a tensor of shape (num_anchors, 4).
            valid_flag_list (list[list[Tensor]]): Multi level valid flags of
                each image. The outer list indicates images, and the inner list
                corresponds to feature levels of the image. Each element of
                the inner list is a tensor of shape (num_anchors, )
            gt_bboxes_list (list[Tensor]): Ground truth bboxes of each image.
            img_metas (list[dict]): Meta info of each image.
            gt_bboxes_ignore_list (list[Tensor]): Ground truth bboxes to be
                ignored.
            gt_labels_list (list[Tensor]): Ground truth labels of each box.
            label_channels (int): Channel of label.
            unmap_outputs (bool): Whether to map outputs back to the original
                set of anchors.

        Returns:
            tuple: Usually returns a tuple containing learning targets.

                - labels_list (list[Tensor]): Labels of each level.
                - label_weights_list (list[Tensor]): Label weights of each \
                    level.
                - bbox_targets_list (list[Tensor]): BBox targets of each level.
                - bbox_weights_list (list[Tensor]): BBox weights of each level.
                - num_total_pos (int): Number of positive samples in all \
                    images.
                - num_total_neg (int): Number of negative samples in all \
                    images.
            additional_returns: This function enables user-defined returns from
                `self._get_targets_single`. These returns are currently refined
                to properties at each feature map (i.e. having HxW dimension).
                The results will be concatenated after the end
        """
        ...
        return res + tuple(rest_results)

    def loss_single(self, cls_score, bbox_pred, anchors, labels, label_weights,
                    bbox_targets, bbox_weights, num_total_samples):
        """Compute loss of a single scale level.

        Args:
            cls_score (Tensor): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W).
            bbox_pred (Tensor): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W).
            anchors (Tensor): Box reference for each scale level with shape
                (N, num_total_anchors, 4).
            labels (Tensor): Labels of each anchors with shape
                (N, num_total_anchors).
            label_weights (Tensor): Label weights of each anchor with shape
                (N, num_total_anchors)
            bbox_targets (Tensor): BBox regression targets of each anchor wight
                shape (N, num_total_anchors, 4).
            bbox_weights (Tensor): BBox regression loss weights of each anchor
                with shape (N, num_total_anchors, 4).
            num_total_samples (int): If sampling, num total samples equal to
                the number of total anchors; Otherwise, it is the number of
                positive anchors.

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
        ...
        return loss_cls, loss_bbox

    @force_fp32(apply_to=('cls_scores', 'bbox_preds'))
    def loss(self,
             cls_scores,
             bbox_preds,
             gt_bboxes,
             gt_labels,
             img_metas,
             gt_bboxes_ignore=None):
        """Compute losses of the head.

        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W)
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W)
            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
            gt_labels (list[Tensor]): class indices corresponding to each box
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
                boxes can be ignored when computing the loss. Default: None

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
        ...
        return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)

    @force_fp32(apply_to=('cls_scores', 'bbox_preds'))
    def get_bboxes(self,
                   cls_scores,
                   bbox_preds,
                   img_metas,
                   cfg=None,
                   rescale=False,
                   with_nms=True):
        """Transform network output for a batch into bbox predictions.

        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W)
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W)
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            cfg (mmcv.Config | None): Test / postprocessing configuration,
                if None, test_cfg would be used
            rescale (bool): If True, return boxes in original image space.
                Default: False.
            with_nms (bool): If True, do nms before return boxes.
                Default: True.

        Returns:
            list[tuple[Tensor, Tensor]]: Each item in result_list is 2-tuple.
                The first item is an (n, 5) tensor, where the first 4 columns
                are bounding box positions (tl_x, tl_y, br_x, br_y) and the
                5-th column is a score between 0 and 1. The second item is a
                (n,) tensor where each item is the predicted class labelof the
                corresponding box.

        Example:
            >>> import mmcv
            >>> self = AnchorHead(
            >>>     num_classes=9,
            >>>     in_channels=1,
            >>>     anchor_generator=dict(
            >>>         type='AnchorGenerator',
            >>>         scales=[8],
            >>>         ratios=[0.5, 1.0, 2.0],
            >>>         strides=[4,]))
            >>> img_metas = [{'img_shape': (32, 32, 3), 'scale_factor': 1}]
            >>> cfg = mmcv.Config(dict(
            >>>     score_thr=0.00,
            >>>     nms=dict(type='nms', iou_thr=1.0),
            >>>     max_per_img=10))
            >>> feat = torch.rand(1, 1, 3, 3)
            >>> cls_score, bbox_pred = self.forward_single(feat)
            >>> # note the input lists are over different levels, not images
            >>> cls_scores, bbox_preds = [cls_score], [bbox_pred]
            >>> result_list = self.get_bboxes(cls_scores, bbox_preds,
            >>>                               img_metas, cfg)
            >>> det_bboxes, det_labels = result_list[0]
            >>> assert len(result_list) == 1
            >>> assert det_bboxes.shape[1] == 5
            >>> assert len(det_bboxes) == len(det_labels) == cfg.max_per_img
        """
        ...
        return result_list

    def _get_bboxes_single(self,
                           cls_score_list,
                           bbox_pred_list,
                           mlvl_anchors,
                           img_shape,
                           scale_factor,
                           cfg,
                           rescale=False,
                           with_nms=True):
        """Transform outputs for a single batch item into bbox predictions.

        Args:
            cls_score_list (list[Tensor]): Box scores for a single scale level
                Has shape (num_anchors * num_classes, H, W).
            bbox_pred_list (list[Tensor]): Box energies / deltas for a single
                scale level with shape (num_anchors * 4, H, W).
            mlvl_anchors (list[Tensor]): Box reference for a single scale level
                with shape (num_total_anchors, 4).
            img_shape (tuple[int]): Shape of the input image,
                (height, width, 3).
            scale_factor (ndarray): Scale factor of the image arange as
                (w_scale, h_scale, w_scale, h_scale).
            cfg (mmcv.Config): Test / postprocessing configuration,
                if None, test_cfg would be used.
            rescale (bool): If True, return boxes in original image space.
                Default: False.
            with_nms (bool): If True, do nms before return boxes.
                Default: True.

        Returns:
            Tensor: Labeled boxes in shape (n, 5), where the first 4 columns
                are bounding box positions (tl_x, tl_y, br_x, br_y) and the
                5-th column is a score between 0 and 1.
        """
       	...
        if with_nms:
            det_bboxes, det_labels = multiclass_nms(mlvl_bboxes, mlvl_scores,
                                                    cfg.score_thr, cfg.nms,
                                                    cfg.max_per_img)
            return det_bboxes, det_labels
        else:
            return mlvl_bboxes, mlvl_scores

    def aug_test(self, feats, img_metas, rescale=False):
        """Test function with test time augmentation.

        Args:
            feats (list[Tensor]): the outer list indicates test-time
                augmentations and inner Tensor should have a shape NxCxHxW,
                which contains features for all images in the batch.
            img_metas (list[list[dict]]): the outer list indicates test-time
                augs (multiscale, flip, etc.) and the inner list indicates
                images in a batch. each dict has image information.
            rescale (bool, optional): Whether to rescale the results.
                Defaults to False.

        Returns:
            list[ndarray]: bbox results of each class
        """
        return self.aug_test_bboxes(feats, img_metas, rescale=rescale)

3.base_dense_head.py

class BaseDenseHead(nn.Module, metaclass=ABCMeta):
    """Base class for DenseHeads."""

    def __init__(self):
        super(BaseDenseHead, self).__init__()

    @abstractmethod
    def loss(self, **kwargs):
        """Compute losses of the head."""
        pass

    @abstractmethod
    def get_bboxes(self, **kwargs):
        """Transform network output for a batch into bbox predictions."""
        pass

    def forward_train(self,
                      x,
                      img_metas,
                      gt_bboxes,
                      gt_labels=None,
                      gt_bboxes_ignore=None,
                      proposal_cfg=None,
                      **kwargs):
        """
        Args:
            x (list[Tensor]): Features from FPN.
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes (Tensor): Ground truth bboxes of the image,
                shape (num_gts, 4).
            gt_labels (Tensor): Ground truth labels of each box,
                shape (num_gts,).
            gt_bboxes_ignore (Tensor): Ground truth bboxes to be
                ignored, shape (num_ignored_gts, 4).
            proposal_cfg (mmcv.Config): Test / postprocessing configuration,
                if None, test_cfg would be used

        Returns:
            tuple:
                losses: (dict[str, Tensor]): A dictionary of loss components.
                proposal_list (list[Tensor]): Proposals of each image.
        """
        outs = self(x)  # 得到卷积后的输出
        if gt_labels is None:
            loss_inputs = outs + (gt_bboxes, img_metas)
        else:
            loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
        losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
        if proposal_cfg is None:
            return losses
        else:
            proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
            return losses, proposal_list

3.代码分析（这里是分析Head从self.rpn_head = build_head(rpn_head)语句开始的详细初始化过程）

RPNHead的初始化

1.执行RPNHead的Init函数

class RPNHead(RPNTestMixin, AnchorHead):
	# 初始化函数，in_channels(int)是输入特征图的维度数，随后进入到AnchorHead类的初始化方法；
	def __init__(self, in_channels, **kwargs):
	super(RPNHead, self).__init__(1, in_channels, **kwargs)

2.跳转到AnchorHead的init函数

class AnchorHead(BaseDenseHead, BBoxTestMixin):
	    def __init__(self,
                 num_classes, #类别数
                 in_channels, #输入特征图的维度
                 feat_channels=256,	
                 anchor_generator=dict( #anchor生成器的配置信息
                     type='AnchorGenerator',
                     scales=[8, 16, 32],
                     ratios=[0.5, 1.0, 2.0],
                     strides=[4, 8, 16, 32, 64]),
                 bbox_coder=dict( # bbox编码器的配置信息
                     type='DeltaXYWHBBoxCoder',
                     clip_border=True,
                     target_means=(.0, .0, .0, .0),
                     target_stds=(1.0, 1.0, 1.0, 1.0)),
                 reg_decoded_bbox=False, 
                 loss_cls=dict( # 分类的损失函数
                     type='CrossEntropyLoss',
                     use_sigmoid=True,
                     loss_weight=1.0),
                 loss_bbox=dict( #回归损失函数
                     type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
                 train_cfg=None, # 获取到Assigner和Sampler的配置信息（训练模式）
                 test_cfg=None): # 获取到BBox Decoder和BBox PostProcess的配置信息（测试模式）
        super(AnchorHead, self).__init__()

3.跳转到BaseDenseHead的init函数

class BaseDenseHead(nn.Module, metaclass=ABCMeta):
    """Base class for DenseHeads."""
    # BaseDenseHead的初始化方法，主要有loss，get_bboxes，forward_train方法的抽象类
    def __init__(self):
        super(BaseDenseHead, self).__init__()

4.继续执行AnchorHead初始化函数

class AnchorHead(BaseDenseHead, BBoxTestMixin):
	# 常规赋值
	self.in_channels = in_channels
    self.num_classes = num_classes
    self.feat_channels = feat_channels
    self.use_sigmoid_cls = loss_cls.get('use_sigmoid', False) # 是否使用sigmoid分类
    # TODO better way to determine whether sample or not
    # 这里是判断是否是不需要进行正负样本采样的回归损失，例如使用FocalLoss不需要进行头部的采样
    self.sampling = loss_cls['type'] not in [
        'FocalLoss', 'GHMC', 'QualityFocalLoss'
    ]
    if self.use_sigmoid_cls:
        self.cls_out_channels = num_classes
    else:
        self.cls_out_channels = num_classes + 1

    if self.cls_out_channels <= 0:
        raise ValueError(f'num_classes={num_classes} is too small')
    self.reg_decoded_bbox = reg_decoded_bbox
	
	# 初始化编码器
    self.bbox_coder = build_bbox_coder(bbox_coder)「5」  # 编码器,对label进行编码然后进行loss计算
    self.loss_cls = build_loss(loss_cls) 「6」 # 分类损失
    self.loss_bbox = build_loss(loss_bbox) 「7」 # 边框回归损失
    self.train_cfg = train_cfg # 更新训练配置信息获取到Assigner和Sampler
    self.test_cfg = test_cfg	# 更新测试配置信息获取到获取到BBox Decoder和BBox PostProcess
    if self.train_cfg:
    	# 分类问题样本不均衡问题,为了解决不均衡的问题,需要首先划分进而确定正负样本,再使用一些方法来解决不均衡的问题,比如说采样
        self.assigner = build_assigner(self.train_cfg.assigner)「8」# 初始化正负样本属性分配模块
        # use PseudoSampler when sampling is False
        # 若没有指定采样器默认采取PseudoSampler
        if self.sampling and hasattr(self.train_cfg, 'sampler'):
            sampler_cfg = self.train_cfg.sampler
        else:
            sampler_cfg = dict(type='PseudoSampler')
        self.sampler = build_sampler(sampler_cfg, context=self) 「9」# 初始化采样器
    self.fp16_enabled = False

    self.anchor_generator = build_anchor_generator(anchor_generator) 「10」 # 初始化anchor生成器
    # usually the numbers of anchors for each level are the same
    # except SSD detectors
    # 计算
    self.num_anchors = self.anchor_generator.num_base_anchors[0]  # num_base_anchors函数返回一个列表，列表的长度是base_sizes的长度，且列表中的anchors数量都一样，所以取第一个即可
    self._init_layers() # 这个步骤是初始化rpn的卷积层，分类层，回归层，三个卷积层

# 初始化网络层
	def _init_layers(self):
	    """Initialize layers of the head."""
	    # rpn_conv层的作用是将不同channels的特征图放缩到一致的channels，为self.feat_channels
	    self.rpn_conv = nn.Conv2d(
	        self.in_channels, self.feat_channels, 3, padding=1)
	    # rpn_cls层的作用是将RPNHead前向推理后的特征图经过rpn_conv后，进行分类预测
	    # 输出的channels的数量为num_anchors*cls_out_channels（classes）
	    self.rpn_cls = nn.Conv2d(self.feat_channels,
	                             self.num_anchors * self.cls_out_channels, 1)
	    # rpn_reg层的作用是将RPNHead前向推理后的特征图经过rpn_conv后，进行边框回归预测
	    # 输出的channels的数量为num_anchors*4 (cx,cy,w,h)
	    self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_anchors * 4, 1)

5.引用到4中的编码器初始化函数

class DeltaXYWHBBoxCoder(BaseBBoxCoder):
	    def __init__(self,
                 target_means=(0., 0., 0., 0.),
                 target_stds=(1., 1., 1., 1.),
                 clip_border=True):
        super(BaseBBoxCoder, self).__init__()  # 继承自BaseBBoxCoder(实现两个抽象接口encoder和decoder)
        self.means = target_means  # 反三角坐标目标的归一化方法
        self.stds = target_stds  # 针对增量坐标归一化目标的标准偏差
        self.clip_border = clip_border  # 图像边框之外是否裁剪,默认为True

6.引用到4中的分类损失初始化函数

class CrossEntropyLoss(nn.Module):

    def __init__(self,
                 use_sigmoid=False,
                 use_mask=False,
                 reduction='mean',
                 class_weight=None,
                 loss_weight=1.0):
        """CrossEntropyLoss.

        Args:
            use_sigmoid (bool, optional): Whether the prediction uses sigmoid
                of softmax. Defaults to False.
            use_mask (bool, optional): Whether to use mask cross entropy loss.
                Defaults to False.
            reduction (str, optional): . Defaults to 'mean'.
                Options are "none", "mean" and "sum".
            class_weight (list[float], optional): Weight of each class.
                Defaults to None.
            loss_weight (float, optional): Weight of the loss. Defaults to 1.0.
        """
        super(CrossEntropyLoss, self).__init__()
        assert (use_sigmoid is False) or (use_mask is False)
        self.use_sigmoid = use_sigmoid
        self.use_mask = use_mask
        self.reduction = reduction
        self.loss_weight = loss_weight
        self.class_weight = class_weight

        if self.use_sigmoid:
            self.cls_criterion = binary_cross_entropy # 二分类交叉熵
        elif self.use_mask:
            self.cls_criterion = mask_cross_entropy # mask交叉熵（二分类交叉熵）
        else:
            self.cls_criterion = cross_entropy # 多分类交叉熵

7.引用到4中的回归损失初始化函数

class L1Loss(nn.Module):
    """L1 loss.

    Args:
        reduction (str, optional): The method to reduce the loss.
            Options are "none", "mean" and "sum".
        loss_weight (float, optional): The weight of loss.
    """

    def __init__(self, reduction='mean', loss_weight=1.0):
        super(L1Loss, self).__init__()
        self.reduction = reduction
        self.loss_weight = loss_weight

8.引用到4中的分配器初始化函数

# MaxIoUAssigner继承自BaseAssigner（实现了一个assign抽象方法）
class MaxIoUAssigner(BaseAssigner):
    """Assign a corresponding gt bbox or background to each bbox.
    Each proposals will be assigned with `-1`, or a semi-positive integer
    indicating the ground truth index.

    - -1: negative sample, no assigned gt 
    - semi-positive integer: positive sample, index (0-based) of assigned gt 

    Args:
        pos_iou_thr (float): IoU threshold for positive bboxes.
        neg_iou_thr (float or tuple): IoU threshold for negative bboxes.
        min_pos_iou (float): Minimum iou for a bbox to be considered as a
            positive bbox. Positive samples can have smaller IoU than
            pos_iou_thr due to the 4th step (assign max IoU sample to each gt).
        gt_max_assign_all (bool): Whether to assign all bboxes with the same
            highest overlap with some gt to that gt.
        ignore_iof_thr (float): IoF threshold for ignoring bboxes (if
            `gt_bboxes_ignore` is specified). Negative values mean not
            ignoring any bboxes.
        ignore_wrt_candidates (bool): Whether to compute the iof between
            `bboxes` and `gt_bboxes_ignore`, or the contrary.
        match_low_quality (bool): Whether to allow low quality matches. This is
            usually allowed for RPN and single stage detectors, but not allowed
            in the second stage. Details are demonstrated in Step 4.
        gpu_assign_thr (int): The upper bound of the number of GT for GPU
            assign. When the number of gt is above this threshold, will assign
            on CPU device. Negative values mean not assign on CPU.
    """

    def __init__(self,
                 pos_iou_thr,
                 neg_iou_thr,
                 min_pos_iou=.0,
                 gt_max_assign_all=True,
                 ignore_iof_thr=-1,
                 ignore_wrt_candidates=True,
                 match_low_quality=True,
                 gpu_assign_thr=-1,
                 iou_calculator=dict(type='BboxOverlaps2D')):
        self.pos_iou_thr = pos_iou_thr
        self.neg_iou_thr = neg_iou_thr
        self.min_pos_iou = min_pos_iou
        self.gt_max_assign_all = gt_max_assign_all
        self.ignore_iof_thr = ignore_iof_thr
        self.ignore_wrt_candidates = ignore_wrt_candidates
        self.gpu_assign_thr = gpu_assign_thr
        self.match_low_quality = match_low_quality
        self.iou_calculator = build_iou_calculator(iou_calculator) # 初始化iou计算器，直接获取到函数名，类内实现了__call_直接通过	方法名()调用

9.引用到4中的采样器初始化函数

# 继承自BaseSampler类（实现了两个抽象方法_sample_pos，_sample_neg和sample函数）
class RandomSampler(BaseSampler):
    """Random sampler.

    Args:
        num (int): Number of samples
        pos_fraction (float): Fraction of positive samples
        neg_pos_up (int, optional): Upper bound number of negative and
            positive samples. Defaults to -1.
        add_gt_as_proposals (bool, optional): Whether to add ground truth
            boxes as proposals. Defaults to True.
    """

    def __init__(self,
                 num,	#样本数量
                 pos_fraction, # 正样本比例
                 neg_pos_ub=-1, # 负样本和正样本的上限数量，默认为-1
                 add_gt_as_proposals=True,
                 **kwargs):
        from mmdet.core.bbox import demodata
        super(RandomSampler, self).__init__(num, pos_fraction, neg_pos_ub,
                                            add_gt_as_proposals)
        self.rng = demodata.ensure_rng(kwargs.get('rng', None)) # a numpy random number generator


class BaseSampler(metaclass=ABCMeta):
    """Base class of samplers."""

    def __init__(self,
                 num,
                 pos_fraction,
                 neg_pos_ub=-1,
                 add_gt_as_proposals=True,
                 **kwargs):
        # 常规赋值初始化
        self.num = num
        self.pos_fraction = pos_fraction
        self.neg_pos_ub = neg_pos_ub
        self.add_gt_as_proposals = add_gt_as_proposals
        self.pos_sampler = self
        self.neg_sampler = self

10.引用到4中的anchor生成器初始化函数

# anchors（锚框生成器），继承于Object类
class AnchorGenerator(object):
    """Standard anchor generator for 2D anchor-based detectors.

    Args:
        strides (list[int] | list[tuple[int, int]]): Strides of anchors
            in multiple feature levels in order (w, h).
        ratios (list[float]): The list of ratios between the height and width
            of anchors in a single level.
        scales (list[int] | None): Anchor scales for anchors in a single level.
            It cannot be set at the same time if `octave_base_scale` and
            `scales_per_octave` are set.
        base_sizes (list[int] | None): The basic sizes
            of anchors in multiple levels.
            If None is given, strides will be used as base_sizes.
            (If strides are non square, the shortest stride is taken.)
        scale_major (bool): Whether to multiply scales first when generating
            base anchors. If true, the anchors in the same row will have the
            same scales. By default it is True in V2.0
        octave_base_scale (int): The base scale of octave.
        scales_per_octave (int): Number of scales for each octave.
            `octave_base_scale` and `scales_per_octave` are usually used in
            retinanet and the `scales` should be None when they are set.
        centers (list[tuple[float, float]] | None): The centers of the anchor
            relative to the feature grid center in multiple feature levels.
            By default it is set to be None and not used. If a list of tuple of
            float is given, they will be used to shift the centers of anchors.
        center_offset (float): The offset of center in proportion to anchors'
            width and height. By default it is 0 in V2.0.

    Examples:
        >>> from mmdet.core import AnchorGenerator
        >>> self = AnchorGenerator([16], [1.], [1.], [9])
        >>> all_anchors = self.grid_anchors([(2, 2)], device='cpu')
        >>> print(all_anchors)
        [tensor([[-4.5000, -4.5000,  4.5000,  4.5000],
                [11.5000, -4.5000, 20.5000,  4.5000],
                [-4.5000, 11.5000,  4.5000, 20.5000],
                [11.5000, 11.5000, 20.5000, 20.5000]])]
        >>> self = AnchorGenerator([16, 32], [1.], [1.], [9, 18])
        >>> all_anchors = self.grid_anchors([(2, 2), (1, 1)], device='cpu')
        >>> print(all_anchors)
        [tensor([[-4.5000, -4.5000,  4.5000,  4.5000],
                [11.5000, -4.5000, 20.5000,  4.5000],
                [-4.5000, 11.5000,  4.5000, 20.5000],
                [11.5000, 11.5000, 20.5000, 20.5000]]), \
        tensor([[-9., -9., 9., 9.]])]
    """

    def __init__(self,
                 strides, # 步长（指的是在feature map上生成anchor的步长）
                 ratios, # 宽高比
                 scales=None, # 缩放比例
                 base_sizes=None, # 基础大小
                 scale_major=True, 
                 octave_base_scale=None,
                 scales_per_octave=None,
                 centers=None, # 指定中心点（一般默认最左上角点）
                 center_offset=0.):
        # check center and center_offset
        if center_offset != 0:
            assert centers is None, 'center cannot be set when center_offset' \
                f'!=0, {centers} is given.'
        if not (0 <= center_offset <= 1):
            raise ValueError('center_offset should be in range [0, 1], '
                             f'{center_offset} is given.')
        if centers is not None:
            assert len(centers) == len(strides), \
                'The number of strides should be the same as centers, got ' \
                f'{strides} and {centers}'

        # calculate base sizes of anchors
        self.strides = [_pair(stride) for stride in strides]
        self.base_sizes = [min(stride) for stride in self.strides
                           ] if base_sizes is None else base_sizes
        assert len(self.base_sizes) == len(self.strides), \
            'The number of strides should be the same as base sizes, got ' \
            f'{self.strides} and {self.base_sizes}'

        # calculate scales of anchors
        assert ((octave_base_scale is not None
                and scales_per_octave is not None) ^ (scales is not None)), \
            'scales and octave_base_scale with scales_per_octave cannot' \
            ' be set at the same time'
        if scales is not None:
            self.scales = torch.Tensor(scales)
        elif octave_base_scale is not None and scales_per_octave is not None:
            octave_scales = np.array(
                [2**(i / scales_per_octave) for i in range(scales_per_octave)])
            scales = octave_scales * octave_base_scale
            self.scales = torch.Tensor(scales)
        else:
            raise ValueError('Either scales or octave_base_scale with '
                             'scales_per_octave should be set')

        self.octave_base_scale = octave_base_scale
        self.scales_per_octave = scales_per_octave
        self.ratios = torch.Tensor(ratios)
        self.scale_major = scale_major
        self.centers = centers
        self.center_offset = center_offset
        self.base_anchors = self.gen_base_anchors() # 同级下分析gen_base_anchors函数
        # 生成基于center点的基础锚框，比如说有3个基础大小的锚框，同时有2个ratios和2个scales，
        # 最终会生成3×2×2=12个基础锚框，要得到所有锚框，
        # 只需要根据strides计算每个锚框相对于左上角基础框的偏移量即可。

	# 此函数用来生成基本锚框
    def gen_base_anchors(self):
        """Generate base anchors.

        Returns:
            list(torch.Tensor): Base anchors of a feature grid in multiple \
                feature levels.
        """
        multi_level_base_anchors = [] # 定义接收所有基础锚框的空列表
        for i, base_size in enumerate(self.base_sizes): # 迭代出每个base_size
            center = None
            if self.centers is not None:
                center = self.centers[i]
            multi_level_base_anchors.append(
                # 根据base_size生成当前base_size下的所有ratios和scales组合的基础锚框
                self.gen_single_level_base_anchors(
                    base_size,
                    scales=self.scales,
                    ratios=self.ratios,
                    center=center)) # 传入Base_size
        return multi_level_base_anchors # 返回所有基础锚框，共len(base_sizes)×ratios×scales个

	# 根据base_size，生成当前base_size下，所有ratios和scales组合的基础锚框，共1×ratios×scales个
    def gen_single_level_base_anchors(self,
                                      base_size,
                                      scales,
                                      ratios,
                                      center=None):
        """Generate base anchors of a single level.

        Args:
            base_size (int | float): Basic size of an anchor.
            scales (torch.Tensor): Scales of the anchor.
            ratios (torch.Tensor): The ratio between between the height
                and width of anchors in a single level.
            center (tuple[float], optional): The center of the base anchor
                related to a single feature grid. Defaults to None.

        Returns:
            torch.Tensor: Anchors in a single-level feature maps.
        """
        # 常规赋值
        w = base_size
        h = base_size
        if center is None:
            x_center = self.center_offset * w
            y_center = self.center_offset * h
        else:
            x_center, y_center = center

		# 这里保持h_ratios×w_ratios保持为1，同时可以实现ratios的宽高比
        h_ratios = torch.sqrt(ratios)
        w_ratios = 1 / h_ratios
        if self.scale_major:
        	# 这里可以分为多个步骤来解读：
        	# w_ratios[:, None]和scales[None, :]是升维，将例如(3,)升维(3,1)或(1,3)
        	# w是base_size，与w_ratios，scales相乘得到基于当前base_size的所有锚宽，同理得到hs
        	# view(-1)是按最后一维铺平
            ws = (w * w_ratios[:, None] * scales[None, :]).view(-1)
            hs = (h * h_ratios[:, None] * scales[None, :]).view(-1)
        else:
            ws = (w * scales[:, None] * w_ratios[None, :]).view(-1)
            hs = (h * scales[:, None] * h_ratios[None, :]).view(-1)

        # use float anchor and the anchor's center is aligned with the
        # pixel center
        # center一般为（0,0），所以此处得到的应该是左上角点
        base_anchors = [
            x_center - 0.5 * ws, y_center - 0.5 * hs, x_center + 0.5 * ws,
            y_center + 0.5 * hs
        ]
        # 按最后一维叠加
        base_anchors = torch.stack(base_anchors, dim=-1)

        return base_anchors

4.Head前向传播

1.RPN网络的前向传播

class RPN(BaseDetector):
	    def forward_train(self,
                      img,
                      img_metas,
                      gt_bboxes=None,
                      gt_bboxes_ignore=None):
        """
        Args:
            img (Tensor): Input images of shape (N, C, H, W).
                Typically these should be mean centered and std scaled.
            img_metas (list[dict]): A List of image info dict where each dict
                has: 'img_shape', 'scale_factor', 'flip', and may also contain
                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.
                For details on the values of these keys see
                :class:`mmdet.datasets.pipelines.Collect`.
            gt_bboxes (list[Tensor]): Each item are the truth boxes for each
                image in [tl_x, tl_y, br_x, br_y] format.
            gt_bboxes_ignore (None | list[Tensor]): Specify which bounding
                boxes can be ignored when computing the loss.

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
        # 判断是不是debug模式
        if self.train_cfg.rpn.get('debug', False):
            self.rpn_head.debug_imgs = tensor2imgs(img)
        
		# 先运行backbone+beck进行特征提取
        x = self.extract_feat(img)	「一」
        # 对head进行forward train，输出loss
        losses = self.rpn_head.forward_train(x, img_metas, gt_bboxes, None,
                                             gt_bboxes_ignore) 「二」
        return losses

2.链接到1中的「一」方法：通过骨架与颈部提取特征

	# 链接到「一」的特征提取层，这里是通过骨架网络和网络颈部提取特征
	# 这里更关注头部网络的前向推理，骨架与网络颈部的前向暂不推导
    def extract_feat(self, img):
        """Extract features.

        Args:
            img (torch.Tensor): Image tensor with shape (n, c, h ,w).

        Returns:
            list[torch.Tensor]: Multi-level features that may have
                different resolutions.
        """
        # 进行网络骨架推理，resnet一般得到4个特征层
        x = self.backbone(img)
        # 进行网络颈部推理，FPN一般输入4个特征图，得到5个输出
        if self.with_neck:
            x = self.neck(x)
        return x

3.链接到1中的「二」方法：通过网络头部计算损失值

# 链接到「二」的网络头部loss计算
class BaseDenseHead(nn.Module, metaclass=ABCMeta):
	    def forward_train(self,
                      x, # 来自颈部网络FPN的输出
                      img_metas, # 输入图像的元素，例如图像尺寸，缩放比例等等
                      gt_bboxes, # ground-truth 形状为(gt框数, 4)
                      gt_labels=None, # ground-truth的标签信息 形状为(gt框数,)
                      gt_bboxes_ignore=None, # 要忽略的GT框 形状为(num_ignored_gts, 4)
                      proposal_cfg=None, # 
                      **kwargs):
        """
        Args:
            x (list[Tensor]): Features from FPN.
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes (Tensor): Ground truth bboxes of the image,
                shape (num_gts, 4).
            gt_labels (Tensor): Ground truth labels of each box,
                shape (num_gts,).
            gt_bboxes_ignore (Tensor): Ground truth bboxes to be
                ignored, shape (num_ignored_gts, 4).
            proposal_cfg (mmcv.Config): Test / postprocessing configuration,
                if None, test_cfg would be used

        Returns:
            tuple:
                losses: (dict[str, Tensor]): A dictionary of loss components.
                各损失值的字典
                proposal_list (list[Tensor]): Proposals of each image.
                对每张图像的预测候选框
        """
        # 将FPN的输出传入到头部网络中，进行前向推理（RPNHead）
        # 得到的输出是一个（2,）的元组，分别是每个尺度的特征图分类得分，和边界框回归结果
        outs = self(x) 「三」
        # 判断gt_labels是不是为None，然后将outs和gt_bboxes，gt_labels，img_metas融合成一个元组
        if gt_labels is None:
            loss_inputs = outs + (gt_bboxes, img_metas)
        else:
            loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
        # 根据gt框和outs计算loss
        losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) 「六」
        if proposal_cfg is None:
            return losses
        else:
            proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
            return losses, proposal_list

4.链接到3中的「三」方法：将FPN的输出传入到头部网络中，进行前向推理

# 链接到「三」的网络头部前向传播，开始往上层传播，前向推理AnchorHead
class AnchorHead(BaseDenseHead, BBoxTestMixin):
	    # AnchorHead的前向传播
	    def forward(self, feats):
        """Forward features from the upstream network.

        Args:
            feats (tuple[Tensor]): Features from the upstream network, each is
                a 4D-tensor.
            传入4维特征图

        Returns:
            tuple: A tuple of classification scores and bbox prediction.
            返回一个包含分类得分和边界框回归的结果元组

                - cls_scores (list[Tensor]): Classification scores for all \
                    scale levels, each is a 4D-tensor, the channels number \
                    is num_anchors * num_classes.
                  对于所有尺度的分类得分，结果是一个4维的张量，深度是num_anchors * num_classes
                - bbox_preds (list[Tensor]): Box energies / deltas for all \
                    scale levels, each is a 4D-tensor, the channels number \
                    is num_anchors * 4.
                  对于所有尺度的边框回归，结果是一个4维的张量，深度是num_anchors * 4
        """
        return multi_apply(self.forward_single, feats) 「四」# 注意这里传入的函数是self.forward_single函数

5.链接到4中的「四」方法：用partial对象，分别将FPN的输出（5,）中的每一个特征图传入到self.forward_single函数中，再对输出结果取一个元组，所以最后的返回结果是（分类得分，边界框回归）

# 多尺度特征图，一个一个迭代进行forward_single
def multi_apply(func, *args, **kwargs):
    """Apply function to a list of arguments.

    Note:
        This function applies the ``func`` to multiple inputs and
        map the multiple outputs of the ``func`` into different
        list. Each list contains the same type of outputs corresponding
        to different inputs.

    Args:
        func (Function): A function that will be applied to a list of
            arguments

    Returns:
        tuple(list): A tuple containing multiple list, each list contains \
            a kind of returned results by the function
    """
    # 将多个方法统一到一个方法列表中，参数以缺省值的形式传入partial对象
    pfunc = partial(func, **kwargs) if kwargs else func
    # 分别计算各个方法的输出
    map_results = map(pfunc, *args)
    # 返回的结果是一个元组（会执行传入的func方法，在这里会把一个（5,）的FPN输出特征图，分别传入func方法中，在这里的func方法是forward_single函数）
    return tuple(map(list, zip(*map_results))) 「五」

6.链接到5中的「五」方法：首先通过rpn_conv函数将FPN输出的单个特征图进行维度转换，转换到统一的channels数量，方便后续进行分类rpn_cls预测和rpn_reg预测，分类结果形状为(N, num_anchors * num_classes, H, W)，边框回归结果形状为(N, num_anchors * 4, H, W)

    def forward_single(self, x):
        """Forward feature map of a single scale level."""
        x = self.rpn_conv(x)
        x = F.relu(x, inplace=True)
        rpn_cls_score = self.rpn_cls(x)
        rpn_bbox_pred = self.rpn_reg(x)
        return rpn_cls_score, rpn_bbox_pred

7.链接到3中的「六」方法：根据gt框和outs计算loss

class RPNHead(RPNTestMixin, AnchorHead):
	    def loss(self,
             cls_scores, # 每个尺度下的候选框得分，(N, num_anchors * num_classes, H, W)
             bbox_preds, # 每个尺度下的候选框位置预测，(N, num_anchors * 4, H, W)
             gt_bboxes, # 每张图的标注信息，(num_gts, 4) in [tl_x, tl_y, br_x, br_y] format
             img_metas, # 图像信息
             gt_bboxes_ignore=None):
        """Compute losses of the head.

        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W)
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W)
            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
                boxes can be ignored when computing the loss.

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
        losses = super(RPNHead, self).loss(
            cls_scores,
            bbox_preds,
            gt_bboxes,
            None,
            img_metas,
            gt_bboxes_ignore=gt_bboxes_ignore) 「七」 # 调用父类的loss前向传播函数
        return dict(
            loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])

8.链接到7中的「七」方法：调用父类的loss前向传播函数

class AnchorHead(BaseDenseHead, BBoxTestMixin):
	@force_fp32(apply_to=('cls_scores', 'bbox_preds'))
    def loss(self,
             cls_scores,
             bbox_preds,
             gt_bboxes,
             gt_labels,
             img_metas,
             gt_bboxes_ignore=None):
        """Compute losses of the head.

        Args:
            cls_scores (list[Tensor]): Box scores for each scale level
                Has shape (N, num_anchors * num_classes, H, W)
            bbox_preds (list[Tensor]): Box energies / deltas for each scale
                level with shape (N, num_anchors * 4, H, W)
            gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
                shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
            gt_labels (list[Tensor]): class indices corresponding to each box
            img_metas (list[dict]): Meta information of each image, e.g.,
                image size, scaling factor, etc.
            gt_bboxes_ignore (None | list[Tensor]): specify which bounding
                boxes can be ignored when computing the loss. Default: None

        Returns:
            dict[str, Tensor]: A dictionary of loss components.
        """
        # 得到每个特征图的尺寸
        featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
        assert len(featmap_sizes) == self.anchor_generator.num_levels
        device = cls_scores[0].device
        anchor_list, valid_flag_list = self.get_anchors(
            featmap_sizes, img_metas, device=device) 「八」# 根据特征图大小获取anchors
        label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
        cls_reg_targets = self.get_targets(
            anchor_list,
            valid_flag_list,
            gt_bboxes,
            img_metas,
            gt_bboxes_ignore_list=gt_bboxes_ignore,
            gt_labels_list=gt_labels,
            label_channels=label_channels) 「十二」 # 获取正负样本
        if cls_reg_targets is None:
            return None
        # 常规赋值
        (labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
         num_total_pos, num_total_neg) = cls_reg_targets
        # 统计总的sample数
        num_total_samples = (
            num_total_pos + num_total_neg if self.sampling else num_total_pos)

        # anchor number of multi levels
        num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
        # concat all level anchors and flags to a single tensor
        concat_anchor_list = []
        for i in range(len(anchor_list)):
            concat_anchor_list.append(torch.cat(anchor_list[i]))
        all_anchor_list = images_to_levels(concat_anchor_list,
                                           num_level_anchors)
		# 计算损失
        losses_cls, losses_bbox = multi_apply(
            self.loss_single,
            cls_scores,
            bbox_preds,
            all_anchor_list,
            labels_list,
            label_weights_list,
            bbox_targets_list,
            bbox_weights_list,
            num_total_samples=num_total_samples)
        return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)

9.链接到8中的「八」方法：根据特征图大小获取anchors

	# 传入特征图的大小获取到对应尺寸的锚框
    def get_anchors(self, featmap_sizes, img_metas, device='cuda'):
        """Get anchors according to feature map sizes.

        Args:
            featmap_sizes (list[tuple]): Multi-level feature map sizes.
            img_metas (list[dict]): Image meta info.
            device (torch.device | str): Device for returned tensors

        Returns:
            tuple:
                anchor_list (list[Tensor]): Anchors of each image.
                valid_flag_list (list[Tensor]): Valid flags of each image.
        """
        # 统计图像数量
        num_imgs = len(img_metas)

        # since feature map sizes of all images are the same, we only compute
        # anchors for one time
        multi_level_anchors = self.anchor_generator.grid_anchors(
            featmap_sizes, device) 「九」 # 生成anchors
        # 将上面得到的多尺度特征图映射为一个二维的列表，因为传入batch的shape一致，所以根据特征图生成的anchors也可以共用
        # 有几张图像就复制几次
        anchor_list = [multi_level_anchors for _ in range(num_imgs)]

        # for each image, we compute valid flags of multi level anchors
        # 对每张图像进行anchor可行性计算
        valid_flag_list = []
        for img_id, img_meta in enumerate(img_metas):
            multi_level_flags = self.anchor_generator.valid_flags(
                featmap_sizes, img_meta['pad_shape'], device) 「十一」
            valid_flag_list.append(multi_level_flags)

        return anchor_list, valid_flag_list  # valid_flags得到在padding以内的值

10.链接到9中的「九」方法：生成anchors

    def grid_anchors(self, featmap_sizes, device='cuda'):
        """Generate grid anchors in multiple feature levels.

        Args:
            featmap_sizes (list[tuple]): List of feature map sizes in
                multiple feature levels.
            device (str): Device where the anchors will be put on.

        Return:
            list[torch.Tensor]: Anchors in multiple feature levels. \
                The sizes of each tensor should be [N, 4], where \
                N = width * height * num_base_anchors, width and height \
                are the sizes of the corresponding feature level, \
                num_base_anchors is the number of anchors for that level.
        """
        # 对应的特征图生成anchors的步长列表长度要与特征图数量一致
        assert self.num_levels == len(featmap_sizes)
        # 建立保存所有anchors的列表
        multi_level_anchors = []
        # 遍历
        for i in range(self.num_levels):
        	# 单个特征图生成anchors
            anchors = self.single_level_grid_anchors(
                self.base_anchors[i].to(device),
                featmap_sizes[i],
                self.strides[i],
                device=device) 「十」 # 单个特征图生成anchors方法
            multi_level_anchors.append(anchors)
        return multi_level_anchors

11.链接到10中的「十」方法：单个特征图生成anchors方法

    def single_level_grid_anchors(self,
                                  base_anchors,
                                  featmap_size,
                                  stride=(16, 16),
                                  device='cuda'):
        """Generate grid anchors of a single level.
        # 单级别特征图的anchors生成方法

        Note:
            This function is usually called by method ``self.grid_anchors``.
            # 该方法经常由self.grid_anchors所调用

        Args:
            base_anchors (torch.Tensor): The base anchors of a feature grid.
            # 基础锚框
            featmap_size (tuple[int]): Size of the feature maps.
            # 特征图大小
            stride (tuple[int], optional): Stride of the feature map in order
                (w, h). Defaults to (16, 16).
            # 特征图上的步长大小
            device (str, optional): Device the tensor will be put on.
                Defaults to 'cuda'.

        Returns:
            torch.Tensor: Anchors in the overall feature maps.
            # 返回单个特征图上的所有锚框
        """
        # 常规赋值
        feat_h, feat_w = featmap_size
        # convert Tensor to int, so that we can covert to ONNX correctlly
        feat_h = int(feat_h)
        feat_w = int(feat_w)
        # torch.arange方法生成从0到x-1步长为1的所有值，例如0,1,2,3...，x-1
        # 乘上步长得到对应在w和h上的偏移量
        shift_x = torch.arange(0, feat_w, device=device) * stride[0]
        shift_y = torch.arange(0, feat_h, device=device) * stride[1]
		# 上面只是得到了在w和h方向上各自第一行（列）的偏移
		# 应该得到在特征图上的所有偏移，size与特征图一致
        shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
        # 按h方向叠加为一个tensor
        # 例如 shift_xx = tensor([0., 1., 2., 0., 1., 2.])
        # shift_yy = tensor([0., 0., 0., 1., 1., 1.])
        # shifts = tensor([
        #				[0., 0., 0., 0.],
		#	   	        [1., 0., 1., 0.],
		#		        [2., 0., 2., 0.],
		#		        [0., 1., 0., 1.],
		#		        [1., 1., 1., 1.],
		#		        [2., 1., 2., 1.]])
		# 其实在这一步已经基本完成了当前特征图的所有anchor的生成
		# 仔细看，shifts中的前三个shifts[0~2][1]和shifts[0~2][2]保持不变
		# 其实是首先固定base_anchor在当前特征图中的y不变，就是按行向前计算anchor，详见最后的图示
        shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
        # 转换类型
        shifts = shifts.type_as(base_anchors)
        # first feat_w elements correspond to the first row of shifts
        # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
        # shifted anchors (K, A, 4), reshape to (K*A, 4)

		# 首先shifts[:, None, :]将shifts中插入一维
		# 例如 之前的shifts[0]为[0., 0., 0., 0.]，现在为[[0., 0., 0., 0.]]
		# base_anchors从变成3维，例如之前为(2,4)->(1,2,4)
		# shifts从(6,4)->(6,1,4)
		# 广播机制all_anchors为(6,2,4)
		# 这一步的目的是为了把base_anchors分别与步长相加，得到每个base_anchor的所有偏移框
		# 当base_anchors从例如(2,4)->(1,2,4)，偏移网格也从(6,4)->(6,1,4)
		# 那么按照广播机制(6,1,4)->(6,2,4)其中的1->2是复制偏移步长的数量和base_anchor的数量相一致
		# base_anchors的(1,2,4)中的2指的是base_anchor的数量
		# base_anchors广播到(1,2,4)->(6,2,4),其中6是当前特征图的所有anchor数，也是每个anchor的偏移
        all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
        # 按最后一维为4铺开(6,2,4)->(12,4)
        all_anchors = all_anchors.view(-1, 4)
        # first A rows correspond to A anchors of (0, 0) in feature map,
        # then (0, 1), (0, 2), ...
        return all_anchors

    def _meshgrid(self, x, y, row_major=True):
        """Generate mesh grid of x and y.

        Args:
            x (torch.Tensor): Grids of x dimension.
            y (torch.Tensor): Grids of y dimension.
            row_major (bool, optional): Whether to return y grids first.
                Defaults to True.

        Returns:
            tuple[torch.Tensor]: The mesh grids of x and y.
        """
        # 假设传入的
        # x,y = [0,1,2],[0,1]
        # 传入的x是第一行的偏移
        # 传入的y是第一列的偏移
        # 重复len(y)次的x得到xx （3,）-> (3*len(y),)
        # xx = [0,1,2,0,1,2]
        xx = x.repeat(len(y))
        # y.view(-1, 1),将y按最后的维度拆分为(N,1),例如(3,)->(3,1)像是从行到列
        # [0,1]->[[0],[1]]
        # y.repeat(1,len(x))，指定按x方向重复数组
        # [[0,0,0],[1,1,1]]
        # y.view(-1)将数组铺平为一行
        # yy = [0,0,0,1,1,1]
        yy = y.view(-1, 1).repeat(1, len(x)).view(-1)
        if row_major:
            return xx, yy
        else:
            return yy, xx

12.链接到9中的「十一」方法：计算有效项

    def valid_flags(self, featmap_sizes, pad_shape, device='cuda'):
        """Generate valid flags of anchors in multiple feature levels.

        Args:
            featmap_sizes (list(tuple)): List of feature map sizes in
                multiple feature levels.
            pad_shape (tuple): The padded shape of the image.
            device (str): Device where the anchors will be put on.

        Return:
            list(torch.Tensor): Valid flags of anchors in multiple levels.
        """
        assert self.num_levels == len(featmap_sizes)
        multi_level_flags = []
        for i in range(self.num_levels):
        	# 对应特征图的步长，也可以看作是输入原图padding之后的下采样次数如两次步长为4,三次步长为8
            anchor_stride = self.strides[i]
            # 特征图的大小
            feat_h, feat_w = featmap_sizes[i]
            # 从img_meta中取到，输入图像经过预处理后的大小，也是输入网络的大小
            h, w = pad_shape[:2]
            valid_feat_h = min(int(np.ceil(h / anchor_stride[1])), feat_h)
            valid_feat_w = min(int(np.ceil(w / anchor_stride[0])), feat_w)
            flags = self.single_level_valid_flags((feat_h, feat_w),
                                                  (valid_feat_h, valid_feat_w),
                                                  self.num_base_anchors[i],
                                                  device=device)
            multi_level_flags.append(flags)
        return multi_level_flags

    def single_level_valid_flags(self,
                                 featmap_size,
                                 valid_size,
                                 num_base_anchors,
                                 device='cuda'):
        """Generate the valid flags of anchor in a single feature map.

        Args:
            featmap_size (tuple[int]): The size of feature maps.
            valid_size (tuple[int]): The valid size of the feature maps.
            num_base_anchors (int): The number of base anchors.
            device (str, optional): Device where the flags will be put on.
                Defaults to 'cuda'.

        Returns:
            torch.Tensor: The valid flags of each anchor in a single level \
                feature map.
        """
        feat_h, feat_w = featmap_size
        valid_h, valid_w = valid_size
        assert valid_h <= feat_h and valid_w <= feat_w
        # 创建两个默认为False的tensor
        valid_x = torch.zeros(feat_w, dtype=torch.bool, device=device)
        valid_y = torch.zeros(feat_h, dtype=torch.bool, device=device)
        # 取到有效值范围的下标赋值为True
        valid_x[:valid_w] = 1
        valid_y[:valid_h] = 1
      	# 得到有效值的网格输出
        valid_xx, valid_yy = self._meshgrid(valid_x, valid_y)
        # 这一步的目的，可以看作将h方向上的值当作一个flag标志，在valid_y[:valid_h] = 1内
        # y的值始终为True，先将y扩展到与x一样的长度，再做与运算，也是和之前的anchor计算一样，逐行计算
        # 它的目的可以用下图直接概括
        valid = valid_xx & valid_yy
        # torch.contiguous()方法首先拷贝了一份张量在内存中的地址，然后将地址按照形状改变后的张量的语义进行排列
        # expand扩张只是为了复制多个(根据num_base_anchors的个数)“有效值校验”的结果
        # 首先将valid[:, None]变为(N,1)的形状，再在最后一维上进行扩展
        # 最后铺平
        valid = valid[:, None].expand(valid.size(0),
                                      num_base_anchors).contiguous().view(-1)
        return valid

13.链接到8中的「十二」方法：获取到正负样本

    def get_targets(self,
                    anchor_list,
                    valid_flag_list,
                    gt_bboxes_list,
                    img_metas,
                    gt_bboxes_ignore_list=None,
                    gt_labels_list=None,
                    label_channels=1,
                    unmap_outputs=True,
                    return_sampling_results=False):
        """Compute regression and classification targets for anchors in
        multiple images.

        Args:
            anchor_list (list[list[Tensor]]): Multi level anchors of each
                image. The outer list indicates images, and the inner list
                corresponds to feature levels of the image. Each element of
                the inner list is a tensor of shape (num_anchors, 4).
            valid_flag_list (list[list[Tensor]]): Multi level valid flags of
                each image. The outer list indicates images, and the inner list
                corresponds to feature levels of the image. Each element of
                the inner list is a tensor of shape (num_anchors, )
            gt_bboxes_list (list[Tensor]): Ground truth bboxes of each image.
            img_metas (list[dict]): Meta info of each image.
            gt_bboxes_ignore_list (list[Tensor]): Ground truth bboxes to be
                ignored.
            gt_labels_list (list[Tensor]): Ground truth labels of each box.
            label_channels (int): Channel of label.
            unmap_outputs (bool): Whether to map outputs back to the original
                set of anchors.

        Returns:
            tuple: Usually returns a tuple containing learning targets.

                - labels_list (list[Tensor]): Labels of each level.
                - label_weights_list (list[Tensor]): Label weights of each \
                    level.
                - bbox_targets_list (list[Tensor]): BBox targets of each level.
                - bbox_weights_list (list[Tensor]): BBox weights of each level.
                - num_total_pos (int): Number of positive samples in all \
                    images.
                - num_total_neg (int): Number of negative samples in all \
                    images.
            additional_returns: This function enables user-defined returns from
                `self._get_targets_single`. These returns are currently refined
                to properties at each feature map (i.e. having HxW dimension).
                The results will be concatenated after the end
        """
        # 常规校验
        num_imgs = len(img_metas)
        assert len(anchor_list) == len(valid_flag_list) == num_imgs

        # anchor number of multi levels
        # 得到每个尺度的anchor数量
        num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
        # concat all level anchors to a single tensor
        # 把所有的anchors整合到一个张量里
        concat_anchor_list = []
        concat_valid_flag_list = []
        for i in range(num_imgs):
            assert len(anchor_list[i]) == len(valid_flag_list[i])
            concat_anchor_list.append(torch.cat(anchor_list[i]))
            concat_valid_flag_list.append(torch.cat(valid_flag_list[i]))

        # compute targets for each image
        # 计算每张图像的目标
        if gt_bboxes_ignore_list is None:
            gt_bboxes_ignore_list = [None for _ in range(num_imgs)]
        if gt_labels_list is None:
            gt_labels_list = [None for _ in range(num_imgs)]
        results = multi_apply(
            self._get_targets_single,
            concat_anchor_list,
            concat_valid_flag_list,
            gt_bboxes_list,
            gt_bboxes_ignore_list,
            gt_labels_list,
            img_metas,
            label_channels=label_channels,
            unmap_outputs=unmap_outputs) 「十三」 # 多项提交同时计算多个目标(当batch_size>1时)
        (all_labels, all_label_weights, all_bbox_targets, all_bbox_weights,
         pos_inds_list, neg_inds_list, sampling_results_list) = results[:7]
        rest_results = list(results[7:])  # user-added return values
        # no valid anchors
        if any([labels is None for labels in all_labels]):
            return None
        # sampled anchors of all images
        num_total_pos = sum([max(inds.numel(), 1) for inds in pos_inds_list])
        num_total_neg = sum([max(inds.numel(), 1) for inds in neg_inds_list])
        # split targets to a list w.r.t. multiple levels
        # 拆分成各层
        labels_list = images_to_levels(all_labels, num_level_anchors)
        label_weights_list = images_to_levels(all_label_weights,
                                              num_level_anchors)
        bbox_targets_list = images_to_levels(all_bbox_targets,
                                             num_level_anchors)
        bbox_weights_list = images_to_levels(all_bbox_weights,
                                             num_level_anchors)
        res = (labels_list, label_weights_list, bbox_targets_list,
               bbox_weights_list, num_total_pos, num_total_neg)
        if return_sampling_results:
            res = res + (sampling_results_list,)
        for i, r in enumerate(rest_results):  # user-added return values
            rest_results[i] = images_to_levels(r, num_level_anchors)

        return res + tuple(rest_results)

14.链接到13中的「十三」方法：多项提交同时计算多个目标(当batch_size>1时)

    def _get_targets_single(self,
                            flat_anchors,
                            valid_flags,
                            gt_bboxes,
                            gt_bboxes_ignore,
                            gt_labels,
                            img_meta,
                            label_channels=1,
                            unmap_outputs=True):
        """Compute regression and classification targets for anchors in a
        single image.

        Args:
            flat_anchors (Tensor): Multi-level anchors of the image, which are
                concatenated into a single tensor of shape (num_anchors ,4)
            valid_flags (Tensor): Multi level valid flags of the image,
                which are concatenated into a single tensor of
                    shape (num_anchors,).
            gt_bboxes (Tensor): Ground truth bboxes of the image,
                shape (num_gts, 4).
            img_meta (dict): Meta info of the image.
            gt_bboxes_ignore (Tensor): Ground truth bboxes to be
                ignored, shape (num_ignored_gts, 4).
            img_meta (dict): Meta info of the image.
            gt_labels (Tensor): Ground truth labels of each box,
                shape (num_gts,).
            label_channels (int): Channel of label.
            unmap_outputs (bool): Whether to map outputs back to the original
                set of anchors.

        Returns:
            tuple:
                labels_list (list[Tensor]): Labels of each level
                label_weights_list (list[Tensor]): Label weights of each level
                bbox_targets_list (list[Tensor]): BBox targets of each level
                bbox_weights_list (list[Tensor]): BBox weights of each level
                num_total_pos (int): Number of positive samples in all images
                num_total_neg (int): Number of negative samples in all images
        """
        inside_flags = anchor_inside_flags(flat_anchors, valid_flags,
                                           img_meta['img_shape'][:2],
                                           self.train_cfg.allowed_border) 「十四」
        # 假设没有任何有效anchor，返回一个(None,) * 7的元组
        if not inside_flags.any():
            return (None,) * 7
        # assign gt and sample anchors
        # 筛选出有效anchors
        anchors = flat_anchors[inside_flags, :]
		# 划分正负样本
        assign_result = self.assigner.assign(
            anchors, gt_bboxes, gt_bboxes_ignore,
            None if self.sampling else gt_labels) 「十五」# 正负样本划分
        sampling_result = self.sampler.sample(assign_result, anchors,
                                              gt_bboxes) 「十九」 # 样本均衡采样

        # 统计有效anchors的数量
        num_valid_anchors = anchors.shape[0]
        # 创建两个维度和num_valid_anchors一样的空tensor
        bbox_targets = torch.zeros_like(anchors)
        bbox_weights = torch.zeros_like(anchors)
        labels = anchors.new_full((num_valid_anchors,),
                                  self.num_classes,
                                  dtype=torch.long)
        label_weights = anchors.new_zeros(num_valid_anchors, dtype=torch.float)

        # 采样的正负样本下标
        pos_inds = sampling_result.pos_inds
        neg_inds = sampling_result.neg_inds
        if len(pos_inds) > 0:
        	# reg_decoded_bbox（布尔）：如果为true，则回归损失将应用于解码的边界框，默认值：False
            if not self.reg_decoded_bbox:
                pos_bbox_targets = self.bbox_coder.encode(
                    sampling_result.pos_bboxes, sampling_result.pos_gt_bboxes) 「二十三」
            else:
                pos_bbox_targets = sampling_result.pos_gt_bboxes
            # 覆盖bbox_targets和bbox_weights中的正样本项
            bbox_targets[pos_inds, :] = pos_bbox_targets
            bbox_weights[pos_inds, :] = 1.0
            if gt_labels is None:
                # Only rpn gives gt_labels as None
                # Foreground is the first class since v2.5.0
                # 在RPN中训练时gt_labels为None
                labels[pos_inds] = 0
            else:
                labels[pos_inds] = gt_labels[
                    sampling_result.pos_assigned_gt_inds]
            # 在配置文件中赋值正样本的权重
            if self.train_cfg.pos_weight <= 0:
                label_weights[pos_inds] = 1.0
            else:
                label_weights[pos_inds] = self.train_cfg.pos_weight
        # 如果存在负样本也要对相对应的负样本权重赋值
        if len(neg_inds) > 0:
            label_weights[neg_inds] = 1.0

        # map up to original set of anchors
        # 将输出反映射到原始的输入
        if unmap_outputs:
            num_total_anchors = flat_anchors.size(0)
            labels = unmap(
                labels, num_total_anchors, inside_flags,
                fill=self.num_classes)  # fill bg label
            label_weights = unmap(label_weights, num_total_anchors,
                                  inside_flags)
            bbox_targets = unmap(bbox_targets, num_total_anchors, inside_flags)
            bbox_weights = unmap(bbox_weights, num_total_anchors, inside_flags)

        return (labels, label_weights, bbox_targets, bbox_weights, pos_inds,
                neg_inds, sampling_result)

15.链接到14中的「十四」方法：

# 检查锚点是否在边界内
def anchor_inside_flags(flat_anchors,
                        valid_flags,
                        img_shape,
                        allowed_border=0):
    """Check whether the anchors are inside the border.

    Args:
        flat_anchors (torch.Tensor): Flatten anchors, shape (n, 4).
        valid_flags (torch.Tensor): An existing valid flags of anchors.
        img_shape (tuple(int)): Shape of current image.
        allowed_border (int, optional): The border to allow the valid anchor.
            Defaults to 0.

    Returns:
        torch.Tensor: Flags indicating whether the anchors are inside a \
            valid range.
    """
    img_h, img_w = img_shape[:2]
    # 被允许的边界
    if allowed_border >= 0:
        inside_flags = valid_flags & \ #首先是之前计算出来的有效值校验结果
            (flat_anchors[:, 0] >= -allowed_border) & \ # 确定左上角坐标的x值是否大于0或Z
            (flat_anchors[:, 1] >= -allowed_border) & \ # 确定左上角坐标的y值是否大于0或Z
            (flat_anchors[:, 2] < img_w + allowed_border) & \ # 确定右下角坐标的x值是否小于图像宽度
            (flat_anchors[:, 3] < img_h + allowed_border)# 确定右下角坐标的y值是否小于图像高度
    else:
        inside_flags = valid_flags
    return inside_flags

16.链接到14中的「十五」方法：

    def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
        """Assign gt to bboxes.

        This method assign a gt bbox to every bbox (proposal/anchor), each bbox
        will be assigned with -1, or a semi-positive number. -1 means negative
        sample, semi-positive number is the index (0-based) of assigned gt.
        The assignment is done in following steps, the order matters.

        1. assign every bbox to the background
        2. assign proposals whose iou with all gts < neg_iou_thr to 0
        3. for each bbox, if the iou with its nearest gt >= pos_iou_thr,
           assign it to that bbox
        4. for each gt bbox, assign its nearest proposals (may be more than
           one) to itself

        Args:
            bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
            gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
            gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
                labelled as `ignored`, e.g., crowd boxes in COCO.
            gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).

        Returns:
            :obj:`AssignResult`: The assign result.

        Example:
            >>> self = MaxIoUAssigner(0.5, 0.5)
            >>> bboxes = torch.Tensor([[0, 0, 10, 10], [10, 10, 20, 20]])
            >>> gt_bboxes = torch.Tensor([[0, 0, 10, 9]])
            >>> assign_result = self.assign(bboxes, gt_bboxes)
            >>> expected_gt_inds = torch.LongTensor([1, 0])
            >>> assert torch.all(assign_result.gt_inds == expected_gt_inds)
        """
        # 判断gt框的数量是否超出允许在gpu上进行正负样本划分的阈值，超过则选用CPU
        assign_on_cpu = True if (self.gpu_assign_thr > 0) and (
                gt_bboxes.shape[0] > self.gpu_assign_thr) else False
        # compute overlap and assign gt on CPU when number of GT is large
        if assign_on_cpu:
            device = bboxes.device
            bboxes = bboxes.cpu()
            gt_bboxes = gt_bboxes.cpu()
            if gt_bboxes_ignore is not None:
                gt_bboxes_ignore = gt_bboxes_ignore.cpu()
            if gt_labels is not None:
                gt_labels = gt_labels.cpu()

        overlaps = self.iou_calculator(gt_bboxes, bboxes) 「十六」 # 计算IOU的值
		# 这部分是计算iof的
        if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
                and gt_bboxes_ignore.numel() > 0 and bboxes.numel() > 0):
            if self.ignore_wrt_candidates:
                ignore_overlaps = self.iou_calculator(
                    bboxes, gt_bboxes_ignore, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
            else:
                ignore_overlaps = self.iou_calculator(
                    gt_bboxes_ignore, bboxes, mode='iof')
                ignore_max_overlaps, _ = ignore_overlaps.max(dim=0)
            overlaps[:, ignore_max_overlaps > self.ignore_iof_thr] = -1
		
		# 
        assign_result = self.assign_wrt_overlaps(overlaps, gt_labels) 「十七」
        if assign_on_cpu:
            assign_result.gt_inds = assign_result.gt_inds.to(device)
            assign_result.max_overlaps = assign_result.max_overlaps.to(device)
            if assign_result.labels is not None:
                assign_result.labels = assign_result.labels.to(device)
        return assign_result

17.链接到16中的「十六」方法：

class BboxOverlaps2D(object):
    """2D Overlaps (e.g. IoUs, GIoUs) Calculator."""

    def __call__(self, bboxes1, bboxes2, mode='iou', is_aligned=False):
        """Calculate IoU between 2D bboxes.

        Args:
            bboxes1 (Tensor): bboxes have shape (m, 4) in 
                format, or shape (m, 5) in  format.
            bboxes2 (Tensor): bboxes have shape (m, 4) in 
                format, shape (m, 5) in  format, or be
                empty. If ``is_aligned `` is ``True``, then m and n must be
                equal.
            mode (str): "iou" (intersection over union), "iof" (intersection
                over foreground), or "giou" (generalized intersection over
                union).
            is_aligned (bool, optional): If True, then m and n must be equal.
                Default False.

        Returns:
            Tensor: shape (m, n) if ``is_aligned `` is False else shape (m,)
        """
        assert bboxes1.size(-1) in [0, 4, 5]
        assert bboxes2.size(-1) in [0, 4, 5]
        if bboxes2.size(-1) == 5:
            bboxes2 = bboxes2[..., :4]
        if bboxes1.size(-1) == 5:
            bboxes1 = bboxes1[..., :4]
        return bbox_overlaps(bboxes1, bboxes2, mode, is_aligned)

def bbox_overlaps(bboxes1, bboxes2, mode='iou', is_aligned=False, eps=1e-6):
    """Calculate overlap between two set of bboxes.
    # 计算两个bbox集合的重叠度

    If ``is_aligned `` is ``False``, then calculate the overlaps between each
    bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned
    pair of bboxes1 and bboxes2.
    # 如果设置is_aligned为真，则会取bboxex1和bboxes2相对应的元素进行计算，反之，会取每一个bboxes1和每个bboxes2的元素进行计算

    Args:
        bboxes1 (Tensor): shape (B, m, 4) in  format or empty.
        bboxes2 (Tensor): shape (B, n, 4) in  format or empty.
            B indicates the batch dim, in shape (B1, B2, ..., Bn).
            If ``is_aligned `` is ``True``, then m and n must be equal.
        mode (str): "iou" (intersection over union) or "iof" (intersection over
            foreground).
        is_aligned (bool, optional): If True, then m and n must be equal.
            Default False.
        eps (float, optional): A value added to the denominator for numerical
            stability. Default 1e-6.

    Returns:
        Tensor: shape (m, n) if ``is_aligned `` is False else shape (m,)

    Example:
        >>> bboxes1 = torch.FloatTensor([
        >>>     [0, 0, 10, 10],
        >>>     [10, 10, 20, 20],
        >>>     [32, 32, 38, 42],
        >>> ])
        >>> bboxes2 = torch.FloatTensor([
        >>>     [0, 0, 10, 20],
        >>>     [0, 10, 10, 19],
        >>>     [10, 10, 20, 20],
        >>> ])
        >>> overlaps = bbox_overlaps(bboxes1, bboxes2)
        >>> assert overlaps.shape == (3, 3)
        >>> overlaps = bbox_overlaps(bboxes1, bboxes2, is_aligned=True)
        >>> assert overlaps.shape == (3, )

    Example:
        >>> empty = torch.empty(0, 4)
        >>> nonempty = torch.FloatTensor([[0, 0, 10, 9]])
        >>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
        >>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
        >>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
    """

    assert mode in ['iou', 'iof', 'giou'], f'Unsupported mode {mode}'
    # Either the boxes are empty or the length of boxes's last dimenstion is 4
    assert (bboxes1.size(-1) == 4 or bboxes1.size(0) == 0)
    assert (bboxes2.size(-1) == 4 or bboxes2.size(0) == 0)

    # Batch dim must be the same
    # Batch dim: (B1, B2, ... Bn)
    # 确保维度一致
    assert bboxes1.shape[:-2] == bboxes2.shape[:-2]
    batch_shape = bboxes1.shape[:-2]
	# 取到个数
    rows = bboxes1.size(-2)
    cols = bboxes2.size(-2)
    # 针对对齐的情况
    if is_aligned:
        assert rows == cols
	# 针对有为空的情况
    if rows * cols == 0:
        if is_aligned:
            return bboxes1.new(batch_shape + (rows, ))
        else:
            return bboxes1.new(batch_shape + (rows, cols))
	
	# 计算bboxes1的面积
    area1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (
        bboxes1[..., 3] - bboxes1[..., 1])
    # 计算bboxes2的面积
    area2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (
        bboxes2[..., 3] - bboxes2[..., 1])

    if is_aligned:
        lt = torch.max(bboxes1[..., :2], bboxes2[..., :2])  # [B, rows, 2]
        rb = torch.min(bboxes1[..., 2:], bboxes2[..., 2:])  # [B, rows, 2]

        wh = (rb - lt).clamp(min=0)  # [B, rows, 2]
        overlap = wh[..., 0] * wh[..., 1]

        if mode in ['iou', 'giou']:
            union = area1 + area2 - overlap
        else:
            union = area1
        if mode == 'giou':
            enclosed_lt = torch.min(bboxes1[..., :2], bboxes2[..., :2])
            enclosed_rb = torch.max(bboxes1[..., 2:], bboxes2[..., 2:])
    else:
    	# 计算两个bbox相交的最左上角坐标
    	# 假设bboxes1有2个，bboxes2有3个
    	# 则 lt和rb的结果为(2,3,2)表示每个bboxes1和每个bboxes2间的最大值ltx,lty或最小值rbx,rby
    	# 感觉这里有点错误
        lt = torch.max(bboxes1[..., :, None, :2],
                       bboxes2[..., None, :, :2])  # [B, rows, cols, 2]
        rb = torch.min(bboxes1[..., :, None, 2:],
                       bboxes2[..., None, :, 2:])  # [B, rows, cols, 2]
		
		# 相减直接得到宽高，限制最小值最少为0
        wh = (rb - lt).clamp(min=0)  # [B, rows, cols, 2]
        # 相乘直接得到相交面积
        overlap = wh[..., 0] * wh[..., 1]

        if mode in ['iou', 'giou']:
        	# 将area1和area2分别扩维相加再减去重叠部分
            union = area1[..., None] + area2[..., None, :] - overlap
        else:
            union = area1[..., None]
        if mode == 'giou':
            enclosed_lt = torch.min(bboxes1[..., :, None, :2],
                                    bboxes2[..., None, :, :2])
            enclosed_rb = torch.max(bboxes1[..., :, None, 2:],
                                    bboxes2[..., None, :, 2:])
	# 分母上增加的数值以保证数值稳定性，防止分母为0
    eps = union.new_tensor([eps])
    union = torch.max(union, eps)
    ious = overlap / union
    if mode in ['iou', 'iof']:
    	# 返回计算的IOU
        return ious
    # calculate gious
    enclose_wh = (enclosed_rb - enclosed_lt).clamp(min=0)
    enclose_area = enclose_wh[..., 0] * enclose_wh[..., 1]
    enclose_area = torch.max(enclose_area, eps)
    gious = ious - (enclose_area - union) / enclose_area
    return gious

18.链接到16中的「十七」方法：

    def assign_wrt_overlaps(self, overlaps, gt_labels=None):
        """Assign w.r.t. the overlaps of bboxes with gts.
        # overlaps是k个gt框和n个anchors的IOU值结果,shape为(k,n)

        Args:
            overlaps (Tensor): Overlaps between k gt_bboxes and n bboxes,
                shape(k, n).
            gt_labels (Tensor, optional): Labels of k gt_bboxes, shape (k, ).

        Returns:
            :obj:`AssignResult`: The assign result.
        """
        num_gts, num_bboxes = overlaps.size(0), overlaps.size(1)

        # 1. assign -1 by default
        # 创建一个值为-1,shape为(n,)的张量
        assigned_gt_inds = overlaps.new_full((num_bboxes,),
                                             -1,
                                             dtype=torch.long)
		# 考虑没有gt框或没有有效anchor的情况
        if num_gts == 0 or num_bboxes == 0:
            # No ground truth or boxes, return empty assignment
            max_overlaps = overlaps.new_zeros((num_bboxes,))
            if num_gts == 0:
                # No truth, assign everything to background
                assigned_gt_inds[:] = 0
            if gt_labels is None:
                assigned_labels = None
            else:
                assigned_labels = overlaps.new_full((num_bboxes,),
                                                    -1,
                                                    dtype=torch.long)
            return AssignResult(
                num_gts,
                assigned_gt_inds,
                max_overlaps,
                labels=assigned_labels)

        # for each anchor, which gt best overlaps with it
        # for each anchor, the max iou of all gts
        # overlaps的维度是k×n
        # max_overlaps是n个anchors中的每一个跟gt框有最大IOU的值，形状为(n,)
        # argmax_overlaps的shape为(n,)是max_overlaps其中对应的k的下标（即指明当前anchor和哪个gt框iou最大）
        # 指明当前anchor和哪个gt框iou最大，值是多少
        max_overlaps, argmax_overlaps = overlaps.max(dim=0)
        # for each gt, which anchor best overlaps with it
        # for each gt, the max iou of all proposals
        # 指明当前gt框和哪个anchor框iou最大，值是多少
        gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=1)

        # 2. assign negative: below
        # the negative inds are set to be 0
        # 划分负样本，根据max_overlaps中的iou值，可得每个anchor与哪个gt框最吻合，iou值为多少
        # 只要max_overlaps的值大于0且小于设定的neg_iou_thr阈值，便划分为负样本
        # 负样本的值为0
        if isinstance(self.neg_iou_thr, float):
            assigned_gt_inds[(max_overlaps >= 0)
                             & (max_overlaps < self.neg_iou_thr)] = 0
        elif isinstance(self.neg_iou_thr, tuple):
            assert len(self.neg_iou_thr) == 2
            assigned_gt_inds[(max_overlaps >= self.neg_iou_thr[0])
                             & (max_overlaps < self.neg_iou_thr[1])] = 0

        # 3. assign positive: above positive IoU threshold
        # 划分正样本，max_overlaps大于设定的正样本阈值，即划分为正样本
        pos_inds = max_overlaps >= self.pos_iou_thr
        # 将大于正样本阈值的对应项从argmax_overlaps中取出值并+1
        # 赋值给assigned_gt_inds对应的位置的项
        # 此时assigned_gt_inds中，负样本为0,正样本的值为[1,k]，默认为-1为背景
        assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1

		# 这一步是将更多匹配值较低的anchor赋为正样本(或许会小于pos_iou_thr)
		# 赋值可能会将步骤3中的一些正样本划分重写
        if self.match_low_quality:
            # Low-quality matching will overwirte the assigned_gt_inds assigned
            # in Step 3. Thus, the assigned gt might not be the best one for
            # prediction.
            # For example, if bbox A has 0.9 and 0.8 iou with GT bbox 1 & 2,
            # bbox 1 will be assigned as the best target for bbox A in step 3.
            # However, if GT bbox 2's gt_argmax_overlaps = A, bbox A's
            # assigned_gt_inds will be overwritten to be bbox B.

            # 重点：This might be the reason that it is not used in ROI Heads.这或许是不在ROI头部网络中使用match_low_quality的原因，指的是覆盖步骤三的样本划分
            # 按照gt框的匹配值来划分正样本
            for i in range(num_gts):
            	# 取到第i个gt框的最大iou，判断是否大于min_pos_iou
                if gt_max_overlaps[i] >= self.min_pos_iou:
                	# gt_max_assign_all标志位是指是否匹配更多同值的anchor框，例如gt框最大iou值为0.5,同时有多个anchor的iou值为0.5,将其全部划分为正样本
                    if self.gt_max_assign_all:
                        max_iou_inds = overlaps[i, :] == gt_max_overlaps[i]
                        assigned_gt_inds[max_iou_inds] = i + 1
                    else:
                        assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1

        if gt_labels is not None:
            assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1)
            pos_inds = torch.nonzero(
                assigned_gt_inds > 0, as_tuple=False).squeeze()
            if pos_inds.numel() > 0:
                assigned_labels[pos_inds] = gt_labels[
                    assigned_gt_inds[pos_inds] - 1]
        else:
            assigned_labels = None

        return AssignResult(
            num_gts, assigned_gt_inds, max_overlaps, labels=assigned_labels) 「十八」#存储

19.链接到18中的「十八」方法：

class AssignResult(util_mixins.NiceRepr):
    """Stores assignments between predicted and truth boxes.
    存储预测框Gt框的样本划分结果

    Attributes:
        num_gts (int): the number of truth boxes considered when computing this
            assignment
        # 指的是GT框的个数

        gt_inds (LongTensor): for each predicted box indicates the 1-based
            index of the assigned truth box. 0 means unassigned and -1 means
            ignore.
        # 指的是对应的每个anchor的assigned情况，默认-1代表背景并忽略，0为负样本，[1,num_gts]为正样本

        max_overlaps (FloatTensor): the iou between the predicted box and its
            assigned truth box.
        # 每个anchor最大IOU值

        labels (None | LongTensor): If specified, for each predicted box
            indicates the category label of the assigned truth box.

    Example:
        >>> # An assign result between 4 predicted boxes and 9 true boxes
        >>> # where only two boxes were assigned.
        >>> num_gts = 9
        >>> max_overlaps = torch.LongTensor([0, .5, .9, 0])
        >>> gt_inds = torch.LongTensor([-1, 1, 2, 0])
        >>> labels = torch.LongTensor([0, 3, 4, 0])
        >>> self = AssignResult(num_gts, gt_inds, max_overlaps, labels)
        >>> print(str(self))  # xdoctest: +IGNORE_WANT
        
        >>> # Force addition of gt labels (when adding gt as proposals)
        >>> new_labels = torch.LongTensor([3, 4, 5])
        >>> self.add_gt_(new_labels)
        >>> print(str(self))  # xdoctest: +IGNORE_WANT
        
    """

    def __init__(self, num_gts, gt_inds, max_overlaps, labels=None):
        self.num_gts = num_gts
        self.gt_inds = gt_inds
        self.max_overlaps = max_overlaps
        self.labels = labels
        # Interface for possible user-defined properties
        self._extra_properties = {}

20.链接到14中的「十九」方法：

    def sample(self,
               assign_result,
               bboxes,
               gt_bboxes,
               gt_labels=None,
               **kwargs):
        """Sample positive and negative bboxes.
        # 采样正负样本

        This is a simple implementation of bbox sampling given candidates,
        assigning results and ground truth bboxes.

        Args:
            assign_result (:obj:`AssignResult`): Bbox assigning results.
            bboxes (Tensor): Boxes to be sampled from.
            gt_bboxes (Tensor): Ground truth bboxes.
            gt_labels (Tensor, optional): Class labels of ground truth bboxes.

        Returns:
            :obj:`SamplingResult`: Sampling result.

        Example:
            >>> from mmdet.core.bbox import RandomSampler
            >>> from mmdet.core.bbox import AssignResult
            >>> from mmdet.core.bbox.demodata import ensure_rng, random_boxes
            >>> rng = ensure_rng(None)
            >>> assign_result = AssignResult.random(rng=rng)
            >>> bboxes = random_boxes(assign_result.num_preds, rng=rng)
            >>> gt_bboxes = random_boxes(assign_result.num_gts, rng=rng)
            >>> gt_labels = None
            >>> self = RandomSampler(num=32, pos_fraction=0.5, neg_pos_ub=-1,
            >>>                      add_gt_as_proposals=False)
            >>> self = self.sample(assign_result, bboxes, gt_bboxes, gt_labels)
        """
        # 维度矫正
        if len(bboxes.shape) < 2:
            bboxes = bboxes[None, :]
		
		# 或许会有5维的情况
        bboxes = bboxes[:, :4]
		
		# 创建和anchors长度一样的tensor，值为0
        gt_flags = bboxes.new_zeros((bboxes.shape[0], ), dtype=torch.uint8)
        if self.add_gt_as_proposals and len(gt_bboxes) > 0:
            if gt_labels is None:
                raise ValueError(
                    'gt_labels must be given when add_gt_as_proposals is True')
            bboxes = torch.cat([gt_bboxes, bboxes], dim=0)
            assign_result.add_gt_(gt_labels)
            gt_ones = bboxes.new_ones(gt_bboxes.shape[0], dtype=torch.uint8)
            gt_flags = torch.cat([gt_ones, gt_flags])
		
		# num_expected_pos指的是期望得到的正样本的数目
        num_expected_pos = int(self.num * self.pos_fraction)
        pos_inds = self.pos_sampler._sample_pos(
            assign_result, num_expected_pos, bboxes=bboxes, **kwargs) 「二十」#得到正样本采样
        # We found that sampled indices have duplicated items occasionally.
        # (may be a bug of PyTorch)
        # 去重
        pos_inds = pos_inds.unique()
        # 统计正样本个数
        num_sampled_pos = pos_inds.numel()
        # 计算期望得到的负样本个数
        num_expected_neg = self.num - num_sampled_pos
        # TODO
        if self.neg_pos_ub >= 0:
            _pos = max(1, num_sampled_pos)
            neg_upper_bound = int(self.neg_pos_ub * _pos)
            if num_expected_neg > neg_upper_bound:
                num_expected_neg = neg_upper_bound
        neg_inds = self.neg_sampler._sample_neg(
            assign_result, num_expected_neg, bboxes=bboxes, **kwargs) 「二十一」# 得到负样本采样
        neg_inds = neg_inds.unique()

        sampling_result = SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes,
                                         assign_result, gt_flags) 「二十二」
        return sampling_result

21.链接到20中的「二十」方法：

    def _sample_pos(self, assign_result, num_expected, **kwargs):
        """Randomly sample some positive samples."""
        # torch.nonzero找到目标中不为0元素的下标
        # assign_result.gt_inds > 0指的是被划分为正样本的元素
        pos_inds = torch.nonzero(assign_result.gt_inds > 0, as_tuple=False)
        # pos_inds.numel()是统计元素数目
        if pos_inds.numel() != 0:
        	# 压缩维度，一般在这里压缩为一维
            pos_inds = pos_inds.squeeze(1)
        # 判断 正样本个数是否超出预期的数目 num_expected
        if pos_inds.numel() <= num_expected:
        	# 没有超出直接返回
            return pos_inds
        else:
        	# 若超出，则随机选择num_expected个
            return self.random_choice(pos_inds, num_expected)

22.链接到20中的「二十一」方法：

    def _sample_neg(self, assign_result, num_expected, **kwargs):
        """Randomly sample some negative samples."""
        # torch.nonzero找到目标中不为0元素的下标（注意assign_result.gt_inds == 0返回的值是bool），所以此处得到的结果是返回结果为真的下标
        # assign_result.gt_inds == 0指的是被划分为负样本的项
        neg_inds = torch.nonzero(assign_result.gt_inds == 0, as_tuple=False)
        if neg_inds.numel() != 0:
            neg_inds = neg_inds.squeeze(1)
        if len(neg_inds) <= num_expected:
            return neg_inds
        else:
            return self.random_choice(neg_inds, num_expected)

23.链接到20中的「二十二」方法：

class SamplingResult(util_mixins.NiceRepr):
    """Bbox sampling result.

    Example:
        >>> # xdoctest: +IGNORE_WANT
        >>> from mmdet.core.bbox.samplers.sampling_result import *  # NOQA
        >>> self = SamplingResult.random(rng=10)
        >>> print(f'self = {self}')
        self = 
    """

    def __init__(self, pos_inds, neg_inds, bboxes, gt_bboxes, assign_result,
                 gt_flags):
        # 正样本下标
        self.pos_inds = pos_inds
        # 负样本下标
        self.neg_inds = neg_inds
        # 正样本框
        self.pos_bboxes = bboxes[pos_inds]
        # 负样本框
        self.neg_bboxes = bboxes[neg_inds]
        # TODO
        self.pos_is_gt = gt_flags[pos_inds]
		# gt框个数
        self.num_gts = gt_bboxes.shape[0]
        # 正样本划分的anchor对应gt框
        self.pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1

        if gt_bboxes.numel() == 0:
            # hack for index error case
            assert self.pos_assigned_gt_inds.numel() == 0
            self.pos_gt_bboxes = torch.empty_like(gt_bboxes).view(-1, 4)
        else:
            if len(gt_bboxes.shape) < 2:
                gt_bboxes = gt_bboxes.view(-1, 4)
			# 正样本划分的anchor对应gt框的值
            self.pos_gt_bboxes = gt_bboxes[self.pos_assigned_gt_inds, :]

        if assign_result.labels is not None:
            self.pos_gt_labels = assign_result.labels[pos_inds]
        else:
            self.pos_gt_labels = None

24.链接到14中的「二十三」方法：

    def encode(self, bboxes, gt_bboxes):
        """Get box regression transformation deltas that can be used to
        transform the ``bboxes`` into the ``gt_bboxes``.

        Args:
            bboxes (torch.Tensor): Source boxes, e.g., object proposals.
            gt_bboxes (torch.Tensor): Target of the transformation, e.g.,
                ground-truth boxes.

        Returns:
            torch.Tensor: Box transformation deltas
        """

        assert bboxes.size(0) == gt_bboxes.size(0)
        assert bboxes.size(-1) == gt_bboxes.size(-1) == 4
        encoded_bboxes = bbox2delta(bboxes, gt_bboxes, self.means, self.stds) 「二十四」# 将anchors和gt框进行编码
        return encoded_bboxes

25.链接到24中的「二十四」方法：

def bbox2delta(proposals, gt, means=(0., 0., 0., 0.), stds=(1., 1., 1., 1.)):
    """Compute deltas of proposals w.r.t. gt.

    We usually compute the deltas of x, y, w, h of proposals w.r.t ground
    truth bboxes to get regression target.
    This is the inverse function of :func:`delta2bbox`.

    Args:
        proposals (Tensor): Boxes to be transformed, shape (N, ..., 4)
        gt (Tensor): Gt bboxes to be used as base, shape (N, ..., 4)
        means (Sequence[float]): Denormalizing means for delta coordinates
        stds (Sequence[float]): Denormalizing standard deviation for delta
            coordinates

    Returns:
        Tensor: deltas with shape (N, 4), where columns represent dx, dy,
            dw, dh.
    """
    # 判断维度
    assert proposals.size() == gt.size()
	
	# 转为浮点
    proposals = proposals.float()
    gt = gt.float()
    
    # 计算中心点以及宽高
    px = (proposals[..., 0] + proposals[..., 2]) * 0.5
    py = (proposals[..., 1] + proposals[..., 3]) * 0.5
    pw = proposals[..., 2] - proposals[..., 0]
    ph = proposals[..., 3] - proposals[..., 1]
    gx = (gt[..., 0] + gt[..., 2]) * 0.5
    gy = (gt[..., 1] + gt[..., 3]) * 0.5
    gw = gt[..., 2] - gt[..., 0]
    gh = gt[..., 3] - gt[..., 1]

	# 计算偏差
    dx = (gx - px) / pw
    dy = (gy - py) / ph
    dw = torch.log(gw / pw)
    dh = torch.log(gh / ph)
    deltas = torch.stack([dx, dy, dw, dh], dim=-1)

	# 均值与方差
    means = deltas.new_tensor(means).unsqueeze(0)
    stds = deltas.new_tensor(stds).unsqueeze(0)
    # 每个值减均值除方差
    deltas = deltas.sub_(means).div_(stds)

    return deltas

RPN网络的全流程图示

1.RPN网络的初始化

Reference

你可能感兴趣的:(深度学习)

从零开始理解Transformer模型：架构与应用淮橘√ transformer 深度学习人工智能
引言近年来，Transformer模型席卷了自然语言处理（NLP）领域，成为了深度学习中的明星架构。从Google提出的《AttentionisAllYouNeed》论文到ChatGPT、BERT等模型的广泛应用，Transformer以其强大的性能和灵活性改变了我们对序列建模的认知。本文将从零开始，深入浅出地解析Transformer的架构原理、核心组件以及实际应用场景，并提供一个简单的代码示例
基于深度学习的线上问诊系统设计与实现（Python+Django+MySQL）神经网络15044 深度学习算法神经网络 python 深度学习 django 机器学习人工智能算法目标检测
基于深度学习的线上问诊系统设计与实现（Python+Django+MySQL）一、系统概述本系统结合YOLOv8目标检测和ResNet50图像分类算法，构建了一个智能线上问诊平台。系统支持用户上传医学影像（皮肤照片/X光片），自动分析并生成诊断报告，同时提供医生审核功能。二、技术栈后端框架：Django4.2数据库：MySQL8.0深度学习：YOLOv8：皮肤病变区域检测ResNet50：肺炎X光
深度学习中常见激活函数总结向左转,　向右走ˉ 深度学习人工智能 pytorch python
以下是一份深度学习激活函数的系统总结，涵盖定义、类型、作用、应用及选择影响，便于你快速掌握核心知识：一、激活函数的定义在神经网络中，激活函数（ActivationFunction）是神经元计算输出的非线性变换函数，作用于加权输入和偏置之和：输出=f(加权和+偏置)核心价值：引入非线性，使神经网络能够拟合任意复杂函数（无激活函数的深度网络等价于单层线性模型）。二、常见激活函数类型1.线性函数（Lin
FP16、BF16、INT8、INT4精度模型加载所需显存以及硬件适配的分析 herosunly 大模型精度 BF16 硬件适配
大家好，我是herosunly。985院校硕士毕业，现担任算法研究员一职，热衷于机器学习算法研究与应用。曾获得阿里云天池比赛第一名，CCF比赛第二名，科大讯飞比赛第三名。拥有多项发明专利。对机器学习和深度学习拥有自己独到的见解。曾经辅导过若干个非计算机专业的学生进入到算法行业就业。希望和大家一起成长进步。本文主要介绍了FP16、INT8、INT4精度模型加载占用显存大小的分析，希望对学习大
智能办公与科研革命：ChatGPT+DeepSeek大模型在论文撰写、数据分析与AI建模中的实践指南 jwwkyjspt 机器学习 SCI论文人工智能 chatgpt 语言模型机器学习
随着人工智能技术的快速发展，大语言模型如ChatGPT和DeepSeek在科研领域的应用正在为科研人员提供强大的支持。这些模型通过深度学习和大规模语料库训练，能够帮助科研人员高效地筛选文献、生成论文内容、进行数据分析和优化机器学习模型。ChatGPT和DeepSeek能够快速理解和生成复杂的语言，帮助研究人员在撰写论文时提高效率，不仅生成高质量的文章内容，还能优化论文结构和语言表达。在数据分析方面
【机器学习&深度学习】适合微调的模型选型指南一叶千舟深度学习【应用必备常识】深度学习人工智能
目录一、不同规模模型微调适用性二、微调技术类型对显存的影响三、选择建议（根据你的硬件）四、实际模型推荐五、不同模型适合人群六、推荐几个“非常适合微调”的模型七、推荐使用的微调技术八、场景选择示例场景1：智能客服（中文）场景2：法律问答（中文RAG）场景3：医学问答/健康咨询场景4：AI写作助手（中英文）场景5：代码补全/AI编程助手对比总结表九、不同参数模型特点9.1参数规模vs能力9.2微型模型
【机器学习&深度学习】本地部署 vs API调用：关键看显存！一叶千舟深度学习【应用必备常识】深度学习人工智能
目录一、本地部署VSAPI调用1.模型运行方式2.性能与速度3.成本4.隐私与安全5.何时选择哪种方式？二、为什么推荐本地部署？1️⃣零依赖网络和外部服务，更可靠稳定2️⃣无调用次数限制，更适合高频或批量推理3️⃣避免长期API费用，节省成本4️⃣保护用户隐私和数据安全5️⃣可自定义、深度优化6️⃣加载一次即可复用，低延迟高性能7️⃣离线可用（重要！）三、适合本地部署的情况四、本地部署条件4.1模
深度学习 vs 传统机器学习：哪个更适合你的项目？ AI大模型应用之禅深度学习机器学习人工智能 ai
深度学习vs传统机器学习：哪个更适合你的项目？关键词：深度学习、传统机器学习、特征工程、数据量、计算资源、项目选择、算法对比摘要：本文将用"炒菜"和"拼图"等生活案例，从核心原理、适用场景、资源需求等维度对比深度学习与传统机器学习。通过具体代码示例和真实项目场景分析，帮助开发者和企业决策者快速判断：你的项目该选深度学习还是传统机器学习？背景介绍目的和范围随着AI技术普及，"该用深度学习还是传统机器
【深度学习|学习笔记】如何在深度学习中使用正则化技术进行模型压缩、稀疏建模和迁移学习调优？努力毕业的小土博^_^ 机器学习基础算法优质笔记2 深度学习学习笔记迁移学习人工智能机器学习
【深度学习|学习笔记】如何在深度学习中使用正则化技术进行模型压缩、稀疏建模和迁移学习调优？【深度学习|学习笔记】如何在深度学习中使用正则化技术进行模型压缩、稀疏建模和迁移学习调优？文章目录【深度学习|学习笔记】如何在深度学习中使用正则化技术进行模型压缩、稀疏建模和迁移学习调优？✅一、使用正则化进行模型压缩（ModelCompression）目标：方法：L1正则化促使权重稀疏化代码示例：后续压缩步骤
【机器学习&深度学习】模型微调的基本概念与流程一叶千舟深度学习【理论】机器学习深度学习人工智能
目录前言一、什么是模型微调（Fine-tuning）？二、预训练vs微调：什么关系？三、微调的基本流程（以BERT为例）1️⃣准备数据2️⃣加载预训练模型和分词器3️⃣数据编码与加载4️⃣定义优化器5️⃣开始训练6️⃣评估与保存模型四、是否要冻结BERT层？五、完整训练示例代码5.1环境依赖5.2执行代码总结：微调的优势前言在自然语言处理（NLP）快速发展的今天，预训练模型如BERT成为了众多任务
linux深度学习问题汇总不想改代码备忘录 linux python 深度学习 pytorch 人工智能 1024程序员节
目录一、异常问题1.segementationfault(coredump)2.Illegalinstruction(coredumped)3.死锁4.掉卡二、通用方法1.查看重启记录2.系统性能监控3.后台执行命令4.异常日志三、深度学习技术1.普通网络改DDP训练，单机多卡，pytorch四、专业内容方法1.微调diffusion类模型本文记录一些在使用linux服务器进行深度学习时遇到的问题
【AI】AI大模型发展史：从理论探索到技术爆发不想当程序汪的第N天 AI 人工智能
一、早期探索阶段—理论与技术奠基1.1符号主义与连接主义的博弈20世纪50-70年代，符号主义AI主导研究方向，通过专家系统模拟人类逻辑推理，但受限于计算能力和数据规模。80年代连接主义AI兴起，以神经网络为核心，反向传播算法的提出为深度学习奠定基础。1.2神经网络初步实践1980年：卷积神经网络（CNN）雏形诞生1998年：LeNet-5模型成功应用于手写数字识别，成为首个商用深度学习模型关键局
OpenCV让Python实现人脸特征点检测 Python编程之道 Python编程之道 opencv python 人工智能 ai
OpenCV让Python实现人脸特征点检测关键词：OpenCV、Python、人脸检测、特征点定位、计算机视觉、Dlib、深度学习摘要：本文将深入探讨如何使用OpenCV和Python实现人脸特征点检测。我们将从基础概念开始，逐步介绍人脸检测和特征点定位的核心算法原理，包括传统的Haar级联检测器和基于深度学习的Dlib面部特征点检测器。文章将提供详细的代码实现和数学原理讲解，并通过实际项目案例
ChatGPT、DeepSeek等大语言模型助力高效办公、论文与项目撰写、数据分析、机器学习与深度学习建模等深度科研 Yolo566Q chatgpt 语言模型数据分析
随着人工智能技术的快速发展，大语言模型如ChatGPT和DeepSeek在科研领域的应用正在为科研人员提供强大的支持。这些模型通过深度学习和大规模语料库训练，能够帮助科研人员高效地筛选文献、生成论文内容、进行数据分析和优化机器学习模型。ChatGPT和DeepSeek能够快速理解和生成复杂的语言，帮助研究人员在撰写论文时提高效率，不仅生成高质量的文章内容，还能优化论文结构和语言表达。在数据分析方面
大语言模型助力高效办公、论文与项目撰写、数据分析、机器学习与深度学习建模等 xiao5kou4chang6kai4 人工智能深度学习机器学习 rnn 语言模型 lstm 深度学习机器学习人工智能 DeepSeek
随着人工智能技术的快速发展，大语言模型如ChatGPT和DeepSeek在科研领域的应用正在为科研人员提供强大的支持。这些模型通过深度学习和大规模语料库训练，能够帮助科研人员高效地筛选文献、生成论文内容、进行数据分析和优化机器学习模型。ChatGPT和DeepSeek能够快速理解和生成复杂的语言，帮助研究人员在撰写论文时提高效率，不仅生成高质量的文章内容，还能优化论文结构和语言表达。在数据分析方面
ChatGPT、DeepSeek等大语言模型助力高效办公、论文与项目撰写、数据分析、机器学习与深度学习建模 asyxchenchong888 chatgpt 语言模型机器学习
随着人工智能技术的快速发展，大语言模型如ChatGPT和DeepSeek在科研领域的应用正在为科研人员提供强大的支持。这些模型通过深度学习和大规模语料库训练，能够帮助科研人员高效地筛选文献、生成论文内容、进行数据分析和优化机器学习模型。ChatGPT和DeepSeek能够快速理解和生成复杂的语言，帮助研究人员在撰写论文时提高效率，不仅生成高质量的文章内容，还能优化论文结构和语言表达。在数据分析方面
ChatGPT、DeepSeek等大语言模型助力高效办公、论文与项目撰写、数据分析、机器学习与深度学习建模等科研应用科研的力量人工智能 ChatGPT chatgpt 语言模型数据分析
随着人工智能技术的快速发展，大语言模型如ChatGPT和DeepSeek在科研领域的应用正在为科研人员提供强大的支持。这些模型通过深度学习和大规模语料库训练，能够帮助科研人员高效地筛选文献、生成论文内容、进行数据分析和优化机器学习模型。ChatGPT和DeepSeek能够快速理解和生成复杂的语言，帮助研究人员在撰写论文时提高效率，不仅生成高质量的文章内容，还能优化论文结构和语言表达。在数据分析方面
2025年中总结 Just Jump 人生经历思考反思认知方法 2025年中总结
2025年中总结。一如往年惯例，总结近半年工作中的体悟和经验。一、把大而难的事拆解成小而具体的小目标。专注解决小目标，每周迭代交付，先完成再完善。1.1把大任务拆解成具体可执行的小目标2025年5月起我开始做大模型相关的技术调研、技术升级和开发工作。传统的机器学习、深度学习算法和大模型的算法在技术知识上还是有很大的差异的。想要快速转型使用大模型做开发、训练，是需要些时间和精力投入的，这并不是一个简
AI人工智能中LSTM在视频行为识别的应用
AI人工智能中LSTM在视频行为识别的应用关键词：LSTM、视频行为识别、深度学习、时序建模、计算机视觉、神经网络、动作识别摘要：本文将深入探讨LSTM（长短期记忆网络）在视频行为识别领域的应用。我们将从基础概念出发，逐步讲解LSTM如何解决视频时序建模的挑战，分析其核心算法原理，并通过实际代码示例展示LSTM在行为识别中的具体实现。文章还将探讨当前的应用场景、工具资源以及未来发展趋势，为读者提供
【自然语言处理-NLP】文本预处理技术云博士的AI课堂哈佛博后带你玩转机器学习深度学习自然语言处理人工智能 NLP 深度学习数据预处理 NLP数据预处理机器学习
以下内容将从基本概念到实用代码分步骤、分场景地详细介绍NLP常见文本预处理方法及其背后的思想。如果无法从外部导入数据，我们会模拟一份简易文本数据（如字符串列表），并在此基础上演示预处理代码及详细解释，确保在常规Python环境下可以运行。一、文本预处理的常见需求和作用在自然语言处理（NLP）任务（如机器学习、深度学习、大模型开发）中，原始文本数据通常会包含各种噪声，例如：多余的空格、换行符、特殊符
深度学习之基于Pytorch卷积神经网络人民币面值识别 Q1744828575 python pytorch plotly
欢迎大家点赞、收藏、关注、评论啦，由于篇幅有限，只展示了部分核心代码。文章目录一项目简介二、功能三、系统四.总结一项目简介一、项目背景在日常生活和商业活动中，人民币面值识别技术具有重要的应用价值。传统的面值识别方法，如基于模板匹配或特征工程的方法，在面对复杂多变的图像环境时，往往难以达到理想的识别效果。随着深度学习技术的兴起，特别是卷积神经网络（ConvolutionalNeuralNetwo
面经总结系列（十六）：元象科技大模型推理优化工程师 GoAI AI面经总结机器学习算法人工智能大模型机器学习深度学习
‍作者简介：CSDN、阿里云人工智能领域博客专家，新星计划计算机视觉导师，百度飞桨PPDE，专注大数据与AI知识分享。✨公众号：GoAI的学习小屋，免费分享书籍、简历、导图等，更有交流群分享宝藏资料，关注公众号回复“加群”或➡️点击链接加群。AI学习星球推荐：GoAI的学习社区知识星球是一个致力于提供《机器学习|深度学习|CV|NLP|大模型|多模态|AIGC》各个最新AI方向综述、论文等成体系的
《深入浅出多模态》(四)：多模态经典模型CLIP GoAI 深入浅出多模态多模态大模型 LLM 人工智能
AI学习星球推荐：GoAI的学习社区知识星球是一个致力于提供《机器学习|深度学习|CV|NLP|大模型|多模态|AIGC》各个最新AI方向综述、论文等成体系的学习资料，配有全面而有深度的专栏内容，包括不限于前沿论文解读、资料共享、行业最新动态以、实践教程、求职相关（简历撰写技巧、面经资料与心得）多方面综合学习平台，强烈推荐AI小白及AI爱好者学习，性价比非常高！加入星球➡️点击链接✨专栏介
深入浅出多模态》（十一）之多模态经典模型：Flamingo系列 GoAI 机器学习多模态大模型人工智能 LLM 机器学习
AI学习星球推荐：GoAI的学习社区知识星球是一个致力于提供《机器学习|深度学习|CV|NLP|大模型|多模态|AIGC》各个最新AI方向综述、论文等成体系的学习资料，配有全面而有深度的专栏内容，包括不限于前沿论文解读、资料共享、行业最新动态以、实践教程、求职相关（简历撰写技巧、面经资料与心得）多方面综合学习平台，强烈推荐AI小白及AI爱好者学习，性价比非常高！加入星球➡️点击链接✨专栏介绍：本作
深度学习目标检测之YOLOv3实战（二）训练自己的图像数据郎郎不会飞深度学习目标识别 python 深度学习
深度学习目标检测之YOLOv3实战（二）训练自己的图像数据数据集准备数据集预处理原demo修改数据集训练目标检测补充二零二零年的大年初一，给大家拜个年，祝大家鼠年吉祥，万事如意，趁着喜气，把Yolov3训练自己的数据过程，记录一下，共勉共进。同样，无人机搭载山狗拍摄的视频，目标检测的种类是模型tank和airplane，部分效果图镇贴：数据集准备首先需要将自己的数据集准备好，不同场景下的目标数据尽
MCP模型上下文协议：AI人工智能模型训练的自动化调参 AI天才研究院 AI人工智能与大数据人工智能自动化运维 ai
MCP模型上下文协议：AI人工智能模型训练的自动化调参关键词：MCP模型、自动化调参、AI训练、超参数优化、上下文协议、机器学习、深度学习摘要：本文深入探讨MCP模型上下文协议在AI模型训练自动化调参中的应用。MCP(ModelContextProtocol)是一种创新的自动化调参框架，通过上下文感知和动态参数调整机制，显著提升模型训练效率和性能。文章将从理论基础、算法实现、数学原理到实际应用进行
从零开始：Python实现语音识别的完整教程_副本 AIGC应用创新大全 AI大模型与大数据技术 AI人工智能与大数据应用开发 MCP&Agent 云算力网络 python 语音识别开发语言 ai
从零开始：Python实现语音识别的完整教程关键词：Python、语音识别、语音转文本、音频处理、机器学习、深度学习、自然语言处理摘要：本文将带你从零开始学习如何使用Python实现语音识别功能。我们将从基础概念讲起，逐步深入到实际代码实现，涵盖音频处理、特征提取、模型训练等关键环节，最终构建一个完整的语音识别系统。无论你是初学者还是有一定经验的开发者，都能从本教程中获得实用的知识和技能。背景介绍
信息抽取数据集全景分析：分类体系、技术演进与挑战_DEEPSEEK 致Great 分类数据挖掘人工智能
信息抽取数据集全景分析：分类体系、技术演进与挑战摘要信息抽取（IE）作为自然语言处理的核心任务，是构建知识图谱、支持智能问答等应用的基础。近年来，随着深度学习技术的发展和大规模预训练模型的兴起，IE数据集呈现爆发式增长，其分析与评估对模型研发和领域迁移至关重要。本文基于对158个主流IE数据集的系统性梳理，首次提出“信息提取与命名实体识别数据集分类体系”。该体系涵盖8大类别（命名实体识别、关系提取
使用Ultralytics YOLO进行数据增强 alpszero YOLO计算机视觉应用 YOLO 人工智能机器学习
概述数据增强是计算机视觉领域的一项重要技术，它通过对现有图像进行各种转换，人为地扩展训练数据集。在训练深度学习模型时，数据增强有助于提高模型的鲁棒性，减少过拟合，并增强对真实世界场景的泛化。在训练计算机视觉模型的过程中，数据增强具有多种重要作用：扩展数据集：通过创建现有图像的变体，可以有效增加训练数据集的规模，而无需收集新数据。提高泛化能力：模型学会在各种条件下识别物体，使其在实际应用中更加稳健。
AI人工智能领域知识图谱在深度学习中的应用拓展
AI人工智能领域知识图谱在深度学习中的应用拓展关键词：知识图谱、深度学习、神经网络、图嵌入、知识表示学习、推理机制、应用场景摘要：本文深入探讨了知识图谱与深度学习的融合应用，系统性地分析了知识图谱在深度学习中的关键技术路径和应用场景。文章首先介绍了知识图谱的基本概念和表示方法，然后详细阐述了知识图谱与深度学习结合的多种技术路线，包括图神经网络、知识嵌入和推理机制等。接着通过具体案例展示了知识图谱增
js动画html标签（持续更新中） 843977358 html js 动画 media opacity
1.jQuery 效果 - animate() 方法改变 "div" 元素的高度： $(".btn1").click(function(){ $("#box").animate({height:"300px
springMVC学习笔记 caoyong springMVC
1、搭建开发环境 a>、添加jar文件，在ioc所需jar包的基础上添加spring-web.jar,spring-webmvc.jar b>、在web.xml中配置前端控制器 <servlet> &nbs
POI中设置Excel单元格格式 107x poi style 列宽合并单元格自动换行
引用：http://apps.hi.baidu.com/share/detail/17249059 POI中可能会用到一些需要设置EXCEL单元格格式的操作小结：先获取工作薄对象: HSSFWorkbook wb = new HSSFWorkbook(); HSSFSheet sheet = wb.createSheet(); HSSFCellStyle setBorder = wb.
jquery 获取A href 触发js方法的this参数无效的情况一炮送你回车库 jquery
html如下： <td class=\"bord-r-n bord-l-n c-333\"> <a class=\"table-icon edit\" onclick=\"editTrValues(this);\">修改</a> </td>" j
md5 3213213333332132 MD5
import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; public class MDFive { public static void main(String[] args) { String md5Str = "cq
完全卸载干净Oracle11g sophia天雪 orale数据库卸载干净清理注册表
完全卸载干净Oracle11g A、存在OUI卸载工具的情况下：第一步：停用所有Oracle相关的已启动的服务；第二步：找到OUI卸载工具：在“开始”菜单中找到“oracle_OraDb11g_home”文件夹中 &
apache 的access.log 日志文件太大如何解决 darkranger apache
CustomLog logs/access.log common 此写法导致日志数据一致自增变大。直接注释上面的语法 #CustomLog logs/access.log common 增加： CustomLog "|bin/rotatelogs.exe -l logs/access-%Y-%m-d.log
Hadoop单机模式环境搭建关键步骤 aijuans 分布式
Hadoop环境需要sshd服务一直开启，故，在服务器上需要按照ssh服务，以Ubuntu Linux为例，按照ssh服务如下： sudo apt-get install ssh sudo apt-get install rsync 编辑HADOOP_HOME/conf/hadoop-env.sh文件，将JAVA_HOME设置为Java
PL/SQL DEVELOPER 使用的一些技巧 atongyeye java sql
1 记住密码这是个有争议的功能，因为记住密码会给带来数据安全的问题。但假如是开发用的库，密码甚至可以和用户名相同，每次输入密码实在没什么意义，可以考虑让PLSQL Developer记住密码。位置：Tools菜单－－Preferences－－Oracle－－Logon HIstory－－Store with password 2 特殊Copy 在SQL Window
PHP：在对象上动态添加一个新的方法 bardo 方法动态添加闭包
有关在一个对象上动态添加方法，如果你来自Ruby语言或您熟悉这门语言，你已经知道它是什么...... Ruby提供给你一种方式来获得一个instancied对象，并给这个对象添加一个额外的方法。好！不说Ruby了，让我们来谈谈PHP PHP未提供一个“标准的方式”做这样的事情，这也是没有核心的一部分... 但无论如何，它并没有说我们不能做这样
ThreadLocal与线程安全 bijian1013 java java多线程 threadLocal
首先来看一下线程安全问题产生的两个前提条件： 1.数据共享，多个线程访问同样的数据。 2.共享数据是可变的，多个线程对访问的共享数据作出了修改。实例：定义一个共享数据： public static int a = 0;
Tomcat 架包冲突解决征客丶 tomcat Web
环境： Tomcat 7.0.6 win7 x64 错误表象：【我的冲突的架包是：catalina.jar 与 tomcat-catalina-7.0.61.jar 冲突，不知道其他架包冲突时是不是也报这个错误】严重: End event threw exception java.lang.NoSuchMethodException: org.apache.catalina.dep
【Scala三】分析Spark源代码总结的Scala语法一 bit1129 scala
Scala语法 1. classOf运算符 Scala中的classOf[T]是一个class对象，等价于Java的T.class,比如classOf[TextInputFormat]等价于TextInputFormat.class 2. 方法默认值 defaultMinPartitions就是一个默认值，类似C++的方法默认值
java 线程池管理机制 BlueSkator java线程池管理机制
编辑 Add Tools jdk线程池一、引言第一：降低资源消耗。通过重复利用已创建的线程降低线程创建和销毁造成的消耗。第二：提高响应速度。当任务到达时，任务可以不需要等到线程创建就能立即执行。第三：提高线程的可管理性。线程是稀缺资源，如果无限制的创建，不仅会消耗系统资源，还会降低系统的稳定性，使用线程池可以进行统一的分配，调优和监控。
关于hql中使用本地sql函数的问题（问-答） BreakingBad HQL 存储函数
转自于：http://www.iteye.com/problems/23775 问：我在开发过程中，使用hql进行查询（mysql5）使用到了mysql自带的函数find_in_set()这个函数作为匹配字符串的来讲效率非常好，但是我直接把它写在hql语句里面（from ForumMemberInfo fm,ForumArea fa where find_in_set(fm.userId,f
读《研磨设计模式》-代码笔记-迭代器模式-Iterator bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.Arrays; import java.util.List; /** * Iterator模式提供一种方法顺序访问一个聚合对象中各个元素，而又不暴露该对象内部表示 * * 个人觉得，为了不暴露该
常用SQL chenjunt3 oracle sql C++c C#
--NC建库 CREATE TABLESPACE NNC_DATA01 DATAFILE 'E:\oracle\product\10.2.0\oradata\orcl\nnc_data01.dbf' SIZE 500M AUTOEXTEND ON NEXT 50M EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K ; CREATE TABLESPA
数学是科学技术的语言 comsci 工作活动领域模型
从小学到大学都在学习数学，从小学开始了解数字的概念和背诵九九表到大学学习复变函数和离散数学，看起来好像掌握了这些数学知识，但是在工作中却很少真正用到这些知识，为什么？最近在研究一种开源软件-CARROT2的源代码的时候，又一次感觉到数学在计算机技术中的不可动摇的基础作用，CARROT2是一种用于自动语言分类（聚类）的工具性软件，用JAVA语言编写，它
Linux系统手动安装rzsz 软件包 daizj linux sz rz
1、下载软件 rzsz-3.34.tar.gz。登录linux，用命令 wget http://freeware.sgi.com/source/rzsz/rzsz-3.48.tar.gz下载。 2、解压 tar zxvf rzsz-3.34.tar.gz 3、安装 cd rzsz-3.34 ; make posix 。注意：这个软件安装与常规的GNU软件不
读源码之:ArrayBlockingQueue dieslrae java
ArrayBlockingQueue是concurrent包提供的一个线程安全的队列,由一个数组来保存队列元素.通过 takeIndex和 putIndex来分别记录出队列和入队列的下标,以保证在出队列时不进行元素移动. //在出队列或者入队列的时候对takeIndex或者putIndex进行累加,如果已经到了数组末尾就又从0开始,保证数
C语言学习九枚举的定义和应用 dcj3sjt126com c
枚举的定义 # include <stdio.h> enum WeekDay { MonDay, TuesDay, WednesDay, ThursDay, FriDay, SaturDay, SunDay }; int main(void) { //int day; //day定义成int类型不合适 enum WeekDay day = Wedne
Vagrant 三种网络配置详解 dcj3sjt126com vagrant
Forwarded port Private network Public network Vagrant 中一共有三种网络配置，下面我们将会详解三种网络配置各自优缺点。端口映射(Forwarded port)，顾名思义是指把宿主计算机的端口映射到虚拟机的某一个端口上，访问宿主计算机端口时，请求实际是被转发到虚拟机上指定端口的。Vagrantfile中设定语法为： c
16.性能优化-完结 frank1234 性能优化
性能调优是一个宏大的工程，需要从宏观架构(比如拆分，冗余，读写分离，集群，缓存等)，软件设计（比如多线程并行化，选择合适的数据结构），数据库设计层面（合理的表设计，汇总表，索引，分区，拆分，冗余等）以及微观（软件的配置，SQL语句的编写，操作系统配置等）根据软件的应用场景做综合的考虑和权衡，并经验实际测试验证才能达到最优。性能水很深，笔者经验尚浅，赶脚也就了解了点皮毛而已，我觉得
Word Search hcx2013 search
Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or ve
Spring4新特性——Web开发的增强 jinnianshilongnian spring spring mvc spring4
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
CentOS安装配置tengine并设置开机启动 liuxingguome centos
yum install gcc-c++ yum install pcre pcre-devel yum install zlib zlib-devel yum install openssl openssl-devel Ubuntu上可以这样安装 sudo aptitude install libdmalloc-dev libcurl4-opens
第14章工具函数（上） onestopweb 函数
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Xelsius 2008 and SAP BW at a glance blueoxygen BO Xelsius
Xelsius提供了丰富多样的数据连接方式，其中为SAP BW专属提供的是BICS。那么Xelsius的各种连接的优缺点比较以及Xelsius是如何直接连接到BEx Query的呢？以下Wiki文章应该提供了全面的概览。 http://wiki.sdn.sap.com/wiki/display/BOBJ/Xcelsius+2008+and+SAP+NetWeaver+BW+Co
oracle表空间相关 tongsh6 oracle
在oracle数据库中，一个用户对应一个表空间，当表空间不足时，可以采用增加表空间的数据文件容量，也可以增加数据文件，方法有如下几种： 1.给表空间增加数据文件 ALTER TABLESPACE "表空间的名字" ADD DATAFILE '表空间的数据文件路径' SIZE 50M; &nb
.Net framework4.0安装失败 yangjuanjava .net windows
上午的.net framework 4.0，各种失败，查了好多答案，各种不靠谱，最后终于找到答案了和Windows Update有关系，给目录名重命名一下再次安装，即安装成功了！下载地址：http://www.microsoft.com/en-us/download/details.aspx?id=17113 方法： 1.运行cmd，输入net stop WuAuServ 2.点击开