等风来~~

DETR论文笔记

End-to-End Object Detection with Transformers

论文笔记及与相关代码阅读

文章目录

End-to-End Object Detection with Transformers
一、论文阅读
- 1 介绍
- 2 相关工作
- - 2.1 Set Prediction
  - 2.2 transformer与并行译码
  - 2.3 目标检测
- 3 DETR模型
- - 3.1 目标检测集预测损失
  - 3.2 DETR架构
二、源码阅读
- 1.a simplified version of DETR
- 2.DETR论文实现源码
- - 1.backbone.py
  - 2.position_encoding.py
  - 3.transformer.py
  - 4.matcher.py
  - 5.detr.py
  - 6.nested_tensor类
  - 7.main.py
总结

一、论文阅读

摘要：提出了一种将目标检测看作直接集预测问题的新方法。我们的方法简化了检测流程，有效地消除了对许多手工设计的组件的需要，例如非最大抑制nms或achor生成，它们明确地编码了我们关于任务的先验知识。新框架的主要组成部分，称为DETR，是一个基于集合的全局损失，通过二部匹配以及Transformers编码器-解码器架构强制唯一的预测。给定一个固定的小范围学习对象查询集，DETR根据对象与全局图像上下文的关系，直接并行输出最终的预测集。与许多其他检测算法不同，新模型在概念上很简单，不需要专门的库。在具有挑战性的COCO目标检测数据集上，DETR的准确度和运行时性能与成熟且高度优化的Faster RCNN基线相当。此外，DETR可以很容易地推广到产生一个统一的方式全景分割。我们发现，它显著优于竞争基线。代码

1 介绍

目标检测的目标是为每个感兴趣的目标预测一组边界框和类别标签。现代检测器通过在一大组建议[37,5]、锚[23]或窗口中心[53,46]上定义替代回归和分类问题，间接地解决了这一集合预测任务。它们的性能受到后处理步骤（瓦解接近重复的预测）、锚集的设计以及将目标框分配给锚的启发式算法的显著影响[52]。为了简化这些流程，我们提出了一种直接集预测方法来绕过代理任务。这种端到端的思想在复杂的结构化预测任务（如机器翻译或语音识别）方面取得了重大进展，但在目标检测方面还没有取得进展：以前的尝试[43,16,4,39]要么添加了其他形式的先验知识，要么在具有挑战性的基准上没有被证明与强大的基线具有竞争力。本文旨在填补这一空白。
我们将目标检测视为一个直接集预测问题来简化训练。我们采用了一种基于transformers的编解码器架构[47]，这是一种流行的序列预测架构。transformers的自注意机制可以显式地对序列中元素之间的所有成对交互进行建模，这使得这些体系结构特别适用于集预测的特定约束，例如消除重复预测。
我们的检测变换器（DETR，见图1）一次预测所有对象，并使用一个集损函数进行端到端训练，该函数在预测对象和地面真实对象之间执行二部匹配。DETR通过删除编码先验知识的多个手工设计的组件（如anchor或非最大抑制nms），简化了检测流程。与大多数现有的检测方法不同，DETR不需要任何定制的层，因此可以在任何包含标准CNN和transformer类的框架中轻松复制。
与以往大多数关于直接集预测的工作相比，DETR的主要特点是二部匹配损耗和transformer（非自回归）并行译码的结合[29,12,10,8]。相比之下，以前的工作主要集中在使用RNNs的自回归解码上[43,41,30,36,42]。我们的匹配损失函数唯一地将预测分配给地面真值对象，并且对预测对象的排列是不变的，因此我们可以并行地发射它们。
我们在一个最流行的目标检测数据集COCO[24]上评估了DETR，并与一个非常有竞争力的Faster R-CNN基线进行了比较[37]。Faster R-CNN经历了多次设计迭代，自最初发布以来，其性能得到了极大的提高。我们的实验表明，我们的新模型达到了相当的性能。更准确地说，DETR在大型对象上表现出明显更好的性能，这一结果很可能是由transformer的非局部计算实现的。但是，它在小对象上的性能较低。我们期望未来的工作能改善这一方面，像FPN[22]的发展Faster R-CNN一样。
DETR的设计理念很容易扩展到更复杂的任务。在我们的实验中，我们证明了一个简单的分割头训练在一个预先训练好的DETR上比全景分割的竞争基线更具竞争力[19]，这是一个具有挑战性的像素级识别任务，最近得到了广泛的应用。

该模型首先是经历一个CNN提取特征，然后得到的特征进入transformer, 最后将transformer输出的结果转化为class和box.输出固定个数的预测结果。最后将得到的检测结果和GroundTruth进行二部图匹配计算loss

2 相关工作

我们的工作建立在几个领域的先前工作之上：集合预测的二部匹配损失、基于transformer的编码器-解码器结构、并行解码和目标检测方法。

2.1 Set Prediction

没有一个规范的深度学习模型可以直接预测集合。基本的集合预测任务是多标签分类（参见[40,33]以获取计算机视觉背景下的参考文献），对于该分类，基线方法“一对一”不适用于诸如在元素之间存在底层结构（即，接近相同的框）的检测等问题。这些任务的第一个困难是避免近乎重复的任务。目前大多数检测器采用非最大值抑制等后处理方法来解决这一问题，而直接集预测是无需后处理的。它们需要全局推理方案来模拟所有预测元素之间的交互，以避免冗余。对于恒定大小集预测，密集的完全连接网络[9]是足够的，但成本高昂。一般的方法是使用自回归序列模型，如递归神经网络[48]。在所有情况下，损失函数都应通过预测的排列保持不变。通常的解决方案是基于匈牙利算法[20]设计一个损耗，在地面真值和预测之间找到一个二部匹配。这强制了置换不变性，并保证每个目标元素具有唯一的匹配。我们遵循二部匹配损失方法。然而，与大多数先前的工作相比，我们远离自回归模型，使用具有并行解码的transformer，我们将在下面描述。

2.2 transformer与并行译码

Vaswani等人[47]引入了Transformers，作为机器翻译的一个新的基于注意力的构建块。注意机制[2]是从整个输入序列聚合信息的神经网络层。Transformers引入了自我注意层，与非局部神经网络[49]类似，它扫描序列的每个元素，并通过聚集整个序列的信息来更新。基于注意的模型的一个主要优点是其全局计算能力和完美的记忆能力，这使得它比RNNs更适合于长序列。在自然语言处理、语音处理和计算机视觉的许多问题中，Transformers正在取代RNN[8,27,45,34,31]。
Transformers首先用于自回归模型，继早期的序列到序列模型[44]之后，一个接一个地生成输出标记。然而，在音频[29]、机器翻译[12,10]、词表示学习[8]和最近的语音识别[6]等领域，禁止性的推理成本（与输出长度成正比，且难以批量）导致了并行序列生成的发展。我们还结合Transformers和并行解码之间的适当权衡计算成本和能力，以执行集预测所需的全局计算。

2.3 目标检测

基于集合的损失。一些目标探测器[9,25,35]使用了二部匹配损失。然而，在这些早期的深度学习模型中，不同的预测之间的关系仅用卷积层或完全连接层来建模，手工设计的NMS后处理可以提高它们的性能。最近的检测器[37,23,53]在真值和预测以及NMS之间使用非唯一分配规则。
可学习的NMS方法[16,4]和关系网络[17]显式地建模了不同预测之间的关系。使用直接设置损耗，它们不需要任何后处理步骤。然而，这些方法采用了额外的手工构建的上下文特征，如建议框坐标，来有效地建立检测之间的关系模型，同时我们寻找减少模型中编码的先验知识的解决方案。
循环检测器。最接近我们的方法是目标检测的端到端集预测[43]和实例分割[41,30,36,42]。与我们类似，他们使用基于CNN激活的编码器-解码器结构的二部匹配损失来直接生成一组边界框。然而，这些方法只在小数据集上进行了评估，没有对照现代基线进行评估。特别是，它们基于自回归模型（更准确地说是RNN），因此它们不会利用最近的Transformers进行并行解码。

3 DETR模型

在检测中，直接集合预测有两个要素是必不可少的：（1）集合预测损失，强制预测和地面真值框之间的唯一匹配；（2）预测（在一次传递中）一组对象并对其关系建模的体系结构。我们在图2中详细描述了我们的体系结构。

3.1 目标检测集预测损失

DETR在通过解码器的一次过程中推断N个预测的固定大小集，其中N被设置为显著大于图像中对象的典型数目。训练的主要困难之一是根据地面真实情况对预测对象（类别、位置、大小）进行评分。我们的损失产生一个最佳的二部匹配之间的预测和地面真相的对象，然后优化特定对象（边界框）的损失。
首先将gt box也变成长度为N的序列以便和网络输出进行匹配，不够长度的用 ∅ 补充，然后对这个序列排列组合，找到和预测的N个序列损失最小的序列来优化。这样就可以得到一对一的关系，也不用后处理操作NMS。

3.2 DETR架构

它包含三个主要组件，我们将在下面描述：一个用于提取紧凑特征表示的CNN主干、一个编码器-解码器转换器和一个用于进行最终检测预测的简单前馈网络（FFN）。

backbone就是传统的CNN结构，用于提取图像的2D信息，网络一开始是使用Backbone（比如ResNet）提取一些feature，然后降维到d×HW，backbone：[3, H, W] 变为[2048, H/32, W/32]。prediction heads用于对decoder的输出进行分类。
Transformer encoder。首先，1x1卷积将高级激活图f的通道维数从C降低到更小的维数d。创建一个新的特征映射z0∈Rd×H×W。编码器期望一个序列作为输入，因此我们将z0的空间维数压缩为一维，得到一个d×HW特征映射。每个编码器层都有一个标准的体系结构，由一个多头自注意模块和一个前馈网络（FFN）组成。由于transformer结构是置换不变的，因此我们用固定的位置编码[31,3]来补充它，这些编码被添加到每个注意层的输入中。我们根据补充材料对该体系结构的详细定义，该定义遵循[47]中所述的定义。
Transformer decoder. 解码器遵循转换器的标准架构，使用多头自和编码器-解码器注意机制转换大小为d的N个嵌入。与原始转换器的不同之处在于，我们的模型在每个解码器层并行解码N个对象，而Vaswani等人[47]使用自回归模型一次预测一个元素的输出序列。我们建议不熟悉这些概念的读者参阅补充材料。由于解码器也是置换不变的，N个输入嵌入必须不同才能产生不同的结果。这些输入嵌入是学习的位置编码，我们称之为对象查询，类似于编码器，我们将它们添加到每个注意层的输入中。解码器将N个对象查询转化为一个嵌入输出。然后，它们被前馈网络（在下一小节中描述）独立地解码成方框坐标和类标签，从而得到N个最终预测。该模型利用自身和编解码器对这些嵌入的关注，利用对象之间的成对关系，对所有对象进行全局推理，同时能够使用整个图像作为上下文。

和NLP中的transformer不太一样，DETR中在decoder部分将Object queries一起作为输入，而非序列化地一个一个输入。
只用image features进行编码是不够的，需要融入位置信息，也就是上图中的Spatial positional encoding部分。
Prediction feed-forward networks (FFNs) 最终预测是由具有ReLU激活功能且具有隐藏层的3层感知器和线性层计算的。 FFN预测框的标准化中心坐标，高度和宽度，输入图像，然后线性层使用softmax函数预测类标签。由于我们预测了一组固定大小的N个边界框，其中N通常比图像中感兴趣的对象的实际数量大得多，因此使用了一个额外的特殊类标签∅来表示在未检测到任何对象。此类在标准对象检测方法中与“背景”类具有相似的作用。

二、源码阅读

pytorch版本的detr推理过程只有50行代码。给定固定的学习对象查询集，则DETR会考虑对象与全局图像上下文之间的关系，以直接并行并行输出最终的预测集。由于这种并行性质，DETR非常快速和高效。
python中的transformer

1.a simplified version of DETR

a demo of DETR (Detection Transformer), with slight differences with the baseline model in the paper.We show how to define the model, load pretrained weights and visualize bounding box and class predictions.基于预训练权重实现detr的简化版，并测试一张图像。

得益于transformer的代表性功能，DETR架构非常简单。有两个主要组成部分：
backbone:采用Resnet50
Transformer：使用默认的PyTorch nn.Transformer

模型代码如下：

'''
detr简单应用demo推理一张图像的目标检测
'''

from PIL import Image
import requests
import matplotlib.pyplot as plt
import torch
from torch import nn
from torchvision.models import resnet50
import torchvision.transforms as T
torch.set_grad_enabled(False)  # 前向推理，不需要梯度反传

# 简化的DETR实现
class DETRdemo(nn.Module):
    """
    Demo DETR implementation.

    Demo implementation of DETR in minimal number of lines, with the
    following differences wrt DETR in the paper:
    * learned positional encoding (instead of sine)
    * positional encoding is passed at input (instead of attention)
    * fc bbox predictor (instead of MLP)
    The model achieves ~40 AP on COCO val5k and runs at ~28 FPS on Tesla V100.
    Only batch size 1 supported.
    """

    def __init__(self, num_classes, hidden_dim=256, nheads=8, num_encoder_layers=6, num_decoder_layers=6):
        # 6个编码器/解码器,nheads为多头注意力的头数
        super().__init__()

        # create ResNet-50 backbone
        self.backbone = resnet50()
        del self.backbone.fc

        # create conversion layer （输入transformer之前降低特征通道数）
        self.conv = nn.Conv2d(2048, hidden_dim, 1)  # 把通道数从2048变成hidden_dim，卷积核大小为1

        # create a default PyTorch transformer
        self.transformer = nn.Transformer(
            hidden_dim, nheads, num_encoder_layers, num_decoder_layers)
        # PyTorch将Transformer相关的模型分为nn.TransformerEncoderLayer、nn.TransformerDecoderLayer、nn.LayerNorm等几个部分
        '''
        pytorch中有5类：
        1.torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, 
        dropout=0.1, activation='relu', custom_encoder=None, custom_decoder=None)
        transformer模型，基于论文 Attention Is All You Need
        d_model –编码器/解码器输入大小（默认 512）
        nhead –多头注意力模型的头数（默认为8）
        num_encoder_layers –编码器中子编码器层的数量（默认为6）
        num_decoder_layers –解码器中子解码器层的数量（默认为6）
        dim_feedforward –前馈网络模型的中间层维度（默认= 2048）
        activation–编码器/解码器中间层的激活函数，relu或gelu（默认值= relu）
        custom_encoder –自定义编码器（默认=None）
        custom_decoder –自定义解码器（默认=None）

        2.torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)
        TransformerEncoder是N个编码器层的堆叠
        encoder_layer – TransformerEncoderLayer（）的实例（必需）
        num_layers –编码器中的子编码器层数（必填）。

        3.torch.nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='relu')
        TransformerEncoderLayer 由self-attn和feedforward组成
        d_model – 编码器/解码器输入中预期词向量的大小.
        nhead – 多头注意力模型中的头数.
        dim_feedforward – 前馈网络模型的尺寸(默认值= 2048).
        activation – 编码器/解码器中间层，激活函数relu或gelu(默认=relu).
        '''

        # prediction heads, one extra class for predicting non-empty slots
        # note that in baseline DETR linear_bbox layer is 3-layer MLP
        # bbox含框的中心的坐标和宽度高度，即（x,y,w,h）四维
        self.linear_class = nn.Linear(hidden_dim, num_classes + 1)
        self.linear_bbox = nn.Linear(hidden_dim, 4)

        # output positional encodings (object queries)
        # 一张图片最多检测100个物体
        self.query_pos = nn.Parameter(torch.rand(100, hidden_dim))

        # spatial positional encodings
        # note that in baseline DETR we use sine positional encodings
        # 编码器的位置编码，对特征图行和列分别进行位置编码，后面会将两者拼接起来故维度为一半
        # 特征图尺寸不超过50*50  50个行，50个列最多
        self.row_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))  # 行
        self.col_embed = nn.Parameter(torch.rand(50, hidden_dim // 2))  # 列
        # nn.Parameter参数可学习的，这个tensor是Parameter类

    def forward(self, inputs):
        # propagate inputs through ResNet-50 up to avg-pool layer
        # inputs为[3,H,W]
        x = self.backbone.conv1(inputs)  # resnet50的conv1图像变成[64,H/2,W/2]  网络输入图像深度为3(正好是彩色图片的3个通道)，输出深度为64，滤波器为7*7，步长为2，填充3层，特征图的h,w缩小1/2
        x = self.backbone.bn1(x)
        x = self.backbone.relu(x)
        x = self.backbone.maxpool(x)   # 此时，特征图的尺寸已成为输入的1/4, [64,H/4,W/4]

        x = self.backbone.layer1(x)  # [256,H/4,W/4]
        x = self.backbone.layer2(x)  # [512,H/8,W/8]
        x = self.backbone.layer3(x)  # [1024,H/16,W/16]
        x = self.backbone.layer4(x)  # [2048,H/32,W/32]

        # convert from 2048 to 256 feature planes for the transformer
        h = self.conv(x)    # [hidden_dim,H/32,W/32]  论文中的[d,H0/32,W0/32]

        # construct positional encodings
        H, W = h.shape[-2:]  # img.shape[:2] 取彩色图片的长、宽；img.shape[:3] 则取彩色图片的长、宽、通道；img.shape[0]：图像的垂直尺寸（高度）;pytorch tensor matrix: (N, C, H, W)
        pos = torch.cat([
            self.col_embed[:W].unsqueeze(0).repeat(H, 1, 1),  # repeat(H, 1, 1)沿着特定的维度重复这个张量；unsqueeze(0)在0维上增加一维，[H,W,hidden_dim//2]
            self.row_embed[:H].unsqueeze(1).repeat(1, W, 1),  # [H,W,hidden_dim//2]
        ], dim=-1).flatten(0, 1).unsqueeze(1)  # （H×W，1，hidden_dim） 对应论文中d×HW的特征图
        # torch.cat是将两个张量（tensor）拼接在一起，torch.cat((A,B),0)就表示按维数0（行）拼接A和B，也就是竖着拼接，A上B下；torch.cat((A,B),1)就表示按维数1（列）拼接A和B，也就是横着拼接，A左B右
        # dim=-1 表示倒数第一维即hiddden_dim//2上
        # torch.flatten(input, start_dim, end_dim=) 把input从第start_dim起到end_dim止展平
        # 变成（H×W，1，hidden_dim）,1为batchsize

        # propagate through the transformer
        h = self.transformer(pos + 0.1 * h.flatten(2).permute(2, 0, 1),
                             self.query_pos.unsqueeze(1)).transpose(0, 1)
        # 现在query_pos大小为（100, 1，hidden_dim），得到的h大小为（batch, 100，hidden_dim）

        # finally project transformer outputs to class labels and bounding boxes
        return {
     'pred_logits': self.linear_class(h),
                'pred_boxes': self.linear_bbox(h).sigmoid()}

让我们用80个COCO输出类别+ 1个“无对象”类别构建模型，并加载预训练的权重。权重以半精度保存，以节省带宽而不损害模型精度。
加载初始化模型实例化一个模型：

detr = DETRdemo(num_classes=91)
state_dict = torch.hub.load_state_dict_from_url(
    url='https://dl.fbaipublicfiles.com/detr/detr_demo-da2a99e9.pth',
    map_location='cpu', check_hash=True)      # 加载预训练模型
detr.load_state_dict(state_dict)  # 模型加载到网络中
detr.eval()  # 测试模型

我们刚刚加载的预训练DETR模型已经在80个COCO类上进行了训练，类索引的范围是1到90（这就是为什么我们在模型构造中考虑了91个类）。在以下单元格中，我们定义从类索引到名称的映射。

# COCO classes
CLASSES = [
    'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack',
    'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A',
    'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
    'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
    'toothbrush'
]

# colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

DETR使用标准ImageNet归一化，并以[xcenter，ycenter，w，h]格式输出相对图像坐标中的框，其中[xcenter，ycenter]是边界框的预测中心，而w，h是边框的宽度和高度。由于坐标是相对于图像尺寸的并且位于[0,1]之间，因此出于可视化目的，我们将预测转换为绝对图像坐标和[x0，y0，x1，y1]格式。

# DETR使用标准ImageNet归一化，并以[xcenter，ycenter，w，h]格式输出相对图像坐标中的框，其中[xcenter，ycenter]是边界框的预测中心，
# 而w，h是边框的宽度和高度。 由于坐标是相对于图像尺寸的并且位于[0,1]之间，因此出于可视化目的，我们将预测转换为绝对图像坐标和[x0，y0，x1，y1]格式
# standard PyTorch mean-std input image normalization 标准PyTorch均值输入图像归一化
transform = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])  # 前面的[0.485, 0.456, 0.406]是RGB三个通道上的均值mean， 后面[0.229, 0.224, 0.225]是三个通道的标准差std
])  # Compose的主要作用是将多个变换组合在一起。


# for output bounding box post-processing
# 将bbox的中心x、中心y坐标，高度宽度转换为左上角右下角坐标
def box_cxcywh_to_xyxy(x):
    x_c, y_c, w, h = x.unbind(1)  # torch.unbind(input, dim=0) → seq，此方法就是将我们的input从dim进行切片，并返回切片的结果，返回的结果里面没有dim这个维度。
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         (x_c + 0.5 * w), (y_c + 0.5 * h)]
    return torch.stack(b, dim=1)  # shape为[batch,4]


def rescale_bboxes(out_bbox, size):
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)  # bbox归一化坐标转为基于图像尺寸的绝对坐标
    return b

封装推断过程，实现输入到输出预测结果：

# Let's put everything together in a detect function:封装整个推断过程获得预测结果
def detect(im, model, transform):
    # mean-std normalize the input image (batch-size: 1)
    img = transform(im).unsqueeze(0)

    # demo model only support by default images with aspect ratio between 0.5 and 2
    # if you want to use images with an aspect ratio outside this range
    # rescale your image so that the maximum size is at most 1333 for best results
    assert img.shape[-2] <= 1600 and img.shape[-1] <= 1600, 'demo model only supports images up to 1600 pixels on each side'

    # propagate through the model
    outputs = model(img)

    # keep only predictions with 0.7+ confidence
    probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
    keep = probas.max(-1).values > 0.7  # 只取置信度大于0.7的

    # convert boxes from [0; 1] to image scales
    bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)
    return probas[keep], bboxes_scaled

对一张图进行测试

# To try DETRdemo model on your own image just change the URL below.
# 对一张图进行测试
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
im = Image.open(requests.get(url, stream=True).raw)
scores, boxes = detect(im, detr, transform)

结果可视化

# Let's now visualize the model predictions
def plot_results(pil_img, prob, boxes):
    plt.figure(figsize=(16, 10))
    plt.imshow(pil_img)
    ax = plt.gca()
    for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), COLORS * 100):
        ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                   fill=False, color=c, linewidth=3))
        cl = p.argmax()
        text = f'{CLASSES[cl]}: {p[cl]:0.2f}'
        ax.text(xmin, ymin, text, fontsize=15,
                bbox=dict(facecolor='yellow', alpha=0.5))
    plt.axis('off')
    plt.show()


plot_results(im, scores, boxes)

2.DETR论文实现源码

1.backbone.py

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Backbone modules.
"""
from collections import OrderedDict

import torch
import torch.nn.functional as F
import torchvision
from torch import nn
from torchvision.models._utils import IntermediateLayerGetter
from typing import Dict, List

from util.misc import NestedTensor, is_main_process

from .position_encoding import build_position_encoding


class FrozenBatchNorm2d(torch.nn.Module):
    """
    BatchNorm2d where the batch statistics and the affine parameters are fixed.

    Copy-paste from torchvision.misc.ops with added eps before rqsrt,
    without which any other models than torchvision.models.resnet[18,34,50,101]
    produce nans.
    """

    def __init__(self, n):
        super(FrozenBatchNorm2d, self).__init__()
        self.register_buffer("weight", torch.ones(n))
        self.register_buffer("bias", torch.zeros(n))
        self.register_buffer("running_mean", torch.zeros(n))
        self.register_buffer("running_var", torch.ones(n))
        # 把这四个量注册到buffer，避免梯度反向传播更新他们。统计量（方差，均值），通常用于注册不应被视为模型参数的缓冲区，同时，模型保存和加载的时候可以写入和读出。

    def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
                              missing_keys, unexpected_keys, error_msgs):
        num_batches_tracked_key = prefix + 'num_batches_tracked'
        if num_batches_tracked_key in state_dict:
            del state_dict[num_batches_tracked_key]

        super(FrozenBatchNorm2d, self)._load_from_state_dict(
            state_dict, prefix, local_metadata, strict,
            missing_keys, unexpected_keys, error_msgs)

    def forward(self, x):
        # move reshapes to the beginning
        # to make it fuser-friendly
        w = self.weight.reshape(1, -1, 1, 1)
        b = self.bias.reshape(1, -1, 1, 1)
        rv = self.running_var.reshape(1, -1, 1, 1)
        rm = self.running_mean.reshape(1, -1, 1, 1)
        eps = 1e-5
        scale = w * (rv + eps).rsqrt()
        bias = b - rm * scale
        return x * scale + bias


class BackboneBase(nn.Module):

    def __init__(self, backbone: nn.Module, train_backbone: bool, num_channels: int, return_interm_layers: bool):
        super().__init__()
        for name, parameter in backbone.named_parameters():
            if not train_backbone or 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
                parameter.requires_grad_(False)
        if return_interm_layers:  # True的时候，记录每一层（Resnet的layer）的输出
            return_layers = {
     "layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
        else:
            return_layers = {
     'layer4': "0"}
        self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
        # IntermediateLayerGetter继承nn.ModuleDict,接受一个nn.Module和一个dict作为初始化参数，dict的key对应nn.Module的模块，value则是用户自定义的对应于各个模块输出的名字
        # IntermediateLayerGetter的输出是一个dict，对应return_layers里面每一层的输出
        self.num_channels = num_channels

    def forward(self, tensor_list: NestedTensor):
        xs = self.body(tensor_list.tensors)
        out: Dict[str, NestedTensor] = {
     }
        for name, x in xs.items():
            m = tensor_list.mask
            assert m is not None
            # 将mask插值到与输出特征图尺寸一致
            mask = F.interpolate(m[None].float(), size=x.shape[-2:]).to(torch.bool)[0]
            out[name] = NestedTensor(x, mask)  # 将图像张量和对应的mask封装到一起
        return out


class Backbone(BackboneBase):
    """ResNet backbone with frozen BatchNorm."""
    def __init__(self, name: str,  # backbone采用的模型名字如resnet50
                 train_backbone: bool,
                 return_interm_layers: bool,
                 dilation: bool):
        backbone = getattr(torchvision.models, name)(
            replace_stride_with_dilation=[False, False, dilation],
            pretrained=is_main_process(), norm_layer=FrozenBatchNorm2d)
        num_channels = 512 if name in ('resnet18', 'resnet34') else 2048
        super().__init__(backbone, train_backbone, num_channels, return_interm_layers)


class Joiner(nn.Sequential):  # 把backbone和position_embedding集合到一起
    def __init__(self, backbone, position_embedding):
        super().__init__(backbone, position_embedding)
        # self[0]是backbone，self[1]是position_embedding

    def forward(self, tensor_list: NestedTensor):

        xs = self[0](tensor_list)  # backbone的输出
        out: List[NestedTensor] = []
        pos = []
        for name, x in xs.items():
            out.append(x)
            # position encoding
            pos.append(self[1](x).to(x.tensors.dtype))

        return out, pos


def build_backbone(args):
    position_embedding = build_position_encoding(args)
    train_backbone = args.lr_backbone > 0
    return_interm_layers = args.masks
    backbone = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation)
    model = Joiner(backbone, position_embedding)
    model.num_channels = backbone.num_channels
    return model

2.position_encoding.py

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Various positional encodings for the transformer.
"""
import math
import torch
from torch import nn

from util.misc import NestedTensor


class PositionEmbeddingSine(nn.Module):
    """
    This is a more standard version of the position embedding, very similar to the one
    used by the Attention is all you need paper, generalized to work on images.
    """
    def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):
        super().__init__()
        self.num_pos_feats = num_pos_feats
        self.temperature = temperature
        self.normalize = normalize
        if scale is not None and normalize is False:
            raise ValueError("normalize should be True if scale is passed")
        # 角度范围在0-2π
        if scale is None:
            scale = 2 * math.pi
        self.scale = scale

    def forward(self, tensor_list: NestedTensor):
        x = tensor_list.tensors  # (b,c,h,w)
        mask = tensor_list.mask  # 指示了图像哪些位置是padding而来 (b,h,w)，# mask中以左上角为起始点，resize后的原图大小区域为false，其他区域为1
        assert mask is not None
        not_mask = ~mask  # 值为TRUE的部分非padding而来，图像真实有效的部分
        y_embed = not_mask.cumsum(1, dtype=torch.float32)  # 在第一维列方向叠加
        x_embed = not_mask.cumsum(2, dtype=torch.float32)  # 在第二维行方向叠加
        if self.normalize:
            eps = 1e-6
            y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
            x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale

        dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
        dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)

        pos_x = x_embed[:, :, :, None] / dim_t
        pos_y = y_embed[:, :, :, None] / dim_t
        pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)  # 对应论文中的位置编码公式
        pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)
        pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
        return pos


class PositionEmbeddingLearned(nn.Module):
    """
    Absolute pos embedding, learned.
    """
    def __init__(self, num_pos_feats=256):
        super().__init__()
        self.row_embed = nn.Embedding(50, num_pos_feats)
        self.col_embed = nn.Embedding(50, num_pos_feats)
        self.reset_parameters()

    def reset_parameters(self):
        nn.init.uniform_(self.row_embed.weight)
        nn.init.uniform_(self.col_embed.weight)

    def forward(self, tensor_list: NestedTensor):
        x = tensor_list.tensors
        h, w = x.shape[-2:]
        i = torch.arange(w, device=x.device)
        j = torch.arange(h, device=x.device)
        x_emb = self.col_embed(i)
        y_emb = self.row_embed(j)
        pos = torch.cat([
            x_emb.unsqueeze(0).repeat(h, 1, 1),
            y_emb.unsqueeze(1).repeat(1, w, 1),
        ], dim=-1).permute(2, 0, 1).unsqueeze(0).repeat(x.shape[0], 1, 1, 1)
        return pos


def build_position_encoding(args):
    N_steps = args.hidden_dim // 2
    if args.position_embedding in ('v2', 'sine'):
        # TODO find a better way of exposing other arguments
        position_embedding = PositionEmbeddingSine(N_steps, normalize=True)
    elif args.position_embedding in ('v3', 'learned'):
        position_embedding = PositionEmbeddingLearned(N_steps)
    else:
        raise ValueError(f"not supported {args.position_embedding}")

    return position_embedding

3.transformer.py

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
DETR Transformer class.

Copy-paste from torch.nn.Transformer with modifications: 从torch.nn.Transformer复制粘贴并进行以下修改：
    * positional encodings are passed in MHattention
    * extra LN at the end of encoder is removed
    * decoder returns a stack of activations from all decoding layers
"""
import copy
from typing import Optional, List

import torch
import torch.nn.functional as F
from torch import nn, Tensor


class Transformer(nn.Module):

    def __init__(self, d_model=512, nhead=8, num_encoder_layers=6,
                 num_decoder_layers=6, dim_feedforward=2048, dropout=0.1,
                 activation="relu", normalize_before=False,
                 return_intermediate_dec=False):
        super().__init__()

        # 单层encoder
        # 如果是后归一化方式，Encoder每层输出都会进行归一化，因此Encoder最后输出不需要额外归一化，这种情况encoder_norm设为None
        encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,
                                                dropout, activation, normalize_before)
        encoder_norm = nn.LayerNorm(d_model) if normalize_before else None
        # 6个单层encoder组成整个Encoder
        self.encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)

        # 单层decoder
        decoder_layer = TransformerDecoderLayer(d_model, nhead, dim_feedforward,
                                                dropout, activation, normalize_before)
        decoder_norm = nn.LayerNorm(d_model)
        # 6个单层encoder组成整个Decoder
        self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm,
                                          return_intermediate=return_intermediate_dec)

        self._reset_parameters()

        self.d_model = d_model
        self.nhead = nhead

    def _reset_parameters(self):
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)

    def forward(self, src, mask, query_embed, pos_embed):
        # flatten NxCxHxW to HWxNxC
        bs, c, h, w = src.shape  # backbone之后输入到transformer的特征图的batchsize,c,h,w
        src = src.flatten(2).permute(2, 0, 1)  # 变成[H*W,N,C]
        pos_embed = pos_embed.flatten(2).permute(2, 0, 1)  # pos_embed变成[H*W,N,C]
        query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)
        mask = mask.flatten(1)  # mask变成[N.H*W]

        tgt = torch.zeros_like(query_embed)  # query object
        memory = self.encoder(src, src_key_padding_mask=mask, pos=pos_embed)
        hs = self.decoder(tgt, memory, memory_key_padding_mask=mask,
                          pos=pos_embed, query_pos=query_embed)
        return hs.transpose(1, 2), memory.permute(1, 2, 0).view(bs, c, h, w)


class TransformerEncoder(nn.Module):

    def __init__(self, encoder_layer, num_layers, norm=None):
        super().__init__()
        self.layers = _get_clones(encoder_layer, num_layers)  # _get_clones对相同结构的模块进行复制，返回一个nn.ModuleList实例
        self.num_layers = num_layers
        self.norm = norm

    def forward(self, src,
                mask: Optional[Tensor] = None,
                src_key_padding_mask: Optional[Tensor] = None,
                pos: Optional[Tensor] = None):
        # src对应backbone输出的特征图并且维度变换到了hidden_dim,shape为[H*W,b,hidden_dim]
        # pos对应backbone最后一层输出的特征图的位置编码,shape为[H*W,b,c]
        # src_key_padding_mask对应backbone最后一层输出的特征图的mask,shape为[b,H*W]
        output = src

        for layer in self.layers:
            output = layer(output, src_mask=mask,
                           src_key_padding_mask=src_key_padding_mask, pos=pos)

        if self.norm is not None:
            output = self.norm(output)

        return output


class TransformerDecoder(nn.Module):

    def __init__(self, decoder_layer, num_layers, norm=None, return_intermediate=False):
        super().__init__()
        self.layers = _get_clones(decoder_layer, num_layers)
        self.num_layers = num_layers
        self.norm = norm
        # 是否需要记录中间层的结果
        self.return_intermediate = return_intermediate

    def forward(self, tgt, memory,
                tgt_mask: Optional[Tensor] = None,
                memory_mask: Optional[Tensor] = None,
                tgt_key_padding_mask: Optional[Tensor] = None,
                memory_key_padding_mask: Optional[Tensor] = None,
                pos: Optional[Tensor] = None,
                query_pos: Optional[Tensor] = None):
        # tgt是query embedding,shape是（num_queries,b,hidden_dim）,num_queries是图像中有多少目标默认100
        # query_pos对应tgt的位置编码
        # memory是encoder的输出，shape是（h*w,b,hidden_dim）
        # memory_key_padding_mask对应encoder的src_key_padding_mask
        # pos是输入到encoder的位置编码，即memory的编码
        output = tgt

        intermediate = []

        for layer in self.layers:
            output = layer(output, memory, tgt_mask=tgt_mask,
                           memory_mask=memory_mask,
                           tgt_key_padding_mask=tgt_key_padding_mask,
                           memory_key_padding_mask=memory_key_padding_mask,
                           pos=pos, query_pos=query_pos)
            if self.return_intermediate:
                intermediate.append(self.norm(output))

        if self.norm is not None:
            output = self.norm(output)
            if self.return_intermediate:
                intermediate.pop()
                intermediate.append(output)

        if self.return_intermediate:
            return torch.stack(intermediate)

        return output.unsqueeze(0)


class TransformerEncoderLayer(nn.Module):
    # 由多头注意力机制、前向反馈FFN构成，包含归一化层、激活层、Dropout层、残差连接层
    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,
                 activation="relu", normalize_before=False):
        super().__init__()

        # 多头自注意力层
        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)  # d_model每一个单词本来的词向量长度

        # Implementation of Feedforward model
        # 前向反馈层FFN
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(dim_feedforward, d_model)

        # 分别用于多头自注意力层和前向反馈层
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)

        self.activation = _get_activation_fn(activation)
        # _get_activation_fn根据输入参数指定的激活方式返回激活层

        # 是否在输入多头自注意力层/前向反馈层前归一化
        self.normalize_before = normalize_before

    def with_pos_embed(self, tensor, pos: Optional[Tensor]):
        return tensor if pos is None else tensor + pos

    def forward_post(self,
                     src,
                     src_mask: Optional[Tensor] = None,
                     src_key_padding_mask: Optional[Tensor] = None,
                     pos: Optional[Tensor] = None):
        # 后进行归一化
        q = k = self.with_pos_embed(src, pos)
        src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,
                              key_padding_mask=src_key_padding_mask)[0]
        src = src + self.dropout1(src2)
        src = self.norm1(src)
        src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))
        src = src + self.dropout2(src2)
        src = self.norm2(src)
        return src

    def forward_pre(self, src,
                    src_mask: Optional[Tensor] = None,  # 防作弊，遮住当前预测位置之后的位置，忽略这些位置不计算相关的自注意力权重
                    src_key_padding_mask: Optional[Tensor] = None,  # backbone最后一层输出特征图对应的mask，True代表原始图像padding的部分，生成注意力的时候被填充-inf
                    pos: Optional[Tensor] = None):
        src2 = self.norm1(src)  # 归一化
        q = k = self.with_pos_embed(src2, pos)  # 位置编码

        # 多头自注意力层
        src2 = self.self_attn(q, k, value=src2, attn_mask=src_mask,
                              key_padding_mask=src_key_padding_mask)[0]  # 索引[0]为自注意力层的输出，[1]为自注意力权重

        src = src + self.dropout1(src2)
        src2 = self.norm2(src)

        # FCN
        src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))
        src = src + self.dropout2(src2)
        return src

    def forward(self, src,
                src_mask: Optional[Tensor] = None,
                src_key_padding_mask: Optional[Tensor] = None,
                pos: Optional[Tensor] = None):
        if self.normalize_before:  # 在输入多头自注意力层和前向反馈层前进行归一化
            return self.forward_pre(src, src_mask, src_key_padding_mask, pos)
        return self.forward_post(src, src_mask, src_key_padding_mask, pos)  # 在输入多头自注意力层和前向反馈层之后再进行归一化


class TransformerDecoderLayer(nn.Module):

    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,
                 activation="relu", normalize_before=False):
        super().__init__()
        # 多头自注意力层
        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
        # Encoder-Decoder层
        self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
        # Implementation of Feedforward model
        # FFN
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(dim_feedforward, d_model)

        # 分别用于以上3层
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.norm3 = nn.LayerNorm(d_model)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)
        self.dropout3 = nn.Dropout(dropout)

        self.activation = _get_activation_fn(activation)
        self.normalize_before = normalize_before

    def with_pos_embed(self, tensor, pos: Optional[Tensor]):
        return tensor if pos is None else tensor + pos

    def forward_post(self, tgt, memory,
                     tgt_mask: Optional[Tensor] = None,
                     memory_mask: Optional[Tensor] = None,
                     tgt_key_padding_mask: Optional[Tensor] = None,
                     memory_key_padding_mask: Optional[Tensor] = None,
                     pos: Optional[Tensor] = None,
                     query_pos: Optional[Tensor] = None):
        # 位置嵌入
        q = k = self.with_pos_embed(tgt, query_pos)
        # 多头自注意力层，输入与encoder无关
        tgt2 = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask,
                              key_padding_mask=tgt_key_padding_mask)[0]
        tgt = tgt + self.dropout1(tgt2)
        tgt = self.norm1(tgt)

        # Encoder-Decoder层，key和value来自encoder，query来自上一层
        tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt, query_pos),
                                   key=self.with_pos_embed(memory, pos),
                                   value=memory, attn_mask=memory_mask,
                                   key_padding_mask=memory_key_padding_mask)[0]
        tgt = tgt + self.dropout2(tgt2)
        tgt = self.norm2(tgt)

        # FFN
        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
        tgt = tgt + self.dropout3(tgt2)
        tgt = self.norm3(tgt)
        return tgt

    def forward_pre(self, tgt, memory,
                    tgt_mask: Optional[Tensor] = None,
                    memory_mask: Optional[Tensor] = None,
                    tgt_key_padding_mask: Optional[Tensor] = None,
                    memory_key_padding_mask: Optional[Tensor] = None,
                    pos: Optional[Tensor] = None,
                    query_pos: Optional[Tensor] = None):
        # 位置嵌入
        tgt2 = self.norm1(tgt)
        q = k = self.with_pos_embed(tgt2, query_pos)

        # 多头自注意力层，输入与encoder无关
        tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,
                              key_padding_mask=tgt_key_padding_mask)[0]
        tgt = tgt + self.dropout1(tgt2)

        # Encoder-Decoder层，key和value来自encoder，query来自上一层
        tgt2 = self.norm2(tgt)
        tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),
                                   key=self.with_pos_embed(memory, pos),
                                   value=memory, attn_mask=memory_mask,
                                   key_padding_mask=memory_key_padding_mask)[0]
        tgt = tgt + self.dropout2(tgt2)

        # FFN
        tgt2 = self.norm3(tgt)
        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
        tgt = tgt + self.dropout3(tgt2)
        return tgt

    def forward(self, tgt, memory,
                tgt_mask: Optional[Tensor] = None,
                memory_mask: Optional[Tensor] = None,
                tgt_key_padding_mask: Optional[Tensor] = None,
                memory_key_padding_mask: Optional[Tensor] = None,
                pos: Optional[Tensor] = None,
                query_pos: Optional[Tensor] = None):
        if self.normalize_before:
            return self.forward_pre(tgt, memory, tgt_mask, memory_mask,
                                    tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)
        return self.forward_post(tgt, memory, tgt_mask, memory_mask,
                                 tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)


def _get_clones(module, N):
    return nn.ModuleList([copy.deepcopy(module) for i in range(N)])


def build_transformer(args):
    return Transformer(
        d_model=args.hidden_dim,
        dropout=args.dropout,
        nhead=args.nheads,
        dim_feedforward=args.dim_feedforward,
        num_encoder_layers=args.enc_layers,
        num_decoder_layers=args.dec_layers,
        normalize_before=args.pre_norm,
        return_intermediate_dec=True,
    )


def _get_activation_fn(activation):
    """Return an activation function given a string"""
    if activation == "relu":
        return F.relu
    if activation == "gelu":
        return F.gelu
    if activation == "glu":
        return F.glu
    raise RuntimeError(F"activation should be relu/gelu, not {activation}.")

4.matcher.py

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Modules to compute the matching cost and solve the corresponding LSAP.
将预测结果和GT进行匹配，匹配采用匈牙利算法，参见https://zhuanlan.zhihu.com/p/96229700
"""
import torch
from scipy.optimize import linear_sum_assignment
from torch import nn

from util.box_ops import box_cxcywh_to_xyxy, generalized_box_iou


class HungarianMatcher(nn.Module):
    """This class computes an assignment between the targets and the predictions of the network

    For efficiency reasons, the targets don't include the no_object. Because of this, in general,
    there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
    while the others are un-matched (and thus treated as non-objects).
    """

    def __init__(self, cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1):
        """Creates the matcher

        Params:
            cost_class: This is the relative weight of the classification error in the matching cost
            cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost
            cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost
        """
        super().__init__()
        self.cost_class = cost_class  # 分类损失的权重
        self.cost_bbox = cost_bbox  # bbox损失的权重
        self.cost_giou = cost_giou
        assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0"

    @torch.no_grad()
    def forward(self, outputs, targets):
        """ Performs the matching

        Params:
            outputs: This is a dict that contains at least these entries:
                 "pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits  预测的分类
                 "pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates  预测的bbox的x,y,h,w

            targets: GT
                This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
                    "labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
                            objects in the target) containing the class labels
                    "boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates

        Returns:
            A list of size batch_size, containing tuples of (index_i, index_j) where:
                - index_i is the indices of the selected predictions (in order)
                - index_j is the indices of the corresponding selected targets (in order)
            For each batch element, it holds:
                len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
        """
        bs, num_queries = outputs["pred_logits"].shape[:2]

        # We flatten to compute the cost matrices in a batch
        out_prob = outputs["pred_logits"].flatten(0, 1).softmax(-1)  # [batch_size * num_queries, num_classes]
        out_bbox = outputs["pred_boxes"].flatten(0, 1)  # [batch_size * num_queries, 4]

        # Also concat the target labels and boxes
        tgt_ids = torch.cat([v["labels"] for v in targets])
        tgt_bbox = torch.cat([v["boxes"] for v in targets])

        # Compute the classification cost. Contrary to the loss, we don't use the NLL,
        # but approximate it in 1 - proba[target class].
        # The 1 is a constant that doesn't change the matching, it can be ommitted.
        cost_class = -out_prob[:, tgt_ids]

        # Compute the L1 cost between boxes
        cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)

        # Compute the giou cost betwen boxes
        cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))

        # Final cost matrix
        C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou
        C = C.view(bs, num_queries, -1).cpu()

        sizes = [len(v["boxes"]) for v in targets]
        indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]  # linear_sum_assignment匹配方法
        return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]


def build_matcher(args):
    return HungarianMatcher(cost_class=args.set_cost_class, cost_bbox=args.set_cost_bbox, cost_giou=args.set_cost_giou)

5.detr.py

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
DETR model and criterion classes.
"""
import torch
import torch.nn.functional as F
from torch import nn

from util import box_ops
from util.misc import (NestedTensor, nested_tensor_from_tensor_list,
                       accuracy, get_world_size, interpolate,
                       is_dist_avail_and_initialized)

from .backbone import build_backbone
from .matcher import build_matcher
from .segmentation import (DETRsegm, PostProcessPanoptic, PostProcessSegm,
                           dice_loss, sigmoid_focal_loss)
from .transformer import build_transformer


class DETR(nn.Module):
    """ This is the DETR module that performs object detection """
    def __init__(self, backbone, transformer, num_classes, num_queries, aux_loss=False):
        """ Initializes the model.
        Parameters:
            backbone: torch module of the backbone to be used. See backbone.py
            transformer: torch module of the transformer architecture. See transformer.py
            num_classes: number of object classes  类别数，不包括背景1
            num_queries: number of object queries, ie detection slot. This is the maximal number of objects  最大类别数
                         DETR can detect in a single image. For COCO, we recommend 100 queries.
            aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.是否对Decoder中每一层计算loss
        """
        super().__init__()
        self.num_queries = num_queries
        self.transformer = transformer
        hidden_dim = transformer.d_model
        self.class_embed = nn.Linear(hidden_dim, num_classes + 1)  # 生成分类的预测结果，1是背景
        # 生成回归的预测结果。MLP多层感知机由多层nn.Linear()组成，中间层维度映射到hidden_dim，第3层映射到4，一共3层
        self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)
        self.query_embed = nn.Embedding(num_queries, hidden_dim)  # Transformer中初始化query并对其编码生成嵌入
        self.input_proj = nn.Conv2d(backbone.num_channels, hidden_dim, kernel_size=1)  # CNN提取的特征维度映射到Transformer隐藏层的维度，转化为序列
        self.backbone = backbone
        self.aux_loss = aux_loss

    def forward(self, samples: NestedTensor):
        """ The forward expects a NestedTensor, which consists of:将图像张量和对应的mask封装到一起
               - samples.tensor: batched images, of shape [batch_size x 3 x H x W]
               - samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels
               每个batch大小填充到一样大，填充的位置mask为1，后面不算这些位置的自注意力

            It returns a dict with the following elements:
               - "pred_logits": the classification logits (including no-object) for all queries.分类结果
                                Shape= [batch_size x num_queries x (num_classes + 1)]
               - "pred_boxes": The normalized boxes coordinates for all queries, represented as
                               (center_x, center_y, height, width). These values are normalized in [0, 1],
                               relative to the size of each individual image (disregarding possible padding).
                               See PostProcess for information on how to retrieve the unnormalized bounding box.
               - "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
                                dictionnaries containing the two above keys for each decoder layer.
        """
        # 将样本转为NestedTensor。isinstance() 函数来判断一个对象是否是一个已知的类型
        if isinstance(samples, (list, torch.Tensor)):
            samples = nested_tensor_from_tensor_list(samples)

        # 输入到CNN backbone提取特征   pos为位置编码
        features, pos = self.backbone(samples)

        # 取最后一层特征和mask
        src, mask = features[-1].decompose()
        assert mask is not None
        # Transformer输出的元组，分别为Decoder和Encoder的输出，这里取第一个为Decoder的输出
        hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]

        outputs_class = self.class_embed(hs)  # 分类
        outputs_coord = self.bbox_embed(hs).sigmoid()  # 回归
        # hs包含了Decoder中每一层的输出，-1代表最后一层
        out = {
     'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}
        if self.aux_loss:
            out['aux_outputs'] = self._set_aux_loss(outputs_class, outputs_coord)
        return out

    @torch.jit.unused
    def _set_aux_loss(self, outputs_class, outputs_coord):
        # this is a workaround to make torchscript happy, as torchscript
        # doesn't support dictionary with non-homogeneous values, such
        # as a dict having both a Tensor and a list.
        return [{
     'pred_logits': a, 'pred_boxes': b}
                for a, b in zip(outputs_class[:-1], outputs_coord[:-1])]


class SetCriterion(nn.Module):
    """ This class computes the loss for DETR.计算损失
    The process happens in two steps:
        1) we compute hungarian assignment between ground truth boxes and the outputs of the model匈牙利匹配
        2) we supervise each pair of matched ground-truth / prediction (supervise class and box)
    """
    def __init__(self, num_classes, matcher, weight_dict, eos_coef, losses):
        """ Create the criterion.
        Parameters:
            num_classes: number of object categories, omitting the special no-object category省略了无对象空类
            matcher: module able to compute a matching between targets and proposals
            weight_dict: dict containing as key the names of the losses and as values their relative weight.dict包含损失的名称和损失的相对权重作为值。即各种损失的权重
            eos_coef: relative classification weight applied to the no-object category 相对分类权重应用于无对象类别
            losses: list of all the losses to be applied. See get_loss for list of available losses.
        """
        super().__init__()
        self.num_classes = num_classes
        self.matcher = matcher
        self.weight_dict = weight_dict  # 各种损失的权重
        self.eos_coef = eos_coef  # 针对背景分类的损失权重
        self.losses = losses
        empty_weight = torch.ones(self.num_classes + 1)
        empty_weight[-1] = self.eos_coef
        self.register_buffer('empty_weight', empty_weight)  # 注册到buffer,被state_dict记录但不被梯度传播更新

    def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
        """Classification loss (NLL)
        targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
        """
        assert 'pred_logits' in outputs
        # (b,num_queries = 100,num_classes+1)
        src_logits = outputs['pred_logits']

        # indices是匹配的预测query索引与GT索引
        # _get_src_permutation_idx返回一个tuple,代表所有匹配的预测结果的batch index(当前batch中属于第几张)和query index（图像中的第几个query对象）
        idx = self._get_src_permutation_idx(indices)

        # 所有匹配的GT
        target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
        # 初始化为背景
        target_classes = torch.full(src_logits.shape[:2], self.num_classes,
                                    dtype=torch.int64, device=src_logits.device)
        # 匹配的预测索引对应的值置为匹配的GT
        target_classes[idx] = target_classes_o

        loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
        losses = {
     'loss_ce': loss_ce}

        if log:
            # TODO this should probably be a separate loss, not hacked in this one here
            losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
        return losses

    @torch.no_grad()
    def loss_cardinality(self, outputs, targets, indices, num_boxes):
        """ Compute the cardinality error, ie the absolute error in the number of predicted non-empty boxes
        This is not really a loss, it is intended for logging purposes only. It doesn't propagate gradients
        """
        pred_logits = outputs['pred_logits']
        device = pred_logits.device
        tgt_lengths = torch.as_tensor([len(v["labels"]) for v in targets], device=device)
        # Count the number of predictions that are NOT "no-object" (which is the last class)
        card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
        card_err = F.l1_loss(card_pred.float(), tgt_lengths.float())
        losses = {
     'cardinality_error': card_err}
        return losses

    def loss_boxes(self, outputs, targets, indices, num_boxes):
        """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
           targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
           The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.
           回归loss
        """
        assert 'pred_boxes' in outputs
        idx = self._get_src_permutation_idx(indices)
        src_boxes = outputs['pred_boxes'][idx]
        target_boxes = torch.cat([t['boxes'][i] for t, (_, i) in zip(targets, indices)], dim=0)

        loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')

        losses = {
     }
        losses['loss_bbox'] = loss_bbox.sum() / num_boxes

        loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(
            box_ops.box_cxcywh_to_xyxy(src_boxes),
            box_ops.box_cxcywh_to_xyxy(target_boxes)))
        losses['loss_giou'] = loss_giou.sum() / num_boxes
        return losses

    def loss_masks(self, outputs, targets, indices, num_boxes):
        """Compute the losses related to the masks: the focal loss and the dice loss.
           targets dicts must contain the key "masks" containing a tensor of dim [nb_target_boxes, h, w]
        """
        assert "pred_masks" in outputs

        src_idx = self._get_src_permutation_idx(indices)
        tgt_idx = self._get_tgt_permutation_idx(indices)
        src_masks = outputs["pred_masks"]
        src_masks = src_masks[src_idx]
        masks = [t["masks"] for t in targets]
        # TODO use valid to mask invalid areas due to padding in loss
        target_masks, valid = nested_tensor_from_tensor_list(masks).decompose()
        target_masks = target_masks.to(src_masks)
        target_masks = target_masks[tgt_idx]

        # upsample predictions to the target size
        src_masks = interpolate(src_masks[:, None], size=target_masks.shape[-2:],
                                mode="bilinear", align_corners=False)
        src_masks = src_masks[:, 0].flatten(1)

        target_masks = target_masks.flatten(1)
        target_masks = target_masks.view(src_masks.shape)
        losses = {
     
            "loss_mask": sigmoid_focal_loss(src_masks, target_masks, num_boxes),
            "loss_dice": dice_loss(src_masks, target_masks, num_boxes),
        }
        return losses

    def _get_src_permutation_idx(self, indices):
        # permute predictions following indices
        batch_idx = torch.cat([torch.full_like(src, i) for i, (src, _) in enumerate(indices)])
        src_idx = torch.cat([src for (src, _) in indices])
        return batch_idx, src_idx

    def _get_tgt_permutation_idx(self, indices):
        # permute targets following indices
        batch_idx = torch.cat([torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)])
        tgt_idx = torch.cat([tgt for (_, tgt) in indices])
        return batch_idx, tgt_idx

    def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs):
        loss_map = {
     
            'labels': self.loss_labels,
            'cardinality': self.loss_cardinality,
            'boxes': self.loss_boxes,
            'masks': self.loss_masks
        }
        assert loss in loss_map, f'do you really want to compute {loss} loss?'
        return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs)

    def forward(self, outputs, targets):
        """ This performs the loss computation.
        Parameters:
             outputs: dict of tensors, see the output specification of the model for the format
             targets: list of dicts, such that len(targets) == batch_size.
                      The expected keys in each dict depends on the losses applied, see each loss' doc
        """
        # outputs是DETR模型的输出，是一个dict，形式如下：
        # {'pred_logits':(b,num_queries=100,num_classes),
        # 'pred_boxes':(b,num_queries=100,4),
        # 'aux_outputs':[]}
        outputs_without_aux = {
     k: v for k, v in outputs.items() if k != 'aux_outputs'}  # 过滤掉中间层的输出只保留最后一层的预测结果

        # Retrieve the matching between the outputs of the last layer and the targets
        # 模型输出的预测结果与GT进行匹配
        indices = self.matcher(outputs_without_aux, targets)
        # 包含多个元组的list，长度与batchsize相等。每个元组(index_i,index_j)前者为预测索引，后者为GT索引

        # Compute the average number of target boxes accross all nodes, for normalization purposes
        num_boxes = sum(len(t["labels"]) for t in targets)
        num_boxes = torch.as_tensor([num_boxes], dtype=torch.float, device=next(iter(outputs.values())).device)
        if is_dist_avail_and_initialized():
            torch.distributed.all_reduce(num_boxes)
        num_boxes = torch.clamp(num_boxes / get_world_size(), min=1).item()

        # Compute all the requested losses
        losses = {
     }
        for loss in self.losses:
            losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes))

        # In case of auxiliary losses, we repeat this process with the output of each intermediate layer.
        if 'aux_outputs' in outputs:
            for i, aux_outputs in enumerate(outputs['aux_outputs']):
                indices = self.matcher(aux_outputs, targets)
                for loss in self.losses:
                    if loss == 'masks':
                        # Intermediate masks losses are too costly to compute, we ignore them.
                        continue
                    kwargs = {
     }
                    if loss == 'labels':
                        # Logging is enabled only for the last layer
                        kwargs = {
     'log': False}
                    l_dict = self.get_loss(loss, aux_outputs, targets, indices, num_boxes, **kwargs)
                    l_dict = {
     k + f'_{i}': v for k, v in l_dict.items()}
                    losses.update(l_dict)

        return losses


class PostProcess(nn.Module):
    """ This module converts the model's output into the format expected by the coco api"""
    # 此模块将模型的输出转换为coco api期望的格式
    @torch.no_grad()
    def forward(self, outputs, target_sizes):
        """ Perform the computation
        Parameters:
            outputs: raw outputs of the model
            target_sizes: tensor of dimension [batch_size x 2] containing the size of each images of the batch
                          For evaluation, this must be the original image size (before any data augmentation)
                          For visualization, this should be the image size after data augment, but before padding
        """
        out_logits, out_bbox = outputs['pred_logits'], outputs['pred_boxes']

        assert len(out_logits) == len(target_sizes)
        assert target_sizes.shape[1] == 2

        # (b,num_queries = 100,num_classes +1)
        prob = F.softmax(out_logits, -1)

        # (b, num_queries = 100)，(b, num_queries = 100)  # coco api 测评不包括背景类所以prob[..., :-1]是到-1
        scores, labels = prob[..., :-1].max(-1)

        # convert to [x0, y0, x1, y1] format
        boxes = box_ops.box_cxcywh_to_xyxy(out_bbox)
        # and from relative [0, 1] to absolute [0, height] coordinates
        img_h, img_w = target_sizes.unbind(1)
        scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1)
        # (b,num_queries = 100,4)*(b,1,4)
        boxes = boxes * scale_fct[:, None, :]

        results = [{
     'scores': s, 'labels': l, 'boxes': b} for s, l, b in zip(scores, labels, boxes)]

        return results


class MLP(nn.Module):
    """ Very simple multi-layer perceptron (also called FFN)"""

    def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
        super().__init__()
        self.num_layers = num_layers
        h = [hidden_dim] * (num_layers - 1)
        self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim]))

    def forward(self, x):
        for i, layer in enumerate(self.layers):
            # 最后一层不使用relu
            x = F.relu(layer(x)) if i < self.num_layers - 1 else layer(x)
        return x


def build(args):
    # the `num_classes` naming here is somewhat misleading.
    # it indeed corresponds to `max_obj_id + 1`, where max_obj_id
    # is the maximum id for a class in your dataset. For example,
    # COCO has a max_obj_id of 90, so we pass `num_classes` to be 91.
    # As another example, for a dataset that has a single class with id 1,
    # you should pass `num_classes` to be 2 (max_obj_id + 1).
    # For more details on this, check the following discussion
    # https://github.com/facebookresearch/detr/issues/108#issuecomment-650269223
    num_classes = 20 if args.dataset_file != 'coco' else 91
    if args.dataset_file == "coco_panoptic":
        # for panoptic全景, we just add a num_classes that is large enough to hold
        # max_obj_id + 1, but the exact value doesn't really matter
        num_classes = 250
    device = torch.device(args.device)

    backbone = build_backbone(args)

    transformer = build_transformer(args)

    model = DETR(
        backbone,
        transformer,
        num_classes=num_classes,
        num_queries=args.num_queries,
        aux_loss=args.aux_loss,
    )
    if args.masks:  # 设置了mask，代表分割任务
        model = DETRsegm(model, freeze_detr=(args.frozen_weights is not None))
    matcher = build_matcher(args)
    weight_dict = {
     'loss_ce': 1, 'loss_bbox': args.bbox_loss_coef}
    weight_dict['loss_giou'] = args.giou_loss_coef
    if args.masks:
        weight_dict["loss_mask"] = args.mask_loss_coef
        weight_dict["loss_dice"] = args.dice_loss_coef
    # TODO this is a hack
    if args.aux_loss:
        aux_weight_dict = {
     }
        for i in range(args.dec_layers - 1):
            # 为中间层的loss也加上权重
            aux_weight_dict.update({
     k + f'_{i}': v for k, v in weight_dict.items()})
        weight_dict.update(aux_weight_dict)

    losses = ['labels', 'boxes', 'cardinality']
    if args.masks:
        losses += ["masks"]
    criterion = SetCriterion(num_classes, matcher=matcher, weight_dict=weight_dict,
                             eos_coef=args.eos_coef, losses=losses)
    criterion.to(device)
    postprocessors = {
     'bbox': PostProcess()}
    if args.masks:
        postprocessors['segm'] = PostProcessSegm()
        if args.dataset_file == "coco_panoptic":
            is_thing_map = {
     i: i <= 90 for i in range(201)}
            postprocessors["panoptic"] = PostProcessPanoptic(is_thing_map, threshold=0.85)

    return model, criterion, postprocessors

6.nested_tensor类

处理之后每个batch的大小可能不一样，batch里面的图像大小是一样的。当一个batch中的图片大小不一样的时候，我们要把它们处理的整齐，就是把图片都padding成最大的尺寸，padding的方式就是补零，每张图片对应一个mask矩阵padding的位置上为1

class NestedTensor(object):
    def __init__(self, tensors, mask: Optional[Tensor]):
        self.tensors = tensors
        self.mask = mask

    def to(self, device):
        # type: (Device) -> NestedTensor # noqa
        cast_tensor = self.tensors.to(device)
        mask = self.mask
        if mask is not None:
            assert mask is not None
            cast_mask = mask.to(device)
        else:
            cast_mask = None
        return NestedTensor(cast_tensor, cast_mask)

    def decompose(self):
        return self.tensors, self.mask

    def __repr__(self):
        return str(self.tensors)


def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
    # TODO make this more general
    if tensor_list[0].ndim == 3:
        if torchvision._is_tracing():
            # nested_tensor_from_tensor_list() does not export well to ONNX
            # call _onnx_nested_tensor_from_tensor_list() instead
            return _onnx_nested_tensor_from_tensor_list(tensor_list)

        # TODO make it support different-sized images
        max_size = _max_by_axis([list(img.shape) for img in tensor_list])
        # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))
        batch_shape = [len(tensor_list)] + max_size
        b, c, h, w = batch_shape
        dtype = tensor_list[0].dtype
        device = tensor_list[0].device
        tensor = torch.zeros(batch_shape, dtype=dtype, device=device)
        mask = torch.ones((b, h, w), dtype=torch.bool, device=device)
        for img, pad_img, m in zip(tensor_list, tensor, mask):
            pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
            m[: img.shape[1], :img.shape[2]] = False
    else:
        raise ValueError('not supported')
    return NestedTensor(tensor, mask)

7.main.py

略

总结

提示：参考链接用Transformer做object detection：DETR
【论文笔记】从Transformer到DETR
End-to-End Object Detection with Transformers论文阅读笔记
object queries
position embedding
b站教学视频Transformer从零详细解读(可能是你见过最通俗易懂的讲解)
还可以看b站李宏毅老师的transformer原理
源码解读
讲transformer的输入和位置编码
Transformer-Encoder部分讲解和代码分析
nn.MultiheadAttention
匈牙利匹配算法

你可能感兴趣的:(图像分割目标检测)

基于社交网络算法优化的二维最大熵图像分割智能算法研学社（Jack旭）智能优化算法应用图像分割算法 php 开发语言
智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码文章目录智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码1.前言2.二维最大熵阈值分割原理3.基于社交网络优化的多阈值分割4.算法结果：5.参考文献：6.Matlab代码摘要：本文介绍基于最大熵的图像分割，并且应用社交网络算法进行阈值寻优。1.前言阅读此文章前，请阅读《图像分割：直方图区域划分及信息统计介绍》htt
【目标检测数据集】卡车数据集1073张VOC+YOLO格式熬夜写代码的平头哥∰ 目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：1073标注数量(xml文件个数)：1073标注数量(txt文件个数)：1073标注类别数：1标注类别名称:["truck"]每个类别标注的框数：truck框数=1120总框数：1120使用标注工具：labelImg标注
番茄西红柿叶子病害分类数据集12882张11类别 futureflsl 数据集分类数据挖掘人工智能
数据集类型：图像分类用，不可用于目标检测无标注文件数据集格式：仅仅包含jpg图片，每个类别文件夹下面存放着对应图片图片数量(jpg文件个数)：12882分类类别数：11类别名称:["Bacterial_Spot_Bacteria","Early_Blight_Fungus","Healthy","Late_Blight_Water_Mold","Leaf_Mold_Fungus","Powdery
[数据集][目标检测]汽车头部尾部检测数据集VOC+YOLO格式5319张3类别 FL1623863129 数据集目标检测汽车 YOLO
数据集制作单位：未来自主研究中心(FIRC)版权单位：未来自主研究中心(FIRC)版权声明：数据集仅仅供个人使用，不得在未授权情况下挂淘宝、咸鱼等交易网站公开售卖,由此引发的法律责任需自行承担数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：5319标注数量(xml文件
Python OpenCV图像处理：从基础到高级的全方位指南极客代码玩转Python 开发语言 python opencv 图像处理计算机视觉
目录第一部分：PythonOpenCV图像处理基础1.1OpenCV简介1.2PythonOpenCV安装1.3实战案例：图像显示与保存1.4注意事项第二部分：PythonOpenCV图像处理高级技巧2.1图像变换2.2图像增强2.3图像复原第三部分：PythonOpenCV图像处理实战项目3.1图像滤波3.2图像分割3.3图像特征提取第四部分：PythonOpenCV图像处理注意事项与优化策略4
遥感图像分割系统：融合空间金字塔池化（FocalModulation)改进YOLOv8 xuehaisj YOLO 人工智能计算机视觉 yolov8
1.研究背景与意义项目参考AAAIAssociationfortheAdvancementofArtificialIntelligence研究背景与意义遥感图像分割是遥感技术领域中的一个重要研究方向，它的目标是将遥感图像中的不同地物或地物类别进行有效的分割和识别。随着遥感技术的不断发展和遥感图像数据的大规模获取，遥感图像分割在农业、城市规划、环境监测等领域具有广泛的应用前景。然而，由于遥感图像的特
CV、NLP、数据控掘推荐、量化海的那边- AI算法自然语言处理人工智能
下面是对CV（计算机视觉）、NLP（自然语言处理）、数据挖掘推荐和量化的简要概述及其应用领域的介绍：1.CV（计算机视觉，ComputerVision）定义：计算机视觉是一门让计算机能够从图像或视频中提取有用信息，并做出决策的学科。它通过模拟人类的视觉系统来识别、处理和理解视觉信息。主要任务：图像分类：识别图像中的物体并分类，比如猫、狗、车等。目标检测：在图像或视频中定位并识别多个对象，如人脸检测
SAM2：环境安装&代码调试要养家的程序猿 AI算法 python 算法 ai 人工智能科技
引子时隔大半年，SAM2代终于来了，之前写过一篇《SegmentAnything（SAM）环境安装&代码调试》，感兴趣童鞋请移步SegmentAnything（SAM）环境安装&代码调试-CSDN博客，OK，让我们开始吧。一、模型介绍Meta公司去年发布了SAM1基础模型，已经可以在图像上分割对象。而最新发布的SAM2可用于图片和视频，并可以实现实时、可提示的对象分割。SAM2在图像分割准确性方面
【目标检测数据集】番茄叶片病害数据集13940张9类VOC+YOLO格式熬夜写代码的平头哥∰ 数据集目标检测 YOLO 目标跟踪
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：13946标注数量(xml文件个数)：13946标注数量(txt文件个数)：13946标注类别数：9标注类别名称:["EarlyBlight","Healthy","LateBlight","LeafMiner","Le
[数据集][目标检测]血细胞检测数据集VOC+YOLO格式2757张4类别 FL1623863129 数据集目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：2757标注数量(xml文件个数)：2757标注数量(txt文件个数)：2757标注类别数：4标注类别名称:["Platelets","RBC","WBC","sicklecell"]每个类别标注的框数：Platelet
目标检测YOLO系列从入门到精通技术详解100篇-【目标检测】工业相机格图素书数码相机目标检测人工智能
目录知识储备深度相机1TOF2双目视觉3结构光4智能门锁应用5手机应用算法原理相机的成像与标定模型相机标定的实施·标定过程的算法实施相机标定的扩展CCD工业相机、镜头倍率及相关参数计算方法知识储备深度相机1TOF1.1Kinectv2Kinectv2是Microsoft在2014年发售的，如图1-1所示。相比于Kinectv1在硬件和软件上作出了很大的进化，且在深度测量的系统和非系统误差方面表现出
【小贪】项目实战——Zero-shot根据文字提示分割出图片目标掩码贪钱算法还我头发 #Deep Learning #Computer Vision AI 目标检测深度学习 python 语义分割 Zero-shot
目标描述给定RGB视频或图片，目标是分割出图像中的指定目标掩码。我们需要复现两个Zero-shot的开源项目，分别为IDEA研究院的GroundingDINO和Facebook的SAM。首先使用目标检测方法GroundingDINO，输入想检测目标的文字提示，可以获得目标的anchorbox。将上一步获得的box信息作为SAM的提示，分割出目标mask。具体效果如下（测试数据来自VolumeDef
探秘3D UNet-PyTorch：高效三维图像分割利器鲍凯印Fox
探秘3DUNet-PyTorch：高效三维图像分割利器在医学影像处理、计算机视觉和自动驾驶等领域，三维图像的理解与分析至关重要。而是一个基于PyTorch实现的深度学习模型，专为三维图像分割任务设计。本文将深入剖析该项目的技术细节，应用场景及特性，以期吸引更多的开发者和研究人员参与其中。项目简介3DUNet是2DUNet的三维扩展，其结构保持了卷积神经网络的对称性，采用跳跃连接的方式保留了不同尺度
yolov5 +gui界面+单目测距实现对图片视频摄像头的测距毕设宇航 QQ767172261 yolov5 单目测距
可实现对图片，视频，摄像头的检测项目概述本项目旨在实现一个集成了YOLOv5目标检测算法、图形用户界面（GUI）以及单目测距功能的系统。该系统能够对图片、视频或实时摄像头输入进行目标检测，并估算目标的距离。通过结合YOLOv5的强大检测能力和单目测距技术，系统能够在多种应用场景中提供高效、准确的目标检测和测距功能。技术栈YOLOv5：用于目标检测的深度学习模型。OpenCV：用于图像处理和单目测距
图像分割任务在设计模型损失函数时，高斯函数会被如何应用 Wils0nEdwards 计算机视觉人工智能深度学习
什么是高斯函数？Gaussianfunction，又称为高斯函数，是一种常见的数学函数，定义为一种特定形状的钟形曲线。其表达式通常为：f(x)=a⋅exp⁡(−(x−b)22c2)f(x)=a\cdot\exp\left(-\frac{(x-b)^2}{2c^2}\right)f(x)=a⋅exp(−2c2(x−b)2)其中：aaa决定了曲线的高度（峰值）。bbb是曲线中心位置的均值，决定曲线的对
目标检测-YOLOv3 wydxry 深度学习目标检测 YOLO 深度学习
YOLOv3介绍YOLOv3(YouOnlyLookOnce,Version3)是YOLO系列目标检测模型的第三个版本，相较于YOLOv2有了显著的改进和增强，尤其在检测速度和精度上表现优异。YOLOv3的设计目标是在保持高速的前提下提升检测的准确性和稳定性。下面是对YOLOv3改进和优势的介绍，以及YOLOv3核心部分的代码展示。相比YOLOv2的改进与优势多尺度特征金字塔YOLOv3引入了FP
SSD目标检测系统月见樽
首发于个人博客系统结构system.pngSSD识别系统也是一种单步物体识别系统，即将提取物体位置和判断物体类别融合在一起进行，其最主要的特点是识别器用于判断物体的特征不仅仅来自于神经网络的输出，还来自于神经网络的中间结果。该系统分为以下几个部分：神经网络部分：用作特征提取器，提取图像特征识别器：根据神经网络提取的特征，生成包含物品位置和类别信息的候选框（使用卷积实现）后处理：对识别器提取出的候选
深度学习目标检测入门COCO数据集日暮途远z 深度学习目标检测人工智能
常见数据集类型：COCO数据集：Pytorch加载COCO数据集：COCO数据集的读取COCO_dataset=torchvision.datasets.CocoDetection(root="./dataset/val2017",annFile="./instances_val2017/instances_val2017.json")root(strorpathlib.Path)–Rootdir
Python(PyTorch和TensorFlow)图像分割卷积网络导图(生物医学) 亚图跨际交叉知识 Python 生物医学脑肿瘤图像皮肤病变多模态医学图像多尺度特征生物医学腹部胰腺图像病灶边界气胸图像
要点语义分割图像三层分割椭圆图像脑肿瘤图像分割动物图像分割皮肤病变分割多模态医学图像多尺度特征生物医学肖像多类和医学分割通用图像分割模板腹部胰腺图像分割分类注意力网络病灶边界分割气胸图像分割Python生物医学图像卷积网络该网络由收缩路径和扩展路径组成，收缩路径是一种典型的卷积网络，由重复应用卷积组成，每个卷积后跟一个整流线性单元(ReLU)和一个最大池化操作。在收缩过程中，空间信息减少，而特征信
[数据集][目标检测]街道乱堆垃圾检测数据集VOC+YOLO格式94张1类别 FL1623863129 数据集目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：94标注数量(xml文件个数)：94标注数量(txt文件个数)：94标注类别数：1标注类别名称:["baolu"]每个类别标注的框数：baolu框数=107总框数：107使用标注工具：labelImg标注规则：对类别进行
YOLOv8改进 | 检测头篇 | YOLOv8引入DynamicHead检测头小李学AI YOLOv8有效涨点专栏 YOLO 深度学习目标检测计算机视觉机器学习人工智能
1.DynamicHead描述1.1摘要：在目标检测中，定位和分类相结合的复杂性导致了各种方法的蓬勃发展。以往的工作试图提高各种目标检测头的性能，但未能呈现出统一的观点。本文根据目标检测的特点，推导了一种新的动态头部框架，将目标检测头部与注意力统一起来。该方法通过在特征层次间、空间位置间和输出通道内协调组合多种自注意机制，在不增加计算开销的情况下显著提高了目标检测头的表示能力。进一步的实验表明，本
目标检测-YOLOv1 wydxry 深度学习目标检测 YOLO 人工智能
YOLOv1介绍YOLOv1（YouOnlyLookOnceversion1）是一种用于目标检测的深度学习算法，由JosephRedmon等人于2016年提出。它基于单个卷积神经网络，将目标检测任务转化为一个回归问题，通过在图像上划分网格并预测每个网格中是否包含目标以及目标的位置和类别来实现目标检测。YOLOv1的主要特点包括：快速的检测速度：相比于传统的目标检测算法，YOLOv1具有更快的检测速
[数据集][目标检测]人脸口罩佩戴目标检测数据集VOC+YOLO格式8068张3类别 FL1623863129 数据集目标检测 YOLO 目标跟踪
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：8068标注数量(xml文件个数)：8068标注数量(txt文件个数)：8068标注类别数：3标注类别名称:["face_with_mask","face_without_mask","mask"]每个类别标注的框数：f
葡萄检测-目标检测数据集（包括VOC格式、YOLO格式）数据集_深度学习目标检测 YOLO 人工智能计算机视觉葡萄
葡萄检测-目标检测数据集（包括VOC格式、YOLO格式）数据集：链接：https://pan.baidu.com/s/1YMwAaSJc8H5SI0f8RVSidw?pwd=iygs提取码：iygs数据集信息介绍：共有1646张图像和一一对应的标注文件标注文件格式提供了两种，包括VOC格式的xml文件和YOLO格式的txt文件。标注的对象共有以下几种：[‘grape’]标注框的数量信息如下：（标注
OpenCV项目实战-深度学习去阴影-图像去阴影阿利同学 opencv 深度学习人工智能阴影去除图像去阴影
往期热门博客项目回顾：计算机视觉项目大集合改进的yolo目标检测-测距测速路径规划算法图像去雨去雾+目标检测+测距项目交通标志识别项目yolo系列-重磅yolov9界面-最新的yolo姿态识别-3d姿态识别深度学习小白学习路线//正文开始！图像去阴影算法旨在改善图像质量并恢复阴影下物体的真实颜色与亮度这对于许多计算机视觉任务如物体识别、跟踪以及增强现实等至关重要。以下是一些图像去阴影算法的基本概述
目标检测-YOLOv4 wydxry 深度学习目标检测 YOLO 目标跟踪
YOLOv4介绍YOLOv4是YOLO系列的第四个版本，继承了YOLOv3的高效性，并通过大量优化和改进，在目标检测任务中实现了更高的精度和速度。相比YOLOv3，YOLOv4在框架设计、特征提取、训练策略等方面进行了全面升级。它在保持实时检测的同时，显著提升了检测性能，尤其在复杂场景中的表现尤为出色。相比YOLOv3的改进与优势改进的Backbone(CSPDarknet-53)YOLOv4使用
[数据集][目标检测]井盖丢失未盖破损检测数据集VOC+YOLO格式2890张5类别 FL1623863129 数据集目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：2890标注数量(xml文件个数)：2890标注数量(txt文件个数)：2890标注类别数：5标注类别名称:["broke","circle","good","lose","uncovered"]每个类别标注的框数：br
YOLOv8改进更换轻量级网络结构学yolo的小白 Upgrade YOLOv8进阶 YOLO 目标检测深度学习
一、GhostNet论文论文地址：1911.11907.pdf(arxiv.org)二、GhostNet结构GhostNet是一种高效的目标检测网络，具有较低的计算复杂度和较高的准确性。该网络采用了轻量级的架构，可以在计算资源有限的设备上运行，并能够快速地实时检测图像中的目标物体。GhostNet基于MobileNetV3的设计思路，采用了Ghost模块来减少网络参数数量，从而减少计算量并提高模型
【Python】成功解决TypeError: list indices must be integers or slices, not str 高斯小哥 BUG解决方案合集 python list 新手入门学习 debug
【Python】成功解决TypeError:listindicesmustbeintegersorslices,notstr欢迎进入我的个人主页，我是高斯小哥！博主档案：广东某985本硕，SCI顶刊一作，深耕深度学习多年，熟练掌握PyTorch框架。技术专长：擅长处理各类深度学习任务，包括但不限于图像分类、图像重构(去雾\去模糊\修复)、目标检测、图像分割、人脸识别、多标签分类、重识别(行人\车辆
LeYOLO 用于目标检测的新型可扩展和高效CNN架构 | 最新轻量化SOTA! 5GFLOP下无对手！迪菲赫尔曼 YOLOv8改进实战目标检测 cnn 架构 pytorch 深度学习轻量化
本改进已集成到YOLOv8-Magic框架。论文地址：https://arxiv.org/pdf/2406.14239代码地址：https://github.com/LilianHollard/LeYOLO/tree/main在深度神经网络中，计算效率对于目标检测至关重要，尤其是在新型模型更倾向于速度而非计算效率（浮点运算次数，FLOP）的情况下。这种演变在一定程度上忽视了嵌入式和面向移动的AI目
面向对象面向过程 3213213333332132 java
面向对象：把要完成的一件事，通过对象间的协作实现。面向过程：把要完成的一件事，通过循序依次调用各个模块实现。我把大象装进冰箱这件事为例，用面向对象和面向过程实现，都是用java代码完成。 1、面向对象 package bigDemo.ObjectOriented; /** * 大象类 * * @Description * @author FuJian
Java Hotspot: Remove the Permanent Generation bookjovi HotSpot
openjdk上关于hotspot将移除永久带的描述非常详细，http://openjdk.java.net/jeps/122 JEP 122: Remove the Permanent Generation Author Jon Masamitsu Organization Oracle Created 2010/8/15 Updated 2011/
正则表达式向前查找向后查找,环绕或零宽断言 dcj3sjt126com 正则表达式
向前查找和向后查找 1. 向前查找：根据要匹配的字符序列后面存在一个特定的字符序列(肯定式向前查找)或不存在一个特定的序列(否定式向前查找)来决定是否匹配。.NET将向前查找称之为零宽度向前查找断言。对于向前查找，出现在指定项之后的字符序列不会被正则表达式引擎返回。 2. 向后查找：一个要匹配的字符序列前面有或者没有指定的
BaseDao 171815164 seda
import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.PreparedStatement; import java.sql.ResultSet; public class BaseDao { public Conn
Ant标签详解--Java命令 g21121 Java命令
这一篇主要介绍与java相关标签的使用终于开始重头戏了，Java部分是我们关注的重点也是项目中用处最多的部分。 1
[简单]代码片段_电梯数字排列 53873039oycg 代码
今天看电梯数字排列是9 18 26这样呈倒N排列的,写了个类似的打印例子，如下: import java.util.Arrays; public class 电梯数字排列_S3_Test { public static void main(S
Hessian原理云端月影 hessian原理
Hessian 原理分析一．远程通讯协议的基本原理网络通信需要做的就是将流从一台计算机传输到另外一台计算机，基于传输协议和网络 IO 来实现，其中传输协议比较出名的有 http 、 tcp 、 udp 等等， http 、 tcp 、 udp 都是在基于 Socket 概念上为某类应用场景而扩展出的传输协
区分Activity的四种加载模式----以及Intent的setFlags aijuans android
在多Activity开发中，有可能是自己应用之间的Activity跳转，或者夹带其他应用的可复用Activity。可能会希望跳转到原来某个Activity实例，而不是产生大量重复的Activity。这需要为Activity配置特定的加载模式，而不是使用默认的加载模式。加载模式分类及在哪里配置 Activity有四种加载模式： standard singleTop
hibernate几个核心API及其查询分析 antonyup_2006 html .net Hibernate xml 配置管理
(一) org.hibernate.cfg.Configuration类读取配置文件并创建唯一的SessionFactory对象.(一般,程序初始化hibernate时创建.) Configuration co
PL/SQL的流程控制百合不是茶 oracle PL/SQL编程循环控制
PL/SQL也是一门高级语言,所以流程控制是必须要有的,oracle数据库的pl/sql比sqlserver数据库要难,很多pl/sql中有的sqlserver里面没有流程控制; 分支语句 if 条件 then 结果 else 结果 end if ; 条件语句 case when 条件 then 结果; 循环语句 loop
强大的Mockito测试框架 bijian1013 mockito 单元测试
一.自动生成Mock类在需要Mock的属性上标记@Mock注解，然后@RunWith中配置Mockito的TestRunner或者在setUp()方法中显示调用MockitoAnnotations.initMocks(this);生成Mock类即可。二.自动注入Mock类到被测试类 &nbs
精通Oracle10编程SQL(11)开发子程序 bijian1013 oracle 数据库 plsql
/* *开发子程序 */ --子程序目是指被命名的PL/SQL块，这种块可以带有参数，可以在不同应用程序中多次调用 --PL/SQL有两种类型的子程序：过程和函数 --开发过程 --建立过程：不带任何参数 CREATE OR REPLACE PROCEDURE out_time IS BEGIN DBMS_OUTPUT.put_line(systimestamp); E
【EhCache一】EhCache版Hello World bit1129 Hello world
本篇是EhCache系列的第一篇，总体介绍使用EhCache缓存进行CRUD的API的基本使用，更细节的内容包括EhCache源代码和设计、实现原理在接下来的文章中进行介绍环境准备 1.新建Maven项目 2.添加EhCache的Maven依赖 <dependency> <groupId>ne
学习EJB3基础知识笔记白糖_ bean Hibernate jboss webservice ejb
最近项目进入系统测试阶段，全赖袁大虾领导有力，保持一周零bug记录，这也让自己腾出不少时间补充知识。花了两天时间把“传智播客EJB3.0”看完了，EJB基本的知识也有些了解，在这记录下EJB的部分知识，以供自己以后复习使用。 EJB是sun的服务器端组件模型，最大的用处是部署分布式应用程序。EJB (Enterprise JavaBean)是J2EE的一部分，定义了一个用于开发基
angular.bootstrap boyitech AngularJS AngularJS API angular中文api
angular.bootstrap 描述：手动初始化angular。这个函数会自动检测创建的module有没有被加载多次，如果有则会在浏览器的控制台打出警告日志，并且不会再次加载。这样可以避免在程序运行过程中许多奇怪的问题发生。使用方法： angular .
java-谷歌面试题-给定一个固定长度的数组，将递增整数序列写入这个数组。当写到数组尾部时，返回数组开始重新写，并覆盖先前写过的数 bylijinnan java
public class SearchInShiftedArray { /** * 题目：给定一个固定长度的数组，将递增整数序列写入这个数组。当写到数组尾部时，返回数组开始重新写，并覆盖先前写过的数。 * 请在这个特殊数组中找出给定的整数。 * 解答： * 其实就是“旋转数组”。旋转数组的最小元素见http://bylijinnan.iteye.com/bl
天使还是魔鬼？都是我们制造 ducklsl 生活教育情感
----------------------------剧透请原谅，有兴趣的朋友可以自己看看电影，互相讨论哦！！！从厦门回来的动车上，无意中瞟到了书中推荐的几部关于儿童的电影。当然，这几部电影可能会另大家失望，并不是类似小鬼当家的电影，而是关于“坏小孩”的电影！自己挑了两部先看了看，但是发现看完之后，心里久久不能平
[机器智能与生物]研究生物智能的问题 comsci 生物
我想,人的神经网络和苍蝇的神经网络,并没有本质的区别...就是大规模拓扑系统和中小规模拓扑分析的区别.... 但是,如果去研究活体人类的神经网络和脑系统,可能会受到一些法律和道德方面的限制,而且研究结果也不一定可靠,那么希望从事生物神经网络研究的朋友,不如把
获取Android Device的信息 dai_lm android
String phoneInfo = "PRODUCT: " + android.os.Build.PRODUCT; phoneInfo += ", CPU_ABI: " + android.os.Build.CPU_ABI; phoneInfo += ", TAGS: " + android.os.Build.TAGS; ph
最佳字符串匹配算法（Damerau-Levenshtein距离算法）的Java实现 datamachine java 算法字符串匹配
原文：http://www.javacodegeeks.com/2013/11/java-implementation-of-optimal-string-alignment.html------------------------------------------------------------------------------------------------------------
小学5年级英语单词背诵第一课 dcj3sjt126com english word
long 长的 show 给...看，出示 mouth 口，嘴 write 写 use 用，使用 take 拿，带来 hand 手 clever 聪明的 often 经常 wash 洗 slow 慢的 house 房子 water 水 clean 清洁的 supper 晚餐 out 在外 face 脸，
macvim的使用实战 dcj3sjt126com mac vim
macvim用的是mac里面的vim, 只不过是一个GUI的APP, 相当于一个壳 1. 下载macvim https://code.google.com/p/macvim/ 2. 了解macvim :h vim的使用帮助信息 :h macvim
java二分法查找蕃薯耀 java二分法查找二分法 java二分法
java二分法查找 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月23日 11:40:03 星期二 http:/
Spring Cache注解+Memcached hanqunfeng spring memcached
Spring3.1 Cache注解依赖jar包：  <dependency> <groupId>com.google.code.simple-spring-memcached</groupId> <artifactId>simple-s
apache commons io包快速入门 jackyrong apache commons
原文参考 http://www.javacodegeeks.com/2014/10/apache-commons-io-tutorial.html Apache Commons IO 包绝对是好东西，地址在http://commons.apache.org/proper/commons-io/，下面用例子分别介绍： 1）工具类 2
如何学习编程 lampcy java 编程 C++c
首先,我想说一下学习思想.学编程其实跟网络游戏有着类似的效果.开始的时候,你会对那些代码,函数等产生很大的兴趣,尤其是刚接触编程的人,刚学习第一种语言的人.可是,当你一步步深入的时候,你会发现你没有了以前那种斗志.就好象你在玩韩国泡菜网游似的,玩到一定程度,每天就是练级练级,完全是一个想冲到高级别的意志力在支持着你.而学编程就更难了,学了两个月后,总是觉得你好象全都学会了,却又什么都做不了,又没有
架构师之spring-----spring3.0新特性的bean加载控制@DependsOn和@Lazy nannan408 Spring3
1.前言。如题。 2.描述。 @DependsOn用于强制初始化其他Bean。可以修饰Bean类或方法，使用该Annotation时可以指定一个字符串数组作为参数，每个数组元素对应于一个强制初始化的Bean。 @DependsOn({"steelAxe","abc"}) @Comp
Spring4+quartz2的配置和代码方式调度 Everyday都不同代码配置 spring4 quartz2.x 定时任务
前言：这些天简直被quartz虐哭。。因为quartz 2.x版本相比quartz1.x版本的API改动太多，所以，只好自己去查阅底层API…… quartz定时任务必须搞清楚几个概念： JobDetail——处理类 Trigger——触发器，指定触发时间，必须要有JobDetail属性，即触发对象 Scheduler——调度器，组织处理类和触发器，配置方式一般只需指定触发
Hibernate入门 tntxia Hibernate
前言使用面向对象的语言和关系型的数据库，开发起来很繁琐，费时。由于现在流行的数据库都不面向对象。Hibernate 是一个Java的ORM（Object/Relational Mapping）解决方案。 Hibernte不仅关心把Java对象对应到数据库的表中，而且提供了请求和检索的方法。简化了手工进行JDBC操作的流程。如
Math类 xiaoxing598 Math
一、Java中的数字（Math）类是final类，不可继承。 1、常数 PI：double圆周率 E：double自然对数 2、截取（注意方法的返回类型） double ceil(double d) 返回不小于d的最小整数 double floor(double d) 返回不大于d的整最大数 int round(float f) 返回四舍五入后的整数 long round