有任何问题欢迎在下面留言
本篇文章的代码运行界面均在Pycharm中进行
本篇文章配套的代码资源已经上传
点我下载源码
DETR 算法解读
DETR 源码解读1(项目配置/CocoDetection类)
DETR 源码解读2(ConvertCocoPolysToMask类)
DETR 源码解读3(DETR类)
DETR 源码解读4(Joiner类/PositionEmbeddingSine类/位置编码/backbone)
位置:models/detr.py/DETR类
class DETR(nn.Module):
def __init__(self, backbone, transformer, num_classes, num_queries, aux_loss=False):
super().__init__()
self.num_queries = num_queries
self.transformer = transformer
hidden_dim = transformer.d_model
self.class_embed = nn.Linear(hidden_dim, num_classes + 1)
self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)
self.query_embed = nn.Embedding(num_queries, hidden_dim)
self.input_proj = nn.Conv2d(backbone.num_channels, hidden_dim, kernel_size=1)
self.backbone = backbone
self.aux_loss = aux_loss
backbone
:CNN骨架网络,用于特征提取transformer
:Transformer模型,用于处理序列数据num_classes
:目标类别的数量num_queries
:解码器初始化生成的100个向量的个数,num_queries=100aux_loss
:一个布尔值,指示是否使用辅助损失来帮助训练这里包含了几个自定义函数和类:
nested_tensor_from_tensor_list函数:将不同尺寸处理的图像Tensor转换为一个嵌套Tensor
MLP类:边界框的四个坐标的回归
transformer类:构建transformer架构
backbone:用于提取图像特征的CNN
def forward(self, samples: NestedTensor):
if isinstance(samples, (list, torch.Tensor)):
samples = nested_tensor_from_tensor_list(samples)
features, pos = self.backbone(samples)
src, mask = features[-1].decompose()
assert mask is not None
hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]
outputs_class = self.class_embed(hs)
outputs_coord = self.bbox_embed(hs).sigmoid()
out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}
if self.aux_loss:
out['aux_outputs'] = self._set_aux_loss(outputs_class, outputs_coord)
return out
samples
是否为列表或Tensor类型_set_aux_loss
计算辅助损失@torch.jit.unused
def _set_aux_loss(self, outputs_class, outputs_coord):
return [{'pred_logits': a, 'pred_boxes': b}
for a, b in zip(outputs_class[:-1], outputs_coord[:-1])]
@torch.jit.unused
:一个装饰器,指示当使用TorchScript编译模型时,该方法不应被编译。这是因为辅助损失的计算可能不兼容TorchScript的静态图特性DETR 算法解读
DETR 源码解读1(项目配置/CocoDetection类)
DETR 源码解读2(ConvertCocoPolysToMask类)
DETR 源码解读3(DETR类)
DETR 源码解读4(Joiner类/PositionEmbeddingSine类/位置编码/backbone)