[AAAI2022] Liu X, Xue N, Wu T. Learning auxiliary monocular contexts helps monocular 3D object detection[C]. In Proceedings of the AAAI Conference on Artificial Intelligence. 2022: 1810-1818.
Paper
Code
MonoCon (AAAI 2022):使用「辅助学习」的单目3D目标检测框架
本文提出了一种无需借助任何额外信息,而是学习单目Contexts来辅助训练过程的方法,称为MonoCon。其关键思想是,目标带有注释的 3D 边界框,可以产生大量可用的良好投影的 2D 监督信号(例如投影的角点关键点及其相关的偏移向量相对于 2D 边界框的中心),这应该在训练中用作辅助任务。MonoCon由三个组件组成:基于深度神经网络 (DNN) 的特征主干、许多回归头分支,用于学习 3D 边界框预测中使用的基本参数,以及用于学习辅助上下文的许多回归头分支。训练后,丢弃辅助上下文回归分支以获得更好的推理效率
从整体上来看,MonoCon是在MonoDLE的基础上,加入了投影的2D辅助学习模块以及对所有的Head加入了AN(Attentive Normalization)模块,从实验结果来看,这些trick的效果是非常work的,如下图所示
AttnBatchNorm2d(64, eps=0.001, momentum=0.03, affine=False, track_running_stats=True
(attn_weights): AttnWeights(
(attention): Sequential(
(0): Conv2d(64, 10, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): HSigmoidv2()
)
)
)
Backbone:DLA34(CenterNet)
Neck:DLAUP
Head:两部分
loss_center_heatmap=dict(type='CenterNetGaussianFocalLoss', loss_weight=1.0),
loss_wh=dict(type='L1Loss', loss_weight=0.1),
loss_offset=dict(type='L1Loss', loss_weight=1.0),
loss_center2kpt_offset=dict(type='L1Loss', loss_weight=1.0),
loss_kpt_heatmap=dict(type='CenterNetGaussianFocalLoss', loss_weight=1.0),
loss_kpt_heatmap_offset=dict(type='L1Loss', loss_weight=1.0),
loss_dim=dict(type='DimAwareL1Loss', loss_weight=1.0),
loss_depth=dict(type='LaplacianAleatoricUncertaintyLoss', loss_weight=1.0),
loss_alpha_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
loss_weight=1.0),
loss_alpha_reg=dict(type='L1Loss', loss_weight=1.0),
mmdetection3d-0.14.0\mmdet3d\models\losses\uncertainty_loss.py
:class LaplacianAleatoricUncertaintyLoss(nn.Module):
"""
Args: reduction (str): Method to reduce losses. The valid reduction method are none, sum or mean. loss_weight (float, optional): Weight of loss. Defaults to 1.0. """ def __init__(self, loss_weight=1.0):
super(LaplacianAleatoricUncertaintyLoss, self).__init__()
self.loss_weight = loss_weight
def forward(self,
input,
target,
log_variance):
log_variance = log_variance.flatten()
input = input.flatten()
target = target.flatten()
loss = 1.4142 * torch.exp(-log_variance) * torch.abs(input - target) + log_variance
return loss.mean() * self.loss_weight
loss_weight
。为了简单起见,除了2D尺寸L1损失使用0.1外,我们对所有损失项都使用1.0辅助学习学了四件事:
这部分单独拿出来着重讨论以下两个问题:
1、论文中给出的量化残差计算公式与源码中的对应实现不一致
2、对9个投影点 ( x k , y k ) (x_k,y_k) (xk,yk)进行量化误差建模时,MonoCon采用了keypoint-agnostic
方式,即关键点无关建模
# kpt heatmap offset
# [8, 2, 96, 312] --> [8, 270, 2]
kpt_heatmap_offset_pred = transpose_and_gather_feat(kpt_heatmap_offset_pred, indices_kpt)
# [8, 270, 2] --> [8, 30, 18]
kpt_heatmap_offset_pred = kpt_heatmap_offset_pred.reshape(batch_size, self.max_objs, self.num_kpt * 2)
Projected keypoints Quantization Residual(2×h×w)
部分timm
版本未指定:# 指定timm版本,不然会更新torch==1.10.0
pip install timm==0.4.5
numba.cuda.cudadrv.error.NvvmError: Failed to compile
Note: To solve this issue, install numba==0.53.0 using conda, dont use pip
Example: conda install numba==0.53.0