本文介绍单目3D目标检测领域的两个经典算法SMOKE(2020)和MonoFlex(2021)
在正篇之前,有必要先了解一下yacs库,因为这俩算法源码的参数配置文件,都是基于yacs库建立起来的,不学看不懂啊!!!!
yacs是一个用于定义和管理参数配置的库(例如用于训练模型的超参数或可配置模型超参数等)。yacs使用yaml文件来配置参数。另外,yacs是在py-fast -rcnn和Detectron中使用的实验配置系统中发展起来的
pip install yacs
defaults.py
文件,然后导入包from yacs.config import CfgNode as CN
CN()
容器来装载参数,并添加需要的参数from yacs.config import CfgNode as CN
__C = CN()
__C.name = 'test'
__C.model = CN() # 嵌套使用
__C.model.backbone = 'resnet'
__C.model.depth = 18
print(__C)
'''
name: test
model:
backbone: resnet
depth: 18
'''
使用merge_from_file()
这个方法,会将默认参数与特定参数不同的部分,用特定参数覆盖
__C.merge_from_file("./test_config.yaml")
defaults.py
示例(默认参数):import os
from yacs.config import CfgNode as CN
# -----------------------------------------------------------------------------
# Config definition
# -----------------------------------------------------------------------------
_C = CN()
_C.MODEL = CN()
_C.MODEL.SMOKE_ON = True
_C.MODEL.DEVICE = "cuda"
_C.MODEL.WEIGHT = ""
# -----------------------------------------------------------------------------
# INPUT
# -----------------------------------------------------------------------------
_C.INPUT = CN()
# Size of the smallest side of the image during training
_C.INPUT.HEIGHT_TRAIN = 384
# Maximum size of the side of the image during training
_C.INPUT.WIDTH_TRAIN = 1280
# Size of the smallest side of the image during testing
_C.INPUT.HEIGHT_TEST = 384
# Maximum size of the side of the image during testing
_C.INPUT.WIDTH_TEST = 1280
# Values to be used for image normalization
_C.INPUT.PIXEL_MEAN = [0.485, 0.456, 0.406] # kitti
# Values to be used for image normalization
_C.INPUT.PIXEL_STD = [0.229, 0.224, 0.225] # kitti
# Convert image to BGR format
_C.INPUT.TO_BGR = True
# Flip probability
_C.INPUT.FLIP_PROB_TRAIN = 0.5
# Shift and scale probability
_C.INPUT.SHIFT_SCALE_PROB_TRAIN = 0.3
_C.INPUT.SHIFT_SCALE_TRAIN = (0.2, 0.4)
# -----------------------------------------------------------------------------
# Dataset
# -----------------------------------------------------------------------------
_C.DATASETS = CN()
# List of the dataset names for training, as present in paths_catalog.py
_C.DATASETS.TRAIN = ()
# List of the dataset names for testing, as present in paths_catalog.py
_C.DATASETS.TEST = ()
# train split tor dataset
_C.DATASETS.TRAIN_SPLIT = ""
# test split for dataset
_C.DATASETS.TEST_SPLIT = ""
_C.DATASETS.DETECT_CLASSES = ("Car",)
_C.DATASETS.MAX_OBJECTS = 30
# -----------------------------------------------------------------------------
# DataLoader
# -----------------------------------------------------------------------------
_C.DATALOADER = CN()
# Number of data loading threads
_C.DATALOADER.NUM_WORKERS = 4
# If > 0, this enforces that each collated batch should have a size divisible
# by SIZE_DIVISIBILITY
_C.DATALOADER.SIZE_DIVISIBILITY = 0
# If True, each batch should contain only images for which the aspect ratio
# is compatible. This groups portrait images together, and landscape images
# are not batched with portrait images.
_C.DATALOADER.ASPECT_RATIO_GROUPING = False
# ---------------------------------------------------------------------------- #
# Backbone options
# ---------------------------------------------------------------------------- #
_C.MODEL.BACKBONE = CN()
# The backbone conv body to use
# The string must match a function that is imported in modeling.model_builder
_C.MODEL.BACKBONE.CONV_BODY = "DLA-34-DCN"
# Add StopGrad at a specified stage so the bottom layers are frozen
_C.MODEL.BACKBONE.FREEZE_CONV_BODY_AT = 0
# Normalization for backbone
_C.MODEL.BACKBONE.USE_NORMALIZATION = "GN"
_C.MODEL.BACKBONE.DOWN_RATIO = 4
_C.MODEL.BACKBONE.BACKBONE_OUT_CHANNELS = 64
# ---------------------------------------------------------------------------- #
# Group Norm options
# ---------------------------------------------------------------------------- #
_C.MODEL.GROUP_NORM = CN()
# Number of dimensions per group in GroupNorm (-1 if using NUM_GROUPS)
_C.MODEL.GROUP_NORM.DIM_PER_GP = -1
# Number of groups in GroupNorm (-1 if using DIM_PER_GP)
_C.MODEL.GROUP_NORM.NUM_GROUPS = 32
# GroupNorm's small constant in the denominator
_C.MODEL.GROUP_NORM.EPSILON = 1e-5
# ---------------------------------------------------------------------------- #
# Heatmap Head options
# ---------------------------------------------------------------------------- #
# --------------------------SMOKE Head--------------------------------
_C.MODEL.SMOKE_HEAD = CN()
_C.MODEL.SMOKE_HEAD.PREDICTOR = "SMOKEPredictor"
_C.MODEL.SMOKE_HEAD.LOSS_TYPE = ("FocalLoss", "DisL1")
_C.MODEL.SMOKE_HEAD.LOSS_ALPHA = 2
_C.MODEL.SMOKE_HEAD.LOSS_BETA = 4
# Channels for regression
_C.MODEL.SMOKE_HEAD.REGRESSION_HEADS = 8
# Specific channel for (depth_offset, keypoint_offset, dimension_offset, orientation)
_C.MODEL.SMOKE_HEAD.REGRESSION_CHANNEL = (1, 2, 3, 2)
_C.MODEL.SMOKE_HEAD.USE_NORMALIZATION = "GN"
_C.MODEL.SMOKE_HEAD.NUM_CHANNEL = 256
# Loss weight for hm and reg loss
_C.MODEL.SMOKE_HEAD.LOSS_WEIGHT = (1., 10.)
# Reference car size in (length, height, width)
# for (car, cyclist, pedestrian)
_C.MODEL.SMOKE_HEAD.DIMENSION_REFERENCE = ((3.88, 1.63, 1.53),
(1.78, 1.70, 0.58),
(0.88, 1.73, 0.67))
# Reference depth
_C.MODEL.SMOKE_HEAD.DEPTH_REFERENCE = (28.01, 16.32)
_C.MODEL.SMOKE_HEAD.USE_NMS = False
# ---------------------------------------------------------------------------- #
# Solver
# ---------------------------------------------------------------------------- #
_C.SOLVER = CN()
_C.SOLVER.OPTIMIZER = "Adam"
_C.SOLVER.MAX_ITERATION = 14500
_C.SOLVER.STEPS = (5850, 9350)
_C.SOLVER.BASE_LR = 0.00025
_C.SOLVER.BIAS_LR_FACTOR = 2
_C.SOLVER.LOAD_OPTIMIZER_SCHEDULER = True
_C.SOLVER.CHECKPOINT_PERIOD = 20
_C.SOLVER.EVALUATE_PERIOD = 20
# Number of images per batch
# This is global, so if we have 8 GPUs and IMS_PER_BATCH = 16, each GPU will
# see 2 images per batch
_C.SOLVER.IMS_PER_BATCH = 32
_C.SOLVER.MASTER_BATCH = -1
# ---------------------------------------------------------------------------- #
# Test
# ---------------------------------------------------------------------------- #
_C.TEST = CN()
# Number of images per batch
# This is global, so if we have 8 GPUs and IMS_PER_BATCH = 16, each GPU will
# see 2 images per batch
_C.TEST.SINGLE_GPU_TEST = True
_C.TEST.IMS_PER_BATCH = 1
_C.TEST.PRED_2D = True
# Number of detections per image
_C.TEST.DETECTIONS_PER_IMG = 50
_C.TEST.DETECTIONS_THRESHOLD = 0.25
# ---------------------------------------------------------------------------- #
# Misc options
# ---------------------------------------------------------------------------- #
# Directory where output files are written
_C.OUTPUT_DIR = "./output/exp"
# Set seed to negative to fully randomize everything.
# Set seed to positive to use a fixed seed. Note that a fixed seed does not
# guarantee fully deterministic behavior.
_C.SEED = -1
# Benchmark different cudnn algorithms.
# If input images have very different sizes, this option will have large overhead
# for about 10k iterations. It usually hurts total time, but can benefit for certain models.
# If input images have the same or similar sizes, benchmark is often helpful.
_C.CUDNN_BENCHMARK = True
_C.PATHS_CATALOG = os.path.join(os.path.dirname(__file__), "paths_catalog.py")
smoke_gn_vector.yaml
示例(特定参数):MODEL:
WEIGHT: "catalog://ImageNetPretrained/DLA34"
INPUT:
FLIP_PROB_TRAIN: 0.5
SHIFT_SCALE_PROB_TRAIN: 0.3
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_test",)
TRAIN_SPLIT: "trainval"
TEST_SPLIT: "test"
SOLVER:
BASE_LR: 2.5e-4
STEPS: (10000, 18000)
MAX_ITERATION: 25000
IMS_PER_BATCH: 32
题目:SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
论文:https://arxiv.org/pdf/2002.10111.pdf
作者官方维护的源码:https://github.com/lzccccc/SMOKE
OpenMMLab复现的MMDetection3D版本:https://github.com/open-mmlab/mmdetection3d
SMOKE是一个One-Stage的单目3D检测模型,它认为2D检测对于单目3D检测任务来说是冗余的,且会引入噪声影响3D检测性能,所以直接用关键点预测和3D框回归的方式,将每个物体与单个关键点配对,结合单个关键点估计和回归的三维变量来预测每个被检测物体的三维边界框。
输入图像经过DLA-34网络进行特征提取,之后送入两个检测分支:关键点预测分支和3D边界框回归分支
主干网络采用带有可变形卷积DCN(Deformable Convolution Network)以及GN(GroupNorm)标准化的DLA-34网络(与CenterNet类似)提取特征,网络输出分辨率为输入分辨率的四分之一。论文中采用DLA-34作为主干网络进行特征提取,以便对不同层之间的特征进行聚合。网络中主要做了两点改动如下:
SMOKE的检测网络主要包括关键点检测、3D边界框回归分支
SMOKE的损失函数,包括关键点分类损失函数+3D边界框回归损失函数
SMOKE算法的源码主要有两个版本:
根据本人实际使用的情况看,直接上手MMDetection3D版本就行(确实好用),官方版本目前只能实现训练和简单测试(还要额外添加其他库),很多功能还不完善,有兴趣的小伙伴可以尝试学习一下,就当做锻炼自己看代码的能力了
https://github.com/open-mmlab/mmdetection3d
1、创建环境
# 在Anaconda中新建虚拟环境
conda create -n mmdet3d python=3.7 -y
conda activate mmdet3d
# 安装最新的PyTorch版本
conda install -c pytorch pytorch torchvision -y
# install mmcv
pip install mmcv-full
# install mmdetection
pip install git+https://github.com/open-mmlab/mmdetection.git
# install mmsegmentation
pip install git+https://github.com/open-mmlab/mmsegmentation.git
# install mmdetection3d
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -v -e . # or "python setup.py develop"
# -v:verbose, or more output
# -e:editable,修改本地文件,调用的模块以最新文件为准
2、kitti数据集准备
参考官方教程:3D 目标检测 KITTI 数据集
3、修改参数
/mmdetection3d/configs/_base_/datasets/kitti-mono3d.py
文件,修改data_root = '/your_datasets_root'
/mmdetection3d/configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py
文件,按需修改参数(例如修改max_epochs、保存权重的间隔数等等)4、训练
配置好环境、数据集、参数之后,就可以直接进行训练(以多卡训练为例):
CUDA_VISIBLE_DEVICES=0,1,2,3 tools/dist_train.sh configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py 4
这里没有指定保存路径,默认保存至/mmdetection3d/work_dirs/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d/
文件夹中
6、测试及可视化
直接在命令行输入以下命令即可:
python tools/test.py configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py work_dirs/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d/latest.pth --show --show-dir ./outputs/smoke/smoke_kitti_72e
目前对于SMOKE算法来说,是不可以通过改变score_thr
参数,来调节可视化输出的3D框数量,原因是SMOKE的检测头SMOKEMono3D
继承自SingleStageMono3DDetector
:
而在SingleStageMono3DDetector
类中,还未实现score_thr
参数的调节功能(这个bug让我一顿好找o(╥﹏╥)o)
https://github.com/lzccccc/SMOKE
1、创建环境
conda create -n smoke python=3.7 -y
conda activate smoke
pip install torch==1.4.0 torchvision==0.5.0
git clone https://github.com/lzccccc/SMOKE
cd smoke
python setup.py build develop
2、添加安装库文件:在smoke
主目录下,新建requirements.txt
文件,并写入以下安装包信息:
shapely
tqdm
tensorboard
tensorboardX
scikit-image
matplotlib
yacs
pyyaml
fire
pycocotools
fvcore
opencv-python
numba
inplace_abn
之后在命令行执行pip install -r requirements.txt
进行安装
3、KITTI数据集下载及配置
具体下载步骤可参考这篇博客:【MMDetection3D】环境搭建,使用PointPillers训练&测试&可视化KITTI数据集,下载完成后,将数据集按照以下结构进行组织:
kitti
│──training
│ ├──calib
│ ├──label_2
│ ├──image_2
│ └──ImageSets
└──testing
├──calib
├──image_2
└──ImageSets
4、修改数据集路径
方式一:软连接下载好的kitti数据集到datasets文件夹中,之后就不用管啦,默认路径就是datasets/kitti/
,但是这种方式在之后的测试阶段会出现找不到文件的情况
mkdir datasets
ln -s /path_to_kitti_dataset datasets/kitti
方式二(推荐):打开/smoke/smoke/config/paths_catalog.py
,直接修改数据集路径
class DatasetCatalog():
DATA_DIR = "your_datasets_root/"
DATASETS = {
"kitti_train": {
"root": "kitti/training/",
},
"kitti_test": {
"root": "kitti/testing/",
},
}
5、修改训练设置(可选)
打开/smoke/configs/smoke_gn_vector.yaml
文件,可以修改一些训练参数,比如训练迭代次数、batchsize等:
# 模型设置
MODEL:
WEIGHT: "catalog://ImageNetPretrained/DLA34"
# 数据集设置
INPUT:
FLIP_PROB_TRAIN: 0.5
SHIFT_SCALE_PROB_TRAIN: 0.3
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_test",)
TRAIN_SPLIT: "trainval"
TEST_SPLIT: "test"
# 训练参数设置
SOLVER:
BASE_LR: 2.5e-4
STEPS: (10000, 15000)
MAX_ITERATION: 20000 # 迭代次数
IMS_PER_BATCH: 8 # 所有GPU的batch_size
6、全部参数设置
打开/smoke/smoke/config/defaults.py
文件,可以修改全部配置参数,包括数据集输入、处理、模型结构、训练、测试等参数。这个文件最好不要动,如果要修改参数,就去上一步的smoke_gn_vector.yaml
文件中进行修改。比如要修改训练、测试结果保存的路径,可以在最后直接加入:
# 模型设置
MODEL:
WEIGHT: "catalog://ImageNetPretrained/DLA34"
# 数据集设置
INPUT:
FLIP_PROB_TRAIN: 0.5
SHIFT_SCALE_PROB_TRAIN: 0.3
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_test",)
TRAIN_SPLIT: "trainval"
TEST_SPLIT: "test"
# 训练参数设置
SOLVER:
BASE_LR: 2.5e-4
STEPS: (10000, 15000)
MAX_ITERATION: 20000 # 迭代次数
IMS_PER_BATCH: 8 # 所有GPU的batch_size
# 输出保存路径
OUTPUT_DIR: "./output/exp"
7、开始训练
python tools/plain_train_net.py --config-file "configs/smoke_gn_vector.yaml"
python tools/plain_train_net.py --num-gpus 4 --config-file "configs/smoke_gn_vector.yaml"
/root/.torch/models/dla34-ba72cf86.pth
即可8、测试
SMOKE官方源码在测试时会有很多问题,作者在这篇issue中给出了解决方案:
You need to put offline kitti eval code under the folder “/smoke/data/datasets/evaluation/kitti/kitti_eval”
if you are using the train/val split. It will compile it automatically and evaluate the performance.
The eval code can be found here:
https://github.com/prclibo/kitti_eval (for 11 recall points)
https://github.com/lzccccc/kitti_eval_offline (for 40 recall points)
However, if you are using the trainval (namely the whole training set), there is no need to evaluate it offline. You need to log in to the kitti webset and submit your result.
具体的测试步骤如下:
/smoke/smoke/data/datasets/evaluation/kitti/
文件夹中/smoke/configs/smoke_gn_vector.yaml
文件,将DATASETS
部分修改为:DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_train",)
TRAIN_SPLIT: "train"
TEST_SPLIT: "val"
/smoke/smoke/data/datasets/evaluation/kitti/kitti_eval.py
文件中的do_kitti_detection_evaluation
函数:def do_kitti_detection_evaluation(dataset,
predictions,
output_folder,
logger
):
predict_folder = os.path.join(output_folder, 'data') # only recognize data
mkdir(predict_folder)
for image_id, prediction in predictions.items():
predict_txt = image_id + '.txt'
predict_txt = os.path.join(predict_folder, predict_txt)
generate_kitti_3d_detection(prediction, predict_txt)
logger.info("Evaluate on KITTI dataset")
output_dir = os.path.abspath(output_folder)
os.chdir('./smoke/data/datasets/evaluation/kitti/kitti_eval')
# os.chdir('../smoke/data/datasets/evaluation/kitti/kitti_eval')
label_dir = getattr(dataset, 'label_dir')
if not os.path.isfile('evaluate_object_3d_offline'):
subprocess.Popen('g++ -O3 -DNDEBUG -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp', shell=True)
command = "./evaluate_object_3d_offline {} {}".format(label_dir, output_dir)
output = subprocess.check_output(command, shell=True, universal_newlines=True).strip()
logger.info(output)
os.chdir('./')
# os.chdir('../')
python tools/plain_train_net.py --eval-only --ckpt YOUR_CKPT --config-file "configs/smoke_gn_vector.yaml"
这里测试的逻辑是:
kitti_eval
文件夹中,执行g++ -O3 -DNDEBUG -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp
,编译生成evaluate_object_3d_offline
文件kitti_eval
文件夹中,执行./evaluate_object_3d_offline /your_root_dir/kitti/training/label_2/ /your_root_dir/smoke/output/exp4/inference/kitti_train
,进行指标计算注意!!测试这一步坑很多:
subprocess
,第412
行(不同版本位置可能不同),将check
改为False
即可subprocess.CalledProcessError: Command './evaluate_object_3d_offline datasets/kitti/training/label_2 /home/rrl/det3d/smoke/output/exp4/inference/kitti_train' returned non-zero exit status 127.
label_2
文件夹的路径,推荐使用绝对路径,而不是软连接(我一开始用的软连接,一直报这个错o(╥﹏╥)o)Thank you for participating in our evaluation!
Loading detections...
number of files for evaluation: 3769
ERROR: Couldn't read: 006071.txt of ground truth. Please write me an email!
An error occured while processing your results.
9、可视化预测结果
Coming soon…
题目:Objects are Different: Flexible Monocular 3D Object Detection
论文:https://arxiv.org/pdf/2104.02323.pdf
源码:https://github.com/zhangyp15/MonoFlex
现有单目3D目标检测大多忽略了对象之间的差异,对所有对象进行同等和联合处理可能会很难检测到严重截断的对象,并且这些硬样本会增加学习负担,并影响对一般对象的预测,造成检测性能下降。因此,统一的方法可能无法找到每个对象,也无法预测精确的3D位置。为此,作者提出了一种灵活的检测器,它考虑了对象之间的差异,并以自适应方式估计其3D位置。
主要贡献主要归纳为以下两点:
Nonoflex框架以及检测思想是从CenterNet扩展而来的,CenterNet的核心思想是将目标作为一个点,即目标BBox的中心点,检测器采用关键点估计来找到中心点,并回归到其他目标属性,例如2D边界框、维度、方向、关键点和深度。最终深度估计是回归深度和根据估计的关键点和尺寸计算的深度的不确定性组合:
物体的3D检测包括估计其3D位置 ( x , y , z ) (x,y,z) (x,y,z)、尺寸 ( h , w , l ) (h,w,l) (h,w,l)和方向 θ \theta θ。尺寸和方向可以直接从基于外观的线索推断出来,而3D位置则转换为投影的3D中心 x c = ( u c , v c ) x_c=(u_c,v_c) xc=(uc,vc)和对象深度 z z z:
x = ( u c − c u ) z f y = ( v c − c v ) z f \begin{aligned} &x=\frac{\left(u_c-c_u\right) z}{f} \\ &y=\frac{\left(v_c-c_v\right) z}{f} \end{aligned} x=f(uc−cu)zy=f(vc−cv)z
其中, ( c u , c v ) (c_u,c_v) (cu,cv)为主点(principle point), f f f为焦距(focal length)。3D位置转换为投影中心和对象深度的示意图如下所示:
现有的单目3D检测方法对每个对象使用统一表示 x r x_r xr,即2D边界框 x b x_b xb的中心点。计算偏移 δ c = x c − x b \delta_c=x_c−x_b δc=xc−xb回归以导出投影的3D中心 x c x_c xc。根据物体的投影3D中心在图像内部还是外部,我们将物体分为两组,内部对象(Inside Objects)和外部对象(Outside Objects)在从2D中心到投影3D中心过程中,呈现完全不同的偏移 δ c \delta_c δc分布:
因此,作者将将内外对象的表示和偏移学习进行解耦:
作者采用L1 Loss回归 δ i n \delta_{in} δin,Log-Scale L1 Loss回归 δ o u t \delta_{out} δout,因为它对极端异常值更加鲁棒,偏移损失计算为:
L o f f = { ∣ δ i n − δ i n ∗ ∣ if inside log ( 1 + ∣ δ o u t − δ o u t ∗ ∣ ) otherwise L_{o f f}=\left\{\begin{array}{l}\left|\boldsymbol{\delta}_{i n}-\boldsymbol{\delta}_{i n}^*\right|\quad\text { if inside } \\ \log \left(1+\left|\boldsymbol{\delta}_{o u t}-\boldsymbol{\delta}_{o u t}^*\right|\right) \quad \text{otherwise} \end{array}\right. Loff={∣δin−δin∗∣ if inside log(1+∣δout−δout∗∣)otherwise
其中, δ i n \delta_{in} δin和 δ o u t \delta_{out} δout表示预测, δ i n ∗ \delta^*_{in} δin∗和 δ o u t ∗ \delta^*_{out} δout∗表示GT
视觉属性的回归,包括对象的2D边界框、尺寸、方向和关键点
Coming Soon…
1、创建环境
# 创建conda虚拟环境:python==3.7, pytorch==1.4.0 and cuda==10.1
conda create -n monoflex python=3.7 -y
conda activate monoflex
pip install torch==1.4.0 torchvision==0.5.0
# clone代码
git clone https://github.com/zhangyp15/MonoFlex
cd monoflex
# 安装库文件
pip install -r requirements.txt
# Build DCNv2 and the project
cd model/backbone/DCNv2
. make.sh
cd ../../..
python setup.py build develop
2、准备数据集并修改路径
数据集下载及配置同SMOKE中的步骤。下载完成后,打开/monoflex/config/paths_catalog.py
文件,修改数据集路径:
class DatasetCatalog():
DATA_DIR = "/your_datasets_root/"
DATASETS = {
"kitti_train": {
"root": "kitti/training/",
},
"kitti_test": {
"root": "kitti/testing/",
},
}
3、修改训练及测试参数
打开/home/rrl/det3d/monoflex/runs/monoflex.yaml
文件,按照需要进行修改:
SOLVER:
OPTIMIZER: 'adamw'
BASE_LR: 3e-4
WEIGHT_DECAY: 1e-5
LR_WARMUP: False
WARMUP_STEPS: 2000
# for 1 GPU
LR_DECAY: 0.1
# 使用epoch作为训练的次数,而不是iterations
EVAL_AND_SAVE_EPOCH: True
EVAL_EPOCH_INTERVAL: 1
SAVE_CHECKPOINT_EPOCH_INTERVAL: 2
# 训练epoch数
MAX_EPOCHS: 100
DECAY_EPOCH_STEPS: [80, 90]
# batchsize大小
IMS_PER_BATCH: 8
EVAL_INTERVAL: 1000
TEST:
UNCERTAINTY_AS_CONFIDENCE: True
# 检测阈值越大,检测出来的框越少
DETECTIONS_THRESHOLD: 0.9
METRIC: ['R40']
# 保存路径
OUTPUT_DIR: "./output/exp1"
4、开始训练
CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --batch_size 8 --config runs/monoflex.yaml --output output/exp
/root/.cache/torch/checkpoints/dla34-ba72cf86.pth
即可5、测试及可视化
CUDA_VISIBLE_DEVICES=0 python tools/plain_train_net.py --config runs/monoflex.yaml --ckpt YOUR_CKPT --eval --vis
6、保存可视化图像(可选)
为了实时保存可视化图像,对源代码进行以下修改:
/monoflex/engine/inference.py
文件,在inference
函数中调用compute_on_dataset
函数的地方,添加新的传参output_dir = output_folder
,也就是把保存路径传给之后的可视化函数,目的是将可视化结果保存在我们指定的目录下:/monoflex/engine/inference.py
文件,在compute_on_dataset
函数中添加新的传参output_dir = None
,并且设置新的子文件夹save_jpg
,将作为参数其传递给show_image_with_boxes
函数:/monoflex/engine/visualize_infer.py
文件,在show_image_with_boxes
函数中添加新的传参save_dir = None
,show_image_with_boxes
函数的最后,添加保存图像的代码,这里既保存plt.fifure()
合成的完整图像(包括热力图、检测结果图和BEV视角正确和错误的推理图),又保存检测结果图(即img3
):
yacs的使用小记
https://github.com/lzccccc/SMOKE/issues/4
[CVPRW 2020] SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation 论文阅读
Apollo 7.0障碍物感知模型原型!SMOKE 单目3D目标检测,代码开源!
【单目3D检测】Monoflex论文阅读
文献阅读:(CVPR2021)Objects are Different: Flexible Monocular 3D Object Detection