浙江大学的一篇工作。可以直接看原作者的中文介绍:
https://zhuanlan.zhihu.com/p/157530787
官方源码:
https://github.com/cfzd/Ultra-Fast-Lane-Detection
这里主要是记录一下自己的学习,同时对官方源码进行了解读。下面是正文:
现今主流做法将车道检测当作语义分割问题来处理,但这样存在复杂场景下效果不好、速度慢的缺点。受人类感知的启发,在严重遮挡和极端光照条件下对车道的识别主要基于上下文和全局信息。具体来说,我们将车道检测过程视为使用全局特征的基于行的选择问题。
contributions:
传统方法。
深度学习方法。
快速和没有视觉线索对车道检测十分重要。在本节中,我们通过解决速度和无视觉线索的问题来展示我们的范式的派生。 为了更好地说明,表1显示了下文中使用的一些符号。
git clone https://github.com/cfzd/Ultra-Fast-Lane-Detection
cd Ultra-Fast-Lane-Detection
conda create -n lane-det python=3.7 -y
conda activate lane-det
#If you dont have pytorch
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt
下载CULane
和Tusimple
。 然后将它们提取到$ CULANEROOT
和$ TUSIMPLEROOT
。 Tusimple
的目录排列应如下所示:
$TUSIMPLEROOT
|──clips
|──label_data_0313.json
|──label_data_0531.json
|──label_data_0601.json
|──test_tasks_0627.json
|──test_label.json
|──readme.md
CULane
的目录排列应如下所示:
$CULANEROOT
|──driver_100_30frame
|──driver_161_90frame
|──driver_182_30frame
|──driver_193_90frame
|──driver_23_30frame
|──driver_37_30frame
|──laneseg_label_w16
|──list
对Tusimple
数据集,未提供语义分割标签,因此我们需要根据json文件生成。
python scripts/convert_tusimple.py --root $TUSIMPLEROOT
#this will generate segmentations and two list files: train_gt.txt and test.txt
import os
import cv2
import tqdm
import numpy as np
import pdb
import json, argparse
def calc_k(line):
'''
Calculate the direction of lanes
'''
line_x = line[::2]
line_y = line[1::2]
length = np.sqrt((line_x[0]-line_x[-1])**2 + (line_y[0]-line_y[-1])**2)
if length < 90:
return -10 # if the lane is too short, it will be skipped
p = np.polyfit(line_x, line_y,deg = 1)
rad = np.arctan(p[0])
return rad
def draw(im,line,idx,show = False):
'''
Generate the segmentation label according to json annotation
'''
line_x = line[::2]
line_y = line[1::2]
pt0 = (int(line_x[0]),int(line_y[0]))
if show:
cv2.putText(im,str(idx),(int(line_x[len(line_x) // 2]),int(line_y[len(line_x) // 2]) - 20),cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), lineType=cv2.LINE_AA)
idx = idx * 60
for i in range(len(line_x)-1):
cv2.line(im,pt0,(int(line_x[i+1]),int(line_y[i+1])),(idx,),thickness = 16)
pt0 = (int(line_x[i+1]),int(line_y[i+1]))
def get_tusimple_list(root, label_list):
'''
Get all the files' names from the json annotation
'''
label_json_all = []
for l in label_list:
l = os.path.join(root,l)
label_json = [json.loads(line) for line in open(l).readlines()]
label_json_all += label_json
names = [l['raw_file'] for l in label_json_all]
h_samples = [np.array(l['h_samples']) for l in label_json_all]
lanes = [np.array(l['lanes']) for l in label_json_all]
line_txt = []
for i in range(len(lanes)):
line_txt_i = []
for j in range(len(lanes[i])):
if np.all(lanes[i][j] == -2):
continue
valid = lanes[i][j] != -2
line_txt_tmp = [None]*(len(h_samples[i][valid])+len(lanes[i][j][valid]))
line_txt_tmp[::2] = list(map(str,lanes[i][j][valid]))
line_txt_tmp[1::2] = list(map(str,h_samples[i][valid]))
line_txt_i.append(line_txt_tmp)
line_txt.append(line_txt_i)
return names,line_txt
def generate_segmentation_and_train_list(root, line_txt, names):
"""
The lane annotations of the Tusimple dataset is not strictly in order, so we need to find out the correct lane order for segmentation.
We use the same definition as CULane, in which the four lanes from left to right are represented as 1,2,3,4 in segentation label respectively.
"""
train_gt_fp = open(os.path.join(root,'train_gt.txt'),'w')
for i in tqdm.tqdm(range(len(line_txt))):
tmp_line = line_txt[i]
lines = []
for j in range(len(tmp_line)):
lines.append(list(map(float,tmp_line[j])))
ks = np.array([calc_k(line) for line in lines]) # get the direction of each lane
k_neg = ks[ks<0].copy()
k_pos = ks[ks>0].copy()
k_neg = k_neg[k_neg != -10] # -10 means the lane is too short and is discarded
k_pos = k_pos[k_pos != -10]
k_neg.sort()
k_pos.sort()
label_path = names[i][:-3]+'png'
label = np.zeros((720,1280),dtype=np.uint8)
bin_label = [0,0,0,0]
if len(k_neg) == 1: # for only one lane in the left
which_lane = np.where(ks == k_neg[0])[0][0]
draw(label,lines[which_lane],2)
bin_label[1] = 1
elif len(k_neg) == 2: # for two lanes in the left
which_lane = np.where(ks == k_neg[1])[0][0]
draw(label,lines[which_lane],1)
which_lane = np.where(ks == k_neg[0])[0][0]
draw(label,lines[which_lane],2)
bin_label[0] = 1
bin_label[1] = 1
elif len(k_neg) > 2: # for more than two lanes in the left,
which_lane = np.where(ks == k_neg[1])[0][0] # we only choose the two lanes that are closest to the center
draw(label,lines[which_lane],1)
which_lane = np.where(ks == k_neg[0])[0][0]
draw(label,lines[which_lane],2)
bin_label[0] = 1
bin_label[1] = 1
if len(k_pos) == 1: # For the lanes in the right, the same logical is adopted.
which_lane = np.where(ks == k_pos[0])[0][0]
draw(label,lines[which_lane],3)
bin_label[2] = 1
elif len(k_pos) == 2:
which_lane = np.where(ks == k_pos[1])[0][0]
draw(label,lines[which_lane],3)
which_lane = np.where(ks == k_pos[0])[0][0]
draw(label,lines[which_lane],4)
bin_label[2] = 1
bin_label[3] = 1
elif len(k_pos) > 2:
which_lane = np.where(ks == k_pos[-1])[0][0]
draw(label,lines[which_lane],3)
which_lane = np.where(ks == k_pos[-2])[0][0]
draw(label,lines[which_lane],4)
bin_label[2] = 1
bin_label[3] = 1
cv2.imwrite(os.path.join(root,label_path),label)
train_gt_fp.write(names[i] + ' ' + label_path + ' '+' '.join(list(map(str,bin_label))) + '\n')
train_gt_fp.close()
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--root', required=True, help='The root of the Tusimple dataset')
return parser
if __name__ == "__main__":
args = get_args().parse_args()
# training set
names,line_txt = get_tusimple_list(args.root, ['label_data_0601.json','label_data_0531.json','label_data_0313.json'])
# generate segmentation and training list for training
generate_segmentation_and_train_list(args.root, line_txt, names) # 3268+358=3626
# testing set
names,line_txt = get_tusimple_list(args.root, ['test_tasks_0627.json'])
# generate testing set for testing
with open(os.path.join(args.root,'test.txt'),'w') as fp: # 2782
for name in names:
fp.write(name + '\n')
如果您只想训练模型或进行演示,则不需要此工具,可以跳过此步骤。 如果要在CULane上获得评估结果,则应安装此工具。
此工具需要OpenCV C ++。 请按照此处安装OpenCV C ++。 当您构建OpenCV时,请从PATH中删除anaconda的路径,否则它将失败。
#First you need to install OpenCV C++.
#After installation, make a soft link of OpenCV include path.
ln -s /usr/local/include/opencv4/opencv2 /usr/local/include/opencv2
我们提供了三种Compile管道来构建CULane的评估工具。
选择1:
cd evaluation/culane
make
选择2:
cd evaluation/culane
mkdir build && cd build
cmake …
make
mv culane_evaluator …/evaluate
选择3:(对Windows用户)
mkdir build-vs2017
cd build-vs2017
cmake … -G “Visual Studio 15 2017 Win64”
cmake --build . --config Release
#or, open the “xxx.sln” file by Visual Studio and click build button
move culane_evaluator …/evaluate
注:按照RESA这篇文章里的做法,好像上述第5步可以由下述命令取代:(待验证)
sudo apt-get install libopencv-dev
首先,请根据您的环境在configs / culane.py
或configs / tusimple.py
配置中修改data_root
和log_path
。
data_root
是您的CULane数据集或Tusimple数据集的路径。log_path
是tensorboard日志,训练好的模型和代码备份的存储位置。 应将其放置在该项目之外。对于单个GPU:
python train.py configs/path_to_your_config
train.py
文件如下:
import torch, os, datetime
import numpy as np
from model.model import parsingNet
from data.dataloader import get_train_loader
from utils.dist_utils import dist_print, dist_tqdm, is_main_process, DistSummaryWriter
from utils.factory import get_metric_dict, get_loss_dict, get_optimizer, get_scheduler
from utils.metrics import MultiLabelAcc, AccTopk, Metric_mIoU, update_metrics, reset_metrics
from utils.common import merge_config, save_model, cp_projects
from utils.common import get_work_dir, get_logger
import time
def inference(net, data_label, use_aux):
if use_aux:
img, cls_label, seg_label = data_label
img, cls_label, seg_label = img.cuda(), cls_label.long().cuda(), seg_label.long().cuda()
cls_out, seg_out = net(img)
return {'cls_out': cls_out, 'cls_label': cls_label, 'seg_out':seg_out, 'seg_label': seg_label}
else:
img, cls_label = data_label
img, cls_label = img.cuda(), cls_label.long().cuda()
cls_out = net(img)
return {'cls_out': cls_out, 'cls_label': cls_label}
def resolve_val_data(results, use_aux):
results['cls_out'] = torch.argmax(results['cls_out'], dim=1)
if use_aux:
results['seg_out'] = torch.argmax(results['seg_out'], dim=1)
return results
def calc_loss(loss_dict, results, logger, global_step):
loss = 0
for i in range(len(loss_dict['name'])):
data_src = loss_dict['data_src'][i]
datas = [results[src] for src in data_src]
loss_cur = loss_dict['op'][i](*datas)
if global_step % 20 == 0:
logger.add_scalar('loss/'+loss_dict['name'][i], loss_cur, global_step)
loss += loss_cur * loss_dict['weight'][i]
return loss
def train(net, data_loader, loss_dict, optimizer, scheduler,logger, epoch, metric_dict, use_aux):
net.train()
progress_bar = dist_tqdm(train_loader)
t_data_0 = time.time()
for b_idx, data_label in enumerate(progress_bar):
t_data_1 = time.time()
reset_metrics(metric_dict)
global_step = epoch * len(data_loader) + b_idx
t_net_0 = time.time()
results = inference(net, data_label, use_aux)
loss = calc_loss(loss_dict, results, logger, global_step)
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step(global_step)
t_net_1 = time.time()
results = resolve_val_data(results, use_aux)
update_metrics(metric_dict, results)
if global_step % 20 == 0:
for me_name, me_op in zip(metric_dict['name'], metric_dict['op']):
logger.add_scalar('metric/' + me_name, me_op.get(), global_step=global_step)
logger.add_scalar('meta/lr', optimizer.param_groups[0]['lr'], global_step=global_step)
if hasattr(progress_bar,'set_postfix'):
kwargs = {me_name: '%.3f' % me_op.get() for me_name, me_op in zip(metric_dict['name'], metric_dict['op'])}
progress_bar.set_postfix(loss = '%.3f' % float(loss),
data_time = '%.3f' % float(t_data_1 - t_data_0),
net_time = '%.3f' % float(t_net_1 - t_net_0),
**kwargs)
t_data_0 = time.time()
if __name__ == "__main__":
torch.backends.cudnn.benchmark = True
args, cfg = merge_config()
work_dir = get_work_dir(cfg)
distributed = False
if 'WORLD_SIZE' in os.environ:
distributed = int(os.environ['WORLD_SIZE']) > 1
if distributed:
torch.cuda.set_device(args.local_rank)
torch.distributed.init_process_group(backend='nccl', init_method='env://')
dist_print(datetime.datetime.now().strftime('[%Y/%m/%d %H:%M:%S]') + ' start training...')
dist_print(cfg)
assert cfg.backbone in ['18','34','50','101','152','50next','101next','50wide','101wide']
train_loader, cls_num_per_lane = get_train_loader(cfg.batch_size, cfg.data_root, cfg.griding_num, cfg.dataset, cfg.use_aux, distributed, cfg.num_lanes)
net = parsingNet(pretrained = True, backbone=cfg.backbone,cls_dim = (cfg.griding_num+1,cls_num_per_lane, cfg.num_lanes),use_aux=cfg.use_aux).cuda()
if distributed:
net = torch.nn.parallel.DistributedDataParallel(net, device_ids = [args.local_rank])
optimizer = get_optimizer(net, cfg)
if cfg.finetune is not None:
dist_print('finetune from ', cfg.finetune)
state_all = torch.load(cfg.finetune)['model']
state_clip = {} # only use backbone parameters
for k,v in state_all.items():
if 'model' in k:
state_clip[k] = v
net.load_state_dict(state_clip, strict=False)
if cfg.resume is not None:
dist_print('==> Resume model from ' + cfg.resume)
resume_dict = torch.load(cfg.resume, map_location='cpu')
net.load_state_dict(resume_dict['model'])
if 'optimizer' in resume_dict.keys():
optimizer.load_state_dict(resume_dict['optimizer'])
resume_epoch = int(os.path.split(cfg.resume)[1][2:5]) + 1
else:
resume_epoch = 0
scheduler = get_scheduler(optimizer, cfg, len(train_loader))
dist_print(len(train_loader))
metric_dict = get_metric_dict(cfg)
loss_dict = get_loss_dict(cfg)
logger = get_logger(work_dir, cfg)
cp_projects(args.auto_backup, work_dir)
for epoch in range(resume_epoch, cfg.epoch):
train(net, train_loader, loss_dict, optimizer, scheduler,logger, epoch, metric_dict, cfg.use_aux)
save_model(net, optimizer, epoch ,work_dir, distributed)
logger.close()
上述代码调用的model.model.py
如下:
import torch
from model.backbone import resnet
import numpy as np
class conv_bn_relu(torch.nn.Module):
def __init__(self,in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1,bias=False):
super(conv_bn_relu,self).__init__()
self.conv = torch.nn.Conv2d(in_channels,out_channels, kernel_size,
stride = stride, padding = padding, dilation = dilation,bias = bias)
self.bn = torch.nn.BatchNorm2d(out_channels)
self.relu = torch.nn.ReLU()
def forward(self,x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class parsingNet(torch.nn.Module):
def __init__(self, size=(288, 800), pretrained=True, backbone='50', cls_dim=(37, 10, 4), use_aux=False):
super(parsingNet, self).__init__()
self.size = size
self.w = size[0]
self.h = size[1]
self.cls_dim = cls_dim # (num_gridding, num_cls_per_lane, num_of_lanes)
# num_cls_per_lane is the number of row anchors
self.use_aux = use_aux
self.total_dim = np.prod(cls_dim)
# input : nchw,
# output: (w+1) * sample_rows * 4
self.model = resnet(backbone, pretrained=pretrained)
if self.use_aux:
self.aux_header2 = torch.nn.Sequential(
conv_bn_relu(128, 128, kernel_size=3, stride=1, padding=1) if backbone in ['34','18'] else conv_bn_relu(512, 128, kernel_size=3, stride=1, padding=1),
conv_bn_relu(128,128,3,padding=1),
conv_bn_relu(128,128,3,padding=1),
conv_bn_relu(128,128,3,padding=1),
)
self.aux_header3 = torch.nn.Sequential(
conv_bn_relu(256, 128, kernel_size=3, stride=1, padding=1) if backbone in ['34','18'] else conv_bn_relu(1024, 128, kernel_size=3, stride=1, padding=1),
conv_bn_relu(128,128,3,padding=1),
conv_bn_relu(128,128,3,padding=1),
)
self.aux_header4 = torch.nn.Sequential(
conv_bn_relu(512, 128, kernel_size=3, stride=1, padding=1) if backbone in ['34','18'] else conv_bn_relu(2048, 128, kernel_size=3, stride=1, padding=1),
conv_bn_relu(128,128,3,padding=1),
)
self.aux_combine = torch.nn.Sequential(
conv_bn_relu(384, 256, 3,padding=2,dilation=2),
conv_bn_relu(256, 128, 3,padding=2,dilation=2),
conv_bn_relu(128, 128, 3,padding=2,dilation=2),
conv_bn_relu(128, 128, 3,padding=4,dilation=4),
torch.nn.Conv2d(128, cls_dim[-1] + 1,1)
# output : n, num_of_lanes+1, h, w
)
initialize_weights(self.aux_header2,self.aux_header3,self.aux_header4,self.aux_combine)
self.cls = torch.nn.Sequential(
torch.nn.Linear(1800, 2048),
torch.nn.ReLU(),
torch.nn.Linear(2048, self.total_dim),
)
self.pool = torch.nn.Conv2d(512,8,1) if backbone in ['34','18'] else torch.nn.Conv2d(2048,8,1)
# 1/32,2048 channel
# 288,800 -> 9,40,2048
# (w+1) * sample_rows * 4
# 37 * 10 * 4
initialize_weights(self.cls)
def forward(self, x):
# n c h w - > n 2048 sh sw
# -> n 2048
x2,x3,fea = self.model(x) # x2:(32,128,36,100) x3:(32,256,18,50) fea:(32,512,9,25)
if self.use_aux:
x2 = self.aux_header2(x2) # (32,128,36,100)
x3 = self.aux_header3(x3) # (32,128,18,50)
x3 = torch.nn.functional.interpolate(x3,scale_factor = 2,mode='bilinear') # (32,128,36,100)
x4 = self.aux_header4(fea) # (32,128,9,25)
x4 = torch.nn.functional.interpolate(x4,scale_factor = 4,mode='bilinear') # (32,128,36,100)
aux_seg = torch.cat([x2,x3,x4],dim=1) # (32,384,36,100)
aux_seg = self.aux_combine(aux_seg) # (32,3,288,800)
else:
aux_seg = None
fea = self.pool(fea).view(-1, 1800) # 输入:(32,512,9,25) view之前:(32,8,9,25) view之后:(32,1800)
group_cls = self.cls(fea).view(-1, *self.cls_dim) # (32,101,56,4)
if self.use_aux:
return group_cls, aux_seg # group_cls:(32,101,56,4) aux_seg:(32,3,288,800)
return group_cls
def initialize_weights(*models):
for model in models:
real_init_weights(model)
def real_init_weights(m):
if isinstance(m, list):
for mini_m in m:
real_init_weights(mini_m)
else:
if isinstance(m, torch.nn.Conv2d):
torch.nn.init.kaiming_normal_(m.weight, nonlinearity='relu')
if m.bias is not None:
torch.nn.init.constant_(m.bias, 0)
elif isinstance(m, torch.nn.Linear):
m.weight.data.normal_(0.0, std=0.01)
elif isinstance(m, torch.nn.BatchNorm2d):
torch.nn.init.constant_(m.weight, 1)
torch.nn.init.constant_(m.bias, 0)
elif isinstance(m,torch.nn.Module):
for mini_m in m.children():
real_init_weights(mini_m)
else:
print('unkonwn module', m)
对多个GPU:
sh launch_training.sh
或者
python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py configs/path_to_your_config
如果没有预训练的torchvision模型,则多gpu训练可能会导致多次下载。 您可以先手动下载相应的模型,然后重新启动多GPU训练。
由于我们的代码具有自动备份功能,该功能将根据gitignore将所有代码复制到log_path,因此,如果未通过gitignore过滤,则可能还会复制其他临时文件,如果临时文件很大,则可能会阻止执行。 因此,您应该保持工作目录的清洁。
除了配置样式设置外,我们还支持命令行样式。 您可以覆盖设置比如:
python train.py configs/path_to_your_config --batch_size 8
batch_size
在训练阶段会被设定为8。
为了用tensorboard进行可视化日志,运行:
tensorboard --logdir log_path --bind_all
我们在CULane和Tusimple上提供了两个经过训练的Res-18模型。
Tusimple:谷歌云/百度云
CULane:谷歌云/百度云
为了评估,运行:
mkdir tmp
# This a bad example, you should put the temp files outside the project.
python test.py configs/culane.py --test_model path_to_culane_18.pth --test_work_dir ./tmp
python test.py configs/tusimple.py --test_model path_to_tusimple_18.pth --test_work_dir ./tmp
同样支持多卡测试。
我们提供了一个脚本来可视化检测结果。 运行以下命令以可视化CULane和Tusimple的测试集。
python demo.py configs/culane.py --test_model path_to_culane_18.pth
# or
python demo.py configs/tusimple.py --test_model path_to_tusimple_18.pth
由于未排序Tusimple的测试集,因此可视化的视频可能看起来很糟糕,我们不建议您这样做。
为了测试运行时间,运行:
python speed_simple.py
# this will test the speed with a simple protocol and requires no additional dependencies
python speed_real.py
# this will test the speed with real video or camera input
它将循环100次,并计算环境中的平均运行时间和fps。
Thanks zchrissirhcz for the contribution to the compile tool of CULane, KopiSoftware for contributing to the speed test, and ustclbh for testing on the Windows platform.
引用信息:
@InProceedings{qin2020ultra,
author = {Qin, Zequn and Wang, Huanyu and Li, Xi},
title = {Ultra Fast Structure-aware Deep Lane Detection},
booktitle = {The European Conference on Computer Vision (ECCV)},
year = {2020}
}