3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation


对3D-LaneNet的改进; 特点 semi-local tile representation: breaks down lanes into simple lane segments whose parameters can be learnt


3D-LaneNet: The first is a CNN architecture with integrated Inverse Perspective Mapping (IPM) to project feature maps to Bird Eye View (BEV)

the second is an anchor based representation which allows casting the lane detection problem to a single stage object detection problem.

not compact objects with easily defined centers. Therefore, instead of predicting the entire lane as a whole, we detect small lane segments that lie within the cell and their attributes

learn for each cell a global embedding that allows clustering the small lane segments together into full 3D lanes


Towards End-to-End Lane Detection: an Instance Segmentation Approach

A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images
【CC】参考 分类-回归的混合框架


each tile holds a line segment parameterized by an offset from the tile center, an orientation and a height offset from the BEV plane

Learning 3D lane segments with Semi-local tile representation

an Inverse Perspective Mapping (IPM) module to project feature maps to BEV. The projection applies a homography, defifined by camera pitch angle ϕcam and height hcam, that maps the image plane to the road plane

We assume that through each tile gij ∈ GW×H can pass a single line segment which can be approximated by a straight line.

the network also predicts a binary classifification score cij indicating the probability that a lane intersects a particular tile

the network regresses, per each tile gij , three parameters: lateral offset distance relative to tile center rij , line angle φij , (see Local tiles in Fig. 1) and height offset ∆zij

Position and z offsets are trained using an L1 loss:
【CC】距离和高度差使用L1 LOSS

Predicting the line angle φij:
we classify the angle φ (omitting tile indexing for brevity) to be in one of Nα bins, centered at α = { 2π/Nα · i} .we regress a vector ∆α, corresponding to the residual offset relative to each bin center

angle bin estimation is optimized using a soft multi-label objective, and the GT
probabilities are calculated as

The angle loss is the sum of the classifification and offset regression losses:
where δαij is the indicator function masking the relevant bins for the offset learning.

The lane tile probability cij is trained using a binary cross entropy loss:
the overall tile loss is the sum over all the tiles in the BEV grid
【CC】上面两个LOSS FUNC就没啥好说的

Global embedding for lane curve clustering

we learn an embedding vector fij for each tile such that vectors representing tiles belonging to the same lane would reside close in embedded space while vectors epresenting tiles of different lanes would reside far apart.

The discriminative push-pull loss is a combination of two losses:
A pull loss aimed at pulling the embeddings of the same lane tiles closer together:

【CC】属于同一根LANE的LOSS FUNC,对均值的误差项补偿上同类的最大距离的L2范数;比较好奇C是如何可变的,另这个∆pul如何设计?

a push loss aimed at pushing the embedding of tiles belonging to different lanes farther apart:
【CC】属于不同LANE的LOSS FUNC, 计算“俩-俩LANE间的2范数距离”然后求平均:属于LANE A与LANE B均值差 与最小距离的差的2范数,整体的设计思路跟PULL差不多;同样,好奇这个∆push如何设计?
where C is the number of lanes (can vary),
Nc is the number of tiles belonging to lane c,
δc ij indicates if tile i, j belongs to lane c,
is the average of fij belonging to lane c,
∆pull constraints the maximal intra-cluster distance
∆push is the inter-cluster minimal required distance.

【CC】对这个LOSS:如果使得整体Loss越小,即使得PULL的LOSS变小,那意味着CELL的特征f大于∆pull这个门限的同时更接近µc,即我们期望得到NN网络,提器CEL的特征后使得属于同一条LANE的特征均值的方差越小越好;同时这个PUSH的LOSS变小,对其取负数,变成LANE A/B间均值的距离 减去 门限后 越大越好,即我们期望得到NN网络,提取CELL的特征使得属于不同LANE的特征均值的距离(2范数)超过门限越大越好

We adopted the clustering methodology from Neven et al. [16] which
uses mean-shift to find the clusters centers and set a threshold around each center to get the cluster members. We set the threshold to ∆push/2
