车道线检测 车道宽度_半局部3d车道检测和不确定性估计

车道线检测 车道宽度

Over the past few years, autonomous driving has drawn numerous attention from both academic and industry. To drive safely, one of the fundamental problems is to perceive the lane structure accurately in real-time. Robust detection on current lane and nearby lanes is not only crucial for lateral vehicle control and accurate localization , but also a powerful tool to build and validate high definition map .

在过去的几年中,自动驾驶引起了学术界和工业界的广泛关注。 为了安全驾驶,基本问题之一是实时准确地感知车道结构。 当前车道和附近车道的鲁棒检测不仅对横向车辆控制和精确定位至关重要,而且是构建和验证高清地图的强大工具。

3D-LaneNet [1] is a 3D lane line detection with uncertainty estimation. This method is based on a semi-local BEV (bird’s eye view) grid representation , which decomposes lane lines into simple lane line segments. This method combines the parameterized model of line segment learning and clustering line segments to become the deep feature embedding of the whole lane line. This combination can extend the method to complex lane topology, curvature and surface geometry. In addition, this method is the first to provide a method based on learning to estimate uncertainty for lane line detection tasks.

3D-LaneNet [1]是具有不确定性估计的3D车道线检测。 该方法基于半局部BEV(鸟瞰图)网格表示,它将车道线分解为简单的车道线段。 该方法将线段学习的参数化模型与线段聚类相结合,成为整个车道线的深度特征嵌入。 这种组合可以将方法扩展到复杂的车道拓扑,曲率和表面几何形状。 另外,该方法是第一种提供基于学习的方法来估计车道线检测任务的不确定性的方法。

The input to the network is a monocular image. This method uses the dual-path backbone method previously proposed by [2], using an encoder and an anti-perspective mapping (IPM) module to project the feature map to a bird’s-eye view (BEV).

网络的输入是单眼图像。 该方法使用先前由[2]提出的双路径主干方法,使用编码器和反透视映射(IPM)模块将特征图投影到鸟瞰(BEV)。

车道线检测 车道宽度_半局部3d车道检测和不确定性估计_第1张图片
3D-LaneNet network architecture.Source [2] 3D-LaneNet网络体系结构。来源[2]

The projection uses the homography defined by the camera elevation angle ϕ and the height h to map the image plane to the road plane, as shown in the figure. The final BEV feature map is spatially divided into a grid G ​​composed of W×H non-overlapping grids. Similar to the previous method, projection can ensure that each pixel in the BEV feature map corresponds to a predefined road position, regardless of the internal parameters and external pose of the camera.

如图所示,投影使用由摄像机仰角the和高度h定义的单应性来将图像平面映射到道路平面。 最终的BEV特征图在空间上划分为由W×H个非重叠网格组成的网格G. 与以前的方法类似,投影可以确保BEV特征图中的每个像素都对应于预定义的道路位置,而与摄像机的内部参数和外部姿态无关。

It is assumed that the lane line passing through each grid can be fitted as a line segment. Specifically, the network regresses three parameters for each grid:

假定可以将通过每个栅格的车道线拟合为线段。 具体地说,网络为每个网格回归三个参数:

  • The lateral offset distance from the center of the grid.

    距网格中心的横向偏移距离。
  • The straight line angle and the height offset.

    直线角度和高度偏移。
  • In addition to these parameters, the network also predicts a binary classification score, which indicates the probability of a lane crossing a specific grid.

    除这些参数外,网络还预测二进制分类分数,该分数表明车道穿越特定网格的概率。

After projecting the lane lines intersected by the grid onto the road plane, use the GT lane line points to approximate the lane segments intersected by the grid as a straight line, and the offset and angle can be calculated, which is the goal of GT regression.

将与网格相交的车道线投影到道路平面后,使用GT车道线点将与网格相交的车道线段近似为直线,并可以计算偏移量和角度,这是GT回归的目标。

车道线检测 车道宽度_半局部3d车道检测和不确定性估计_第2张图片
The road projection plane is defined according to the camera mounting pitch angle ϕ_{cam} and height h_{cam}, hence our representation is invariant to the camera extrinsics. We represent the GT lanes in full 3D relatively to that plane.Source [1] 道路投影平面是根据摄像机安装俯仰角ϕ_ {cam}和高度h_ {cam}定义的,因此我们的表示对于摄像机的外部特性是不变的。 我们以相对于该平面的完整3D形式表示GT车道。来源[1]

Assume that the lane segments passing through each grid are simple and represented by a low-dimensional parametric model. Specifically, each grid contains a line segment whose parameters include offset from the center of the grid, direction, and height offset from the bird’s-eye view plane. This semi-local grid represents a continuous change from the global representation (entire channel) to the local representation (pixel level). This segmentation-based solution, each grid output is more informative than a single pixel, it can infer the local lane line structure, but it is not as limited as the global solution, because the global solution must obtain the topology of the entire lane , curvature and complexity of surface geometry.

假设通过每个网格的车道线段很简单,并由低维参数模型表示。 具体来说,每个栅格都包含一个线段,其线段参数包括距栅格中心的偏移,方向和距鸟瞰平面的高度偏移。 此半局部网格表示从全局表示(整个通道)到本地表示(像素级别)的连续变化。 这种基于分段的解决方案,每个网格输出都比单个像素具有更多的信息,它可以推断局部车道线的结构,但是它不像全局解决方案那样受限制,因为全局解决方案必须获得整个车道的拓扑,曲率和表面几何形状的复杂性。

This representation subdivides the lane curve into multiple lane segments, but does not explicitly obtain any relationship between them. Adjacent grids will have overlapping receptive fields and produce related results, but the fact that multiple grids represent the same lane is not captured. In order to generate a complete lane curve, the paper learns the embedding of each grid, which meets the global consistency along the lane. In this way, the small lanes can be clustered into a complete curve.

此表示将车道曲线细分为多个车道段,但未明确获得它们之间的任何关系。 相邻的栅格将具有重叠的接收场并产生相关结果,但是不会捕获多个栅格代表同一车道的事实。 为了生成完整的车道曲线,本文学习了每个网格的嵌入,该网格符合车道的整体一致性。 这样,小车道可以聚集成一条完整的曲线。

Method overview. The network is comprised of two processing pipelines: in image view (top) and in Bird Eye View (bottom). The image view encoder is composed of resnet blocks each one multiplying the number of channels. The BEV backbone is comprised of projected image view feature maps which are concatenated with the convoluted projected feature map from the former block. The final decimated BEV feature map is the input to the lane prediction head which outputs local lane segments, global embedding for clustering the segments to entire lanes, and lane point position uncertainty which relies both on the local tiles and on the entire lane curves.Source[1] 方法概述。 该网络由两个处理管道组成:图像视图(顶部)和鸟瞰视图(底部)。 图像视图编码器由resnet块组成,每个块与通道数相乘。 BEV主干由投影图像视图特征图组成,这些图与来自前一个块的卷积投影特征图相连。 最终抽取的BEV特征图是车道预测头的输入,该预测头输出本地车道段,全局嵌入以将这些段聚类到整个车道,以及车道点位置不确定性,这既取决于本地图块又取决于整个车道曲线。 [1]

In addition, by modeling the network output as a Gaussian distribution and estimating its mean and variance values, uncertainty estimation can be achieved. Operate on the parameters of each lane line segment and combine them together to generate the final covariance matrix for the points of each lane line. Unlike the line segment parameters that are locally learned along the grid, the empirical error required for this method to train uncertainty depends on all the grids that make up the entire lane and perform global inference.

另外,通过将网络输出建模为高斯分布并估计其均值和方差值,可以实现不确定性估计。 操作每个车道线段的参数,并将它们组合在一起以生成每个车道线的点的最终协方差矩阵。 与沿网格局部学习的线段参数不同,此方法训练不确定性所需的经验误差取决于组成整个通道并执行全局推理的所有网格。

结论 (Conclusion)

The efficacy of the method is demonstrated in extensive experiments achieving state-of-the-art results for camera-based 3D lane detection, while also showing the ability to generalize to complex topologies, curvatures and road geometries as well as to different cameras.

该方法的有效性在广泛的实验中得到了证明,该实验获得了基于相机的3D车道检测的最新结果,同时还显示了能够推广到复杂的拓扑,曲率和道路几何以及不同相机的能力。

翻译自: https://medium.com/@nabil.madali/semi-local-3d-lane-detection-and-uncertainty-estimation-aec4e6768afa

车道线检测 车道宽度

你可能感兴趣的:(python,计算机视觉,人工智能,opencv)