毫米波目标检测论文 | Radar Transformer: An Object Classification Network Based on 4D MMW Imaging Radar
Jie Bai, Lianqing Zheng , Sen Li , Bin Tan, Sihan Chen and Libo Huang
同济大学
Sensors
原始论文地址: https://www.mdpi.com/1424-8220/21/11/3854
本文为毫米波目标检测论文 Radar Transformer: An Object Classification Network Based on 4D MMW Imaging Radar
的阅读笔记, 原载于R.X. NLOS的博客
笔记难免存在问题,欢迎联系 [email protected] 指正。
内容在CSDN、知乎和微信公众号同步更新
Abstract
- millimeter-wave (MMW) 4D radar的重要性
- essential in automomous vehicles
- 因为its robustness in all weather conditions
- 但传统automotive radar分辨率低
- 难以完成object Classification任务
- 因此,有了4D imaging radar
- azimuth and elevation 分辨率高 + 包括了Doppler信息
- 能够产生高质量3D点云 + speed
- 本文工作
- 提出了Radar Transformer
- 核心:注意力机制
- 包括vector attention和scalar attention
- to make full use of 空间信息、Doppler信息、点云强度信息,实现deep fusion
- 实验结果:
- 采集了数据集,完成了标注
- 识别准确率94.9%
- The prosposed method适合radar点云识别任务
Introduction
p1: 为什么要研究基于MMW点云的object detection: 自动驾驶很重要 + MMW 4D sensor具有优势 + 现有研究较少
- 近年来,自动驾驶developed rapidly
- Auto. vehicles包括多个模块:
- 感知 perception (is siginificant)
- (预测)、规划 path planing
- 控制 decision control
- 4D MMW radar在perception中很重要:
- cameras and LiDAR对不同天气、光照的鲁棒性差
- 传统MMW radar分辨率低、缺少高度信息, 仅起到最终警告作用
- 4D MMW radar 能够形成点云、包含Doppler信息、对天气鲁棒,但相关算法仍in the initial stage
介绍4D毫米波雷达的相关文章[5-7]:
[5]. Brisken, S.; Ruf, F.; Höhne, F. Recent evolution of automotive imaging radar and its information content. IET Radar Sonar Navig. 2018.
[6]. Li, G.; Sit, Y.L.; Manchala, S.; Kettner, T.; Ossowska, A.; Krupinski, K.; Sturm, C.; Lubbert, U. Novel 4D 79 GHz Radar Concept for Object Detection and Active Safety Applications. In Proceedings of the 2019 12th German Microwave Conference (GeMiC), Stuttgart, Germany, 2019.
[7]. Stolz, M.;Wolf, M.; Meinl, F.; Kunert, M.; Menzel,W. A New Antenna Array and Signal Processing Concept for an Automotive 4D Radar. In Proceedings of the 2018 15th European Radar Conference (EuRAD), Spain, 2018.
p2: Related Works on 4D imaging radar (Hardware + 相应Detection Alg.)
- imaging radar Hardware:
- [6]: 4D radar operates at 79HZ, 使用FMCW, bandwidth 1.6GHz
- 利用MIMO和BPSK transmitting signals to obtain elevation info.
- 能够用来完成简单的检测任务(如road edge height estimation)
- [7]: used a new antenna array – can measure angles in azimuth and elevation
- 通过combining them to estimate the direction of arrival
- [8]: exploited high-resolution MMW radar to obtain radar point-cloud represen.
- then using GMM for point-cloud segmentation (交通场景)
- imaging radar Harware --> Algorithm
- [9] used planar phased FMCW radar产生3D点云,用于detecting human motions
- 通过calculating the direction of arrival获得3D点云
- CNN for classification: acc 80%
- [10] made a dataset including radar, LiDAR and Camera
- radar: Astyx 6455 HiRes[5], 高分辨率imaging radar
- most of the objects are cars --> 难以应用
p3: Related Works on Point-Cloud Object Detection
- Deep learning have made impressive achievements
- including in data structures like point clouds
- 点云特性:permutation and orientation invariance
- 传统CNN不适合such irregularly strucured data
- MVCNN: from different views
- 3DMV: integrates RGB + geometric features
- Voxel-based methods: Vox Net; 3DCNN;
- GCN: DGCNN, EdgeConv;
- Point-wise network: PointNet series
- often hierarchically extract/combined features
p4: Related Works on Transformers and its applications in Point-Cloud Object Detection
- Transformers has dominated in NLP:
- BERT, Transformer-XL, BioBERT, etc.
- Transformers has also benn extented to CV
- The core of Transformer is the self-attention module
- this mechanism is well suited to dealing with data like point clouds
- PCT [31] applies transformer to point clouds 并取得了good results
p5: Introduces the proposed Network in this paper
- Transfomer架构
- 使用注意力机制在multi-level上fuse局部和全局features
- 完成MMW radar下的object Classification任务
- achieved the highest accuracy
p6: Point-by-point Summary of Contributions
- Generated an MMW imaging radar classification dataset
- collected dynamic and static road participants
- persons, cyclists, motorcyclists, cars and buses
- manually annotated them
- A total of 10,000 frames of data
- each data containing spatial (XYZ) and Doppler velocity (V) information
- Proposed a radar point-cloud classification network beased on Transformer
- 输入为5维点云信息 -> 经过embedding, hierarchical feature extraction, multilevel deep fusion和scalar+vector attention后得到deep features
- Experiments show that the proposed network exhibits SOTA performance
p7: Organization of the remainder
- Section 2: Describe the Network
- Section 3: Experiments
- Section 4: Discussion
- Section 5: Conclusion
Methodology
- 本次没有去读详细的网络结构
- 网络结构大体上如右图所示
- input: Radar Point Cloud
- output: 分类结果
Results
Dataset
- Existed public datasets containing radar information:
- 要么只包括2D radar数据
- Nuscence
- CRUW
- Oxford Radar Robotcar Dataset
- 要么质量差 (frame太少+ unbalanced classes)
- Astyx Dataset
- only 500 frames + Most of which are cars
- 因此this paper collected and created own dataset:
- contain 10000 frames
- five classes
- persons, cyclists, motorcyclists, cars, buses
-
采集设置
- Radar: TI imaging radar TIDEP01012
- composed of four AWR2243 cascaded radar boards
- 跨多个cascaded AWR2243制作的MIMO antenna能够maximize the number of active antennas
- enabling substantially improved angular resolution
-
照片和Radar参数如下
Radar Signal Processing
- The imaging radar development board was designed in a cascade of four devices.
Radar Signal Processing具体处理过程包括:
-
Step 0: 预处理 (本步骤未在上页图中表明)
-
- Antenna calibration
- 防止由于芯片和天线耦合等因素的差异导致主设备与其余三个从设备之间的频率、相位和幅度不匹配
- 校准方法:
-
- chirp configuration parameters
- set to those in MIMO mode
-
Step 1: Read and parsed the ADC data
-
Step 2: Perform frequency and phase calibrations
-
Step 3: 将校准后的数据 经过 range FFT 和 多普勒FFT
-
Step 4: 进行非相干集成 (non-coherent integration)
- Since there were multiple channels
- Step 5: 执行恒定虚警率 constant false-alarm rate (CFAR) 算法
- To filter out noise and interference
- Step 6: Performing maximum velocity extension and phase compensation
- Step 7: Azimuth and elevation angle estimation
- 最终获得点云
Data acquisition and Production
所采集的数据: 包括Static scenes和Dynamic scenes
- Static Scenes:
- collected data at a distance interval of 1 m 1 m 1m and an angle interval of 45 ° 45 \degree 45°
- To fully represent the distribution of the object point cloud
- collected different types of samples for each class of objects
- To make the object classes more representative
- Dynamic Data:
- collected them on campus roads and experimental sites
- different objects moved at different speeds and angles
每一帧的format 和 坐标变换
- The formate of the reflected points:
- p i = { r i , θ i , φ i , v i , s i } p_{i}=\left\{r_{i}, \theta_{i}, \varphi_{i}, v_{i}, s_{i}\right\} pi={ri,θi,φi,vi,si}
- r i r_{i} ri: range; θ i \theta_{i} θi: azimuth angle; φ i \varphi_{i} φi: elevation angle; v i v_{i} vi: radial velocity; s i s_{i} si: signal-to-noise ratio
- 坐标变换以进行后续analysis、visualization和labeling:
- spherical coordinate to Cartesian Coordinate system
- [ x i y i z i ] = r i [ cos ( θ i ) cos ( φ i ) sin ( θ i ) cos ( φ i ) sin ( φ i ) ] \left[\begin{array}{c}x_{i} \\ y_{i} \\ z_{i}\end{array}\right]=r_{i}\left[\begin{array}{c}\cos \left(\theta_{i}\right) \cos \left(\varphi_{i}\right) \\ \sin \left(\theta_{i}\right) \cos \left(\varphi_{i}\right) \\ \sin \left(\varphi_{i}\right)\end{array}\right] ⎣⎡xiyizi⎦⎤=ri⎣⎡cos(θi)cos(φi)sin(θi)cos(φi)sin(φi)⎦⎤
数据标注方法
- 1: clustered the obtained point cloud for each frame
- to get the approximate 3D bounding box
- 2: labeled it with the information recorded by camera
- 3: final dataset contained 10,000 frames (static + dynamix)
- Visualization of some experimental data
Experimental Details
Details related to Dataset Processing (划分 + Normalization)
- Totally 10,000 frames of data
- including 5 classes
- classes are proportionally balanced
- each class had 2000 frames of data
- Train : Test = 7:3
- The information in each point:
- XYZ spatial information + Doppler velocity V + intensity information S
- Normalization:
- For each point p i = { x i , y i , z i , v i , s i } p_{i}=\left\{x_{i}, y_{i}, z_{i}, v_{i}, s_{i}\right\} pi={xi,yi,zi,vi,si} in one frame
- Norm:
( x i , y i , z i , v i , s i ) = ( x i , y i , z i , v i , s i ) max ( x i 2 + y i 2 + z i 2 + v i 2 + s i 2 ) , i = 1 , 2 , … N \left(x_{i}, y_{i}, z_{i}, v_{i}, s_{i}\right)=\frac{\left(x_{i}, y_{i}, z_{i}, v_{i}, s_{i}\right)}{\max \left(\sqrt{x_{i}^{2}+y_{i}^{2}+z_{i}^{2}+v_{i}^{2}+s_{i}^{2}}\right)}, i=1,2, \ldots N (xi,yi,zi,vi,si)=max(xi2+yi2+zi2+vi2+si2 )(xi,yi,zi,vi,si),i=1,2,…N
Details related to network Training
- 128 input points
- Using Pytorch, SGD optimizer, momentum + weight decay
- Learning rate: 0.001
- decayed by 30% every 20 epoches
- loss function:
- training
- With data augmentation
- No data augmentation in testing
- 200 epoches, batchsize = 24; 1080 Ti
Experimental Results
OA: overall accuracy