nuScenes数据集 是自动驾驶公司nuTonomy建立的大规模自动驾驶数据集,该数据集不仅包含了Camera和Lidar,和radar数据。作为3D目标检测,我们使用算法的时候看一下数据集的构造。
数据集下载网址:https://www.nuscenes.org
nuScenes 传感器设备包含6个摄像头,5个radars,一个Lidar和GPS,IMU。
20Hz capture frequency, 32 channel, 360° Horizontal FOV, +10° to -30° Vertical FOV. 80m-100m Range, Usable returns up to 70 meters, ± 2 cm accuracy, Up to ~1.39 Million Points per Second
12Hz capture frequency, 1/1.8’’ CMOS sensor of 1600x1200 resolution, Bayer8 format for 1 byte per pixel encoding,1600x900 ROI is cropped from the original resolution to reduce processing and transmission bandwidth。Auto exposure with exposure time limited to the maximum of 20 ms,Images are unpacked to BGR format and compressed to JPEG. See camera orientation and overlap in the figure below.
77GHz,13Hz capture frequency.Independently measures distance and velocity in one cycle using Frequency Modulated Continuous Wave Up to 250m distance. Velocity accuracy of ±0.1 km/h
We use a laser liner to accurately measure the relative location of the LIDAR to the ego frame.
We place a cube-shaped calibration target in front of the camera and LIDAR sensors. The calibration target consists of three orthogonal planes with known patterns. After detecting the patterns we compute the transformation matrix from camera to LIDAR by aligning the planes of the calibration target. Given the LIDAR to ego frame transformation computed above, we can then compute the camera to ego frame transformation and the resulting extrinsic parameters.
We use a calibration target board with a known set of patterns to infer the intrinsic and distortion parameters of the camera.
We mount the radar in a horizontal position. Then we collect radar measurements by driving in an urban environment. After filtering radar returns for moving objects, we calibrate the yaw angle using a brute force approach to minimize the compensated range rates for static objects.
In order to achieve good cross-modality data alignment between the LIDAR and the cameras, the exposure of a camera is triggered when the top LIDAR sweeps across the center of the camera’s FOV. The timestamp of the image is the exposure trigger time; and the timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved. Given that the camera’s exposure time is nearly instantaneous, this method generally yields good data alignment. Note that the cameras run at 12Hz while the LIDAR runs at 20Hz. The 12 camera exposures are spread as evenly as possible across the 20 LIDAR scans, so not all LIDAR scans have a corresponding camera frame. Reducing the frame rate of the cameras to 12Hz helps to reduce the compute, bandwidth and storage requirement of the perception system.
下载的数据包总共有13个,分别是:
v1.0-trainval01_blobs.tgz v1.0-trainval02_blobs.tgz v1.0-trainval03_blobs.tgz
v1.0-trainval04_blobs.tgz v1.0-trainval05_blobs.tgz v1.0-trainval06_blobs.tgz
v1.0-trainval07_blobs.tgz v1.0-trainval08_blobs.tgz v1.0-trainval09_blobs.tgz
v1.0-trainval10_blobs.tgz v1.0-test_blobs.tgz v1.0-trainval_meta.tgz
v1.0-test_meta.tgz
前面十个是数据,后面两个是标注文件。解压后文件夹的数据格式如下:
我们关注samples文件夹(Sweeps是提供选择的一些数据,不是必要的。它是一些插帧)。samples是传感器的数据关键帧。而samples里面包含12个文件夹,我们主要用到CAM_FRONT和LIDAR_TOP 。
打开CAM_FRONT文件夹,总共有34,149张图片数据,大小分别为1600x900。文件名格式为
汽车编号-时间-相机位置-时间戳.jpg
比如
n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg
而LIDAR_TOP文件夹内,包含的是32线激光雷达的数据,文件名格式和摄像头的一样
汽车编号-时间-雷达位置-时间戳.pcd.bin
比如
n015-2018-11-21-19-58-31+0800__LIDAR_TOP__1542801725949047.pcd.bin
现在我们来看看另外一个文件夹v1.0-trainval。
v1.0-trainval是一些JSON tabels文件,包括了数据标注和数据分布(train,val,test)。
一、attribute.json: 描绘了一个实例的属性。举个例子:一辆车可能是停着的,移动中,或者一个单车上有人。
attribute {
"token": -- Unique record identifier.
"name": -- Attribute name.
"description": -- Attribute description.
}
例子:
{
"token": "cb5118da1ab342aa947717dc53544259",
"name": "vehicle.moving",
"description": "Vehicle is moving."
},
二、calibrated_sensor.json: 车辆上已经校准的特定传感器(激光雷达/雷达/摄像机)的标定数据。所有外部参数均针对车身框架给出。所有的相机图像都没有失真和变形。
calibrated_sensor {
"token": -- Unique record identifier.
"sensor_token": -- Foreign key pointing to the sensor type.
"translation": [3] -- Coordinate system origin in meters: x, y, z.
"rotation": [4] -- Coordinate system orientation as quaternion: w, x, y, z.
"camera_intrinsic": [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
}
例子:
{
"token": "7781065816974801afc4dcdaf6acf92c",
"sensor_token": "47fcd48f71d75e0da5c8c1704a9bfe0a",
"translation": [
3.412,
0.0,
0.5
],
"rotation": [
0.9999984769132877,
0.0,
0.0,
0.0017453283658983088
],
"camera_intrinsic": []
}
三、category.json: 对象类别的分类(如车辆、人)。子类别由一个句点(例如。人.行人.成人)。
category {
"token": -- Unique record identifier.
"name": -- Category name. Subcategories indicated by period.
"description": -- Category description.
"index": -- The index of the label used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not exist previously.
}
例子:
{
"token": "1fa93b757fc74fb197cdd60001ad8abf",
"name": "human.pedestrian.adult",
"description": "Adult subcategory."
}
四、ego_pose.json: 车辆在特定时刻的一个姿态。相对于全局日志地图的一个坐标系。该姿态是本文提出的一种基于激光雷达地图的定位算法的输出。定位在x-y平面上是二维的。
ego_pose {
"token": -- Unique record identifier.
"translation": [3] -- Coordinate system origin in meters: x, y, z. Note that z is always 0.
"rotation": [4] -- Coordinate system orientation as quaternion: w, x, y, z.
"timestamp": -- Unix time stamp.
}
例子:
{
"token": "bddd80ae33ec4e32b27fdb3c1160a30e",
"timestamp": 1531883530440378,
"rotation": [
-0.7504501527141022,
-0.0076295847961364415,
0.00847103369020136,
-0.6608287216181199
],
"translation": [
1010.1273947164545,
610.7727090350685,
0.0
]
}
五、instance.json: 一个物体的实例。比如,汽车。这是一个包括所有对象实例的表格。注意的是,实例不会跨场景跟踪。
instance {
"token": -- Unique record identifier.
"category_token": -- Foreign key pointing to the object category.
"nbr_annotations": -- Number of annotations of this instance.
"first_annotation_token": -- Foreign key. Points to the first annotation of this instance.
"last_annotation_token": -- Foreign key. Points to the last annotation of this instance.
}
例子:
{
"token": "5e2b6fd1fab74d04a79eefebbec357bb",
"category_token": "85abebdccd4d46c7be428af5a6173947",
"nbr_annotations": 13,
"first_annotation_token": "173a50411564442ab195e132472fde71",
"last_annotation_token": "2cd832644d09479389ed0785e5de85c9"
}
六、log.json: 提取数据的日志的信息。
log {
"token": -- Unique record identifier.
"logfile": -- Log file name.
"vehicle": -- Vehicle name.
"date_captured": -- Date (YYYY-MM-DD).
"location": -- Area where log was captured, e.g. singapore-onenorth.
}
例子:
{
"token": "6b6513e6c8384cec88775cae30b78c0e",
"logfile": "n015-2018-07-18-11-07-57+0800",
"vehicle": "n015",
"date_captured": "2018-07-18",
"location": "singapore-onenorth"
}
七、map.json: 用二值分割掩膜所保存的地图数据。是top-down视野。
map {
"token": -- Unique record identifier.
"log_tokens": [n] -- Foreign keys.
"category": -- Map category, currently only semantic_prior for drivable surface and sidewalk.
"filename": -- Relative path to the file with the map mask.
}
例子:
{
"category": "semantic_prior",
"token": "53992ee3023e5494b90c316c183be829",
"filename": "maps/53992ee3023e5494b90c316c183be829.png",
"log_tokens": [
"0986cb758b1d43fdaa051ab23d45582b",
"1c9b302455ff44a9a290c372b31aa3ce",
"e60234ec7c324789ac7c8441a5e49731",
"46123a03f41e4657adc82ed9ddbe0ba2",
"a5bb7f9dd1884f1ea0de299caefe7ef4",
"bc41a49366734ebf978d6a71981537dc",
"f8699afb7a2247e38549e4d250b4581b",
"d0450edaed4a46f898403f45fa9e5f0d",
"f38ef5a1e9c941aabb2155768670b92a",
"7e25a2c8ea1f41c5b0da1e69ecfa71a2",
"ddc03471df3e4c9bb9663629a4097743",
"31e9939f05c1485b88a8f68ad2cf9fa4",
"783683d957054175bda1b326453a13f4",
"343d984344e440c7952d1e403b572b2a",
"92af2609d31445e5a71b2d895376fed6",
"47620afea3c443f6a761e885273cb531",
"d31dc715d1c34b99bd5afb0e3aea26ed",
"34d0574ea8f340179c82162c6ac069bc",
"d7fd2bb9696d43af901326664e42340b",
"b5622d4dcb0d4549b813b3ffb96fbdc9",
"da04ae0b72024818a6219d8dd138ea4b",
"6b6513e6c8384cec88775cae30b78c0e",
"eda311bda86f4e54857b0554639d6426",
"cfe71bf0b5c54aed8f56d4feca9a7f59",
"ee155e99938a4c2698fed50fc5b5d16a",
"700b800c787842ba83493d9b2775234a"
]
}
八、sample.json: 一个样例是一个以2Hz标注好的关键帧。 数据收集与激光雷达扫描的对应的部分具有相同(大概)的时间戳。
sample {
"token": -- Unique record identifier.
"timestamp": -- Unix time stamp.
"scene_token": -- Foreign key pointing to the scene.
"next": -- Foreign key. Sample that follows this in time. Empty if end of scene.
"prev": -- Foreign key. Sample that precedes this in time. Empty if start of scene.
}
例子:
{
"token": "e93e98b63d3b40209056d129dc53ceee",
"timestamp": 1531883530449377,
"prev": "",
"next": "14d5adfe50bb4445bc3aa5fe607691a8",
"scene_token": "73030fb67d3c46cfb5e590168088ae39"
}
八、sample_annotation.json: 在一个样例中的bounding box,用于定位目标位置。所有的位置数据都是相对于全局坐标系。
sample_annotation {
"token": -- Unique record identifier.
"sample_token": -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations are done on the sample level taking all relevant sample_data into account.
"instance_token": -- Foreign key. Which object instance is this annotating. An instance can have multiple annotations over time.
"attribute_tokens": [n] -- Foreign keys. List of attributes for this annotation. Attributes can change over time, so they belong here, not in the instance table.
"visibility_token": -- Foreign key. Visibility may also change over time. If no visibility is annotated, the token is an empty string.
"translation": [3] -- Bounding box location in meters as center_x, center_y, center_z.
"size": [3] -- Bounding box size in meters as width, length, height.
"rotation": [4] -- Bounding box orientation as quaternion: w, x, y, z.
"num_lidar_pts": -- Number of lidar points in this box. Points are counted during the lidar sweep identified with this sample.
"num_radar_pts": -- Number of radar points in this box. Points are counted during the radar sweep identified with this sample. This number is summed across all radar sensors without any invalid point filtering.
"next": -- Foreign key. Sample annotation from the same object instance that follows this in time. Empty if this is the last annotation for this object.
"prev": -- Foreign key. Sample annotation from the same object instance that precedes this in time. Empty if this is the first annotation for this object.
}
例子:
{
"token": "2cd832644d09479389ed0785e5de85c9",
"sample_token": "c36eb85918a84a788e236f5c9eef2b05",
"instance_token": "5e2b6fd1fab74d04a79eefebbec357bb",
"visibility_token": "3",
"attribute_tokens": [],
"translation": [
993.884,
612.441,
0.675
],
"size": [
0.3,
0.291,
0.734
],
"rotation": [
-0.04208490861058176,
0.0,
0.0,
0.9991140377690821
],
"prev": "5cd018cb2448415ab8aff4dc7256999a",
"next": "",
"num_lidar_pts": 2,
"num_radar_pts": 0
}
九、sample_data.json: 所有的传感器数据。对于sample_data中 is_key_frame=True,表明该数据时间戳应该非常接近它所指向的样本。对于非关键帧,样本数据指向时间最接近的样本。
sample_data {
"token": -- Unique record identifier.
"sample_token": -- Foreign key. Sample to which this sample_data is associated.
"ego_pose_token": -- Foreign key.
"calibrated_sensor_token": -- Foreign key.
"filename": -- Relative path to data-blob on disk.
"fileformat": -- Data file format.
"width": -- If the sample data is an image, this is the image width in pixels.
"height": -- If the sample data is an image, this is the image height in pixels.
"timestamp": -- Unix time stamp.
"is_key_frame": -- True if sample_data is part of key_frame, else False.
"next": -- Foreign key. Sample data from the same sensor that follows this in time. Empty if end of scene.
"prev": -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene.
}
例子:
{
"token": "bddd80ae33ec4e32b27fdb3c1160a30e",
"sample_token": "e93e98b63d3b40209056d129dc53ceee",
"ego_pose_token": "bddd80ae33ec4e32b27fdb3c1160a30e",
"calibrated_sensor_token": "7781065816974801afc4dcdaf6acf92c",
"timestamp": 1531883530440378,
"fileformat": "pcd",
"is_key_frame": true,
"height": 0,
"width": 0,
"filename": "samples/RADAR_FRONT/n015-2018-07-18-11-07-57+0800__RADAR_FRONT__1531883530440378.pcd",
"prev": "",
"next": "90df03ad4710427aabb5f88fe049df2e"
}
十、scene.json: 场景是从日志中提取的20秒长的连续帧序列。多个场景可以来自同一个日志。请注意,对象标识(实例标记)不会跨场景保留。
scene {
"token": -- Unique record identifier.
"name": -- Short string identifier.
"description": -- Longer description of the scene.
"log_token": -- Foreign key. Points to log from where the data was extracted.
"nbr_samples": -- Number of samples in this scene.
"first_sample_token": -- Foreign key. Points to the first sample in scene.
"last_sample_token": -- Foreign key. Points to the last sample in scene.
}
例子:
{
"token": "73030fb67d3c46cfb5e590168088ae39",
"log_token": "6b6513e6c8384cec88775cae30b78c0e",
"nbr_samples": 40,
"first_sample_token": "e93e98b63d3b40209056d129dc53ceee",
"last_sample_token": "40e413c922184255a94f08d3c10037e0",
"name": "scene-0001",
"description": "Construction, maneuver between several trucks"
}
十一、sensor.json: 传感器的种类
sensor {
"token": -- Unique record identifier.
"channel": -- Sensor channel name.
"modality": {camera, lidar, radar} -- Sensor modality. Supports category(ies) in brackets.
}
例子:
{
"token": "725903f5b62f56118f4094b46a4470d8",
"channel": "CAM_FRONT",
"modality": "camera"
}
十二、visibility.json: 实例的可见性是所有6个图像中可见度的分数。分为0-40%、40-60%、60-80%和80-100%四个等级。
visibility {
"token": -- Unique record identifier.
"level": -- Visibility level.
"description": -- Description of visibility level.
}
例子:
{
"description": "visibility of whole object is between 0 and 40%",
"token": "1",
"level": "v0-40"
}
到目前位置我们分析完整个nuScenes 数据集了。那是否所有文件夹或者文件对我们3D目标检测都有用呢?并不是的!我们需要关注的文件夹为: samples/CAM_FRONT, samples/LIDAR_TOP 和 v1.0-trainval。第一个是图像数据,第二个是激光雷达数据,第三个是标签文件。 而 sweeps文件夹,也是需要的。因为作为补充数据,可以提高训练质量。在second.pytorch 源代码里面有一句话:
"you must download all trainval data, key-frame only dataset performs far worse than sweeps." 。所以其训练数据分布格式为:
└── NUSCENES_TRAINVAL_DATASET_ROOT
├── samples <-- key frames
├── sweeps <-- frames without annotation
├── maps <-- unused
└── v1.0-trainval <-- metadata and annotations
└── NUSCENES_TEST_DATASET_ROOT
├── samples <-- key frames
├── sweeps <-- frames without annotation
├── maps <-- unused
└── v1.0-test <-- metadata