接上一篇文章,Nuscenes数据集生成MotionNet训练数据 (一)
1. 函数原型:
gen_2d_grid_gt(data_dict: dict, grid_size: np.array, extents: np.array = None,
frame_skip: int = 0, reordered: bool = False, proportion_thresh: float = 0.5,
category_num: int = 5, one_hot_thresh: float = 0.8, h_flip: bool = False,
min_point_num_per_voxel: int = -1,
return_past_2d_disp_gt: bool = False,
return_instance_map: bool = False)`
参数:
data_dict
: 数据信息字典grid_size
: 网格尺寸extents
: ROI范围frame_skip
: 需要过滤的帧数reordered
: 是否需要对结果重新排序,保证第一要素对应过去的记录。该选项只对返回值return_past_2d_disp_gt有影响。proportion_thresh
: 在给定的像素内,只有当前景点的比例超过这个阈值时,我们才会计算这个像素的位移向量。category_num
: 点云类别数目one_hot_thresh
: 当一个单元格内的点数的比例超过这个阈值时,我们计算该单元的硬类别向量,否则计算 软类别向量。h_flip
: 是否点云坐标取反(相反数)。min_point_num_per_voxel
: 每个体素中最少点数目return_past_2d_disp_gt
: 是否计算过去帧的位移真值return_instance_map
: 是否返回实例id图返回值:
all_disp_field_gt_list
: 未来20帧的位移真值, ndarray = (20,256,256,2)all_valid_pixel_maps_list
: 未来20帧的有效像素图,ndarray = (20,256,256)non_empty_map
: 当前帧(data_dict字典中首帧)的非空图,ndarray = (256,256)pixel_cat_map
: 像素类别图 , ndarray = (256,256,5)pixel_indices
: 像素索引,ndarray = (4475,2)pixel_instance_map
: 当前帧像素实例图,ndarray = (256,256)注意:
位移场真值格式(帧数,图像高,图像宽,2),忽略了z方向
当前帧指的是data_dict字典中首帧。
2. 函数流程:
1)同3.1 1),检索box和类别,转存到instance_box_list和instance_cat_list。
num_instances = data_dict['num_instances']
instance_box_list = list()
instance_cat_list = list() # for instance categories
for i in range(num_instances):
instance = data_dict['instance_boxes_' + str(i)]
category = data_dict['category_' + str(i)]
instance_box_list.append(instance)
instance_cat_list.append(category)
2)感兴趣区域范围裁剪,对点云进行二维格网划分,统计格网坐标,去掉重复索引,排序得到二维坐标值(pixel_coords)
# refer_pc 当前帧, [cur,cur-1,cur-2, ...,cur-m, cur+1, cur+2,..., cur+n]
refer_pc = pc_list[0]
refer_pc = refer_pc[:, 0:3]
# 范围裁剪
if extents is not None:
if extents.shape != (3, 2):
raise ValueError("Extents are the wrong shape {}".format(extents.shape))
filter_idx = np.where((extents[0, 0] < refer_pc[:, 0]) & (refer_pc[:, 0] < extents[0, 1]) &
(extents[1, 0] < refer_pc[:, 1]) & (refer_pc[:, 1] < extents[1, 1]) &
(extents[2, 0] < refer_pc[:, 2]) & (refer_pc[:, 2] < extents[2, 1]))[0]
refer_pc = refer_pc[filter_idx]
# 全部点云计算二维坐标,排序
discrete_pts = np.floor(refer_pc[:, 0:2] / grid_size).astype(np.int32)
x_col = discrete_pts[:, 0]
y_col = discrete_pts[:, 1]
sorted_order = np.lexsort((y_col, x_col))
refer_pc = refer_pc[sorted_order]
discrete_pts = discrete_pts[sorted_order] # 全部点云换算出二维坐标,存在重复值,且已排序
# 合并内存,去重,得到排序后二维坐标,pixel_coords[ndarray:(4750,2)]
contiguous_array = np.ascontiguousarray(discrete_pts).view(
np.dtype((np.void, discrete_pts.dtype.itemsize * discrete_pts.shape[1])))
_, unique_indices = np.unique(contiguous_array, return_index=True)
unique_indices.sort() # 二维坐标下,格网内第一个点的索引值
pixel_coords = discrete_pts[unique_indices]
3)通过索引做差,求得每个pixel中的点数,计算二维坐标值极值,将pixel_coords最小值归到原点处,正好给其归结到256以内的伪图像。
# 通过索引做差,求得每个pixel中的点数
num_points_in_pixel = np.diff(unique_indices)
num_points_in_pixel = np.append(num_points_in_pixel, discrete_pts.shape[0] - unique_indices[-1]) # 补偿最后一个点数
# 二维极值
if extents is not None:
min_pixel_coord = np.floor(extents.T[0, 0:2] / grid_size)
max_pixel_coord = np.ceil(extents.T[1, 0:2] / grid_size) - 1
else:
min_pixel_coord = np.amin(pixel_coords, axis=0)
max_pixel_coord = np.amax(pixel_coords, axis=0)
num_divisions = ((max_pixel_coord - min_pixel_coord) + 1).astype(np.int32)
# 平移,即将最小值归到原点
pixel_indices = (pixel_coords - min_pixel_coord).astype(int)
4)遍历实例,提取第一帧中,每一个实例box中的点索引和box外的点索引。
# box列表,box中点索引列表
refer_box_list = list()
refer_pc_idx_per_bbox = list()
points_category = np.zeros(refer_pc.shape[0], dtype=np.int) # store the point categories
pixel_instance_id = np.zeros(pixel_indices.shape[0], dtype=np.uint8)
points_instance_id = np.zeros(refer_pc.shape[0], dtype=np.int)
# 遍历实例,获取关键帧中box内部点以及box外部点索引
for i in range(num_instances):
instance_cat = instance_cat_list[i]
instance_box = instance_box_list[i]
instance_box_data = instance_box[0] # 第一帧
assert not np.isnan(instance_box_data).any(), "In the keyframe, there should not be NaN box annotation!"
# 是否翻转
if h_flip:
tmp_quad = instance_box_data[6:].copy()
tmp_quad[2] *= -1 # y
tmp_quad[3] *= -1 # z
tmp_quad = Quaternion(tmp_quad)
tmp_center = instance_box_data[0:3].copy()
tmp_center[0] = -tmp_center[0]
tmp_box = Box(center=tmp_center, size=instance_box_data[3:6], orientation=Quaternion(tmp_quad))
else:
tmp_box = Box(center=instance_box_data[:3], size=instance_box_data[3:6],
orientation=Quaternion(instance_box_data[6:]))
idx = point_in_hull_fast(refer_pc[:, 0:3], tmp_box) # box内部点索引
refer_pc_idx_per_bbox.append(idx)
refer_box_list.append(tmp_box)
points_category[idx] = instance_cat
points_instance_id[idx] = i + 1 # 实例id从1开始,背景为0
assert np.max(points_instance_id) <= 255, "The instance id exceeds uint8 max."
if len(refer_pc_idx_per_bbox) > 0:
refer_pc_idx_inside_box = np.concatenate(refer_pc_idx_per_bbox).tolist()
else:
refer_pc_idx_inside_box = []
refer_pc_idx_outside_box = set(range(refer_pc.shape[0])) - set(refer_pc_idx_inside_box)
refer_pc_idx_outside_box = list(refer_pc_idx_outside_box) # box外部的点索引
5)遍历二维格网,计算每一个二维格网内的最高频率类别(most_freq_cat),最高频率(most_freq),实例id(most_freq_instance_id)。最终得到像素坐标类别图(pixel_cat_map),大小为 [256,256,5];像素实例图(pixel_instance_map),大小为 [256,256]。
# 二维格网(伪图像)中的像素类别
pixel_cat = np.zeros([unique_indices.shape[0], category_num], dtype=np.float32)
most_freq_info = []
for h, v in enumerate(zip(unique_indices, num_points_in_pixel)):
pixel_elements_categories = points_category[v[0]:v[0] + v[1]]
elements_freq = np.bincount(pixel_elements_categories, minlength=category_num) # 规定长度下的元素个数数组
assert np.sum(elements_freq) == v[1], "The frequency count is incorrect."
# 计算二维坐标,最大频率种类,和最大频率
elements_freq = elements_freq / float(v[1])
most_freq_cat, most_freq = np.argmax(elements_freq), np.max(elements_freq)
most_freq_info.append([most_freq_cat, most_freq])
# 利用最高频率的种类,为pixel实例赋值
most_freq_elements_idx = np.where(pixel_elements_categories == most_freq_cat)[0]
pixel_elements_instance_ids = points_instance_id[v[0]:v[0] + v[1]]
most_freq_instance_id = pixel_elements_instance_ids[most_freq_elements_idx[0]]
# 基本是一个类别,硬类别
if most_freq >= one_hot_thresh:
one_hot_cat = np.zeros(category_num, dtype=np.float32)
one_hot_cat[most_freq_cat] = 1.0
pixel_cat[h] = one_hot_cat # 二维伪图像,某像素类别,ndarray:(4750,5)
pixel_instance_id[h] = most_freq_instance_id # 二维伪图像,某像素实例id,ndarray:(4750,)
else:
pixel_cat[h] = elements_freq # 软类别,用元素频率
pixel_cat_map = np.zeros((num_divisions[0], num_divisions[1], category_num), dtype=np.float32)
pixel_cat_map[pixel_indices[:, 0], pixel_indices[:, 1]] = pixel_cat[:] # 前后顺序一一对应
pixel_instance_map = np.zeros((num_divisions[0], num_divisions[1]), dtype=np.uint8)
pixel_instance_map[pixel_indices[:, 0], pixel_indices[:, 1]] = pixel_instance_id[:]
6)设置0-1值,得到非空图(non_empty_map),方便计算loss。针对前景,体素中点数过少则忽略,计算一张忽略掩膜。针对true的位置,都留下,false的位置,则忽略。
# 前景 -- 1, 背景 -- 0
non_empty_map = np.zeros((num_divisions[0], num_divisions[1]), dtype=np.float32)
non_empty_map[pixel_indices[:, 0], pixel_indices[:, 1]] = 1.0 # 有点的位置(前景)
# 前景中, < min_point_num_per_voxel, 则忽略
cell_pts_num = np.zeros((num_divisions[0], num_divisions[1]), dtype=np.float32)
cell_pts_num[pixel_indices[:, 0], pixel_indices[:, 1]] = num_points_in_pixel[:]
tmp_pixel_cat_map = np.argmax(pixel_cat_map, axis=2)
ignore_mask = np.logical_and(cell_pts_num <= min_point_num_per_voxel, tmp_pixel_cat_map != 0)
ignore_mask = np.logical_not(ignore_mask)
ignore_mask = np.expand_dims(ignore_mask, axis=2)
7)计算位移向量。遍历处理未来帧,将box外的点全部设置为0;针对每一个实例,计算对应点的位移。然后为非空像素计算平均位移disp_field = ndarray:(4475,2) 和 有效像素valid_pixels = ndarray:(4475,)
# 处理未来20帧
for i in frame_considered:
curr_disp_vectors = np.zeros_like(refer_pc, dtype=np.float32)
curr_disp_vectors.fill(np.nan)
curr_disp_vectors[refer_pc_idx_outside_box,] = 0.0 # 将box外场景流,处理为0,curr_disp_vectors = ndarray:(20967,3)
# 首先,对于每一个实例,计算对应点的位移.
for j in range(num_instances):
instance_box = instance_box_list[j]
instance_box_data = instance_box[i] # 第i帧box数据
if np.isnan(instance_box_data).any(): # 可能存在未标注情况,跳过
continue
if h_flip:
tmp_quad = instance_box_data[6:].copy()
tmp_quad[2] *= -1 # y
tmp_quad[3] *= -1 # z
tmp_quad = Quaternion(tmp_quad)
tmp_center = instance_box_data[0:3].copy()
tmp_center[0] = -tmp_center[0]
tmp_box = Box(center=tmp_center, size=instance_box_data[3:6], orientation=Quaternion(tmp_quad))
else:
tmp_box = Box(center=instance_box_data[:3], size=instance_box_data[3:6],
orientation=Quaternion(instance_box_data[6:]))
pc_in_bbox_idx = refer_pc_idx_per_bbox[j] # box中点索引
disp_vectors = calc_displace_vector(refer_pc[pc_in_bbox_idx], refer_box_list[j], tmp_box) # 通过当前帧refer_box_list[j] 与下一帧tmp_box 计算位移
curr_disp_vectors[pc_in_bbox_idx] = disp_vectors[:] # 每一个点的位移向量,ndarray:(20907,3)
# 第二,计算平均位移和每个非空像素种类
disp_field = np.zeros([unique_indices.shape[0], 2], dtype=np.float32) # 2D场
# 两帧间box标注,计算有效像素
valid_pixels = np.zeros(unique_indices.shape[0], dtype=np.bool)
for h, v in enumerate(zip(unique_indices, num_points_in_pixel)):
# 点数比例达到预定比例, 我们才为该像素赋值位移向量。否则,可能是背景(例如地平面)
pixel_elements_categories = points_category[v[0]:v[0] + v[1]]
most_freq_cat, most_freq = most_freq_info[h]
if most_freq >= proportion_thresh:
most_freq_cat_idx = np.where(pixel_elements_categories == most_freq_cat)[0]
most_freq_cat_disp_vectors = curr_disp_vectors[v[0]:v[0] + v[1], :3]
most_freq_cat_disp_vectors = most_freq_cat_disp_vectors[most_freq_cat_idx] # 筛选出频率最高的点集位移向量
if np.isnan(most_freq_cat_disp_vectors).any(): # 去掉无效值
valid_pixels[h] = 0.0
else:
mean_disp_vector = np.mean(most_freq_cat_disp_vectors, axis=0) #平均值
disp_field[h] = mean_disp_vector[0:2] # 忽略z方向
valid_pixels[h] = 1.0 # 有效像素置为1
8)最终,集成到2D图像上,disp_field_sparse = ndarray:(256,256,2);valid_pixel_map = ndarray:(256,256) 。
# 2D 位移场,忽略了z方向
disp_field_sparse = np.zeros((num_divisions[0], num_divisions[1], 2), dtype=np.float32)
disp_field_sparse[pixel_indices[:, 0], pixel_indices[:, 1]] = disp_field[:]
disp_field_sparse = disp_field_sparse * ignore_mask
valid_pixel_map = np.zeros((num_divisions[0], num_divisions[1]), dtype=np.float32)
valid_pixel_map[pixel_indices[:, 0], pixel_indices[:, 1]] = valid_pixels[:]
all_disp_field_gt_list.append(disp_field_sparse)
all_valid_pixel_maps_list.append(valid_pixel_map)
1. 函数原型:point_in_hull_fast(points: np.array, bounding_box: Box)
points
: 点云(N×d)bounding_box
: Box物体2. 做法:
首先将boundingbox旋转至与坐标轴平行,与此同时,旋转整个点云数据。借助坐标轴对点进行检查,最后再将点云进行恢复。
1. 函数原型:calc_displace_vector(points: np.array, curr_box: Box, next_box: Box)
参数:
points
: 点云(N×d)curr_box
: 当前的包围盒next_box
: 连续下一个帧的物体包围盒返回值:
pc_displace_vectors
: 每一个点的位移向量2. 函数流程:
四元数归一化,当前时刻点云先旋转,再平移,然后对应位置点坐标做差。
curr_box.orientation = curr_box.orientation.normalised
next_box.orientation = next_box.orientation.normalised
# 当前时刻点云先旋转,再平移,然后对应位置做差
delta_rotation = curr_box.orientation.inverse * next_box.orientation
rotated_pc = (delta_rotation.rotation_matrix @ points.T).T # @ 代表矩阵相乘的意思
rotated_curr_center = np.dot(delta_rotation.rotation_matrix, curr_box.center)
delta_center = next_box.center - rotated_curr_center
rotated_tranlated_pc = rotated_pc + delta_center
pc_displace_vectors = rotated_tranlated_pc - points
注意: 相对旋转顺序,因为采用相对坐标系旋转,根据右乘法则,有疑惑不理解小伙伴可以参考: 坐标变换总结,从零开始做自动驾驶定位(四): 前端里程计之初试.
此处简单展开一下,eg. 假如当前帧是第k帧,那么用第k-2帧位姿和第k-1帧位姿就可以计算一个位姿变化量,车辆的运动是相对平缓的,所以在k-1帧位姿基础上累加这个位姿变化量,基本就是第k帧的预测值了。
step_pose = last_pose.inverse() * current_frame_.pose; # 位姿变化,last->cur = last->world->cur, 满足相对轴变换,故右乘;如果是固定轴变换,则左乘
predict_pose = current_frame_.pose * step_pose; # world->pred = world->cur->pred,满足相对轴变换,故右乘
last_pose = current_frame_.pose;
1.函数原型:
convert_to_sparse_bev(dense_bev_data)
参数:
dense_bev_data
: 详细参考2.3)数据结构返回值:
sparse_bev_data
: 详细参考2.4)数据结构2. 做法:
对dense_bev_data稀疏化,降低维度。
cui@cui-pc:~/data/nuscenes/preprocess_data $ tree
.
├── 0_0
│ ├── 0.npy
│ └── 1.npy
├── 0_1
│ ├── 0.npy
│ └── 1.npy
├── 0_2
│ ├── 0.npy
│ └── 1.npy
├── 0_3
│ ├── 0.npy
│ └── 1.npy
├── 0_4
│ ├── 0.npy
│ └── 1.npy
├── 0_5
│ ├── 0.npy
│ └── 1.npy
├── 0_6
│ ├── 0.npy
│ └── 1.npy
├── 0_7
│ ├── 0.npy
│ └── 1.npy
├── 0_8
│ ├── 0.npy
│ └── 1.npy