数据集包含一些室内的序列,在不同的纹理下,不同的光照和不同的结构条件,从RGB-D传感器采集到的数据中分别去评估物体的重建和SLAM/视觉里程计的性能。TUM提供很多数据集,从各个有利位置捕获对象,每个条目中包含图像序列、相应的轮廓和完整的校准参数。
#!/usr/bin/python
# Software License Agreement (BSD License)
#
# Copyright (c) 2013, Juergen Sturm, TUM
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above
# copyright notice, this list of conditions and the following
# disclaimer in the documentation and/or other materials provided
# with the distribution.
# * Neither the name of TUM nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#
# Requirements:
# sudo apt-get install python-argparse
"""
The Kinect provides the color and depth images in an un-synchronized way. This means that the set of time stamps from the color images do not intersect with those of the depth images. Therefore, we need some way of associating color images to depth images.
For this purpose, you can use the ''associate.py'' script. It reads the time stamps from the rgb.txt file and the depth.txt file, and joins them by finding the best matches.
"""
import argparse
import sys
import os
import numpy
import shutil
def read_file_list(filename):
"""
Reads a trajectory from a text file.
File format:
The file format is "stamp d1 d2 d3 ...", where stamp denotes the time stamp (to be matched)
and "d1 d2 d3.." is arbitary data (e.g., a 3D position and 3D orientation) associated to this timestamp.
Input:
filename -- File name
Output:
dict -- dictionary of (stamp,data) tuples
"""
file = open(filename)
data = file.read()
lines = data.replace(","," ").replace("\t"," ").split("\n")
list = [[v.strip() for v in line.split(" ") if v.strip()!=""] for line in lines if len(line)>0 and line[0]!="#"]
list = [(float(l[0]),l[1:]) for l in list if len(l)>1]
return dict(list)
def associate(first_list, second_list,offset,max_difference):
"""
Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim
to find the closest match for every input tuple.
Input:
first_list -- first dictionary of (stamp,data) tuples
second_list -- second dictionary of (stamp,data) tuples
offset -- time offset between both dictionaries (e.g., to model the delay between the sensors)
max_difference -- search radius for candidate generation
Output:
matches -- list of matched tuples ((stamp1,data1),(stamp2,data2))
"""
first_keys = first_list.keys()
second_keys = second_list.keys()
potential_matches = [(abs(a - (b + offset)), a, b)
for a in first_keys
for b in second_keys
if abs(a - (b + offset)) < max_difference]
potential_matches.sort()
matches = []
for diff, a, b in potential_matches:
if a in first_keys and b in second_keys:
first_keys.remove(a)
second_keys.remove(b)
matches.append((a, b))
matches.sort()
return matches
if __name__ == '__main__':
# parse command line
parser = argparse.ArgumentParser(description='''
This script takes two data files with timestamps and associates them
''')
parser.add_argument('first_file', help='first text file (format: timestamp data)')
parser.add_argument('second_file', help='second text file (format: timestamp data)')
parser.add_argument('--first_only', help='only output associated lines from first file', action='store_true')
parser.add_argument('--offset', help='time offset added to the timestamps of the second file (default: 0.0)',default=0.0)
parser.add_argument('--max_difference', help='maximally allowed time difference for matching entries (default: 0.02)',default=0.02)
args = parser.parse_args()
first_list = read_file_list(args.first_file)
second_list = read_file_list(args.second_file)
matches = associate(first_list, second_list,float(args.offset),float(args.max_difference))
if not os.path.exists("rgb_sync"):
os.makedirs("rgb_sync")
if not os.path.exists("depth_sync"):
os.makedirs("depth_sync")
if args.first_only:
for a,b in matches:
print("%f %s"%(a," ".join(first_list[a])))
else:
for a,b in matches:
print("%f %s %f %s"%(a," ".join(first_list[a]),b-float(args.offset)," ".join(second_list[b])))
print("%s"%(" ".join(first_list[a])))
print(" ".join(first_list[a]).split("/")[1])
print("%s"%("rgb_sync/" + " ".join(first_list[a]).split("/")[1]))
print("%s"%(" ".join(first_list[a]).split("/")[1]))
shutil.move(" ".join(first_list[a]), "rgb_sync/" + " ".join(first_list[a]).split("/")[1])
shutil.move(" ".join(second_list[b]), "depth_sync/" + " ".join(second_list[b]).split("/")[1])
KITTI数据集由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办,是目前国际上最大的自动驾驶场景下的计算机视觉算法评测数据集。该数据集用于评测立体图像(stereo),光流(optical flow),视觉测距(visual odometry),3D物体检测(object detection)和3D跟踪(tracking)等计算机视觉技术在车载环境下的性能。KITTI包含市区、乡村和高速公路等场景采集的真实图像数据,每张图像中最多达15辆车和30个行人,还有各种程度的遮挡与截断。整个数据集由389对立体图像和光流图,39.2 km视觉测距序列以及超过200k 3D标注物体的图像组成,以10Hz的频率采样及同步。
用于室内MAV的双目+IMU数据集
对牛津的一部分连续的道路进行了上百次数据采集,收集到了多种天气、行人和交通情况下的数据,也有建筑和道路施工时的数据。1000小时以上。
该集旨在对RGB-D、视觉测距和SLAM算法进行基准测试。两个不同的场景(起居室和办公室场景)提供了基本事实。客厅具有3D表面地面实况以及深度图和相机姿势,因此完美地适用于不仅用于标记相机轨迹而且还用于重建。办公室场景仅带有轨迹数据,并且没有任何明确的3D模型。
RGB-D对象数据集是300个常见家庭对象的大型数据集。这些对象被分为51个类别。该数据集使用Kinect风格的3D相机来记录,该相机以30Hz记录同步和对准的640x480 RGB和深度图像。拍摄时将每个物体放置在转盘上旋转一整圈并捕获视频序列。对于每个对象,有3个视频序列,每个视频序列都安装在不同高度的摄像机上,以便从与地平线不同的角度观察对象。