喜乐boy

SLAM系列——机器人顶刊T-RO！用于关联、建图和高级任务的物体级SLAM框架

系列文章目录

SLAM系列——第一讲预备知识[2023.1]
SLAM系列——第二讲初识SLAM[2023.1]
SLAM系列——第三讲三维空间刚体运动[2023.1]
SLAM系列——第四讲李群李代数[2023.1]
SLAM系列——第五讲相机与图像[2023.1]
SLAM系列——第六讲非线性优化[2023.1]
SLAM系列——第七讲视觉里程计1[2023.1]
SLAM系列——第八讲视觉里程计2[2023.1]
SLAM系列——第九讲后端1[2023.1]
SLAM系列——第十讲后端2[2023.1]
SLAM系列——第十一讲回环检测[2023.1]
SLAM系列——第十二讲建图[2023.1]
SLAM系列——第十三讲实践：设计SLAM系统[2023.1]
SLAM系列——第十四讲 SLAM：现在与未来[2023.1]
SLAM系列——g2o
SLAM系列——ORB_SLAM2
SLAM系列——ORB_SLAM3
SLAM系列——机器人顶刊T-RO！用于关联、建图和高级任务的物体级SLAM框架

文章目录

系列文章目录
Abstract
- 关键词：视觉SLAM、数据关联、语义建图、增强现实、机器人
Introduction
II. RELATED WORK
- A. 数据关联
- B. 对象表示
- C. 语义场景匹配
- D. 主动感知与基于地图的对象抓取
III. SYSTEM OVERVIEW
IV. OBJECT-LEVEL DATA ASSOCIATION
- A. Intersection over Union (IoU) Model
- B. Nonparametric Test Model
- C. Single-sample and Double-sample T-test Model
V. OBJECT PARAMETERIZATION
- C. Estimation of orientation(θy)
- D. Object pose optimization
VI. OBJECT DESCRIPTOR ON THE TOPOLOGICAL MAP
- A. Semantic Topological Map
- B. Semantic Descriptor
VII. OBJECT-DRIVEN ACTIVE EXPLORATION
- A. Observation Completeness Measurement
- B. Object-Driven Exploration
VIII. EXPERIMENT
- A. Distributions of Different Statistics
- B. Object-level Data Association Experiments
- C. Qualitative Assessment of Object Parameterization
- D. Object-Oriented Map Building
- E. Augmented Reality Experiment
- F. Object-based Scene Matching and Relocalization
- G. Evaluation of Active Mapping
- H. Object Grasping and Placement
- IX. DISCUSSION AND ANALYZE
X. CONCLUSION
REFERENCES

Abstract

Object SLAM被认为对于机器人高层次感知与决策制定越来越重要。本文提出一个综合的object SLAM框架，该框架专注于基于object的感知和面向object的机器人任务。首先，我们提出了一种集成数据关联方法，用于通过结合参数和非参数统计测试来关联复杂条件下的对象。此外，我们提出基于iForest和线对齐的对象建模的离群鲁棒质心和尺度估计算法。然后由估计的通用对象模型表示轻量级和面向对象的地图。考虑到对象的语义不变性，我们将对象图转换为拓扑图以提供语义描述符以实现多图匹配。最后，我们提出了一种对象驱动的主动探索策略，以在抓取场景中实现自主建图。一系列公共数据集和映射、增强现实、场景匹配、重新定位和机器人操作方面的真实结果已被用于评估所提出的object SLAM框架的高效性能。

关键词：视觉SLAM、数据关联、语义建图、增强现实、机器人

这项工作由中国国家自然科学基金（No. 61973066）、辽宁省重大科技专项（No.2021JH1/10400049）、航天系统仿真重点实验室基金会（No.6142002200301）、可靠性装备重点实验室基金会（No.WD2C20205500306）、中央大学基础研究基金会（N2004022）支持。
Yanmin Wu 和 Xin Chen 来自东北大学机器人科学与工程学院，沈阳 110819 (Email: [email protected]）
Yunzhou Zhang, Zhiqiang Deng, and Wenkai Sun，东北大学信息科学与工程学院，沈阳 110819。（通讯作者：Yunzhou Zhang，Email：[email protected]）。
Delong Zhu 就职于中国香港特别行政区新界沙田香港中文大学电子工程系。
Jian Zhang (北京大学深圳研究生院电子与计算机工程学院, 深圳 518055)

Introduction

视觉 SLAM 的准确性和效率方面的基本问题在过去二十年中得到了极大的改善，这使得视觉 SLAM 在机器人、自动驾驶和增强现实中得到广泛应用。下一代 SLAM 将需要以更好的能力支持更智能的任务，我们称之为“几何和语义空间 AI 感知”[1]。这将极大地扩展传统几何定位和建图的范围。

在几何感知方面（例如，基于点的外观建模和基于手工特征的定位）更多的视觉地标，例如线 [2]、边缘 [3] 和平面 [4]，被用来克服环境和运动挑战。全向几何感知是通过视觉、热、惯性、LiDAR、GNSS 和 UWB 的多传感器融合实现的 [5]-[7]。在车载应用中，这些通用且稳健的算法被广泛采用。然而，由于缺乏语义线索，仅靠几何线索不足以实现智能机器人交互和主动决策，如语义映射、目标导航和目标搜索。本文重点关注下一代SLAM的另一个方面：语义感知，旨在从语义层面表示和理解环境信息，超越基本的几何外观和位置感知。

在语义 SLAM 中，深度学习技术提供的语义线索在各种子组件中起着至关重要的作用，例如定位、建图、闭环和优化。在这项工作中，我们专注于语义辅助映射并探索基于语义映射的多个高级应用程序。流行的语义映射管道 [8]-[10] 将几何 SLAM 工作流和基于学习的语义分割并行化，然后用 2D 图像分割标签标注 3D 点云（或体积、网格）。最后，将多帧分割结果与概率方法融合以构建全局语义图。尽管这些基于点云的语义图在视觉上很吸引人，但它们并不详细，并且缺乏足够的实例特定信息来帮助机器人执行细粒度的任务。因此，本文的第一个见解是，对机器人操作有用的语义映射应该是面向实例和面向对象的。

对象 SLAM 是语义 SLAM 的一个面向对象的分支，专注于以对象为中心实体构建地图，通常以实例级分割或对象检测作为语义网络。大多数关于稀疏 SLAM [11]、[12] 的研究将点云与物体地标相关联，并将点云的质心作为物体的位置。其他关于密集 SLAM 的研究 [13]-[16] 通过更密集的点云和更精确的分割/检测来改进映射结果，从而实现对象级重建和对象的密集语义表示。尽管如此，这些研究侧重于物体位置的准确性，而没有研究物体的方向和大小，这对于操纵和导航等机器人任务来说确实是不可或缺的。本文提出的第二个观点是，物体在地图中的位置、方向、大小都应该参数化。

对象参数化或表示是对象 SLAM 的主要任务之一。为了解决这个问题，典型的研究 [17]-[19] 通常包括对象模型作为先验，并且目标对象的点云或形状是已知的。然后通过模型检索和匹配来实现对象的姿态估计。先验模型也被集成到地图中并进行对象级束调整。研究 [20]-[23] 是专注于分类对象模型的示例，它仅采用对象的部分知识，例如结构和形状，因为先验和使用需要一个模型来表示一个类别。尽管对象参数在先验实例模型或类别模型中得到了很好的编码，但获取先验知识既困难又昂贵。此外，这些模型的泛化能力有限。这项工作中的第三个观察是，对象应该由具有高度通用性和低成本先验的通用模型表示，例如立方体、圆柱体和二次曲面。

总而言之，这项工作旨在提出一个对象 SLAM 框架，该框架使用通用模型生成面向对象的地图，该模型可以参数化地图中对象的位置、方向和大小。此外，我们进一步探索基于面向对象地图的高级应用。之前的一些研究 [24]、[25] 追求类似的目标，但遇到了以下挑战。 1）数据关联算法在处理涉及各种类别和数量的对象的复杂设置时不够稳健和准确。 2）对象参数化草率，通常依赖于严格的假设或仅实现不完整的建模，这两者在实践中都难以实现。 3) 大多数研究都集中在创建对象或语义图上，但没有探索在下游任务中的应用，也没有展示地图的实用性。相反，我们不仅讨论对象映射的基本技术，而且还讨论高级和面向对象映射的应用程序。

在本文中，我们提出了一个对象 SLAM 框架来实现预期目标，同时克服上述挑战。首先，我们将参数和非参数统计检验与传统的基于 IoU 的方法相结合，对数据关联进行模型集成。与传统方法相比，我们的方法充分利用了不同统计数据的性质，例如高斯、非高斯、2D 和 3D 测量，因此在关联稳健性方面表现出显着优势。然后，对于对象参数化，我们提供了一种用于质心、大小和方向估计的算法，以及一种基于 iForest（隔离森林）和线对齐的对象姿态初始化方法。所提出的方法对异常值具有鲁棒性并且表现出高精度，这极大地促进了联合姿势优化过程。最后，利用以立方体和二次曲面为代表的通用模型构建面向对象的地图。基于地图，我们开发了一个增强现实系统以实现虚实融合和交互，为机器人手臂移植了一个框架来实现常见物体的建模和抓取，并提出了一种新的用于子场景匹配和重定位的物体描述符。

本文扩展了我们之前的工作 [26]、[27]。扩展包括基于语义描述符的场景匹配/重定位（Sec.VI 和 Sec.VIII-F）以及扩展和分析（Sec.IX）。贡献总结如下：

我们提出了一种集成数据关联策略，可以有效地聚合对象的不同测量值以提高关联精度
我们提出了一种基于 iForest 和线对齐的物体姿态估计框架，该框架对异常值具有鲁棒性，可以准确估计物体的姿态和大小。
我们使用通用模型构建了一个轻量级和面向对象的地图，在此基础上我们开发了一个能够感知遮挡和碰撞的增强现实应用程序。
我们将对象图扩展为拓扑图，并根据参数化对象信息设计语义描述符，以实现多场景匹配和基于对象的重定位。
我们将对象 SLAM 与机器人抓取任务相结合，提出了一种对象驱动的主动探索策略，该策略考虑了对象观察的完整性和姿态估计的不确定性，实现了精确的对象映射和复杂的机器人抓取。
我们提出了一个全面的对象 SLAM 框架，探索关键挑战并有力地展示了它在各种场景和任务中的实用性。

II. RELATED WORK

A. 数据关联

数据关联建立了图像帧中对象与全局地图之间的 2D-3D 关系以及连续帧之间对象的 2D-2D 对应关系。最流行的策略将其视为对象跟踪问题 [11]、[28]、[29]。李等[30] 将 3D 对象投影到图像平面，然后通过匈牙利对象跟踪算法使用投影的 2D 边界框执行关联。一些方法 [16]、[31]–[33] 使用联合交集 (IoU) 算法来跟踪帧之间的对象，而基于跟踪的方法很容易在复杂的上下文中产生错误的先验，从而导致错误的关联结果。

一些研究增加了共享信息的利用率。刘等人[34] 创建一个表示对象之间拓扑关系的描述符，具有最多共享描述符的实例被认为是相同的。相反，杨等人[24] 建议使用检测到的对象上匹配的地图点的数量作为关联标准。格林瓦尔德等[15] 预设了语义标签相似性的测量，而 Ok 等人。 [35] 建议利用色调饱和度直方图相关性。 Sunderhauf等人[14] 更直接地比较不同实例之间的距离。通常，设计的标准不够普遍、详尽或稳健，导致不正确的关联。

在基于学习的研究方面，Xiang 等人[36] 建议利用递归神经网络来实现连续图像之间的语义标签数据关联。然而，他们只关注像素级关联。同样，李等人[37] 使用基于注意力的 GNN 来维护检测到的 2D 和 3D 属性。美林等[38] 提出了一种基于关键点的对象级 SLAM 系统，该系统将 3D 关键点投射到图像上，作为下一帧中对象的先验。但是，此方法未在 SLAM 数据集上进行验证，无法推广到以前未见过的对象。 Xing 等人使用深度图卷积网络。 [39]提取对象特征并进行特征匹配。然而，这种方法只适用于构建良好的地图，对实时 SLAM 的增量地图具有挑战性。

另一个可行的选择是基于概率的解决方案。鲍曼等人[20] 使用概率方法对数据关联过程进行建模，并利用 EM 算法来识别观察到的地标之间的对应关系。随后的研究 [40]、[41] 将概念扩展到关联动态对象或执行密集语义重建。然而，它们的效率受到 EM 优化器的高成本的限制。翁等[13] 提出了一个用于语义数据关联的非参数 Dirichlet 过程，它可以解决当统计不遵循高斯分布时出现的挑战。后来，张等人[42] 和 Ran 等人[43] 介绍了用于降低关联不确定性的分层 Dirichlet 方法的两种变体。伊克巴尔等人[12] 也展示了非参数数据关联的效率。然而，这种策略无法正确处理高斯分布的统计数据，因此无法充分利用 SLAM 中的多样化数据。我们结合参数和非参数方法来执行模型集成，在具有众多对象类别的复杂场景中表现出卓越的关联性能。

B. 对象表示

对象 SLAM 中的对象表示可分为基于形状重建的方法和基于模型的方法。对于前一类，Sucar 等人[23] 使用变分自动编码器从图像中推断物体体积，然后联合优化物体形状和姿势。王等[32] 采用 DeepSDF [44] 作为形状嵌入，最小化观察到的点云的表面一致性和深度渲染损失。同样，Xu 等人[33] 基于预训练的 DeepSDF 训练形状完成网络，以实现部分可见物体的完整形状重建。然而，这些方法是数据驱动的，并且显着依赖于大规模的形状先验。

基于模型的对象表示大致分为三种类型：先验实例级模型 [17]-[19]、[45]、特定类别模型和通用模型。先前的实例级模型依赖于完善的或训练有素的数据库，例如详细的点云或 CAD 模型。由于此类模型必须事先知道，因此其应用场景受到限制。此外，关于特定类别模型的研究[21]-[23]侧重于识别类别级别的特征。帕克希亚等人。 [21] 和乔希等人。 [22]通过线段的组合来表示不同的类别，但类别特定的特征不够普遍，无法描述过多的类别。

一般的对象模型由简单的几何元素表示，例如立方体、二次曲面和圆柱体，它们是最有效的模型。有两种典型的建模方法。第一种类型从 2D 检测结果推断 3D 姿态。杨等。 [24] 利用消失点从单个视图中对 3D 立方体提案进行采样，然后使用几何测量优化对象姿势。尼科尔森等人[25] 结合多视图观察将对象地标参数化为受约束的对偶二次曲面。随后的研究 [35]、[46] 通过结合形状和语义先验以及平面约束来改进二次表示。然而，这种从 2D 对象推断出来的精度很差，并且有很大的错误。李等[37] 应用超二次曲面自适应地在 3D 框和二次曲面之间进行调整。但是，它们依赖于额外的 3D 对象检测。另一种类型的方法通过 3D 点云测量来解析 3D 对象姿态。一些研究 [11]-[13] 使用点云中心描绘物体位置，这是表达物体属性的一种不精确的方式。Runz 等人[47] 使用更准确的实例和基于几何的分割获得密集对象重建结果。虽然对象的位置和大小是可行的，但方向被忽略了。其他一些涉及方向估计的研究[24]、[48]、[49]利用图像或点云的几何特征进行方向采样和分析。然而，它们面临着鲁棒性不足的问题。相比之下，研究[50]、[51]使用基于学习的方法从图像方向回归，但在准确性和泛化方面存在问题。在这项工作中，基于一般对象模型，我们提出了一种使用 iForest 和线对齐方法的离群鲁棒对象姿态估计算法，以更好地参数化对象大小和方向。

C. 语义场景匹配

场景匹配对于机器人重新定位、闭环和多代理协作至关重要。传统研究 [52]、[53] 依赖于关键帧和几何特征，当面对视点、光照和外观的变化时，它们很容易失败。相反，由于语义信息（例如，标签和大小）的时间和空间不变性，基于语义的场景匹配更有效。

加维尔等人[54] 专注于多视图机器人的全局场景匹配，他们提出了一种基于随机游走的语义描述符，以通过语义图匹配实现全局定位。郭等[55] 研究了大规模场景匹配问题，并提出了一种基于语义直方图的快速图匹配算法，从而实现更准确、更快速的定位和地图合并。然而，这些方法只考虑了大场景的全局匹配，忽略了局部信息。此外，语义信息不在对象级别。刘等人[34] 对环境外观变化时的本地化问题感兴趣。他们建议使用密集的语义拓扑图来表征场景，并通过匹配对象描述符来执行 6-DOF 对象定位。同样，李等人[30] 关注视角变化的重新定位。他们使用目标地标建立不同视图之间的对应关系，并通过基于匈牙利算法的图形匹配进行重定位。然而，考虑到对象数量有限以及它们没有很好地参数化这一事实，他们的方法在具有多个重复对象的复杂设置中是值得怀疑的。为了解决多目标场景中的闭环问题，Qin 等人[56] 建议使用对象的语义标签生成语义子图，然后利用 Kuhn-Munkres 对齐子图以估计转换。然而，语义线索仅用于确定场景的相似度，而它们之间的翻译仍然是通过几何度量而不是语义度量来计算的。在这项工作中，我们专注于多个对象的场景匹配和场景转换。与之前的研究类似，我们创建了一个拓扑图并设计了一个对象描述符。在地图中，对象被完全参数化，基于对象描述符的匹配策略也得到了改进。

D. 主动感知与基于地图的对象抓取

主动感知是通过分析现有数据主动调整传感器状态以收集更有价值的信息以执行特定任务的过程，这是机器人自主性的关键特征。张等。 [57] 利用 Fisher 信息来预测最佳传感器位置以减少定位不确定性。曾等。 [58]利用对象之间的先验知识为主动对象搜索建立语义链接图。更具体地说，主动建图是一种与自主地图构建有关的特定类型的主动感知任务。查罗等人。 [59]利用二次互信息来指导 3D 密集映射。王等[60] 还利用互信息在稀疏路线图上执行下一个最佳视图 (NBV) 选择，随后作为语义地标来帮助映射过程。克里格尔等人[61] 提出了一种针对单个未知物体的表面重建方法。除了信息增益，他们还将重建质量的度量集成到目标函数中，实现了高精度和完整性。主动映射的关键是定义引导代理自主移动的度量和策略。我们提出了基于信息熵的不确定性量化和对象驱动的主动探索策略。与其他方法的另一个显着区别是我们建议的方法的输出是与复杂机器人操作任务兼容的对象图。

用对象姿势编码的对象图可用于机器人对象操作任务，例如对象放置和排列。和田等。 [62] 提出通过增量对象级体素映射重建对象。体素点初始化物体姿态，然后使用ICP算法将初始化物体与CAD模型对齐以进一步优化姿态，这在很大程度上取决于CAD模型的配准精度。在 NodeSLAM [23] 中，对象被视为地标并参与联合优化以帮助生成准确的对象图。这种方法的主要缺陷是模型需要对每个对象进行繁琐的类别级训练过程。 Labbe’ 等人。 [19] 提出了一种单视图 6-DoF 对象姿态估计方法，并利用 SLAM 框架中的对象级束调整来优化对象图。但是，此方法仅关注已知对象。阿尔梅达等。 [63] 利用 SLAM 框架密集映射未知物体以进行准确的抓取点检测，但未估计物体姿态。在这项工作中，我们使用所提出的 SLAM 框架主动生成对象地图，使全局感知能够帮助机器人自主执行更智能的任务。此外，与之前的研究不同，我们专注于未知物体的姿态估计。

III. SYSTEM OVERVIEW

Fig. 1: The proposed object SLAM framework.

所提出的对象 SLAM 框架如图 1 所示，包括四个部分。跟踪模块建立在 ORB-SLAM2 [52] 的基础上，它生成增量稀疏点云并通过提取和匹配多视图特征来估计相机位姿。我们的主要贡献在于剩下的三个部分。语义模块使用 YOLO [64] 作为对象检测器来提供语义标签和边界框，然后将它们与点云测量相结合，将 2D 检测到的对象与 3D 全局对象相关联。之后，应用 iForest 和线对齐算法对跟踪模块生成的点云和 2D 线进行细化。基于关联和细化结果，使用立方体和二次模型对对象进行参数化。

对象映射由多个参数化对象组成，实现了环境的轻量级表示，是应用模块的重要组成部分。对于增强现实应用，虚拟模型的 3D 配准是基于真实世界的物体姿态，而不是传统的基于点的方法。此外，我们将对象图转换为拓扑图，即对象及其相对位姿的图形表示。基于该地图，设计了语义描述符以实现多场景匹配和重定位任务。

准确的物体姿态被编码在物体图中，它为机器人抓取应用提供了基本线索（例如，抓取点）。值得注意的是，对象图是主动创建的，如探索模块中所述。在这里，我们提出了一个不确定性测量模型来预测下一个最佳探索视图。机械手然后主动移动以最佳视图扫描桌子，直到建立完整且准确的对象图。

简而言之，所提出的对象 SLAM 利用几何和语义测量来同时实现相机定位和对象地图构建，从而形成一个综合系统来应对该领域的各种挑战，并促进许多智能和迷人的应用。我们论文的其余部分组织如下：IV 和 V 介绍了数据关联和对象参数化的主要理论。语义描述符和场景匹配方法在第 VI 节中定义。第七节介绍了主动勘探策略。第八节通过综合实验展示了我们系统的性能。第九节提供讨论和分析，第十节提供结论。

IV. OBJECT-LEVEL DATA ASSOCIATION

图 2 展示了所提出的数据关联策略的流程。局部对象是在当前单个视图 (t) 中观察到的 3D 实例，其中点云对应于位于 2D 边界框内的 ORB 特征，质心是点的平均值。全局对象是多帧（t之前）观察到的实体，已经存在于地图上，其中点云和质心也来自于多视图的增量测量。数据关联的目的是确定地图中的哪个全局对象与当前视图中的局部对象相关联。如管道所示，相机运动 IoU (M-IoU)、非参数 (NP) 测试、单样本 t (S-t) 测试和项目 IoU (P-IoU) 将用于确定关联是否是成功的。实验中，成功案例应满足第四项和前三项中的任意一项。如果是这样，现有的全局对象将被更新；否则，将创建一个新的全局对象。最后，使用双样本-t (D-t) 检验来检查是否存在重复。

在本节中，使用以下符号：

A. Intersection over Union (IoU) Model

如果在前两帧（t-1 和 t-2）中观察到全局对象，则我们基于匀速运动假设预测当前帧（t）中的边界框，并计算预测框之间的 IoU 全局对象的检测框和局部对象的检测框，我们将其定义为 Motion-IoU（见图 2 M-IoU 部分）。如果 IoU 值足够大，则两个对象之间可能存在潜在关联。在 NP 和 S-t 之后（参见第 IV-B 和 IV-C 节），Project-IoU 将通过将全局对象的 3D 点云投影到当前帧上的 2D 点并为这些点拟合一个框来验证这种关联。之后，我们计算投影框和检测框之间的 IoU（见图 2 P-IoU 部分）。

Fig. 2: The pipeline of object-level data association.

B. Nonparametric Test Model

m-IoU 模型提供了一种直接有效的方法来处理连续帧的情况。但是，当 1) 物体被检测器遗漏，2) 物体被遮挡，或 3) 物体从相机视野中消失时，它将发生故障。

Nonparametric test model不需要连续观察物体，可以直接应用于处理两组点云P和Q（见图2 NP部分），基于点云服从非高斯分布的假设（这将在第 VIII-A 节中进行演示）。理论上，如果 P 和 Q 代表同一个对象，它们应该服从相同的分布，即 $f_P = f_Q$ 。我们使用 Wilcoxon 秩和检验 [65] 来验证原假设是否成立。

我们首先混合两个点云 $X=[P|Q]=[x_1,x_2,...,x_{|X|}] \in \R^{3\times(|P|+|Q|)}$ , 然后将 X 按三个维度分类。定义 $W_P\in \R^{3\times1}$ 如下，

$W_p=\{\sum_{k=1}^{|X|}\mathcal{R}(1\{x_k\in P\}-\frac{|P|(|P|+1)}{2})\},$

和 $W_Q$ 是同一个公式。 Mann-Whitney 统计量为 $W= min(W_P , W_Q)$ ，证明其渐近服从高斯分布 [66]、[67]。在这里，我们本质上是使用非高斯点云构建高斯统计。计算W的均值和方差：

$m (W) = (∣ P ∣∣ Q ∣) /2,$

$\sigma(W)=\frac{|P||Q|\Delta^+}{12} - \frac{|P||Q|(\Sigma_i\tau_i^3-\Sigma_i\tau_i)}{12(|P|+|Q|)\Delta^-},$

其中 $\Delta^+ = |P| + |Q| + 1, \Delta^− = |P| + |Q| − 1$ , 且 $τ \in P \cap Q$ 。 $τ$ 表示两个对象之间的共享点数；因为它的值很小，方程式中复杂且贡献低的第二项。 (3) 在我们的实施中被忽略。

为使原假设成立，W 应满足以下约束条件：

$f(W) ≥ f (r_r) = f (r_l) = α/2,$

其中 α 是显着性水平，1−α 是置信水平， $[r_l, r_r] ≈ [m − s\sqrtσ, m + s\sqrtσ ]$ 定义置信区域。标量 s > 0 在归一化高斯分布 $\mathcal N (s|0, 1)=α$ 上定义。总之，如果两个点云P和Q的Mann-Whitney统计量W满足Eq。 (4)、我们暂时假设它们来自同一个对象。

C. Single-sample and Double-sample T-test Model

单样本 t 检验用于处理在不同视图中观察到的对象质心（参见图 2 S-t 部分），其通常遵循高斯分布（参见第 VIII-A 节）。假设原假设是C和c来自同一个对象，定义t统计量如下，

$t=\frac{m(C)-c}{\sigma(C)/\sqrt{|C|}}\sim t(|C|-1).$

为了使零假设成立，t 应该满足：

$f(t)\ge f(t_{\alpha/2,v})=\alpha/2,$

其中 $t_{α/2,v}$ 是 v 自由度的 t 分布的上 α/2 分位数，v = |C|−1。如果t统计量满足（6），我们暂时假设c和C来自同一个对象。

由于上述严格的数据关联策略、不良的观察视图或错误的对象检测，某些现有对象可能会被错误识别为新对象，从而导致重复。因此，利用双样本 t 检验通过分析两个对象的历史质心来确定是否合并两个对象（见图 2 D-t 部分）。

如下构造 $C_1$ 和 $C_2$ 的 t 统计量，

$t=\frac{m(C_1)-m(C_2)}{\sigma_d}\sim t(|C_1|+|C_2|-2),$

$\sigma_d=\sqrt{\frac{(|C_1|-1)\sigma_1^2+(|C_2|-1)\sigma_2^2}{|C_1|+|C_2|-2}(\frac{1}{|C_1|}+\frac{1}{|C_2|})},$

其中 $σ_d$ 是两个对象的合并标准差。同样，如果 t 满足（6），则 $v = |C_1| + |C_2| − 2$ ，表示 $C_1$ 和 $C_2$ 属于同一个对象，则合并。

Fig. 3: (a-c) Demonstration of object parameterization. (d-e) Demonstration of iForest.

V. OBJECT PARAMETERIZATION

数据关联为全局对象提供多视图测量，确保更多观察参数化以有效地建模对象。在本节中，使用以下符号：

$t=[t_x,t_y,t_z]^T$ - 目标坐标系在世界坐标系中的平移（位置）。
$\theta=[\theta_r,\theta_y,\theta_p]^T$ - 目标坐标系w.r.t.旋转到世界坐标系。 R(θ) 是矩阵表示。
$T=\{R(\theta),t\}$ - 目标坐标系w.r.t.到世界坐标系的转移矩阵。
$s=[s_l,s_w,s_h]^T$ - 3D框的半边长度，例如，目标的尺寸。
$P_o,P_w\in\R^{3\times8}$ - 对应立方体的八个顶点在对象和世界坐标系中的坐标。
$Q_o,Q_w\in\R^{4\times4}$ - 分别由其在对象和世界坐标系中的半轴参数化的二次曲面，其中 $Q_o=diag\{s_l^2,s_w^2,s_h^2,-1\}$
$\alpha(*)$ - 计算图像中线段的角度。
$K,T_c$ - 相机内外参。
$p\in\R^{3\times1}$ - 世界坐标系中一个点的坐标。

A. 对象表示

在这项工作中，我们利用立方体和二次曲面/圆柱体来表示对象，而不是复杂的实例级或类别级模型。对于具有规则形状的对象，例如书本、键盘和椅子，我们使用立方体（由它们的顶点 $P_o$ 编码）来表示它们。对于球、瓶、杯等没有明确方向的非规则物体，采用二次/圆柱（由其半轴 $Q_o$ 编码）表示，忽略其方向参数。这里， $P_o$ 和 $Q_o$ 在对象框架中表示，仅取决于尺度 s。为了将这些元素注册到全局地图，我们还需要估计它们的平移 t 和方向 θ w.r.t。全局框架。全局坐标系中的立方体和二次曲面表示如下：

$P_w=R(\theta)P_o+t,$

$Q_w=TQ_oT^T.$

这两种模型可以方便地切换，如图3©所示。假设物体与地面平行放置，如在其他作品 [68]、[69] 中，即 θr=θp=0，我们只需要估计立方体的 [θy, t, s] 和 [t, s ] 对于二次曲面。

Fig. 4: Line alignment to initialize object orientation. (a) Object and line detection in 2D image. (b-d) Angle sampling in 3D space; (e-g) Projection of angle sampling process in 2D images.

B. Estimation of translation(t) and scale(s)

Assuming that the object point clouds X are in the global frame, we follow conventions and denote its mean by t, based on which the scale can be calculated by s = (max(X) − min(X))/2, as shown in Fig.3(a). The main challenge here is that X is typical with many outliers, which will introduce a substantial bias to t and s. One of our major contributions in this paper is the development of an outlier-robust centroid and scale estimation algorithm based on the iForest [70] to improve the estimation accuracy. The detailed procedure of our algorithm is presented in Alg. 1.

假设目标点云X在全局坐标系中，我们按照约定将其均值表示为t，据此可以计算尺度s = (max(X) − min(X))/2，如图所示在图 3(a) 中。这里的主要挑战是 X 是具有许多异常值的典型值，这会给 t 和 s 带来很大的偏差。我们在本文中的主要贡献之一是开发了一种基于 iForest [70] 的离群鲁棒质心和尺度估计算法，以提高估计精度。我们算法的详细过程在算法中给出。 1.

The key idea of the algorithm is to recursively separate the data space into a series of isolated data points, and then take the easily isolated ones as outliers. The philosophy is that, normal points are typically located more closely and thus need more steps to isolate, while the outliers usually scatter sparsely and can be easily isolated with fewer steps. As indicated by the algorithm, we first create t isolated trees (the iForest) using the point cloud of an object (lines 2 and 14-33), and then identify the outliers by counting the path length of each point x ∈ X (lines 3-9), in which the score function is defined as follows:

该算法的核心思想是递归地将数据空间分离成一系列孤立的数据点，然后将容易孤立的数据点作为异常值。原理是，正常点通常位于更近的位置，因此需要更多的步骤来隔离，而离群点通常分散得很稀疏，可以用更少的步骤轻松隔离。如算法所示，我们首先使用对象的点云（第 2 行和第 14-33 行）创建 t 个孤立树（iForest），然后通过计算每个点 x ∈ X 的路径长度来识别异常值（第 2 行） 3-9），其中评分函数定义如下：

$s(x)=2exp\frac{-E(h(x))}{C},$

$C=2H(|X|-1)-\frac{2(|X|-1)}{|X|},$

where C is a normalization parameter, H is a harmonic number H(i) = ln (i) + 0.5772156649, h(x) is the height of point x in the isolated tree, and E is the operation to calculate the average height. As demonstrated in Fig. 3(d)-(e), the yellow point is isolated after four steps; hence its path length is 4, whereas the green point has a path length of 8. Therefore, the yellow point is more likely to be an outlier. In our implementation, points with a score greater than 0.6 are removed and the remaining are used to calculate t and s (lines 10-12). Based on s, we can initially construct the cubics and quadratics in the object frame, as shown in Fig. 3(a)-©.

其中C为归一化参数，H为调和数H(i) = ln(i) + 0.5772156649，h(x)为孤立树中点x的高度，E为计算平均高度的操作。如图 3(d)-(e) 所示，黄色点经过四步分离；因此它的路径长度为 4，而绿色点的路径长度为 8。因此，黄色点更可能是异常值。在我们的实现中，分数大于 0.6 的点被删除，剩余的点用于计算 t 和 s（第 10-12 行）。基于s，我们可以初步构造目标坐标系中的三次和二次，如图3(a)-©所示。

C. Estimation of orientation(θy)

The estimation of θy is divided into two steps, namely, to find a good initial value for θy first and then conduct numerical optimization based on the initial value. Since pose estimation is a non-linear process, a good initialization is very important to help improve the optimality of the estimation result. Conventional methods [30], [47] usually neglect the initialization process, which typically yields inaccurate results.

θy的估计分为两步，即先为θy找到一个好的初值，然后根据初值进行数值优化。由于姿态估计是一个非线性过程，因此良好的初始化对于帮助提高估计结果的最优性非常重要。传统方法 [30]、[47] 通常忽略初始化过程，这通常会产生不准确的结果。

Algorithm 1 Centroid and Scale Estimation Based on iForest



Input: X - The point cloud of an object, t - The number of
iTrees in iForest, ψ - The subsampling size for an iTree.
Output: F - The iForest, a set of iTrees, t - The origin of
local frame, s - The initial scale of the object.
1: procedure PARAOBJECT(X, t, ψ)
2: F ← BUILDFOREST(X, t, ψ)
3: for point x in X do
4: E(h) ← averageDepth(x, F)
5: s ← score(E(h), C) . Eq. (11) and (12)
6: if s > 0.6 then . an empirical value
7: remove(x) . remove x from X
8: end if
9: end for
10: t ← meanValue(X)
11: s ← (max(X) - min(X)) / 2
12: return F, t, s
13: end procedure
14: procedure BUILDFOREST(X, t, ψ)
15: F ← φ
16: l ← ceiling(log2 ψ) . maximum times of iterations
17: for i = 1 to t do
18: X(i) ← randomSample(X, ψ)
19: F ← F ∪ BUILDTREE(X(i)
, 0, l)
20: end for
21: return F
22: end procedure
23: procedure BUILDTREE(X, e, l)
24: if e ≥ l or	X
25: return exNode{	X
26: end if
27: i ← randomDim(1, 3) . get one dimension
28: q ← randomSpitPoint(X[i])
29: Xl
, Xr ← split(X[i], q)
30: L ←BUILDTREE(Xl
, e + 1, l) . get child pointer
31: R ←BUILDTREE(Xr, e + 1, l)
32: return inNode{L, R, i, q}
33: end procedure

The detail of orientation initialization algorithm is presented in Alg. 2. The inputs are obtained as follows: 1) LSD (Line Segment Detector [71]) segments are extracted from t consecutive images, and those falling in the bounding boxes are assigned to the corresponding objects (see Fig. 4(a)); 2) The initial pose of an object is assumed to be consistent with the global frame, i.e., θ0=0 (see Fig. 4b). In the algorithm, we first uniformly sample thirty angles within [−π/2, π/2] (line 2). For each sample, we then evaluate its score by calculating the accumulated angle errors between LSD segments Zlsd and the projected 2D edges of 3D edges Z of the cube (lines 3-12). The error is defined as follows:

方向初始化算法的细节在算法中给出。 2.输入如下： 1）从t个连续图像中提取LSD（Line Segment Detector [71]）线段，将落在边界框内的线段分配给相应的对象（见图4（a）） ; 2）假设物体的初始姿态与全局框架一致，即θ0=0（见图4b）。在该算法中，我们首先在 [−π/2, π/2] 内均匀采样三十个角度（第 2 行）。对于每个样本，我们然后通过计算 LSD 段 Zlsd 和立方体的 3D 边 Z 的投影 2D 边之间的累积角度误差来评估其分数（第 3-12 行）。错误定义如下：

$e(\theta)=||\alpha(\hat Z(\theta))-\alpha(Z_{lsd})||^2, \hat Z(\theta)=KT_c(R(\theta)Z+t).$



Algorithm 2 Initialization for Object Pose Estimation
Input: Z1, Z2, . . . , Zt - Line segments detected by LSD in t
consecutive images, θ0 - The initial guess of yaw angel.
Output: θ - The estimation result of yaw angel, e - The
estimation errors.
1: S, E ← φ
2: Θ ← sampleAngles(θ0, 30) . see Fig. 4 (b)-(d)
3: for sample θ in Θ do
4: sθ, eθ ← 0
5: for Z in {Z1, Z2, . . . , Zt} do
6: s, e ← score(θ, Z) . Eq. (13) and (14)
7: sθ ← sθ + s
8: eθ ← eθ + e
9: end for
10: S ← S ∪ {sθ}
11: E ← E ∪ {eθ}
12: end for
13: θ
∗ ← argmax(S)
14: return θ
∗
, eθ
∗

The demonstration of the calculation of e(θ) is visualized in Fig. 4(e)-(g). The score function is defined as follows:

e(θ) 的计算演示如图 4(e)-(g) 所示。得分函数定义如下：

$Score=\frac{N_P}{N_a}(1+0.1(ξ-E(e))),$

where Na is the total number of line segments of the object in the current frame, Np is the number of line segments that satisfy e < ξ, ξ is a manually defined error threshold (five degrees here), and E(e) is the average error of these line segments with e < ξ. After evaluating all the samples, we choose the one that achieves the highest score as the initial yaw angle for optimization (line 13).

其中Na为当前帧中物体的线段总数，Np为满足e < ξ的线段数，ξ为手动定义的误差阈值（此处为五度），E(e)为这些线段的平均误差，其中 e < ξ。在评估所有样本后，我们选择得分最高的样本作为初始偏航角进行优化（第 13 行）。

D. Object pose optimization

After obtaining the initial s and θy, we then jointly optimize object and camera poses:

在获得初始 s 和 θy 之后，我们然后联合优化对象和相机位姿：

$\{O,T_c\}^*=\argmin_{\{\theta_y,s\}}\sum(e(\theta)+e(s))+\argmin_{\{T_c\}}\sum e(p),$

where the first term is the object pose error defined in Eq. (13) and the scale error e(s) is defined as the distance between the projected edges of a cube and their nearest parallel LSD segments. The second term e§ is the commonly-used reprojection error in the traditional SLAM framework.

其中第一项是等式中定义的物体位姿误差。 (13) 和尺度误差 e(s) 被定义为立方体的投影边缘与其最近的平行 LSD 段之间的距离。第二项 e§ 是传统 SLAM 框架中常用的重投影误差。

VI. OBJECT DESCRIPTOR ON THE TOPOLOGICAL MAP

After the step of object parameterization, we obtain the label, size, and pose information of a single object. To present the relationship between objects and that between objects and the scene, we create a topological map. The map is then used to generate an object descriptor for scene matching.

在对象参数化步骤之后，我们获得了单个对象的标签、大小和位姿信息。为了呈现对象之间以及对象与场景之间的关系，我们创建了一个拓扑图。该地图随后用于生成用于场景匹配的对象描述符。

A. Semantic Topological Map

The topological map is an abstract representation of the scene. In this work, to construct the semantic topological map, the 3D object centroid is used to represent the node N that encodes the semantic label l and the object parameters t, θ, s. Then, under the distance and number constraints, we generate the undirected edge E between objects, which includes the distance d and angle α of two objects:

拓扑图是场景的抽象表示。在这项工作中，为了构建语义拓扑图，使用 3D 对象质心表示编码语义标签 l 和对象参数 t、θ、s 的节点 N。然后，在距离和数量约束下，我们生成对象之间的无向边 E，它包括两个对象的距离 d 和角度 α：

$N =< l, t, θ, s >, E =< d, α > .$

Fig. 5(a) presents a real-world scene with multiple objects. Fig. 5(b) shows the object modeling result by the method of Section V, which is then used to create a semantic topological map (Fig. 5©) that expresses the scene in an abstract way and shows the connection relationship between objects as symbolized in Eq. (16).

图 5(a) 展示了一个具有多个对象的真实世界场景。图5(b)展示了通过Section V的方法得到的对象建模结果，然后用它来创建一个语义拓扑图(Fig. 5©)，以抽象的方式表达场景并显示之间的连接关系等式中符号化的对象。 (16).

Fig. 5: (a) Real-world scene. (b) Object-level map. © Semantic topological map. (d) Random walk descriptor. (e) 3D matrix visualization of a single descriptor.

B. Semantic Descriptor

Since the object information, including semantic label, position, and scale, is not unique, the computation for undirected graph matching, an NP problem [72], is extremely high. To reduce the computational complexity and enhance the matching accuracy, we introduce a random-walk descriptor that weights multi-neighborhood measurements to describe an object, improving object uniqueness and the relationship with the scene.

由于对象信息（包括语义标签、位置和比例）不是唯一的，因此无向图匹配（NP 问题 [72]）的计算量非常大。为了降低计算复杂度并提高匹配精度，我们引入了一种随机游走描述符，该描述符对多邻域测量进行加权以描述对象，从而提高对象的唯一性和与场景的关系。

The random-walk descriptor is represented by a 2D matrix, as shown in Fig. 5(d), with each row storing a walking route that starts at the described object, and randomly points to the next object. It is worth noting that each object only appears once in a route and the process ends when reaching a certain depth i or time j limit.

随机游走描述符由一个二维矩阵表示，如图 5(d) 所示，每一行存储一条步行路线，该路线从所描述的对象开始，并随机指向下一个对象。值得注意的是，每个对象在一条路线中只出现一次，当达到一定的深度 i 或时间 j 限制时，该过程结束。

The previous work [54] only considers the semantic label l = (l1, l2, · · · , li) as the descriptor. Benefiting from the above accurate object parameterization, we add three additional measurements, object size s = (s1, s2, · · · , si), distance d = (d11, d12, · · · , d1i), and angle α = (α11, α12, · · · , α1i), to improve the robustness of the descriptor. As shown in Fig. 5(e), thus we transfer the random-walk descriptor to a 3D matrix form:

之前的工作[54]只考虑语义标签l = (l1, l2,···,li)作为描述符。受益于上述精确的对象参数化，我们添加了三个额外的测量值，对象大小 s = (s1, s2, · · · , si)，距离 d = (d11, d12, · · · , d1i)，角度 α = ( α11,α12,····,α1i),提高描述符的鲁棒性。如图 5(e) 所示，我们将随机游走描述符转换为 3D 矩阵形式：

$v=(r_1,r_2,···,r_j)^T,r_j=(l,s,d,\alpha)^T.$

In our implementation, the additional measurement does not increase the computation. Instead, it accelerates the matching process by eliminating irrelevant candidates with more clues, such as label and size.

在我们的实现中，额外的测量不会增加计算量。相反，它通过消除具有更多线索（例如标签和大小）的不相关候选者来加速匹配过程。

Alg. 3 describes the procedure for scene matching. Firstly, each object’s semantic descriptor is generated in two independent sub-topological maps (lines 3-4, 10-17). Then find the best matching object-pair by scoring the similarity of each element (l, s, d, α) (lines 5-7, 18-21). Finally, the transformation between two scenes is solved by singular value decomposition (SVD) according to the multiple object pairs (line 8).

藻类。图 3 描述了场景匹配的过程。首先，在两个独立的子拓扑图中生成每个对象的语义描述符（第 3-4、10-17 行）。然后通过对每个元素 (l, s, d, α) 的相似性进行评分来找到最佳匹配对象对（第 5-7、18-21 行）。最后，根据多个对象对（第 8 行），通过奇异值分解 (SVD) 解决两个场景之间的转换。

There are some points worth mentioning: 1) Scale ambiguity: Two maps are initialized with different depths resulting in distinct scales. While object size, like Li et al. [50], provides a scale by length, width, and height, it is insufficiently robust. Instead, we find the matched object pair between two maps, then calculate the scale factor by averaging the ratio of the distance d. 2) Anomalous object: The mismatch resulting from the error object or novel object may cause a considerable inaccuracy in the resolution of the translation; therefore, the RANSAC algorithm is used to eliminate the disturbance caused by anomalous objects.

有几点值得一提： 1）比例模糊：两张地图初始化为不同的深度，导致不同的比例。虽然对象大小，如 Li 等人。 [50]，提供了长度、宽度和高度的比例，它不够稳健。相反，我们找到两个地图之间的匹配对象对，然后通过平均距离 d 的比率来计算比例因子。 2）异常对象：由错误对象或新对象引起的不匹配可能导致翻译的分辨率出现相当大的不准确；因此，采用RANSAC算法来消除异常物体引起的干扰。

VII. OBJECT-DRIVEN ACTIVE EXPLORATION

Object parameterization is good for quantifying the incompleteness of the object or map, and the incompleteness provides a driving force for active exploration. We consider the robotic grasping scene as an example. As shown in Fig. 6, the robot arm is fitted with a camera, the motion module controls the robot to execute observation commands. The perception module parametrized the object map by Section IV and Section V. The analysis module measures object uncertainty and predicts different camera views’ information gains. The view with the greatest information gain is selected as the Next Best View (NBV) and passed to the motion module to enable active exploration. We aim to incrementally build a global object map with the minimum effort and the maximum accuracy for robotic grasping.

对象参数化有利于量化对象或地图的不完整性，不完整性为主动探索提供动力。我们以机器人抓取场景为例。如图6所示，机器人手臂上装有摄像头，运动模块控制机器人执行观察指令。感知模块通过第四节和第五节对对象图进行参数化。分析模块测量对象的不确定性并预测不同相机视图的信息增益。具有最大信息增益的视图被选为下一个最佳视图 (NBV)，并传递给运动模块以启用主动探索。我们的目标是以最小的努力和最大的机器人抓取精度逐步构建全局对象图。

Fig. 7: Illustration of observation completeness measurement. Left: Raw image. Center: Objects with point cloud. Right: Objects with surface grids.

A. Observation Completeness Measurement

We focus on active map building and regard the incompleteness of the map as a motivating factor for active exploration. Existing studies usually take the entire environment as the exploration target [59], [73] or focus on reconstructing a single object [61], [74], neither of which is ideal for building the object map required by robotic grasping. The reasons are as follows: 1) The insignificant environmental regions will interfere with the decisions made for exploration and misguide the robot into the non-object area; 2) it will significantly increase the computational cost and thus reduce the efficiency of the whole system. We propose an object-driven active exploration strategy for building the object map incrementally. The strategy is designed based on the observation completeness of the object, which is defined as follows.

我们专注于主动地图构建，并将地图的不完整性视为主动探索的激励因素。现有研究通常将整个环境作为探索目标 [59]、[73] 或专注于重建单个对象 [61]、[74]，这两种方法都不是构建机器人抓取所需对象图的理想选择。原因如下：1）无关紧要的环境区域会干扰探索决策，误导机器人进入非目标区域； 2) 会显着增加计算成本，从而降低整个系统的效率。我们提出了一种对象驱动的主动探索策略，用于逐步构建对象图。该策略是基于对象的观察完整性设计的，其定义如下。

As demonstrated in Fig. 7, the point clouds of an object are translated from the world frame to the object frame and then projected onto the five surfaces of the estimated 3D cube. Here, the bottom face is not considered. Each of the five surfaces is discretized into a surface occupancy grid map [75] with cell size m ∗ m (m = 1cm in our implementation). Each grid cell can be in one of three states:

如图 7 所示，对象的点云从世界框架转换到对象框架，然后投影到估计的 3D 立方体的五个表面上。这里不考虑底面。五个表面中的每一个都被离散化为表面占用网格图 [75]，单元格大小为 m * m（在我们的实现中 m = 1cm）。每个网格单元可以处于三种状态之一：

unknown: the grid is not observed by the camera;
occupied: the grid is occupied by the point clouds;
free: the grid can be seen by the camera but is not occupied by the point clouds.
unknown：相机没有观察到网格；
occupied：网格被点云占据；
free：网格可以被相机看到但不被点云占据

We use information entropy [76] to determine the completeness of observations based on the occupancy grid map, as information entropy has the property of symptomatizing uncertainty. The entropy of each grid cell is defined by a binary entropy function:

我们使用信息熵[76]来确定基于占用网格图的观察的完整性，因为信息熵具有表征不确定性的特性。每个网格单元的熵由二元熵函数定义：

$H_{grid}(p)=-p log(p)-(1-p)log(1-p),$

where p is the probability of a grid cell being occupied and its initial value before exploration is set to 0.5. The total entropy is therefore defined as

其中 p 是网格单元被占用的概率，其探索前的初始值设置为 0.5。因此，总熵定义为

$H_{obj}=\sum_{o\in \mathbb O}H_o+\sum_{f \in \mathbb F}H_f+\sum_{u \in \mathbb U}H_u,$

and the normalized total entropy is

$\overline H_{obj}=H_{obj}/(|\mathbb O|+|\mathbb F|+|\mathbb U|),$

where Ho, Hf , Hu are the entropy of occupied, free, and unknown grids, $\mathbb O, \mathbb F, \mathbb U$ are sets of the occupied, free, and unknown grid cells, respectively. $|\mathbb X|$ represents the size of X. As objects continue to be explored, the number of unknown grid cells is gradually reduced, making all grids’ normalized entropy $\overline H_{grid}$ a smaller value. The lower the $\overline H_{grid}$ is, the higher the observation completeness is. The exploration objective is to minimize $\overline H_{grid}$ .

其中 Ho, Hf , Hu 是占用、空闲和未知网格的熵， $\mathbb O, \mathbb F, \mathbb U$ 分别是占用、空闲和未知网格单元的集合。 $|\mathbb X|$ 表示X的大小。随着物体的不断探索，未知网格单元的数量逐渐减少，使得所有网格的归一化熵 $\overline H_{grid}$ 成为一个更小的值。 $\overline H_{grid}$ 越低，观测完整性越高。探索目标是最小化 $\overline H_{grid}$ 。

B. Object-Driven Exploration

Information Gain Definition: As illustrated in Fig. 8(b), object-driven exploration aims to predict the information gain of different candidate camera views and then select the one to explore that maximizes the information gain, i.e., the NBV. The information in this work is defined as the uncertainty of the map, as mentioned in Section VII-A. The information gain is thus defined as the measurement of uncertainty reduction and accuracy improvement after the camera is placed at a specific pose. Conventionally, information gain is defined based on the area of unknown regions of the environment, e.g., the black holes in the medium subfigure of Fig. 8(a), which may mislead the object map building. Compared with the conventional one, our proposed information gain is built on the observation completeness measurement of the object, shown in the right subfigure of Fig. 8(a), and incorporates the influence on object pose estimation.

信息增益定义：如图8（b）所示，对象驱动的探索旨在预测不同候选相机视图的信息增益，然后选择一个最大化信息增益的探索，即NBV . 本工作中的信息被定义为地图的不确定性，如第 VII-A 节所述。因此，信息增益被定义为相机放置在特定姿势后不确定性降低和精度提高的度量。通常，信息增益是根据环境中未知区域的面积来定义的，例如，图 8(a) 中子图中的黑洞，这可能会误导对象图的构建。与传统方法相比，我们提出的信息增益建立在物体的观察完整性测量之上，如图 8（a）的右子图所示，并结合了对物体姿态估计的影响。

Information Gain Modeling: As indicated by the definition, information gain is contingent on many factors; thus, we create a utility function to model the information gain by manually designing a feature vector to parameterize those factors. The following is the design of the feature vector used to characterize the object x,

信息增益建模：如定义所示，信息增益取决于许多因素；因此，我们创建了一个效用函数，通过手动设计一个特征向量来参数化这些因素，从而对信息增益进行建模。下面是用来表征对象x的特征向量的设计，

$x=(H_{obj},\overline H_{obj},R_o,R_{IoU},\overline V_{obj},s),$

where $H_{obj}$ , and $\overline H_{obj}$ are defined by Eq. (18) - (20), $R_o$ is the ratio of occupied grids to the total grids of the object, which indicates the richness of its surface texture, $R_{IoU}$ is the 2D mean IoU with adjacent objects used for modeling occlusion under a specific camera view, $\overline V_{obj}$ is the current volume of the object, and s is a binary value used for indicating whether the object is fully explored.

其中 $H_{obj}$ 和 $\overline H_{obj}$ 由等式定义。 (18) - (20)， $R_o$ 是被占用的网格占物体总网格的比例，表示其表面纹理的丰富程度， $R_{IoU}$ 是与相邻物体的二维平均IoU，用于在特定相机视图下建模遮挡， $\overline V_{obj}$ 是对象的当前体积，s 是用于指示对象是否被完全探索的二进制值。

The utility function for NBV selection then is defined as:

NBV 选择的效用函数定义为：

$f=\sum_{x\in I}((1-R_o)H_{obj}+\lambda(H_{IoU}+H_V))s(x),$

where I is the predicted camera view, λ is a weight coefficient (λ = 0.2 in our implementation), and $H_{IoU}$ , $H_V$ share the same formula,

其中 I 是预测的相机视图，λ 是权重系数（在我们的实现中 λ = 0.2）， $H_{IoU}$ 和 $H_V$ 共享相同的公式，

$H = - pl o g (p) .$

The first item $\sum_{x∈I}(1 − R_o)H_{obj}$ in Eq. (22) is used to model the total weighted uncertainty of the object map under the predicted camera view. Here we give more weight to the unknown grids and the free ones by using 1−Ro. The reason is to encourage more explorations in free regions to find more image features that are neglected by previous sensing.

等式中的第一项 $\sum_{x∈I}(1 − R_o)H_{obj}$ (22) 用于在预测的摄像机视图下对对象图的总加权不确定性进行建模。在这里，我们通过使用 1−Ro 给未知网格和自由网格更多的权重。原因是鼓励在自由区域进行更多探索，以找到更多被先前感知忽略的图像特征。

The second item $\sum_{x∈I}H_{IoU}$ in Eq. (22) defines the uncertainty of object detection, which is one of the critical factors affecting object pose estimation. The uncertainty is essentially caused by occlusions between objects. We use this item to encourage a complete observation of the object. The variable in Eq. (23) is the rescaled 2D IoU, i.e., $p = R_{IoU}/2$ .

等式中的第二项 $\sum_{x∈I}H_{IoU}$ 。 (22) 定义了物体检测的不确定性，这是影响物体姿态估计的关键因素之一。不确定性本质上是由对象之间的遮挡引起的。我们使用这个项目来鼓励对对象的完整观察。方程式中的变量。 (23) 是重新缩放的 2D IoU，即 $p = R_{IoU}/2$ 。

The third item $\sum_{x∈I} H_V$ in Eq. (22) models the uncertainty of object pose estimation. Under different camera views, the estimated object poses are usually different and induce the changes in object volume. Here, we first fit a standard normal distribution using the normalized history volumes $\{\overline V_{obj}^{(0)}, \overline V_{obj}^{(1)}, ..., \overline V_{obj}^{(t)}\}$ of each object, and then take the probability density of $\overline V_{obj}^{(t)}$ as the value p in Eq. (23). This item essentially encourages the camera view that can converge the pose estimation process.

方程式中的第三项 $\sum_{x∈I} H_V$ 。 (22) 模拟物体姿态估计的不确定性。在不同的相机视图下，估计的物体姿态通常不同，并引起物体体积的变化。在这里，我们首先使用归一化历史卷 $\{\overline V_{obj}^{(0)}, \overline V_{obj}^{(1)}, ..., \overline V_ 拟合标准正态分布 {obj}^{(t)}\}$ ，然后取 $\overline V_{obj}^{(t)}$ 的概率密度作为等式中的值p。 (23). 该项目本质上鼓励可以收敛姿势估计过程的相机视图。

The s(x) in Eq. (22) indicates whether the object should be considered during the calculation of the utility function. Set s(x)=0, if the following condition is satisfied: $(\overline H_{grid} < 0.5 \bigvee R_o >0.5) \bigwedge p(\overline V_{obj}^{(t)}) > 0.8$ . If this condition holds for all the objects, or the maximum tries are achieved (10 in this work), the exploration will be finished.

方程式中的 s(x)。 (22) 表示在计算效用函数时是否应该考虑对象。设s(x)=0，若满足以下条件： $(\overline H_{grid} < 0.5 \bigvee R_o >0.5) \bigwedge p(\overline V_{obj}^{(t)}) > 0.8 $。如果此条件适用于所有对象，或者达到最大尝试次数（在本工作中为 10 次），则探索将结束。

Based on the utility function, the NBV that maximizes f is continuously selected and leveraged to guide the exploration process, during which the global object map is also incrementally constructed, as depicted in Fig. 6

基于效用函数，不断选择和利用使 f 最大化的 NBV 来指导探索过程，在此过程中也逐步构建全局对象图，如图 6 所示

VIII. EXPERIMENT

The experiment will demonstrate the performance of essential techniques such as data association, object parameterization, and active exploration. In addition, the proposed object SLAM framework will be evaluated by various applications, such as object mapping, augmented reality, scene matching, relocalization, and robotic grasping, .

该实验将演示数据关联、对象参数化和主动探索等基本技术的性能。此外，拟议的对象 SLAM 框架将通过各种应用程序进行评估，例如对象映射、增强现实、场景匹配、重定位和机器人抓取。

A. Distributions of Different Statistics

For data association, the adopted 3D statistics for statistical testing include the point clouds and their centroids of an object. To verify our hypothesis about the distributions of different statistics, we analyze a large amount of data and visualize their distributions in Fig. 9.

对于数据关联，统计测试采用的 3D 统计包括对象的点云及其质心。为了验证我们关于不同统计数据分布的假设，我们分析了大量数据并在图 9 中可视化了它们的分布。

Fig. 9 (a) shows the distributions of the point clouds from 13 objects during the data association in the TUM RGB-D fr3 long office sequence [77]. Obviously, these statistics do not follow a Gaussian distribution. The distributions are related to specific characteristics of the objects, and do not show consistent behaviors. Fig. 9 (b) shows the error distribution of object centroids, which typically follow the Gaussian distribution. This error is computed between the centroids of objects detected in each frame and the object centroid in the final, well-constructed map. This result verifies the reasonability of applying the nonparametric Wilcoxon Rank-Sum test for point clouds and the t-test for object centroids.

图 9（a）显示了 TUM RGB-D fr3 长办公室序列中数据关联期间来自 13 个对象的点云分布 [77]。显然，这些统计数据不服从高斯分布。分布与对象的特定特征有关，并没有表现出一致的行为。图 9（b）显示了对象质心的误差分布，通常服从高斯分布。该误差是在每个帧中检测到的对象的质心与最终构建良好的地图中的对象质心之间计算的。该结果验证了对点云应用非参数 Wilcoxon Rank-Sum 检验和对对象质心应用 t 检验的合理性。

B. Object-level Data Association Experiments

We compare our method with the commonly-used Intersection over Union (IoU) method, nonparametric test (NP), and t-test. Fig. 10 shows the association results of these methods in the TUM RGB-D fr3 long office sequence. It can be seen that some objects are not correctly associated in (a)-©. Due to the lack of association information, existing objects are often misrecognized as new ones by these methods once the objects are occluded or disappear in some frames, resulting in many unassociated objects in the map. In contrast, our method is much more robust and can effectively address this problem (see Fig. 10(d)). The results of other sequences are shown in Table I, and we use the same evaluation metric as [12], [78], which measures the number of objects that are finally present in the map. The GT represents the ground-truth object number. As we can see, our method achieves a high success rate of association, and the number of objects in the map goes closer to GT, which significantly demonstrates the effectiveness of the proposed method.

我们将我们的方法与常用的并集交集 (IoU) 方法、非参数检验 (NP) 和 t 检验进行了比较。图 10 显示了这些方法在 TUM RGB-D fr3 长办公室序列中的关联结果。可以看出（a）-（c）中有些对象没有正确关联。由于缺乏关联信息，一旦对象在某些帧中被遮挡或消失，这些方法往往会将现有对象误识别为新对象，从而导致地图中出现许多未关联的对象。相比之下，我们的方法更加稳健，可以有效地解决这个问题（见图 10（d））。其他序列的结果如表一所示，我们使用与 [12]、[78] 相同的评估指标，它衡量最终出现在地图中的对象数量。 GT 表示地面实况对象编号。正如我们所看到的，我们的方法实现了很高的关联成功率，并且地图中的对象数量更接近 GT，这显着证明了所提出方法的有效性。

The results of our comparison with [12], [78], which is based on the nonparametric test, are reported in II. As indicated, our method can significantly outperform [12], [78]. Especially in the TUM dataset, the number of successfully associated objects by our method is almost twice that by [12], [78]. The advantage in Microsoft RGBD [79] and Scenes V2 [80] is not apparent since the number of objects is limited. Reasons for the inaccurate association of [12], [78] lie in two folds: 1) The method does not exploit different statistics and only uses non-parametric statistics, thus resulting in many unassociated objects; 2) A clustering algorithm is leveraged to tackle the abovementioned problem, but it removes most of the candidate objects.

我们与基于非参数检验的 [12]、[78] 的比较结果在 II 中报告。如前所述，我们的方法可以显着优于 [12]、[78]。特别是在 TUM 数据集中，我们的方法成功关联对象的数量几乎是 [12]、[78] 的两倍。 Microsoft RGBD [79] 和 Scenes V2 [80] 的优势并不明显，因为对象数量有限。 [12]、[78]关联不准确的原因有两个：1）该方法没有利用不同的统计信息，仅使用非参数统计信息，从而导致许多未关联的对象； 2）利用聚类算法来解决上述问题，但它删除了大部分候选对象。

C. Qualitative Assessment of Object Parameterization

To demonstrate the accuracy of object parameterization, We superimpose the cubes and quadrics of objects on semi-dense maps for qualitative evaluation. Fig. 11 is the 3D top view of a keyboard (Fig. 4(a)) where the cube characterizes its pose. Fig. 11(a) is the initial pose with large-scale error; Fig. 11(b) is the result after using iForest; Fig. 11© is the final pose after our joint pose estimation. Fig. 12 presents the pose estimation results of the objects in 14 sequences of the three datasets, in which the objects are placed randomly and in different directions. As is shown, the proposed method achieves promising results with a monocular camera, which demonstrates the effectiveness of our pose estimation algorithm.

为了证明对象参数化的准确性，我们将对象的立方体和二次曲面叠加在半稠密图上以进行定性评估。图 11 是键盘的 3D 顶视图（图 4（a）），其中立方体表征了它的姿势。图 11(a) 是具有大尺度误差的初始位姿；图11(b)是使用iForest后的结果；图 11© 是我们联合姿态估计后的最终姿态。图 12 给出了三个数据集的 14 个序列中物体的姿态估计结果，其中物体随机放置在不同的方向。如图所示，所提出的方法使用单目相机取得了可喜的结果，这证明了我们的姿态估计算法的有效性。

Fig. 11: Visualization of the pose estimation. (a): Initial object pose and size. (b): Object pose and size after iForest. ©: object pose and size after iForest and line alignment.

Fig. 12: Results of object pose estimation. Odd columns: original RGB images. Even column: estimated object poses.

D. Object-Oriented Map Building

Then, we build the object-oriented semantic maps based on the robust data association algorithm, the accurate object pose estimation algorithm and a semi-dense mapping system [81]. Fig. 13 shows three examples of TUM fr3 long office and fr2 desk, where (d) and (e) show semi-dense semantic and object-oriented maps built by our object SLAM. Compared with the sparse map of ORB-SLAM2, our maps can express the environment much better. Moreover, the object-oriented map shows superior performance in environment understanding than the semi-dense map.

然后，我们基于稳健的数据关联算法、准确的对象姿态估计算法和半密集映射系统构建面向对象的语义图[81]。图 13 显示了 TUM fr3 长办公室和 fr2 办公桌的三个示例，其中（d）和（e）显示了由我们的对象 SLAM 构建的半密集语义和面向对象的地图。与 ORB-SLAM2 的稀疏地图相比，我们的地图可以更好地表达环境。此外，面向对象地图在环境理解方面表现出优于半稠密地图的性能。

The mapping results of other sequences in TUM, Microsoft RGB-D, and Scenes V2 datasets are shown in Fig. 14. It can be seen that the system can process multiple classes of objects with different scales and orientations in complex environments. Inevitably, there are some inaccurate estimations. For instance, in the fire sequence, the chair is too large to be well observed by the fast-moving camera, thus yielding an inaccurate estimation. We also conduct the experiment in a real scenario (Fig. 15). It can be seen that even if the objects are occluded, they can be accurately estimated, which further verifies the robustness and accuracy of our system.

TUM、Microsoft RGB-D 和 Scenes V2 数据集中其他序列的映射结果如图 14 所示。可以看出，该系统可以处理复杂环境中具有不同尺度和方向的多类对象。难免会有一些不准确的估计。例如，在火灾序列中，椅子太大而无法被快速移动的摄像机很好地观察到，从而产生不准确的估计。我们还在真实场景中进行了实验（图 15）。可以看出，即使物体被遮挡，也能准确估计，进一步验证了我们系统的鲁棒性和准确性。

E. Augmented Reality Experiment

Early augmented reality used QR codes, 2D manual features, or image temples to register virtual 3D models, resulting in a restricted range of motion and poor tracking. The sparse point cloud map created by SLAM enables large-scale tracking and high-robust registration for AR. Geometric SLAM-based AR, however, is only concerned with accuracy and robustness, not authenticity. Conversely, our object SLAM-based AR provides complete environment information, thus a more realistic immersive experience can be achieved.

早期的增强现实使用 QR 码、2D 手动功能或图像镜腿来注册虚拟 3D 模型，导致运动范围受限且跟踪效果差。 SLAM 创建的稀疏点云图可以实现 AR 的大规模跟踪和高鲁棒性配准。然而，基于几何 SLAM 的 AR 只关心准确性和鲁棒性，而不关心真实性。相反，我们基于对象 SLAM 的 AR 提供了完整的环境信息，从而可以实现更逼真的沉浸式体验。

In the case of the desk scene in Fig. 16(a), we use the method described above to construct an object map, as shown in Fig. 16(b), in which we model objects such as the book, keyboard, and bottles.

对于图 16(a) 中的办公桌场景，我们使用上述方法构建对象图，如图 16(b) 所示，其中我们对书籍、键盘和瓶子。

3D Registration: We present an object-triggered virtual model registration method, instead of 3D registration triggered by a plane or position humanly specified. As shown in Fig. 18(a), the top row represents three raw frames from the video stream, while the bottom row represents the corresponding real-virtual integration scene. Virtual models can be seen registered on the desk to replace real objects based on the object semantics, pose, and size encoded in the object map.

3D 配准：我们提出了一种对象触发的虚拟模型配准方法，而不是人为指定的平面或位置触发的 3D 配准。如图 18(a) 所示，顶行代表来自视频流的三个原始帧，而底行代表相应的实-虚融合场景。可以看到虚拟模型已注册在桌面上，以根据对象映射中编码的对象语义、姿势和大小来替换真实对象。

Occlusion and collision: Physical occlusion and collision between the actual scene and virtual models is the crucial reflection of augmented reality. The top row, as seen in Fig. 18(b), is the result of common augmented reality, where virtual models are registered on the top layer of the image, resulting in an unrealistic separation of real and virtual scenes. The bottom row of Fig. 18(b) shows the outcome of our object SLAM-based augmented reality, in which the foreground and background are distinguished, and the real object obscures a portion of the virtual model, where the virtual and physical worlds are fused together. Similarly, Fig. 18© depicts the collision effect. The virtual model in the top row falls on the desk without colliding with the bottle. Contrarily, the bottom row shows the outcome of our object SLAM-based augmented reality, in which the virtual model falls and collides with the real bottle, with the dropping propensity changing.

遮挡与碰撞：真实场景与虚拟模型之间的物理遮挡与碰撞是增强现实的重要体现。如图 18(b) 所示，第一行是普通增强现实的结果，其中虚拟模型被注册在图像的顶层，导致真实场景和虚拟场景的分离不切实际。图 18(b) 的底行显示了我们基于对象 SLAM 的增强现实的结果，其中前景和背景被区分，真实对象遮挡了虚拟模型的一部分，其中虚拟世界和物理世界是融合在一起。类似地，图 18© 描绘了碰撞效果。顶行的虚拟模型落在桌子上，没有与瓶子发生碰撞。相反，底行显示了我们基于对象 SLAM 的增强现实的结果，其中虚拟模型掉落并与真实瓶子碰撞，掉落倾向发生变化。

Semantic interaction: Interaction, cascading user command with the real scene and virtual models, plays a crucial role in augmented reality applications. As shown in Fig. 17, clicking different real-world objects produces different virtual interactive effects.

语义交互：交互，将用户命令与真实场景和虚拟模型进行级联，在增强现实应用中起着至关重要的作用。如图17所示，点击不同的现实世界对象会产生不同的虚拟交互效果。

The above functions, object-triggered 3D registration, occlusion, collision, and interaction, rely on accurate object perception of the proposed object SLAM framework. The experimental results demonstrate that object SLAM-based augmented reality has a fascinating benefit in areas such as gaming, military training, and virtual decorating.

上述功能，对象触发的 3D 注册、遮挡、碰撞和交互，依赖于所提出的对象 SLAM 框架的准确对象感知。实验结果表明，基于对象 SLAM 的增强现实在游戏、军事训练和虚拟装饰等领域具有令人着迷的优势。

Fig. 13: Different map representations. (a) the RGB images. (b) the sparse map. © semi-dense map. (d) our semi-dense semantic map. (e) our lightweight and object-oriented map. (d) and (e) are build by the proposed method.

Fig. 14: Mapping results on the three datasets. Top: raw images. Bottom: simi-dense object-oriented map.

Fig. 15: Mapping results in a real scenario. Top: raw images. Middle: semi-dense object-oriented map. Bottom: lightweight and object-oriented map.

Fig. 16: The raw image and the corresponding object map.

Fig. 17: The demonstration of interaction. Reactions are visualized as a series of augmented reality events.

F. Object-based Scene Matching and Relocalization

Scene Matching. In this experiment, we evaluate the performance of the proposed object descriptor-based scene matching, which is crucial for multi-agent collaboration, scene reidentification, and multi-maps merging at different periods. We acquire two separate trajectories and their associated object maps in the same scene, then utilize the suggested method to figure out their relationship. Fig. 19 illustrates the map-matching results in three settings.

场景匹配。在这个实验中，我们评估了所提出的基于对象描述符的场景匹配的性能，这对于多代理协作、场景重新识别和不同时期的多地图合并至关重要。我们在同一场景中获取两个独立的轨迹及其关联的对象图，然后利用建议的方法找出它们之间的关系。图 19 说明了三种设置下的地图匹配结果。

The results of the TUM and Microsoft sequences are shown in Fig. 19(a) and Fig. 19(b). The two maps with different scales and numbers of objects match accurately, and the translation between them is also resolved. The match is not based on point clouds or BoW (Bag of Words) of keyframes, but the semantic object descriptor constructed by the object topological map. Additionally, the scale inconsistency of the two maps is also eliminated. Fig. 19© shows a real-world example of the matched result. Apart from the previous features, what is worth noting is that the two maps were recorded under different illumination. With this scenario, the traditional appearance-based method is trends to fail, demonstrating the robustness of the proposed object descriptor with the semantic level invariance property.

TUM 和 Microsoft 序列的结果如图 19(a) 和图 19(b) 所示。两张不同比例尺和物体数量的地图匹配准确，它们之间的平移也解决了。匹配不是基于点云或关键帧的BoW(Bag of Words)，而是基于对象拓扑图构建的语义对象描述符。此外，还消除了两张地图比例尺不一致的情况。图 19© 显示了匹配结果的真实示例。除了之前的特征，值得注意的是这两张地图是在不同光照下记录的。在这种情况下，传统的基于外观的方法往往会失败，证明了所提出的具有语义级不变性的对象描述符的鲁棒性。

Table III analyzes the performance time of the algorithm. The matching duration is found to be the primary cost, and the time is positively related to the number of objects. The average total duration is approximately 1.23ms, which is both practical and economical in various robot applications.

Relocalization. We perform relocalization experiments with parallax to demonstrate the robustness of the proposed matching method to viewpoint changes. As illustrated in the figure in Tab. IV, we first construct a prior map with a set of prior keyframes and then utilize query keyframes for relocalization, which do not overlap the trajectories of prior keyframes and have parallax. We conduct several repeated experiments under different parallax conditions and compare the success rate of relocalization with ORB-SLAM3 [82]. When the parallax is less than 20%, as shown in Tab. IV, ORB-SLAM3 achieves a relocalization success rate of 32.5%; however, the rate drops sharply to 0 when the parallax is greater than 20%, which demonstrates that the appearance-based descriptor represented by ORB-SLAM3 is extremely sensitive to parallax. Conversely, our method is robust to parallax and achieves a success rate of over 12% even under challenging large parallax.

重新定位。我们使用视差进行重新定位实验，以证明所提出的匹配方法对视点变化的鲁棒性。如表中的图所示。 IV，我们首先用一组先验关键帧构建先验图，然后利用查询关键帧进行重定位，这些关键帧不与先验关键帧的轨迹重叠并且具有视差。我们在不同的视差条件下进行了多次重复实验，并比较了 ORB-SLAM3 的重定位成功率 [82]。当视差小于 20% 时，如表 1 所示。 IV、ORB-SLAM3实现了32.5%的重定位成功率；然而，当视差大于 20% 时，该比率急剧下降到 0，这表明以 ORB-SLAM3 为代表的基于外观的描述符对视差极其敏感。相反，我们的方法对视差具有鲁棒性，即使在具有挑战性的大视差下也能达到超过 12% 的成功率。

However, the accuracy of 14.9% is still unsatisfactory. We found that the primary reason is that the observations of the two sets of keyframes are incomplete, thus resulting in inaccurate object modeling. To prove our hypothesis, we manually generate a scene with objects and divide it into a prior map and query map, assuming that prior and query keyframes generate them, respectively, and that the poses of the objects are obtained from the ground truth. As depicted in the figure in Tab. V, we adjust the proportion of shared objects across the two maps and measure the success rate of relocalization. As demonstrated in Tab. V, we obtain a 100% success rate with a public object ratio of over 50% and retain over 80% accuracy with a ratio of 33%. The result demonstrates the effectiveness of our proposed object descriptor and matching algorithm. It also illustrates its sensitivity to object pose and suggests that more accurate object modeling methods can improve its performance.

但是，14.9%的准确率还是差强人意。我们发现主要原因是两组关键帧的观察不完整，从而导致对象建模不准确。为了证明我们的假设，我们手动生成一个包含对象的场景并将其分为先验图和查询图，假设先验关键帧和查询关键帧分别生成它们，并且对象的姿势是从地面实况中获得的。如表中的图所示。 V，我们调整了两个地图共享对象的比例并测量了重定位的成功率。如表中所示。 V，我们获得了 100% 的成功率，公共对象比例超过 50%，保留了 80% 以上的准确率，比例为 33%。结果证明了我们提出的对象描述符和匹配算法的有效性。它还说明了它对物体姿态的敏感性，并表明更准确的物体建模方法可以提高其性能。

G. Evaluation of Active Mapping

To validate the effectiveness of the active map building and the viability of robot manipulation led by the map, we conduct extensive evaluations in both simulation and real-world environments. The simulated robotic manipulation scene is set in Sapien [83], shown in Fig. 20, where the number of objects and the scene complexities vary in different scenes.

为了验证主动地图构建的有效性和地图引导的机器人操作的可行性，我们在模拟和现实环境中进行了广泛的评估。 Sapien [83] 中设置了模拟的机器人操作场景，如图 20 所示，在不同的场景中，物体的数量和场景的复杂度各不相同。

The accurate position estimate is critical for successful robotic manipulation operations such as grasping, placing, arranging, and planning. However, precision is difficult to ensure when the robot estimates autonomously. To quantify the effect of active exploration on object pose estimation, like previous studies [74], [84], we compare our object-driven method with two typically used baseline strategies, i.e., randomized exploration (Random.) and coverage exploration (Cover.). As indicated in Fig. 20, for randomized exploration, the camera pose is randomly sampled from the reachable set relative to the manipulator, while for coverage exploration, a coverage trajectory based on Boustrophedon decomposition [85] is leveraged to scan the scene. At the beginning of all the explorations, an initialization step (Init.), in which the camera is sequentially placed over the four desk corners from a top view, is applied to start the object mapping process. The simulator provides the ground truth of object position, orientation, and size. Correspondingly, the accuracy of pose estimation is evaluated by the Center Distance Error (CDE, cm), the Yaw Angle Error (YAE, degree), and the IoU (including 2D IoU from the top view and 3D IoU) between the ground truth and our estimated results.

准确的位置估计对于机器人操作的成功至关重要，例如抓取、放置、排列和规划。然而，机器人自主估计时，精度难以保证。为了量化主动探索对物体姿态估计的影响，就像之前的研究[74]、[84]一样，我们将我们的对象驱动方法与两种通常使用的基线策略进行比较，即随机探索（Random.）和覆盖探索（Cover .). 如图 20 所示，对于随机探索，相机姿态是从相对于操纵器的可达集合中随机采样的，而对于覆盖探索，利用基于 Boustrophedon 分解 [85] 的覆盖轨迹来扫描场景。在所有探索的开始，应用初始化步骤（Init.），其中相机从顶视图顺序放置在四个桌子角上，以启动对象映射过程。模拟器提供对象位置、方向和大小的基本事实。相应地，姿态估计的准确性通过中心距离误差（CDE，cm），偏航角误差（YAE，度），以及ground truth和ground truth之间的IoU（包括俯视图的2D IoU和3D IoU）来评估。我们估计的结果。

(a) Object-triggered 3D registration. Top: raw images of the scene. Bottom: augmented reality scene with registered virtual models in place of the original objects. (b) The demonstration of occlusion. Top:
the standard AR without occlusion. Bottom: our object SLAM-based AR. © The demonstration of collision. Top: the standard augmented reality. Bottom: our object-SLAM-based augmented reality with the awareness of collision.

Fig. 18: 3D registration, occlusion, and collision in object SLAM-based augmented reality.

Fig. 19: The quantitative analysis of scene matching.

Table VI shows the evaluation results in seven scenes (Fig. 20). We can see our proposed object-driven exploration strategy achieves a 3D IoU of 45.3%, which is 15.53%, 8.85%, and 13.3% higher than that of the randomized exploration, the coverage exploration, and the initialization, respectively. For 2D IoU, our method achieves an accuracy of 64.83%, which is 15.14%, 10.64%, and 13.12% higher than baseline methods. In terms of CDE, our method reaches 1.62cm, significantly less than other methods. For YAE, all exploration strategies achieve an error of approximately 3◦, which verifies the robustness of our line-alignment-based yaw angle optimization method. The level of above precision attained is sufficient for robotic manipulation [86]. Moreover, we also find that randomized exploration sometimes performs worse than the initialization result (rows 2, 5, and 7), which indicates that increasing observations do not necessarily result in more accurate pose estimation, and purposeful exploration is necessary.

表 VI 显示了七个场景的评估结果（图 20）。我们可以看到我们提出的对象驱动探索策略实现了 45.3% 的 3D IoU，分别比随机探索、覆盖探索和初始化高 15.53%、8.85% 和 13.3%。对于 2D IoU，我们的方法达到了 64.83% 的准确率，比基线方法高出 15.14%、10.64% 和 13.12%。在 CDE 方面，我们的方法达到 1.62cm，明显小于其他方法。对于 YAE，所有探索策略都实现了大约 3° 的误差，这验证了我们基于线对齐的偏航角优化方法的稳健性。达到的上述精度水平足以进行机器人操作 [86]。此外，我们还发现随机探索有时比初始化结果（第 2、5 和 7 行）表现更差，这表明增加观察并不一定会导致更准确的姿态估计，有目的的探索是必要的。

Fig. 20: Comparison of mapping results. The first column in the sub-picture: the scene image; the second column: the result of our object-driven exploration; the third column: the result of the coverage exploration; the fourth column: the result of the randomized exploration.

The mapping results are shown in Fig. 20. The cubes and cylinders are used to model the objects, including poses and scales (analyzed above), based on their semantic categories. The following characteristics are present: 1) The system can accurately model various objects as the number of objects increases, as shown in Fig. 20(a)-(e), demonstrating its robustness. 2) Among objects of various sizes, our method focuses more on large objects with lower observation completeness (see Fig. 20(f)). 3) When objects are distributed unevenly, our proposed strategy can swiftly concentrate the camera on object regions, thus avoiding unnecessary and time-consuming exploration (see Fig. 20(g). 4) For scenes with objects close to each other, our method can focus more on regions with fewer occlusions (see Fig. 20(h)). These behaviors verify the effectiveness of our exploration strategy. Additionally, our method has a shorter exploration path yet produces a more precise object posture.

映射结果如图 20 所示。立方体和圆柱体用于根据对象的语义类别对对象进行建模，包括姿势和比例（如上分析）。具有以下特点： 1）随着对象数量的增加，系统可以准确地对各种对象进行建模，如图20（a）-（e）所示，证明了其鲁棒性。 2）在各种大小的物体中，我们的方法更侧重于观察完整性较低的大物体（见图20（f））。 3）当物体分布不均匀时，我们提出的策略可以迅速将相机集中在物体区域，从而避免不必要且耗时的探索（见图 20（g））。4）对于物体彼此靠近的场景，我们的方法可以更多地关注遮挡较少的区域（见图 20（h））。这些行为验证了我们探索策略的有效性。此外，我们的方法具有较短的探索路径，但会产生更精确的物体姿态。

H. Object Grasping and Placement

This experiment uses the incrementally generated object map to perform object grasping. Fig. 21(a) and Fig. 21(b) illustrate the grasping process in simulated and real-world environments, with the object map included. After extensive testing, we obtained a grasping success rate of approximately 86% in the simulator and 81% in the real world, which may be affected by environmental or manipulator noises. It is found that the center and direction of the objects have a significant influence on grasping performance. The proposed method performs well regarding these two metrics, thus ensuring high-quality grasping. Overall, our object SLAM-based pose estimation results can satisfy the requirements of grasping.

本实验使用增量生成的对象图来执行对象抓取。图 21(a) 和图 21(b) 说明了在模拟和真实环境中的抓取过程，包括对象图。经过大量测试，我们在模拟器中获得了大约 86% 的抓取成功率，在现实世界中获得了 81% 的抓取成功率，这可能会受到环境或机械手噪音的影响。发现物体的中心和方向对抓取性能有显着影响。所提出的方法在这两个指标方面表现良好，从而确保了高质量的抓取。总体而言，我们基于对象 SLAM 的姿态估计结果可以满足抓取的要求。

We argued that the proposed object map level perception outperforms object pose-only perception and provides information for more intelligent robotics decision-making tasks in addition to grasping. Such include avoiding collisions with other objects, updating the map after grasping, object arrangement and placement based on object properties, and object delivery requested by the user. We design the object placement experiments to verify the global perception capabilities introduced by object mapping. As shown in Fig. 22, the robot is required to manipulate the original scene (see Fig. 22(a)) to the target scene (see Fig. 22©) according to object sizes and classes encoded in the object map.

我们认为，所提出的对象地图级别感知优于仅对象姿势感知，并为除了抓取之外的更智能的机器人决策任务提供信息。这包括避免与其他对象发生碰撞、抓取后更新地图、基于对象属性的对象排列和放置以及用户请求的对象传递。我们设计了对象放置实验来验证对象映射引入的全局感知能力。如图 22 所示，机器人需要根据对象图中编码的对象大小和类别，将原始场景（见图 22（a））操纵到目标场景（见图 22（c））。

The global object map is shown in Fig. 22(b), which contains the semantic labels, size, and pose of the objects. The two little blocks are picked up and placed in the large cup (Fig. 22(d)), while the cups are ordered by volume (Fig. 22(e)) and the bottles by height (Fig. 22(f)). This task is challenging for the conventional grasping approach since lacking global perception such as object’s height on the map, its surroundings, and could interact with which objects.

全局对象图如图 22(b) 所示，其中包含对象的语义标签、大小和姿态。两个小块被拿起并放在大杯子中（图 22（d）），而杯子按体积排序（图 22（e）），瓶子按高度排序（图 22（f）） . 这项任务对于传统的抓取方法来说是具有挑战性的，因为它缺乏全局感知，例如物体在地图上的高度、它的周围环境，以及可以与哪些物体相互作用。

IX. DISCUSSION AND ANALYZE

In this section, we analyze the limitations and implementation details of our method and provide potential solutions for the object SLAM community.

在本节中，我们分析了我们方法的局限性和实现细节，并为对象 SLAM 社区提供了潜在的解决方案。

1) Data association. Experiments revealed that the two primary reasons for the failure of data association could be summed up as follows: i) Long-tailed distribution. In some cases, object centroids are located in the tail of the distribution, which violates the Gaussian distribution assumption and causes the association to fail. Although our multiple association strategy alleviates this to some extent. ii) Detected semantic label mistakes. Even if the IoU-based or distribution-based method determines an association between two objects, the association will fail if the labels are inconsistent. Error in label recognition is one of the most common issues with detectors. More generalized, accurate detectors or fine-turning on specific datasets are potential alternatives.

1) 数据关联。实验表明，数据关联失败的两个主要原因可以归纳如下： i) 长尾分布。在某些情况下，对象质心位于分布的尾部，这违反了高斯分布假设并导致关联失败。尽管我们的多重关联策略在一定程度上缓解了这一点。 ii) 检测到语义标签错误。即使基于 IoU 或基于分布的方法确定了两个对象之间的关联，如果标签不一致，关联也会失败。标签识别错误是检测器最常见的问题之一。更通用、更准确的检测器或对特定数据集进行微调是潜在的替代方案。

The running time of data association is shown in the first two columns of Tab. VII. Distribution-based methods include non-parametric and t-tests, while IoU-based methods include motion IoU and project IoU, where projecting 3D points to 2D images takes most of the time. Note that the total duration of data association is less than the sum in the table, approximately 8 ms per frame, because sometimes some strategies can be skipped. We perform data association on every frame, which would be more time-efficient if simply performed on keyframes, as CubeSLAM does.

数据关联的运行时间显示在Tab的前两列。七。基于分布的方法包括非参数和 t 检验，而基于 IoU 的方法包括运动 IoU 和投影 IoU，其中将 3D 点投影到 2D 图像需要花费大部分时间。请注意，数据关联的总持续时间小于表中的总和，每帧大约 8 毫秒，因为有时可以跳过某些策略。我们在每一帧上执行数据关联，如果像 CubeSLAM 那样简单地在关键帧上执行，这会更省时。

Fig. 21: The demonstration of grasping process.

Fig. 22: Object placement according to the global object map.

2) Object Parameterization. Two factors typically cause failure situations of object pose estimation: i) Object surfaces lack texture, or objects are only partially seen due to occlusion or camera viewpoint. In this case, few object point clouds are collected, significantly reducing the pose estimation performance. ii) Too many outliers lead the object to be estimated too large, or the modeled object is extremely small since the iForest algorithm falls into a local optimum. In the alternatives, the 3D detector [87] based on the complete point cloud may not be optimal due to the incremental characteristic of SLAM; image-based 6-DOF pose estimation [19], [88] is limited by the scale of the training data, resulting in poor generalization [51]. Conversely, incremental detection/segmentation [89] and joint point cloud-image multimodal RGB-D 3D object detection [90], [91] are potentially feasible.

2) 对象参数化。两个因素通常会导致对象姿态估计的失败情况：i）对象表面缺乏纹理，或者由于遮挡或相机视点而只能部分看到对象。在这种情况下，收集到的对象点云很少，从而显着降低了姿态估计性能。 ii) 离群点太多导致被估计的对象太大，或者建模的对象非常小，因为iForest算法陷入了局部最优。在备选方案中，由于 SLAM 的增量特性，基于完整点云的 3D 检测器 [87] 可能不是最优的；基于图像的 6-DOF 姿态估计 [19]、[88] 受训练数据规模的限制，导致泛化能力较差 [51]。相反，增量检测/分割 [89] 和联合点云图像多模式 RGB-D 3D 对象检测 [90]、[91] 可能是可行的。

The runtime of object parameterization is shown in the last two columns in Tab. VII, which takes around 6.5 ms per frame on average. The full SLAM system (for camera tracking and semantic mapping) runs at about 10 fps.

对象参数化的运行时间显示在 Tab 的最后两列中。 VII，平均每帧大约需要 6.5 毫秒。完整的 SLAM 系统（用于摄像机跟踪和语义映射）以大约 10 fps 的速度运行。

3) Augmented Reality. Augmented reality performance depends on object modeling, camera localization accuracy, and the rendering effect. Here we provide detailed engineering implementations for SLAM developers to migrate their algorithms to augmented reality applications. The AR system comprises three modules: i) The Localization and Semantic Mapping (LSM) modules provide camera position, point cloud, and object parameters. Sections IV and V introduce the techniques. ii) ROS [92] data transfer module: send images captured by the camera to the LSM module and then publish the estimated camera pose and map elements. iii) Virtual-real rendering module: Use the Unity3D engine to subscribe to topics published by ROS, construct a virtual 3D scene, and render it to a 2D image plane. Tab. VIII details the duration of each module, which is executed in parallel.

3) 增强现实。增强现实性能取决于对象建模、相机定位精度和渲染效果。在这里，我们为 SLAM 开发人员提供详细的工程实现，以将他们的算法迁移到增强现实应用程序。 AR 系统包括三个模块：i) 定位和语义映射 (LSM) 模块提供相机位置、点云和对象参数。第四节和第五节介绍了这些技术。 ii) ROS [92] 数据传输模块：将相机拍摄的图像发送到 LSM 模块，然后发布估计的相机位姿和地图元素。 iii) 虚实渲染模块：使用Unity3D引擎订阅ROS发布的主题，构建虚拟3D场景，渲染到2D图像平面。标签。 VIII 详细说明了并行执行的每个模块的持续时间。

4) Scene Matching. The principal causes for the failure of scene matching and relocalization are: i) There are few common objects between the two maps, resulting in a significant difference in the descriptors of the same object in the two maps, leading to matching fails. ii) The parallax of the trajectories of the two maps is excessively large, and the observation is insufficient, which affects the accuracy of object modeling and the construction of descriptors. Regarding the first issue, other non-object-level landmarks, such as planes and structural components, can be considered for descriptor construction. For the second challenge, more accurate object modeling techniques can improve the performance of matching and relocalization, as demonstrated by our experiments.

4) 场景匹配。场景匹配和重定位失败的主要原因是：i）两张地图之间的共同物体很少，导致两张地图中同一物体的描述符存在显着差异，导致匹配失败。 ii) 两张图的轨迹视差过大，观测不足，影响物体建模的准确性和描述子的构建。关于第一个问题，可以考虑其他非对象级地标，例如平面和结构组件，来构建描述符。对于第二个挑战，更准确的对象建模技术可以提高匹配和重新定位的性能，正如我们的实验所证明的那样。

5) Object Grasping. There are two limitations to the object grasping task: i) Textured objects and tabletops are required for point-based SLAM tracking to succeed. ii) Objects are all regular cube and cylinder shapes in our experiments. Complex irregular objects may necessitate more detailed shape reconstruction and grasp point detection. Nonetheless, we demonstrate the potential of object SLAM for grasping tasks without object priors. Model-free and unseen object grasping will be the future trend. In terms of running time, the speed is even faster than 10fps because, in this setting, the data association is more straightforward, and more time is spent on the active mapping analysis process.

5) 对象抓取。对象抓取任务有两个限制：i）基于点的 SLAM 跟踪成功需要纹理对象和桌面。 ii) 在我们的实验中，物体都是规则的立方体和圆柱体。复杂的不规则物体可能需要更详细的形状重建和抓取点检测。尽管如此，我们展示了对象 SLAM 在没有对象先验的情况下掌握任务的潜力。无模型和看不见的物体抓取将是未来的趋势。在运行时间上，速度甚至超过了 10fps，因为在这种设置下，数据关联更直接，更多时间花在主动映射分析过程上。

X. CONCLUSION

We presented an object mapping framework that aims to create an object-oriented map using general models that parameterize the object’s position, orientation, and size. First, we investigated related fundamental techniques for object mapping, including multi-view data association and object pose estimation. We then center on the object map and validate its potential in multiple high-level tasks such as augmented reality, scene matching, and object grasping. Finally, we analyzed the limitations and failure instances of our method and gave possible alternatives to inspire the development of related fields. The following points will be given significant consideration in future work: 1) Dynamic objects data association, tracking, and trajectory prediction; 2) Irregular and unseen object modeling and tightly coupled optimization with SLAM; 3) Object-level relocalization and loop closure; 4) Omnidirectional perception with multi-sensor and multiple semantic networks to realize spatial AI.

我们提出了一个对象映射框架，旨在使用参数化对象的位置、方向和大小的通用模型来创建面向对象的映射。首先，我们研究了对象映射的相关基础技术，包括多视图数据关联和对象姿态估计。然后，我们以对象图为中心，验证其在增强现实、场景匹配和对象抓取等多个高级任务中的潜力。最后，我们分析了我们方法的局限性和失败实例，并给出了可能的替代方案，以启发相关领域的发展。以下几点将在未来的工作中得到重要考虑： 1）动态对象数据关联、跟踪和轨迹预测； 2）不规则和看不见的物体建模和与SLAM的紧耦合优化； 3）对象级重定位和闭环； 4）多传感器多语义网络全方位感知，实现空间人工智能。

REFERENCES

[1] A. J. Davison, “Futuremapping: The computational structure of spatial ai systems,” arXiv preprint arXiv:1803.11288, 2018.
[2] Q. Wang, Z. Yan, J. Wang, F. Xue, W. Ma, and H. Zha, “Line flow based simultaneous localization and mapping,” IEEE Transactions on Robotics, vol. 37, no. 5, pp. 1416–1432, 2021.
[3] Y. Zhou, H. Li, and L. Kneip, “Canny-vo: Visual odometry with rgb-d cameras based on geometric 3-d–2-d edge alignment,” IEEE Transactions on Robotics, vol. 35, no. 1, pp. 184–199, 2018.
[4] R. Yunus, Y. Li, and F. Tombari, “Manhattanslam: Robust planar tracking and mapping leveraging mixture of manhattan frames,” in Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 6687–6693.
[5] S. Zhao, P. Wang, H. Zhang, Z. Fang, and S. Scherer, “Tp-tio: A robust thermal-inertial odometry with deep thermalpoint,” in Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4505–4512.
[6] S. Cao, X. Lu, and S. Shen, “Gvins: Tightly coupled gnss–visual–inertial fusion for smooth and consistent state estimation,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2004–2021, 2022.
[7] T.-M. Nguyen, S. Yuan, M. Cao, T. H. Nguyen, and L. Xie, “Viral slam: Tightly coupled camera-imu-uwb-lidar slam,” arXiv preprint arXiv:2105.03296, 2021.
[8] J. McCormac, A. Handa, A. Davison, and S. Leutenegger, “Semanticfusion: Dense 3d semantic mapping with convolutional neural networks,” in Proceedings of 2017 IEEE International Conference on Robotics and automation (ICRA). IEEE, 2017, pp. 4628–4635.
[9] S. Yang, Y. Huang, and S. Scherer, “Semantic 3d occupancy mapping through efficient high order crfs,” in Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 590–597.
[10] A. Rosinol, M. Abate, Y. Chang, and L. Carlone, “Kimera: an open-source library for real-time metric-semantic localization and mapping,” in Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 1689–1696.
[11] D. Frost, V. Prisacariu, and D. Murray, “Recovering stable scale in monocular slam using object-supplemented bundle adjustment,” IEEE Transactions on Robotics, vol. 34, no. 3, pp. 736–747, 2018.
[12] A. Iqbal and N. R. Gans, “Localization of classified objects in slam using nonparametric statistics and clustering,” in Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 161–168.
[13] B. Mu, S.-Y. Liu, L. Paull, J. Leonard, and J. P. How, “Slam with objects using a nonparametric pose graph,” in Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 4602–4609.
[14] N. Sunderhauf, T. T. Pham, Y. Latif, M. Milford, and I. Reid, “Meaningful maps with object-oriented semantic mapping,” in Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5079–5085.
[15] M. Grinvald, F. Furrer, T. Novkovic, J. J. Chung, C. Cadena, R. Siegwart, and J. Nieto, “Volumetric instance-aware semantic mapping and 3d object discovery,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 3037–3044, 2019.
[16] A. Sharma, W. Dong, and M. Kaess, “Compositional and scalable object slam,” in Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 11 626–11 632.
[17] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison, “Slam++: Simultaneous localisation and mapping at the level of objects,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1352–1359.
[18] S. Choudhary, L. Carlone, C. Nieto, J. Rogers, Z. Liu, H. I. Christensen, and F. Dellaert, “Multi robot object-based slam,” in Proceedings of International Symposium on Experimental Robotics. Springer, 2016, pp. 729–741.
[19] Y. Labbe, J. Carpentier, M. Aubry, and J. Sivic, “Cosypose: Consistent multi-view multi-object 6d pose estimation,” in Proceedings of European Conference on Computer Vision. Springer, 2020, pp. 574–591.
[20] S. L. Bowman, N. Atanasov, K. Daniilidis, and G. J. Pappas, “Probabilistic data association for semantic slam,” in Proceedings of 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 2017, pp. 1722–1729.
[21] P. Parkhiya, R. Khawad, J. K. Murthy, B. Bhowmick, and K. M. Krishna, “Constructing category-specific models for monocular object-slam,” in Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 4517–4524.
[22] N. Joshi, Y. Sharma, P. Parkhiya, R. Khawad, K. M. Krishna, and B. Bhowmick, “Integrating objects into monocular slam: Line based category specific models,” in Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing, 2018, pp. 1–9.
[23] E. Sucar, K. Wada, and A. Davison, “Nodeslam: Neural object descriptors for multi-view shape reconstruction,” in 2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 949–958.
[24] S. Yang and S. Scherer, “Cubeslam: Monocular 3-d object slam,” IEEE Transactions on Robotics, vol. 35, no. 4, pp. 925–938, 2019.
[25] L. Nicholson, M. Milford, and N. Sunderhauf, “Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam,” IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 1–8, 2018.
[26] Y. Wu, Y. Zhang, D. Zhu, Y. Feng, S. Coleman, and D. Kerr, “Eao-slam: Monocular semi-dense object slam based on ensemble data association,” in Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4966–4973.
[27] Y. Wu, Y. Zhang, D. Zhu, X. Chen, S. Coleman, W. Sun, X. Hu, and Z. Deng, “Object slam-based active mapping and robotic grasping,” in Proceedings of 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 1372–1381.
[28] K. Chen, J. Liu, Q. Chen, Z. Wang, and J. Zhang, “Accurate object association and pose updating for semantic slam,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 25 169–25 179, 2022.
[29] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” International Journal of Computer Vision, vol. 129, no. 11, pp. 3069–3087, 2021.
[30] J. Li, D. Meger, and G. Dudek, “Semantic mapping for view-invariant relocalization,” in Proceedings of 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 7108–7115.
[31] J. McCormac, R. Clark, M. Bloesch, A. Davison, and S. Leutenegger, “Fusion++: Volumetric object-level slam,” in Proceedings of 2018 international conference on 3D vision (3DV). IEEE, 2018, pp. 32–41.
[32] J. Wang, M. Runz, and L. Agapito, “Dsp-slam: Object oriented slam with deep shape priors,” in Proceedings of 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 1362–1371.
[33] B. Xu, A. J. Davison, and S. Leutenegger, “Learning to complete object shapes for object-level mapping in dynamic scenes,” in Proceedings of 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 2257–2264.
[34] Y. Liu, Y. Petillot, D. Lane, and S. Wang, “Global localization with object-level semantics and topology,” in Proceedings of 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4909–4915.
[35] K. Ok, K. Liu, K. Frey, J. P. How, and N. Roy, “Robust object-based slam for high-speed autonomous navigation,” in Proceedings of 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 669–675.
[36] Y. Xiang and D. Fox, “Da-rnn: Semantic mapping with data associated recurrent neural networks,” in Proceedings of Robotics: Science and Systems (RSS), 2017.
[37] K. Li, D. DeTone, Y. F. S. Chen, M. Vo, I. Reid, H. Rezatofighi, C. Sweeney, J. Straub, and R. Newcombe, “Odam: Object detection, association, and mapping using posed rgb video,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp.
5998–6008.
[38] N. Merrill, Y. Guo, X. Zuo, X. Huang, S. Leutenegger, X. Peng, L. Ren, and G. Huang, “Symmetry and uncertainty-aware object slam for 6dof object pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 901–14 910.
[39] C. Xing, X. Sun, A. Cramariuc, S. Gull, J. J. Chung, C. Cadena, R. Siegwart, and F. Tschopp, “Descriptellation: Deep learned constellation descriptors for slam,” arXiv preprint arXiv:2203.00567, 2022.
[40] M. Strecke and J. Stuckler, “Em-fusion: Dynamic object-level slam with probabilistic data association,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5865–5874.
[41] S. Yang, Z.-F. Kuang, Y.-P. Cao, Y.-K. Lai, and S.-M. Hu, “Probabilistic projective association and semantic guided relocalization for dense reconstruction,” in Proceedings of 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 7130–7136.
[42] J. Zhang, M. Gui, Q. Wang, R. Liu, J. Xu, and S. Chen, “Hierarchical topic model based object association for semantic slam,” IEEE transactions on visualization and computer graphics, vol. 25, no. 11, pp. 3052–3062, 2019.
[43] T. Ran, L. Yuan, J. Zhang, L. He, R. Huang, and J. Mei, “Not only look but infer: Multiple hypothesis clustering of data association inference for semantic slam,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–9, 2021.
[44] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174.
[45] M. Han, Z. Zhang, Z. Jiao, X. Xie, Y. Zhu, S.-C. Zhu, and H. Liu, “Reconstructing interactive 3d scenes by panoptic mapping and cad model alignments,” in Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 12 199–12 206.
[46] Z. Cao, Y. Zhang, R. Tian, R. Ma, X. Hu, S. Coleman, and D. Kerr, “Object-aware slam based on efficient quadric initialization and joint data association,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9802–9809, 2022.
[47] M. Runz, M. Buffier, and L. Agapito, “Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects,” in Proceedings of 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2018, pp. 10–20.
[48] S. Lin, J. Wang, M. Xu, H. Zhao, and Z. Chen, “Topology aware object-level semantic mapping towards more robust loop closure,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7041–7048, 2021.
[49] J. Lu, B. Tian, H. Shen, and X. Zhang, “Real-time instance-aware segmentation and semantic mapping on edge devices,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–9, 2022.
[50] J. Li, K. Koreitem, D. Meger, and G. Dudek, “View-invariant loop closure with oriented semantic landmarks,” in Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 7943–7949.
[51] Y. Ming, X. Yang, and A. Calway, “Object-augmented rgb-d slam for wide-disparity relocalisation,” in Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 2203–2209.
[52] R. Mur-Artal and J. D. Tardos, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
[53] P. Schmuck and M. Chli, “Ccm-slam: Robust and efficient centralized collaborative monocular simultaneous localization and mapping for robotic teams,” Journal of Field Robotics, vol. 36, no. 4, pp. 763–781, 2019.
[54] A. Gawel, C. Del Don, R. Siegwart, J. Nieto, and C. Cadena, “Xview: Graph-based semantic multi-view localization,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1687–1694, 2018.
[55] X. Guo, J. Hu, J. Chen, F. Deng, and T. L. Lam, “Semantic histogram based graph matching for real-time multi-robot global localization in large scale environment,” IEEE Robotics and Automation Letters, 2021.
[56] C. Qin, Y. Zhang, Y. Liu, and G. Lv, “Semantic loop closure detection based on graph matching in multi-objects scenes,” Journal of Visual Communication and Image Representation, vol. 76, p. 103072, 2021.
[57] Z. Zhang and D. Scaramuzza, “Beyond point clouds: Fisher information field for active visual localization,” in Proceedings of 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 5986–5992.
[58] Z. Zeng, A. Rofer, and O. C. Jenkins, “Semantic linking maps for active visual object search,” in Proceedings of 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 1984–1990.
[59] B. Charrow, G. Kahn, S. Patil, S. Liu, K. Goldberg, P. Abbeel, N. Michael, and V. Kumar, “Information-theoretic planning with trajectory optimization for dense 3d mapping.” in Proceedings of Robotics: Science and Systems, vol. 11. Rome, 2015, pp. 3–12.
[60] C. Wang, D. Zhu, T. Li, M. Q. . Meng, and C. W. de Silva, “Efficient autonomous robotic exploration with semantic road map in indoor environments,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2989–2996, July 2019.
[61] S. Kriegel, C. Rink, T. Bodenmuller, and M. Suppa, “Efficient next-best-scan planning for autonomous 3d surface reconstruction of unknown objects,” Journal of Real-Time Image Processing, vol. 10, no. 4, pp. 611–631, 2015.
[62] K. Wada, E. Sucar, S. James, D. Lenton, and A. J. Davison, “Morefusion: Multi-object reasoning for 6d pose estimation from volumetric fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 540–14 549.
[63] D. Almeida, E. Ataer-Cansizoglu, and R. Corcodel, “Detection, tracking and 3d modeling of objects with sparse rgb-d slam and interactive perception,” in Proceedings of 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids). IEEE, 2019, pp. 1–8.
[64] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[65] F. Wilcoxon, “Individual comparisons by ranking methods,” in Break throughs in statistics. Springer, 1992, pp. 196–202.
[66] S. Sidney, “Nonparametric statistics for the behavioral sciences,” The Journal of Nervous and Mental Disease, vol. 125, no. 3, p. 497, 1957.
[67] E. L. Lehmann and H. J. D’Abrera, Nonparametrics: statistical methods based on ranks. Holden-day, 1975.
[68] S. Yang and S. Scherer, “Monocular object and plane slam in structured environments,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3145–3152, 2019.
[69] T. Pire, J. Corti, and G. Grinblat, “Online object detection and localization on stereo visual slam system,” Journal of Intelligent & Robotic Systems, vol. 98, no. 2, pp. 377–386, 2020.
[70] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation-based anomaly detection,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 6, no. 1, pp. 1–39, 2012.
[71] R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: A line segment detector,” Image Processing On Line, vol. 2, pp. 35–55, 2012.
[72] S. A. Cook, “The complexity of theorem-proving procedures,” in Proceedings of the third annual ACM symposium on Theory of computing, 1971, pp. 151–158.
[73] G. Kahn, P. Sujan, S. Patil, S. Bopardikar, J. Ryde, K. Goldberg, and P. Abbeel, “Active exploration using trajectory optimization for robotic grasping in the presence of occlusions,” in Proceedings of 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE,
2015, pp. 4783–4790.
[74] E. Arruda, J. Wyatt, and M. Kopicki, “Active vision for dexterous grasping of novel objects,” in Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 2881–2888.
[75] A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46–57, 1989.
[76] C. E. Shannon, “A mathematical theory of communication,” The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948.
[77] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 573–580.
[78] A. Iqbal and N. R. Gans, “Data association and localization of classified objects in visual slam,” Journal of Intelligent & Robotic Systems, vol. 100, pp. 113–130, 2020.
[79] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene coordinate regression forests for camera relocalization in rgb-d images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2930–2937.
[80] K. Lai, L. Bo, and D. Fox, “Unsupervised feature learning for 3d scene labeling,” in Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 3050–3057.
[81] S. He, X. Qin, Z. Zhang, and M. Jagersand, “Incremental 3d line segment extraction from semi-dense slam,” in Proceedings of 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018, pp. 1658–1663.
[82] C. Campos, R. Elvira, J. J. G. Rodr´ıguez, J. M. Montiel, and J. D. Tardos, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
[83] F. Xiang, Y. Qin, K. Mo, Y. Xia, H. Zhu, F. Liu, M. Liu, H. Jiang, Y. Yuan, H. Wang et al., “Sapien: A simulated part-based interactive environment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 097–11 107.
[84] D. Morrison, P. Corke, and J. Leitner, “Multi-view picking: Next-best view reaching for improved grasping in clutter,” in Proceedings of 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 8762–8768.
[85] D. Kaljaca, B. Vroegindeweij, and E. van Henten, “Coverage trajectory planning for a bush trimming robot arm,” Journal of Field Robotics, vol. 37, no. 2, pp. 283–308, 2020.
[86] C. Wang, D. Xu, Y. Zhu, R. Mart´ın-Mart´ın, C. Lu, L. Fei-Fei, and S. Savarese, “Densefusion: 6d object pose estimation by iterative dense fusion,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3343–3352.
[87] Z. Liu, Z. Zhang, Y. Cao, H. Hu, and X. Tong, “Group-free 3d object detection via transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2949–2958.
[88] H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6d object pose and size estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2642–2651.
[89] S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, “Scenegraph-fusion: Incremental 3d scene graph prediction from rgb-d sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7515–7525.
[90] Y. Wang, X. Chen, L. Cao, W. Huang, F. Sun, and Y. Wang, “Multimodal token fusion for vision transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 186–12 195.
[91] H. Yang, C. Shi, Y. Chen, and L. Wang, “Boosting 3d object detection via object-focused image fusion,” arXiv preprint arXiv:2207.10589, 2022.
[92] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng et al., “Ros: an open-source robot operating system,” in Proceedings of ICRA workshop on open source software. Kobe, Japan, 2009.

你可能感兴趣的:(SLAM,机器人,数码相机,人工智能)

【python】利用 GridSearchCV 和 SVM 进行学生成绩预测码银支持向量机机器学习人工智能
在机器学习领域，寻找最优模型参数是一个重要的步骤，它直接影响模型的泛化能力和预测准确性。本文将通过一个具体案例介绍如何使用支持向量机（SVM）和网格搜索（GridSearchCV）来预测学生的成绩，并通过调整参数来优化模型性能。数据集：公众号“码银学编程”后台回复：学生成绩-SVM前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家：前言–人工智能教程引言学生的成绩预测
【学术会议征稿-第二届生成式人工智能与信息安全学术会议（GAIIS 2025）】人工智能与信息安全的魅力禁默学术会议人工智能
重要信息时间：2025年2月21日-23日地点：中国杭州官网：http://www.ic-gaiis.org简介2025年第二届生成式人工智能与信息安全将于2025年2月21日-23日在中国杭州举行。主要围绕“生成式人工智能与信息安全”的最新研究展开，紧密聚焦AI的热点和难点问题，深入剖析信息安全核心技术。生成式人工智能与信息安全的关系主要体现在以下几个方面：数据安全：生成式人工智能通常需要大量的
《解锁AI黑科技：数据分类聚类与可视化》人工智能深度学习数据挖掘
在当今数字化时代，数据如潮水般涌来，如何从海量数据中提取有价值的信息，成为了众多领域面临的关键挑战。人工智能（AI）技术的崛起，为解决这一难题提供了强大的工具。其中，能够实现数据分类与聚类，并以可视化形式展现的AI技术，正逐渐成为各行业数据分析和决策的核心力量。数据分类与聚类：AI的核心技能数据分类是将数据划分到预先定义好的类别中，就像把图书馆里的书籍按照不同学科分类摆放，方便读者查找。比如在垃圾
洛谷[P2956]机器人犁田 SD_LTF 题解洛谷 OI 题解 P2956
题目传送门OvO题目描述FarmerJohn为了让自己从无穷无尽的犁田工作中解放出来，于是买了个新机器人帮助他犁田。这个机器人可以完成犁田的任务，可惜有一个小小的缺点：这个犁田机器人一次只能犁一个边的长度是整数的长方形的田地。因为FJ的田地有树和其它障碍物，所以FJ设定机器人去犁很多不同的长方形。这些长方形允许重叠。他给机器人下了P个指令，每个指令包含一个要犁长方形的地。这片田地由长方形的左下角和
园区智能化系统实现管理与服务的智能化转型与创新进阶快鲸智慧楼宇管理系统其他
内容概要园区智能化系统的出现，标志着管理与服务向智能化转型的重要一步。这一系统不仅仅是一个技术解决方案，更是一个全面提升园区运营效率与安全性的独特工具。通过集成大数据分析、物联网和人工智能，园区智能化系统能够为各类园区如工业园、产业园、物流园、写字楼与公寓等提供切实可行的解决方案。“智能化管理不仅是未来的发展趋势，更是提升竞争力的必要手段。”在资产管理方面，智能化系统能够实时监控并优化资源的配置，
洛谷 P1687 机器人小Q（DP） Auto114514 ACM—DP 算法动态规划 c++
题目链接https://www.luogu.com.cn/problem/P1687思路因为要按照顺序来给机器人充电，所以考虑dpdpdp。令dp[i][j]={x,y}dp[i][j]=\{x,y\}dp[i][j]={x,y}表示从前iii个单位能量中选了jjj个对机器人进行充电，所用的最小天数为xxx，天数xxx最小时最后一天的充电时长最短为yyy。状态转移方程为：dp[i][j]=adj(
「机器人」扑翼飞行器的偏航力矩控制：分周期参数调节机制 Robot_Starscream 「机器人学」机器人人工智能
前言通过调节分周期控制参数，扑翼飞行器能够在机翼拍动周期中引入时间不对称性，从而在左右机翼之间制造不同的空气动力，最终产生偏航方向的力矩。以下从原理、数学描述、实现过程以及实验验证等方面对该方法进行介绍。1.偏航力矩的生成原理1.1分周期控制参数定义是一个位于区间的控制参数，用于定义机翼在一个完整冲程周期内上冲（上冲程）与下冲（下冲程）的时间比例。•当时，上冲与下冲时间相等，对称性最高，偏航力矩为
DeepSeek R1本地化部署与联网功能实战指南：从零搭建智能对话系统 Coderabo python DeepSeek R1
前言在人工智能技术快速发展的今天，如何将先进的对话模型DeepSeekR1部署到本地环境并赋予其联网能力，成为许多开发者和企业关注的重点。本文将深入讲解完整的本地化部署流程，并通过实例代码演示如何为模型添加实时网络访问功能。一、环境准备与基础架构1.1硬件需求推荐配置：NVIDIAGPU（RTX3090或更高）+32GB内存+50GB存储空间最低配置：CPU（支持AVX2指令集）+16GB内存+3
【自编程使松灵机器人转圈】加油吧小森方法机器人
自编程使松灵机器人转圈1.USB转CAN驱动：sudomodprobegs_usbsudoiplinksetcan0uptypecanbitrate5000002.启动机器人底盘：roslaunchscout_basescout_base.launch看到松灵机器人官方文档，以为启动底盘节点在scout_bringup功能包里。3.编程发布话题使小车转圈rosrunbringupdraw_circ
Meta首席科学家Yann LeCun预言：5年内AI架构将颠覆，当前大模型的4大核心缺陷机器小乙人工智能
✨引言：一场颠覆AI行业的预言在2025冬季达沃斯“技术辩论”现场，Meta首席AI科学家、图灵奖得主杨立昆（YannLeCun）抛出一个震撼观点：“当前的大语言模型（LLM）范式将在3-5年内被淘汰。”这位深度学习先驱的论断，不仅直指ChatGPT等明星产品的技术天花板，更揭示了下一代AI进化的核心路径——构建理解物理世界的“世界模型”（WorldModel）。作为Meta人工智能实验室负责人，
AI编程风潮下的生产力革命：从 Copilot 到 Trae 机器小乙 AI编程
AI编程风潮下的生产力革命：从Copilot到Trae前言在人工智能飞速发展的背景下，“AI编程”已经不再是概念炒作，而逐渐成为真实可落地的开发模式。从最初的GitHubCopilot到如今字节跳动的Trae，以及各种聚焦不同场景的AI编程产品如Cursor、Bolt.new、ReplitGhostwriter等，都在加速软件研发流程。本文将结合一些常见使用场景，并通过简短代码示例，让你对AI编程
松灵机器人SCOUT的ROS代码架构鱼尾sama ROS c++自动驾驶
++++++++++++++++++++++++++++++++++++松灵机器人感觉可能还是比较小众吧。明明官方的github，但是star之类的完全没有关注度，也没有issue。这个SCOUT_ROS主要是解决SCOUT底盘与上位机的通信方式，截止目前，已经开放了RS232和CAN两种通信方式。而且CSDN上只有寥寥几篇的相关博客，参考价值不高。故整理此篇，以供后日快速上手。代码地址：http
如何通过开源SDK控制松灵机器人SCOUT底盘？松灵机器人AgileX Robotics 编程语言自动驾驶机器学习
如何通过开源SDK控制松灵机器人SCOUT底盘？松灵课堂开课啦！为了更方便的解答大家在使用我们移动底盘进行开发调试的时候出现的问题，使二次研发更加得心应手，我们开设了松灵课堂，主要为小伙伴们讲解松灵机器人开发过程中一些需要注意的问题，同时也会将具有代表性的用户问题（隐去商业秘密后）详解发布出来，欢迎各位小伙伴随时与我们交流。松灵课堂系列SCOUT专题分为三篇文章，第一个专题我们采用松灵最具代表性的
昆虫机器人：从仿生设计到未来应用机器小乙机器人
目录引言：从科幻到现实的启示仿生昆虫机器人：技术突破与功能解析应用场景：农业与灾后救援的革新技术难点：微型机器人研发的挑战未来趋势：智能化与群体协作的潜力总结：昆虫机器人技术的广阔前景1.引言：从科幻到现实的启示还记得阿西莫夫的《奇幻之旅》吗？科学家通过微型潜艇进入人体进行探险，这种场景曾是科幻迷的梦想。如今，随着人工智能和仿生设计的发展，这些奇思妙想正在逐步成为现实。最近，《科学机器人》期刊的一
【AI人工智能】DeepSeek R1：你需要知道的一切大名顶顶人工智能人工智能 AI DeepSeek 程序员计算机编程开源
我们将在本博客中介绍的关于DeepSeekR1的所有你需要知道的一切内容，请坚持认真读完，必有收获：DeepSeekR1简要概述主要特点与能力开源与可访问性模型架构强化学习训练变体与精简模型使用案例与应用从专有模型迁移到开源模型1.DeepSeekR1简要概述大语言模型（LLM）研究领域正在迅速发展，每一个新模型都在推动机器能力的边界。DeepSeekR1是由DeepSeek于2025年1月20日
手眼标定：相机坐标系转换代码李大脑袋741 人工智能 python 计算机视觉
在我们机器人与相机的联动使用时，必须进行的操作为手眼标定，将相机的坐标系与机器人的末端坐标系进行转换。首先第1步为拍摄相机照片，并进行标定得到内参：如何matlab进行单目相机标定（全流程）_matlabcamerecalibrator-CSDN博客如何未直接获得外参，还需进行相机的外参求解：matlab进行相机标定求得外参_matlab求解外参函数-CSDN博客求解相机内参外参后，还需将相机拍摄
【人工智能时代】-Deepseek用到的技术架构 xiaoli8748_软件开发人工智能
以下是DeepSeek技术架构的详细介绍：1.混合专家架构（MoE）DeepSeek-V3采用了混合专家（Mixture-of-Experts,MoE）架构，这种架构通过将模型分解为多个“专家”网络来处理不同的输入特征。具体配置如下：层数：61层，其中58层为MoE层。专家数量：每层有257个专家（1个共享专家+256个路由专家），整个模型共有14,906个专家。激活机制：每个Token激活9个专
Chrome浏览器删除网站cookies的解决方案爱编程的喵喵 Windows实用技巧 chrome cookie cookies
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了Chrome浏览器删除网站cooki
当区块链遇见AI：智能合约如何驱动下一代 DApps 威哥说编程人工智能区块链 ai
随着区块链技术和人工智能（AI）在各自领域的迅速发展，二者的结合逐渐成为了推动未来技术创新的重要力量。特别是在去中心化应用（DApps）领域，区块链与AI的结合有可能彻底改变我们对智能合约、数据处理、决策制定等方面的理解。智能合约（SmartContracts）作为区块链的核心组成部分，能够在无信任环境下自动执行合同条款。而AI则能赋予智能合约“自主学习”和“智能决策”的能力，使得DApps的功能
如何在本地电脑上安装和使用 DeepSeek R-1 知识大胖 NVIDIA GPU和大语言模型开发教程电脑
简介似乎每个人都在谈论DeepSeekR-1是中国人工智能公司DeepSeek开发的全新开源人工智能语言模型。一些用户声称，其推理能力与OpenAI的o1相当，甚至更好。目前，DeepSeek是免费使用的，这对用户来说是个好消息，但也带来了一些疑问。随着用户量的激增，他们如何管理服务器成本？硬件运行成本不可能便宜吧？这里最合乎逻辑的一点是——数据。数据是人工智能模型的命脉。他们可能以某种方式收集用
智能化Kubernetes管理：AI与ChatGPT提升运维效率的创新实践大大宝的博客 k8s kubernetes 人工智能 chatgpt
摘要随着云计算技术的飞速发展，Kubernetes（K8s）已成为企业进行容器化应用管理的标准平台。然而，Kubernetes集群的管理在复杂度、规模和资源优化等方面仍然面临巨大挑战。传统的Kubernetes运维方式往往依赖手动操作，导致效率低下，且容易产生人为错误。随着人工智能（AI）技术的成熟，特别是基于自然语言处理（NLP）的智能体如ChatGPT的出现，AI智能体能够在Kubernete
JavaScript中的隐式类型转换阿珊和她的猫 javascript 开发语言 ecmascript
前端开发工程师、技术日更博主、已过CET6阿珊和她的猫_CSDN博客专家、23年度博客之星前端领域TOP1牛客高级专题作者、打造专栏《前端面试必备》、《2024面试高频手撕题》蓝桥云课签约作者、上架课程《Vue.js和Egg.js开发企业级健康管理项目》、《带你从入门到实战全面掌握uni-app》前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站。文章
【大模型应用开发动手做AI Agent】Plan and Solve策略的提出杭州大厂Java程序媛计算机软件编程原理与应用实践 java python javascript kotlin golang 架构人工智能
【大模型应用开发动手做AIAgent】Plan-and-Solve策略的提出关键词：大模型，AIAgent，Plan-and-Solve，智能体，策略学习，强化学习，自然语言处理1.背景介绍随着人工智能技术的飞速发展，大模型（LargeLanguageModel，LLM）在自然语言处理（NaturalLanguageProcessing，NLP）领域取得了显著的突破。大模型能够理解和生成自然语言，
监控易：智慧高校一体化综合运维解决方案 MXsoft618 运维信息安全物联网监控类
新冠疫情发生以来，线上线下教育模式的初探，促使学校、家长和社会对于教育信息化认识产生巨大的转变。伴随着云计算和物联网的发展，教育已经开启了一个全新的时代。自“十三五”规划中明确提出“支持各级各类学校建设智慧校园，综合利用互联网、大数据、人工智能和虚拟现实技术探索未来教育教学新模式”以来，政策春风也不断加码教育信息化进程，《教育信息化2.0行动计划》以及《智慧校园总体框架》的相继发布，全国各地都在积
AI 计算的未来：去中心化浪潮与全球竞争格局重塑智识微光Intelligence 人工智能机器学习大数据
引言人工智能（AI）正以前所未有的速度发展，尤其是大模型训练和推理效率的提升，使得AI计算成本迅速下降，呈现出向去中心化演进的趋势。最新的DeepSeekr1模型，以仅600万美元的训练成本，达到了OpenAIo1级别的性能，表明AI技术正迈向更具普惠性的阶段。这一趋势不仅对AI产业格局产生深远影响，还将改变计算基础设施、全球科技竞争力分布，甚至可能影响人工超级智能（ASI）的未来发展。因此，AI
AIGC时代的Vue或React前端开发 GISer_Jinger Javascript React Vue AIGC vue.js react.js
在AIGC（人工智能生成内容）时代，Vue开发正经历着深刻的变革。以下是对AIGC时代Vue开发的详细分析：一、AIGC技术对Vue开发的影响代码生成与自动化AIGC技术使得开发者能够借助智能工具快速生成和优化Vue代码。例如，通过自然语言处理模型（如ChatGPT），开发者可以描述组件的功能和样式需求，然后自动生成包含模板、脚本和样式的完整组件代码。这不仅大大提高了开发效率，还减少了人为错误的可
大模型蒸馏与大模型微调技术有啥差别? kcarly 大模型知识乱炖杂谈大模型蒸馏大模型微调大模型 AI
大模型蒸馏与大模型微调是当前人工智能领域中两种重要的技术手段，它们在模型优化、性能提升和资源利用方面各有特点。以下将从定义、技术原理、应用场景及优缺点等方面对这两种技术进行深入对比。一、定义与基本概念大模型蒸馏（KnowledgeDistillation）蒸馏是一种将大型复杂模型（教师模型）的知识迁移到小型模型（学生模型）的技术。通过训练学生模型模仿教师模型的行为，实现模型压缩和性能保留的目标。蒸
DeepSeek大模型技术深度解析：揭开Transformer架构的神秘面纱不一样的信息安全网络杂烩 AI DeepSeek
摘要DeepSeek大模型由北京深度求索人工智能基础技术研究有限公司开发，基于Transformer架构，具备卓越的自然语言理解和生成能力。该模型能够高效处理智能对话、文本生成和语义理解等复杂任务，标志着人工智能在自然语言处理领域的重大进展。关键词DeepSeek模型,Transformer架构,自然语言,智能对话,文本生成,语义理解一、DeepSeek大模型的架构解析1.1DeepSeek大模型
从 0 到 1，DEEPseek 大模型瞬间 “霸榜” AI 赛道的秘密羑悻的小杀马特. 人工智能 deepseek AI大模型
大家都知道，科技发展特别快，AI更是突飞猛进。DeepSeek大模型，就像一匹黑马，在AI领域迅速冒尖。它和我们常用的搜索引擎、聊天机器人都有联系，到底有多大能耐？让我们一起揭开它的神秘面纱。deepseek官网传送门：DeepSeek下面就对本文标题来剖析一下：目录一·本篇背景：二.技术实力：突破创新的基石:2.1强大的模型能力:2.1.1高效的训练方法:2.2.2优化的推理速度:三·市场策略：
《DeepSeek-R1 问世，智能搜索领域迎来新变革》黑金IT 智能搜索
DeepSeek-R1是由DeepSeek公司开发的一款创新型人工智能模型，自2024年5月7日发布以来，迅速在AI领域引起广泛关注。该模型凭借其卓越的语言理解能力、高效的数据处理能力、自适应学习能力、高安全性与可靠性以及广泛的应用场景与拓展性，在众多人工智能模型中脱颖而出。DeepSeek-R1的核心特点强大的语言理解能力：DeepSeek-R1采用先进的深度学习算法，能够精准解析复杂的语义结构
web前段跨域nginx代理配置刘正强 nginx cms Web
nginx代理配置可参考server部分 server { listen 80; server_name localhost;
spring学习笔记 caoyong spring
一、概述 a>、核心技术 : IOC与AOP b>、开发为什么需要面向接口而不是实现接口降低一个组件与整个系统的藕合程度，当该组件不满足系统需求时，可以很容易的将该组件从系统中替换掉，而不会对整个系统产生大的影响 c>、面向接口编口编程的难点在于如何对接口进行初始化,(使用工厂设计模式)
Eclipse打开workspace提示工作空间不可用 0624chenhong eclipse
做项目的时候，难免会用到整个团队的代码，或者上一任同事创建的workspace， 1.电脑切换账号后，Eclipse打开时，会提示Eclipse对应的目录锁定，无法访问，根据提示，找到对应目录，G:\eclipse\configuration\org.eclipse.osgi\.manager，其中文件.fileTableLock提示被锁定。解决办法，删掉.fileTableLock文件，重
Javascript 面向对面写法的必要性？一炮送你回车库 JavaScript
现在Javascript面向对象的方式来写页面很流行，什么纯javascript的mvc框架都出来了：ember 这是javascript层的mvc框架哦,不是j2ee的mvc框架我想说的是，javascript本来就不是一门面向对象的语言，用它写出来的面向对象的程序，本身就有些别扭，很多人提到js的面向对象首先提的是：复用性。那么我请问你写的js里有多少是可以复用的，用fu
js array对象的迭代方法换个号韩国红果果 array
1.forEach 该方法接受一个函数作为参数，对数组中的每个元素使用该函数 return 语句失效 function square(num) { print(num, num * num); } var nums = [1,2,3,4,5,6,7,8,9,10]; nums.forEach(square); 2.every 该方法接受一个返回值为布尔类型
对Hibernate缓存机制的理解归来朝歌 session 一级缓存对象持久化
在hibernate中session一级缓存机制中，有这么一种情况：问题描述：我需要new一个对象，对它的几个字段赋值，但是有一些属性并没有进行赋值，然后调用 session.save()方法，在提交事务后，会出现这样的情况： 1：在数据库中有默认属性的字段的值为空 2：既然是持久化对象，为什么在最后对象拿不到默认属性的值？通过调试后解决方案如下：对于问题一，如你在数据库里设置了
WebService调用错误合集 darkranger webservice
Java.Lang.NoClassDefFoundError: Org/Apache/Commons/Discovery/Tools/DiscoverSingleton 调用接口出错，一个简单的WebService import org.apache.axis.client.Call;import org.apache.axis.client.Service; 首先必不可
JSP和Servlet的中文乱码处理 aijuans Java Web
JSP和Servlet的中文乱码处理前几天学习了JSP和Servlet中有关中文乱码的一些问题，写成了博客，今天进行更新一下。应该是可以解决日常的乱码问题了。现在作以下总结希望对需要的人有所帮助。我也是刚学，所以有不足之处希望谅解。一、表单提交时出现乱码：在进行表单提交的时候，经常提交一些中文，自然就避免不了出现中文乱码的情况，对于表单来说有两种提交方式：get和post提交方式。所以
面试经典六问 atongyeye 工作面试
题记：因为我不善沟通，所以在面试中经常碰壁，看了网上太多面试宝典，基本上不太靠谱。只好自己总结，并试着根据最近工作情况完成个人答案。以备不时之需。以下是人事了解应聘者情况的最典型的六个问题： 1 简单自我介绍关于这个问题，主要为了弄清两件事，一是了解应聘者的背景，二是应聘者将这些背景信息组织成合适语言的能力。我的回答：(针对技术面试回答，如果是人事面试，可以就掌
contentResolver.query()参数详解百合不是茶 android query()详解
收藏csdn的博客,介绍的比较详细,新手值得一看 1.获取联系人姓名一个简单的例子，这个函数获取设备上所有的联系人ID和联系人NAME。 [java] view plain copy public void fetchAllContacts() {
ora-00054:resource busy and acquire with nowait specified解决方法 bijian1013 oracle 数据库 kill nowait
当某个数据库用户在数据库中插入、更新、删除一个表的数据，或者增加一个表的主键时或者表的索引时，常常会出现ora-00054:resource busy and acquire with nowait specified这样的错误。主要是因为有事务正在执行（或者事务已经被锁），所有导致执行不成功。 1.下面的语句
web 开发乱码征客丶 spring Web
以下前端都是 utf-8 字符集编码一、后台接收 1.1、 get 请求乱码 get 请求中，请求参数在请求头中；乱码解决方法： a、通过在web 服务器中配置编码格式：tomcat 中，在 Connector 中添加URIEncoding="UTF-8"； 1.2、post 请求乱码 post 请求中，请求参数分两部份， 1.2.1、url？参数，
【Spark十六】： Spark SQL第二部分数据源和注册表的几种方式 bit1129 spark
Spark SQL数据源和表的Schema case class apply schema parquet json JSON数据源准备源数据 {"name":"Jack", "age": 12, "addr":{"city":"beijing&
JVM学习之:调优总结 -Xms -Xmx -Xmn -Xss BlueSkator -Xss -Xmn -Xms -Xmx
堆大小设置JVM 中最大堆大小有三方面限制：相关操作系统的数据模型（32-bt还是64-bit）限制；系统的可用虚拟内存限制；系统的可用物理内存限制。32位系统下，一般限制在1.5G~2G；64为操作系统对内存无限制。我在Windows Server 2003 系统，3.5G物理内存，JDK5.0下测试，最大可设置为1478m。典型设置： java -Xmx355
jqGrid 各种参数详解(转帖) BreakingBad jqGrid
jqGrid 各种参数详解分类：源代码分享个人随笔请勿参考解决开发问题 2012-05-09 20:29 84282人阅读评论(22) 收藏举报 jquery 服务器 parameters function ajax string
读《研磨设计模式》-代码笔记-代理模式-Proxy bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.lang.reflect.InvocationHandler; import java.lang.reflect.Method; import java.lang.reflect.Proxy; /* * 下面
应用升级iOS8中遇到的一些问题 chenhbc ios8 升级iOS8
1、很奇怪的问题，登录界面，有一个判断，如果不存在某个值，则跳转到设置界面，ios8之前的系统都可以正常跳转，iOS8中代码已经执行到下一个界面了，但界面并没有跳转过去，而且这个值如果设置过的话，也是可以正常跳转过去的，这个问题纠结了两天多，之前的判断我是在 -(void)viewWillAppear:(BOOL)animated 中写的，最终的解决办法是把判断写在 -(void
工作流与自组织的关系？ comsci 设计模式工作
目前的工作流系统中的节点及其相互之间的连接是事先根据管理的实际需要而绘制好的，这种固定的模式在实际的运用中会受到很多限制，特别是节点之间的依存关系是固定的，节点的处理不考虑到流程整体的运行情况，细节和整体间的关系是脱节的，那么我们提出一个新的观点，一个流程是否可以通过节点的自组织运动来自动生成呢？这种流程有什么实际意义呢？这里有篇论文，摘要是：“针对网格中的服务
Oracle11.2新特性之INSERT提示IGNORE_ROW_ON_DUPKEY_INDEX daizj oracle
insert提示IGNORE_ROW_ON_DUPKEY_INDEX 转自：http://space.itpub.net/18922393/viewspace-752123 在 insert into tablea ...select * from tableb中，如果存在唯一约束，会导致整个insert操作失败。使用IGNORE_ROW_ON_DUPKEY_INDEX提示，会忽略唯一
二叉树:堆 dieslrae 二叉树
这里说的堆其实是一个完全二叉树,每个节点都不小于自己的子节点,不要跟jvm的堆搞混了.由于是完全二叉树,可以用数组来构建.用数组构建树的规则很简单: 一个节点的父节点下标为: (当前下标 - 1)/2 一个节点的左节点下标为: 当前下标 * 2 + 1 &
C语言学习八结构体 dcj3sjt126com c
为什么需要结构体，看代码 # include <stdio.h> struct Student //定义一个学生类型，里面有age, score, sex, 然后可以定义这个类型的变量 { int age; float score; char sex; } int main(void) { struct Student st = {80, 66.6,
centos安装golang dcj3sjt126com centos
#在国内镜像下载二进制包 wget -c http://www.golangtc.com/static/go/go1.4.1.linux-amd64.tar.gz tar -C /usr/local -xzf go1.4.1.linux-amd64.tar.gz #把golang的bin目录加入全局环境变量 cat >>/etc/profile<
10.性能优化-监控-MySQL慢查询 frank1234 性能优化 MySQL慢查询
1.记录慢查询配置 show variables where variable_name like 'slow%' ; --查看默认日志路径查询结果：--不用的机器可能不同 slow_query_log_file=/var/lib/mysql/centos-slow.log 修改mysqld配置文件：/usr /my.cnf[一般在/etc/my.cnf，本机在/user/my.cn
Java父类取得子类类名 happyqing java this 父类子类类名
在继承关系中，不管父类还是子类，这些类里面的this都代表了最终new出来的那个类的实例对象，所以在父类中你可以用this获取到子类的信息！ package com.urthinker.module.test; import org.junit.Test; abstract class BaseDao<T> { public void
Spring3.2新注解@ControllerAdvice jinnianshilongnian @Controller
@ControllerAdvice，是spring3.2提供的新注解，从名字上可以看出大体意思是控制器增强。让我们先看看@ControllerAdvice的实现： @Target(ElementType.TYPE) @Retention(RetentionPolicy.RUNTIME) @Documented @Component public @interface Co
Java spring mvc多数据源配置 liuxihope spring
转自：http://www.itpub.net/thread-1906608-1-1.html 1、首先配置两个数据库 <bean id="dataSourceA" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close&quo
第12章 Ajax（下） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
BW / Universe Mappings blueoxygen BO
BW Element OLAP Universe Element Cube Dimension Class Charateristic A class with dimension and detail objects (Detail objects for key and desription) Hi
Java开发熟手该当心的11个错误 tomcat_oracle java 多线程工作单元测试
#1、不在属性文件或XML文件中外化配置属性。比如，没有把批处理使用的线程数设置成可在属性文件中配置。你的批处理程序无论在DEV环境中，还是UAT（用户验收测试）环境中，都可以顺畅无阻地运行，但是一旦部署在PROD 上，把它作为多线程程序处理更大的数据集时，就会抛出IOException，原因可能是JDBC驱动版本不同，也可能是#2中讨论的问题。如果线程数目可以在属性文件中配置，那么使它成为
推行国产操作系统的优劣 yananay windows linux 国产操作系统
最近刮起了一股风，就是去“国外货”。从应用程序开始，到基础的系统，数据库，现在已经刮到操作系统了。原因就是“棱镜计划”，使我们终于认识到了国外货的危害，开始重视起了信息安全。操作系统是计算机的灵魂。既然是灵魂，为了信息安全，那我们就自然要使用和推行国货。可是，一味地推行，是否就一定正确呢？先说说信息安全。其实从很早以来大家就在讨论信息安全。很多年以前，就据传某世界级的网络设备制造商生产的交