论文笔记:Learning task-oriented grasping for tool manipulation from simulated self-supervision

1. 文章大概

1.1 文章做了什么

文章做了一个Task-Oriented Grasping Network。目前很多机械臂抓取只是单纯的抓取,文章做的是一个任务导向的抓取。抓取工具然后执行操作,文章主要完成的两个任务是:sweeping and hammering。
Four keys aspects to learning task-oriented tool usage:

  • understanding the desired effect(预期)
  • identifying properties of an object that make it a suitable tool(目标抓取物体的属性)
  • determining the correct orientation of the tool prior to usage(抓取)
  • manipulating the tool(操作)

1.2 怎么做

a two-stage procedure:

  • robot picks up a tool
  • manipulates this grasped tool to complete a task
    dataset:
    use self-supervised learning paradigm
    training labels are collected through the robot performing grasping and manipulation attempts in a trialand-error fashion.
    采用了自监督学习范式 ,训练标签是通过机器人以试错的方式进行抓取和操作尝试来收集的。

1.3 创新点

  • learn-based model
  • develop a mechanism for generating large-scale simulated self-supervision
  • generalize well in both simulation and real world

2. Related work

2.1. Task-agnostic grasping

Only learning with rendered depth images as opposed to rendered RGB images enabled the trained models to transfer to execution on a real robot without further fine-tuning, because physical depth cameras produce images that are largely similar to rendered depth images
只有学习渲染的深度图像而不是渲染的RGB图像,才能使训练的模型在没有进一步微调的情况下转移到实际机器人上执行,因为物理深度摄影机生成的图像与渲染的深度图像基本相似。

2.2.Task-oriented grasping

incorporate semantic constraints

  • 1.dataset
  • 2.do not entail the success of the downstream manipulation tasks
    相比之下作者的工作直接是根据下游任务优化的

2.3.Affordance learning

描述了物体的功能特性

3.Problem statement

目标:物体完成一个功能性操作,任务分成两部分:1、抓取物体;2、操作物体

3.1.Notation of grasping

observation space O O O, camera observation point cloud
possible grasps G G G, perpendicular to the table plane g = ( g x , g y , g z , g ϕ ) g = (g_x, g_y, g_z, g_\phi) g=(gx,gy,gz,gϕ)
Given o ∈ O o \in O oO and g ∈ G g \in G gG, S G ( o , g ) ∈ { 0 , 1 } S_G(o,g)\in\{0,1\} SG(o,g){0,1} denote a binary-values grasp success metric.
the probability of grasp success Q G ( o , g ) Q_G(o,g) QG(o,g), Q G ( o , g ) = P r ( S G = 1 ∣ o , g ) Q_G(o, g) = Pr(S_G =1 | o, g) QG(o,g)=Pr(SG=1o,g), S G S_G SG is task-agnostic.

3.2. Problem setup

  • grasp stage
  • manipulation stage, a policy π \pi π produces actions to interact with the enviroment once the object is graspde

S T ( o , g ) ∈ 0 , 1 S_T(o, g) \in {0, 1} ST(o,g)0,1: a binary-valued task-specific success metric
Q T π Q_T^\pi QTπ the probability of task success under policy π \pi π, Q T π ( o , g ) = P r ( S T = 1 ∣ o , g ) Q_T^\pi(o,g)=Pr(S_T =1 | o,g) QTπ(o,g)=Pr(ST=1o,g)The overall leaning objective is to train both policies simultaneously such that
g ∗ , π ∗ = a r g m a x g , π Q T π ( o , g ) g^*,\pi^* = argmax_{g,\pi}Q_T^\pi(o,g) g,π=argmaxg,πQTπ(o,g)

4.Task-oriented grasping for tool manipulation

论文笔记:Learning task-oriented grasping for tool manipulation from simulated self-supervision_第1张图片

4.1.Task-oriented grasp prediction

finding the corresponding grasp g g g that maximizes the grasp quality Q G ( o , g ) = P r ( S G = 1 ∣ o , g ) Q_G(o,g) = Pr(S_G =1 |o,g) QG(o,g)=Pr(SG=1o,g)
Q T ∣ G π = P r π ( S T = 1 ∣ S G = 1 , o , g ) Q_{T|G}^\pi = Pr_\pi(S_T = 1| S_G = 1, o, g) QTGπ=Prπ(ST=1SG=1,o,g) conditioned on a successful grasp
Q T ∣ G π ≥ δ Q_{T|G}^\pi \ge \delta QTGπδ
Q T π ( o , g ) = P r π ( S T = 1 ∣ o , g ) = P r ( S T = , S G = 1 ∣ o , g ) = P r π ( S T = 1 ∣ S G = 1 , o , g ) × P r ( S G = 1 ∣ o , g ) = Q T ∣ G π ( o , g ) × Q G ( o , g ) Q_T^\pi(o,g) = Pr_\pi(S_T = 1|o,g) =Pr(S_T=,S_G=1|o,g) =Pr_\pi(S_T=1|S_G=1,o,g)\times Pr(S_G=1|o,g) = Q_{T|G}^\pi(o,g)\times Q_G(o,g) QTπ(og)=Prπ(ST=1o,g)=Pr(ST=,SG=1o,g)=Prπ(ST=1SG=1,o,g)×Pr(SG=1o,g)=QTGπ(o,g)×QG(o,g)

predicted values denoted as Q ^ G ( o , g ; θ 1 ) \hat Q_G(o,g;\theta_1) Q^G(o,g;θ1) and Q ^ T ∣ G π ( o , g ; θ 2 ) \hat Q_{T|G}^\pi(o,g; \theta_2) Q^TGπ(o,g;θ2), θ 1 \theta_1 θ1and θ 2 \theta_2 θ2 represent the neural network parameters.

4.2. Manipulation policy

use a Gaussian policy
π ( a ∣ o , g ; θ 3 ) = N ( f ( o , g ; θ 3 ) , ∑ ) \pi(a|o,g;\theta_3) = N(f(o,g;\theta_3),\sum) π(ao,g;θ3)=N(f(o,g;θ3),), f ( o , g ; θ 3 ) f(o,g;\theta_3) f(o,g;θ3) predict mean and diagonal matrix

4.3. Neural network architecture

论文笔记:Learning task-oriented grasping for tool manipulation from simulated self-supervision_第2张图片

4.4 Learning objectives and optimization

θ ∗ = a r g m i n θ ∑ i = 1 N L ( S G , Q ^ G ( o , g ; θ 1 ) ) + 1 [ S G = 1 ] L ˙ ( S T , Q ^ T ∣ G π ( o , g ; θ 2 ) ) + 1 [ S T = 1 ] 1 2 ∣ ∣ f ( o , g ; θ 3 ) − a ∣ ∣ ∑ 2 \theta^* = argmin_\theta \sum_{i=1}^N L(S_G,\hat Q_G(o,g;\theta_1))+\mathbb{1}[S_G=1] \dot L(S_T,\hat Q_{T|G}^\pi(o,g;\theta_2))+1[S_T=1]\frac{1}{2}||f(o,g;\theta_3)-a||_{\sum}^2 θ=argminθi=1NL(SG,Q^G(o,g;θ1))+1[SG=1]L˙(ST,Q^TGπ(o,g;θ2))+1[ST=1]21f(o,g;θ3)a2
1 1 1 indicator function

你可能感兴趣的:(论文笔记,深度学习,机器学习,计算机视觉,人工智能)