论文阅读:SCIENCE ROBOTICS——Learning agile and dynamic motor skills for legged robots

腿式机器人学习敏捷和动态运动技能

  • 1.关于液压驱动的优劣
  • 2.SpotMini
  • 3.ANYmal
    • 3.1 难点
    • 3.2 传统的控制局限
    • 3.3 模块化控制
    • 3.4 轨迹优化
  • 4.数据驱动方法
    • 4.1 优点
    • 4.2 缺点
    • 4.3缺点—用实物训练
    • 4.4强化学习在机器人上的进展
  • 5. 仿真—现实的转换
    • 5.1 思路

1.关于液压驱动的优劣

hydraulic actuators
These have advantages
in operation because they are powered by conventional fuel with high energy density. However, systems of this type cannot be scaled down (usually >40 kg) and generate smoke and noise, limiting them to outdoor environments .

2.SpotMini

论文阅读:SCIENCE ROBOTICS——Learning agile and dynamic motor skills for legged robots_第1张图片

3.ANYmal

The platform used in this work, ANYmal (5), is another promising quadrupedal robot powered by electric actuators. Its bioinspired actuator design makes it robust against impact while allowing accurate torque measurement at the joints. However, the complicated actuator design increases cost and compromises the power output of the robot.
在这项工作中使用的平台ANYmal(5)是另一个有前途的平台电动四足机器人。其bioinspired致动器设计使其坚固耐用的冲击,同时允许准确的扭矩关节处的测量。然而,复杂的执行机构设计增加成本,降低机器人的功率输出。

3.1 难点

1.From the control perspective, these robots are high-dimensional and nonsmooth systems with many physical constraints.
The contact points change over the course of time and depending on the maneuver being executed and, therefore, cannot be prespecified.
接触点随着时间的推移而变化时间取决于执行的机动,因此,不能指定。
2.Analytical models of the robots are often inaccurate and cause uncertainties in the dynamics.
分析的模型不准确,引发在dynamics中的不确定性
3.A complex sensor suite and multiple layers of software bring noise and delays to information transfer.
一个复杂的传感器套件和多层软件给信息带来噪音和延迟转移。

3.2 传统的控制局限

Conventional control theories are often insufficient to deal with these problems effectively. Specialized control methods developed to tackle this complex problem typically require a lengthy design process and arduous parameter tuning.

3.3 模块化控制

For example, some popular approaches (7–10) use a template-dynamicsbased
control module that approximates the robot as a point mass with a massless limb to compute the next foothold position. Given the foothold positions, the next module computes a parameterized trajectory for the foot to follow. The last module tracks the trajectory with a simple proportional-integral-derivative (PID) controller.
Because the outputs of these modules are physical quantities, such as body height or foot trajectory, each module can be individually hand-tuned.
将机器人近似为一个质点用无质量的肢体计算下一个立足点的位置鉴于在立足点位置,下一个模块计算一个参数化脚的轨迹。最后一个模块跟踪轨迹用一个简单的比例-积分-微分(PID)控制器。
Second, the design ofmodular controllers is extremely laborious. Highly trained engineers spend months to develop a controller and to arduously hand-tune the control parameters per module for every new robot or even for every new maneuver.For example, running and climbing controllers of this kind can have markedly different architectures and are designed and tuned separately.
控制模块设计非常费力

3.4 轨迹优化

trajectory optimization
the controller is separated into two modules: planning and tracking.
控制器分为规划和跟踪两个模块
trajectory optimization for a complex rigidbody model with many unspecified contact points is beyond the capabilities of current optimization techniques.

主要缺点是计算代价高。

4.数据驱动方法

Data-driven methods
such as reinforcement learning (RL), promise to overcome the limitations of prior model-based approaches by learning effective controllers directly from experience.

4.1 优点

This process is fully automated and can optimize the controller end to end, from sensor readings to low-level control signals, thereby allowing for highly agile and efficient controllers.

4.2 缺点

RL typically requires prohibitively long interaction with the system to learn complex skills—typically weeks or months of real-time execution.
the controller may exhibit sudden and chaotic behavior, leading to logistical complications and safety concerns.
控制器可能表现出突然和混乱的行为,导致后勤的复杂性和安全问题。

4.3缺点—用实物训练

Direct application of learning methods to physical legged systems is therefore complicated and has only been demonstrated on relatively simple and stable platforms (19) or in a limited context (20).

Because of the difficulties of training on physical systems, most advanced applications of RL to legged locomotion are restricted to simulation.

4.4强化学习在机器人上的进展

Levine and Koltun (21) combined learning and trajectory optimization to train a locomotion controller for a simulated 2D walker. Schulman et al. (22) trained a locomotion policy for a similar 2D walker with an actor-critic method. More recent work obtained full 3D locomotion policies (23–26). In these papers, animated characters achieve remarkable motor skills in simulation.

标号是要读的论文

5. 仿真—现实的转换

simulation-to-reality transfer

5.1 思路

There are two general approaches to bridging the reality gap. The first is to improve simulation fidelity either analytically or in a data-driven way; the latter is also known as system identification.

你可能感兴趣的:(论文阅读:SCIENCE ROBOTICS——Learning agile and dynamic motor skills for legged robots)