ML-Agents Toolkit v0.3 Beta发布:模仿学习,反馈驱动的功能等

We are happy to announce that the ML-Agents team is releasing the latest version of our toolkit, v0.3.

我们很高兴地宣布ML-Agents团队将发布我们工具箱的最新版本 v0.3。

This is our biggest release yet, and many of the major features we have included are the direct results of community feedback. The new features focus on expanding what is possible with ML-Agents toolkit with the introduction of Imitation Learning, Multi-Brain Training, On-Demand Decision-Making, and Memory-Enhanced Agents. We’re also making setup and usage simpler and more intuitive with the addition of a Docker-Image, changes to API Semantics and a major revamp of our documentation. Read on to learn more about the major changes, check our GitHub page to download the new version, and learn all the details on the release page.

这是我们迄今为止最大的版本,我们包含的许多主要功能是社区反馈的直接结果。 新功能着重于通过引入 模仿学习多脑训练按需决策和记忆增强代理 来扩展ML-Agents工具包的功能 通过添加 Docker-Image, 对API语义的更改以及对文档的重大修改, 我们 还将使设置和使用更加简单直观 请继续阅读以了解有关主要更改的更多信息,检查我们的 GitHub页面 以下载新版本,并在 发布页面 上了解所有详细信息 。

通过行为克隆进行模仿学习 (Imitation Learning via Behavioral Cloning)

In Unity ML-Agents Toolkit v0.3, we include the first algorithm in a new class of methods for training agents called Imitation Learning. Unlike Reinforcement Learning, which operates primarily using a reward signal, Imitation Learning only requires demonstrations of the desired behavior in order to provide a learning signal to the agents.

在Unity ML-Agents Toolkit v0.3中,我们在用于培训代理的新方法类( 模仿学习)中包括了第一个算法。 与主要使用奖励信号进行操作的强化学习不同,模仿学习仅需要演示所需的行为 ,以便向代理提供学习信号。

We believe that, in some scenarios, simply playing as the agent in order to teach it can be more intuitive than defining a reward, and a powerful new way of creating behavior in games.

我们认为,在某些情况下,仅仅扮演代理人来教它比定义奖励和创建游戏行为的强大新方法要直观得多。

There are a variety of possible Imitation Learning algorithms that can be used, and for v0.3 we are starting with the simplest one: Behavioral Cloning. This works by collecting training data from a teacher agent, and then simply using it to directly learn a behavior, in the same way that Supervised Learning for image classification or other traditional Machine Learning tasks work.

可以使用多种可能的模仿学习算法,对于v0.3,我们从最简单的算法开始:行为克隆。 这是通过从教师代理收集培训数据,然后简单地使用它直接学习行为来实现的,就像 用于图像分类的 监督学习 或其他传统机器学习任务的工作方式一样。

As we develop this feature and collect feedback, we plan to offer methods that are both more robust, and provide a more intuitive training interface. Since applying Imitation Learning to game development is new, we’d like to develop this feature with as much community feedback as possible in order to determine how to best integrate it into the developer’s workflow. If you try out our Imitation Learning, or have ideas about how you’d like to see it more integrated into the Editor, please share your feedback with us ([email protected], or on our GitHub Issues page).

随着我们开发此功能并收集反馈,我们计划提供既更健壮的方法,又提供更直观的培训界面。 由于将模仿学习应用于游戏开发是新事物,因此我们希望在获得尽可能多的社区反馈的情况下开发此功能,以确定如何将其最好地集成到开发人员的工作流程中。 如果您尝试我们的模仿学习,或对如何将其更好地集成到编辑器中有想法,请与我们分享您的反馈意见( [email protected] ,或在我们的 GitHub Issues 页面上)。

多脑训练 (Multi-Brain training)

One of the requests we received early on was for the ability to train more than one brain at a time. Say for example you have a soccer game where different players, for example offensive and defensive, need to be controlled differently. Using Multi-Brain Training, you can give each “position” on the field a separate brain, with its own observation and action space, and train it alongside other brains.

我们很早收到的要求之一是能够一次训练多个大脑。 假设您有一场足球比赛,需要对不同的球员(例如进攻和防守)进行不同的控制。 使用“ 多脑训练” ,您可以为场上的每个“位置”分配一个独立的大脑,并具有自己的观察和动作空间,并与其他大脑一起训练它。

At the end of training, you will receive one binary (.bytes) file, which contains one neural network model per brain. This allows for mixing and matching different hyperparameters, as well as using our Curriculum Learning feature to progressively change how different sets of brains and agents interact within the environment over time.

训练结束时,您将收到一个二进制文件(.bytes),其中每个大脑包含一个神经网络模型。 这允许混合和匹配不同的超参数,并使用我们的 课程学习 功能逐步改变不同的大脑和主体在环境中的交互方式。

按需决策 (On-Demand Decision-Making)

Another request we received from the developer community was the ability to have agents ask for decisions in an on-demand fashion, rather than forcing them to make decisions every step or every few steps of the engine.

我们从开发人员社区收到的另一个请求是能够让代理按需请求决策的能力,而不是强迫他们在引擎的每个步骤或每几个步骤做出决策。

There are multiple genres of games, such as card games, real-time strategy games, role-playing games, board games, etc, all of which rely on agents being able to make decisions after variable amounts of time. We are happy to be supporting this in ML-Agents toolkit. You can now enable and disable On-Demand Decision-Making for each agent independently with the click of a button! Simply enable it on your agent, and make a simple function call on an agent to ask for a decision from its brain.

有多种类型的游戏,例如纸牌游戏,实时策略游戏,角色扮演游戏,棋盘游戏等,所有这些游戏都依赖于代理能够在可变的时间量后做出决策。 我们很高兴在ML-Agents工具包中对此提供支持。 现在,您只需单击一个按钮,即可为每个代理独立启用和禁用按需决策 ! 只需在您的代理上启用它,然后对代理进行简单的功能调用,即可请求其决策。

对ML-Agents语义的更改 (Changes to ML-Agents semantics)

In order to help future-proof ML-Agents, we have made a series of changes to the semantics of the toolkit. These changes are designed to bring the terms and concepts we use within the system more in-line with the literature on Reinforcement Learning.

为了帮助面向未来的ML-Agent,我们对工具包的语义进行了一系列更改。 这些更改旨在使我们在系统内使用的术语和概念与关于强化学习的文献更加一致。

The biggest of these changes is that there is no longer the concept of “state.” Instead, agents receive observations of various kinds (vector, image, or text) from the environment, send these observations to their respective brain to have a decision calculated (either as a vector or in text), and then receieve this decision from the brain and use it to take an action. See the table below for an overview of the changes. These changes require changes to the API. To understand how these changes affect the current environments build using Unity ML-Agents Toolkit v0.2 and earlier, see here.

这些变化中最大的变化是不再存在“状态”的概念。 相反,特工 从环境中 接收 各种 观察结果 (矢量,图像或文本),将这些观察结果发送到各自的大脑以 计算 出 决策 (以矢量或文本形式),然后从大脑接收该决策并采取 行动 。 有关更改的概述,请参见下表。 这些更改需要更改API。 要了解这些更改如何影响使用Unity ML-Agents Toolkit v0.2及更早版本构建的当前环境,请参见此处 。

Old New
State Vector Observation
Observation Visual Observation
(New) Text Observation
Action Vector Action
(New) Text Action
矢量观察
观察 视觉观察
(新)文本观察
行动 矢量动作
(新)文字操作

在部分可观察性下学习 (Learning under partial observability )

Part of the motivation for changing semantics from state to observation is that, in most environments, the agents are never actually exposed to the full state of the environment. Instead, they receive partial observations which often consist of local or incomplete information. It is often too expensive to provide the agent with the full state, or it is unclear how to even represent that state. In order to overcome this, we are including two methods for dealing with partial observability within learning environments through Memory-Enhanced Agents.

将语义从状态更改为观察的部分动机是,在大多数环境中,代理从未真正暴露于环境的完整状态。 取而代之的是,他们会收到部分观察结果,这些观察结果通常包含本地或不完整的信息。 为代理提供完整状态通常太昂贵,或者不清楚如何表示该状态。 为了解决这个问题,我们包括通过并购 埃默里,增强代理商 的学习环境内的部分可观测处理两种方法

The first memory enhancement is Observation-Stacking. This allows an agent to keep track of up to the past ten previous observations within an episode, and to feed them all to the brain for decision-making. The second form of memory is the inclusion of an optional recurrent layer for the neural network being trained. These Recurrent Neural Networks (RNNs) have the ability to learn to keep track of important information over time in a hidden state. You can think of this as the memory of the agent.

第一个内存增强功能是O bservation-Stacking 。 这使代理可以跟踪一个情节中最近的十个以前的观察,并将其全部反馈给大脑进行决策。 记忆的第二种形式是为正在训练的神经网络包括一个可选的递归层。 这些 递归神经网络 (RNN)能够学习以隐藏状态随时间推移跟踪重要信息。 您可以将其视为代理的记忆。

使用Docker映像更容易设置(预览) (Easier setup with Docker Image (Preview))

One of the more frequent issues developers faced when using ML-Agents toolkit had little to do with the toolkit itself, and more to do with the difficulties of installing all the right prerequisites, such as Python and TensorFlow.

开发人员在使用ML-Agents工具包时遇到的最常见的问题之一与工具包本身无关,而与安装所有合适的先决条件(如Python和TensorFlow)的困难有关。

We want to make it as simple as possible for developers who want to stay within (and only think about) the world of Unity and C# to do so. As a first step toward this goal, we are enabling the creation of a Docker image, which contains all of the requirements necessary for training using an ML-Agents environment.

对于希望留在(仅考虑)Unity和C#世界中的开发人员,我们希望使其尽可能简单。 作为朝着这个目标迈出的第一步,我们使创建 Docker 映像成为可能,该映像包含使用ML-Agents环境进行培训所需的所有要求。

To train a brain (or brains), simply install Docker (easier than installing python and other dependencies, we promise), build your Unity environment with a Linux target, and launch the Docker image with the name of your environment.

要训​​练一个或多个大脑,只需安装Docker(我们承诺,比安装python和其他依赖项更容易),使用Linux目标构建您的Unity环境,然后使用您的环境名称启动Docker映像。

全新和改版的环境 (New and revamped environments)

We are happy to be including four completely new example environments with the release of v0.3: Banana Collectors, Soccer Twos, Bouncer, and Hallway. The first two of these are multi-agent environments, where the agents within the environment interact with one another either cooperatively, competitively, or both.

我们很高兴在v0.3版本中包含四个全新的示例环境: Banana Collector,Soccer TwosBouncerHallway 。 其中的前两个是多主体环境,其中环境中的主体相互合作,竞争或两者交互。

Banana Collectors

香蕉收藏家

In Banana Collectors, multiple agents move around an area attempting to collect as many rewarding bananas (yellow) as possible, while avoiding negatively rewarding bananas (purple). The catch is that the agents can fire lasers at one another, freezing them in place. Inspired by research from DeepMind last year, the agents can learn to either fight over the bananas, or peacefully share them, depending on the number of rewarding bananas within the scene.

在“香蕉收藏家”中,多个特工在一个区域内走动,试图收集尽可能多的奖励香蕉(黄色),同时避免对香蕉(紫色)产生负面奖励。 要注意的是,这些特工可以互相发射激光,将它们冻结到位。 受到 DeepMind去年研究的 启发 ,特工们可以学会争夺香蕉,或者和平分享它们,具体取决于现场奖励香蕉的数量。

Soccer Twos

足球两人

The second environment, Soccer Twos, contains a 2v2 environment, where each team contains both a striker and a goalie, each trained using separate reward functions and brains.

第二种环境是Soccer Twos,包含2v2环境,其中每个团队都包含前锋和守门员,每个人都使用单独的奖励功能和大脑进行训练。

Bouncer

弹跳器

ML-Agents Toolkit v0.3 Beta发布:模仿学习,反馈驱动的功能等_第1张图片

The third environment, Bouncer, provides an example of our new “On-Demand Decision-Making” feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas. What makes the environment unique is that the agent only makes a new decision about where to bounce next once it has landed on the ground, which takes place after various time intervals.

第三个环境Bouncer提供了我们新的“按需决策”功能的示例。 在这种环境下,特工可以向自身施加力以在平台周围弹跳,并尝试与浮动香蕉碰撞。 使环境与众不同的原因是,代理仅在落到地面上后才决定下一步要弹跳的新决定,该决定在不同的时间间隔后发生。

Hallway

门厅

ML-Agents Toolkit v0.3 Beta发布:模仿学习,反馈驱动的功能等_第2张图片

The fourth environment, Hallway (inspired by this paper) provides a test of the agent’s memory abilities, and our new support for Recurrent Neural Networks as an optional model type to be trained. In it, an agent must use local perception, to explore a hallway, discover the color of the block, and use that information to go to the rewarding goal.

第四个环境Hallway(受 本文 启发 ) 提供了对代理记忆能力的测试,以及我们对递归神经网络的新支持,它是要训练的可选模型类型。 在其中,特工必须使用当地的感知力,探索走廊,发现街区的颜色,并使用该信息来实现有意义的目标。

In addition to these three new environments, we have also significantly revamped the Push Block and Wall Jump environments, and also provided a new unified look and feel to all of the example environments. We hope that these changes will make it easier for the community to train their own models in these environments, and to use the environments as inspiration for their own work.

除了这三个新环境外,我们还对 Push BlockWall Jump 环境进行了 重大改进 ,并为所有示例环境提供了新的统一外观。 我们希望这些更改将使社区能够更轻松地在这些环境中训练自己的模型,并利用这些环境作为自己工作的灵感。

试试看 (Try it out)

We encourage you to try out these new features, and let us know what you think. As always, since this is a beta product that is in flux, there may be bugs or issues. If you run into one, feel free to let us know about it on our GitHub issues page. If you’d just like to provide general feedback, or discuss ML-Agents toolkit with other developers, check out our Unity Connect channel. Additionally feel free to email us directly at [email protected] with feedback, questions, or to show us what you are working on! Happy Training.

我们鼓励您尝试这些新功能,并让我们知道您的想法。 与往常一样,由于这是一个不断变化的beta产品,因此可能存在错误或问题。 如果您碰巧遇到麻烦,请随时在 GitHub问题页面 上告诉我们 。 如果您想提供一般性反馈,或者与其他开发人员讨论ML-Agents工具包,请查看我们的 Unity Connect频道 。 此外,请随时直接通过 [email protected] 向我们发送电子邮件, 提供反馈,问题或向我们展示您的工作方式! 培训愉快。

翻译自: https://blogs.unity3d.com/2018/03/15/ml-agents-v0-3-beta-released-imitation-learning-feedback-driven-features-and-more/

你可能感兴趣的:(游戏,python,人工智能,深度学习,大数据)