Our two previous blog entries implied that there is a role games can play in driving the development of Reinforcement Learning algorithms. As the world’s most popular creation engine, Unity is at the crossroads between machine learning and gaming. It is critical to our mission to enable machine learning researchers with the most powerful training scenarios, and for us to give back to the gaming community by enabling them to utilize the latest machine learning technologies. As the first step in this endeavor, we are excited to introduce Unity Machine Learning Agents Toolkit.
我们之前的两个博客文章暗示游戏可以在推动强化学习算法的发展中发挥作用。 作为世界上最受欢迎的创作引擎,Unity处于机器学习和游戏之间的十字路口。 对于我们的使命而言,至关重要的是使机器学习研究人员能够获得最强大的培训方案,并通过使他们能够利用最新的机器学习技术来回馈游戏界。 作为这项工作的第一步,我们很高兴推出Unity Machine Learning Agents工具包。
演示地址
Machine Learning is changing the way we expect to get intelligent behavior out of autonomous agents. Whereas in the past the behavior was coded by hand, it is increasingly taught to the agent (either a robot or virtual avatar) through interaction in a training environment. This method is used to learn behavior for everything from industrial robots, drones, and autonomous vehicles, to game characters and opponents. The quality of this training environment is critical to the kinds of behaviors that can be learned, and there are often trade-offs of one kind or another that need to be made. The typical scenario for training agents in virtual environments is to have a single environment and agent which are tightly coupled. The actions of the agent change the state of the environment, and provide the agent with rewards.
机器学习正在改变我们期望从自治代理中获得智能行为的方式。 过去,行为是通过手工编写的,而通过培训环境中的交互,行为已越来越多地教给代理(机器人或虚拟化身)。 这种方法用于学习从工业机器人,无人机和自动驾驶汽车到游戏角色和对手的一切行为。 培训环境的质量对于可以学习的行为类型至关重要,并且经常需要权衡取舍。 在虚拟环境中培训代理的典型方案是使单个环境和代理紧密耦合。 代理的行为会更改环境状态,并为代理提供奖励。
The typical Reinforcement Learning training cycle.
典型的强化学习训练周期。
At Unity, we wanted to design a system that provide greater flexibility and ease-of-use to the growing groups interested in applying machine learning to developing intelligent agents. Moreover, we wanted to do this while taking advantage of the high quality physics and graphics, and simple yet powerful developer control provided by the Unity Engine and Editor. We think that this combination can benefit the following groups in ways that other solutions might not:
在Unity,我们希望设计一个系统,为有兴趣将机器学习应用于开发智能代理的成长中群体提供更大的灵活性和易用性。 此外,我们希望在利用Unity引擎和编辑器提供的高质量物理和图形以及简单而强大的开发人员控制的同时进行此操作。 我们认为这种组合可以使其他解决方案无法从以下群体中受益:
We call our solution Unity Machine Learning Agents Toolkit (ML-Agents toolkit for short), and are happy to be releasing an open beta version of our SDK today! The ML-Agents SDK allows researchers and developers to transform games and simulations created using the Unity Editor into environments where intelligent agents can be trained using Deep Reinforcement Learning, Evolutionary Strategies, or other machine learning methods through a simple to use Python API. We are releasing this beta version of Unity ML-Agents toolkit as open-source software, with a set of example projects and baseline algorithms to get you started. As this is an initial beta release, we are actively looking for feedback, and encourage anyone interested to contribute on our GitHub page. For more information on Unity ML-Agents toolkit, continue reading below! For more detailed documentation, see our GitHub Wiki.
我们将我们的解决方案称为Unity Machine Learning Agents工具包 (简称ML-Agents工具包),很高兴今天发布我们的SDK的开放测试版! ML-Agents SDK允许研究人员和开发人员将使用Unity Editor创建的游戏和模拟转换为可以通过简单易用的Python API使用深度强化学习,进化策略或其他机器学习方法来训练智能代理的环境。 我们将开放版本的Unity ML-Agents工具包作为开源软件发布,并提供了一系列示例项目和基准算法来帮助您入门。 由于这是最初的Beta版本,因此我们正在积极寻求反馈,并鼓励任何有兴趣在GitHub页面上做出贡献的人。 有关Unity ML-Agents工具包的更多信息,请继续阅读下面的内容! 有关更多详细文档,请参见我们的GitHub Wiki 。
A visual depiction of how a Learning Environment might be configured within Unity ML-Agents Toolkit.
关于如何在Unity ML-Agents Toolkit中配置学习环境的直观描述。
The three main kinds of objects within any Learning Environment are:
任何学习环境中的三种主要对象是:
Agent – Each Agent can have a unique set of states and observations, take unique actions within the environment, and receive unique rewards for events within the environment. An agent’s actions are decided by the brain it is linked to.
代理 -每个代理可以具有一组独特的状态和观察结果,在环境中采取独特的行动,并为环境中的事件获得独特的回报。 代理的动作由其所连接的大脑决定。
Brain – Each Brain defines a specific state and action space, and is responsible for deciding which actions each of its linked agents will take. The current release supports Brains being set to one of four modes:
大脑 –每个大脑定义一个特定的状态和动作空间,并负责确定每个关联的主体将采取的动作。 当前版本支持将Brains设置为以下四种模式之一:
Internal (Experimental) – Actions decisions are made using a trained model embedded into the project via TensorFlowSharp.
内部(实验性)–使用通过TensorFlowSharp嵌入到项目中的经过训练的模型来做出操作决策。
Brain – Each Brain defines a specific state and action space, and is responsible for deciding which actions each of its linked agents will take. The current release supports Brains being set to one of four modes:
大脑 –每个大脑定义一个特定的状态和动作空间,并负责确定每个关联的主体将采取的动作。 当前版本支持将Brains设置为以下四种模式之一:
Academy – The Academy object within a scene also contains as children all Brains within the environment. Each environment contains a single Academy which defines the scope of the environment, in terms of:
学院 -场景中的学院对象还包含环境中的所有大脑(作为孩子)。 每个环境都包含一个学院,该学院根据以下方面定义环境的范围:
Academy – The Academy object within a scene also contains as children all Brains within the environment. Each environment contains a single Academy which defines the scope of the environment, in terms of:
学院 -场景中的学院对象还包含环境中的所有大脑(作为孩子)。 每个环境都包含一个学院,该学院根据以下方面定义环境的范围:
The states and observations of all agents with brains set to External are collected by the External Communicator, and communicated to our Python API for processing using your ML library of choice. By setting multiple agents to a single brain, actions can be decided in a batch fashion, opening the possibility of getting the advantages of parallel computation, when supported. For more information on how these objects work together within a scene, see our wiki page.
外部沟通器会收集所有大脑设置为“外部”的特工的状态和观察结果,并将其传达给我们的Python API,以便使用您选择的ML库进行处理。 通过将多个代理设置到一个大脑中,可以以批处理方式决定操作, 从而在获得支持时开辟了获得并行计算优势的可能性 。 有关这些对象如何在场景中协同工作的更多信息,请参见Wiki页面 。
With Unity ML-Agents toolkit, a variety of training scenarios are possible, depending on how agents, brains, and rewards are connected. We are excited to see what kinds of novel and fun environments the community creates. For those new to training intelligent agents, below are a few examples that can serve as inspiration. Each is a prototypical environment configurations with a description of how it can be created using the ML-Agents SDK.
使用Unity ML-Agents工具包,可以根据代理,大脑和奖励的连接方式进行各种培训。 我们很高兴看到社区创造了什么样的新颖有趣的环境。 对于那些刚开始训练智能代理的人,以下是一些可以作为启发的示例。 每个都是典型的环境配置,并描述了如何使用ML-Agents SDK创建它。
Single-Agent – A single agent linked to a single brain. The traditional way of training an agent. An example is any single-player game, such as Chicken. (Demo project included – “GridWorld”)
单一代理 -与单个大脑关联的单一代理。 培训代理商的传统方式。 一个示例是任何单人游戏,例如Chicken。 (包括演示项目–“ GridWorld”)
演示地址
Simultaneous Single-Agent – Multiple independent agents with independent reward functions linked to a single brain. A parallelized version of the traditional training scenario, which can speed-up and stabilize the training process. An example might be training a dozen robot-arms to each open a door simultaneously. (Demo project included – “3DBall”)
同步单一代理商 –具有独立奖励功能的多个独立代理商与单个大脑相关联。 传统培训方案的并行版本,可以加快和稳定培训过程。 一个示例可能是训练十几个机械臂以同时打开一扇门。 (包括演示项目–“ 3DBall”)
演示地址
Adversarial Self-Play – Two interacting agents with inverse reward functions linked to a single brain. In two-player games, adversarial self-play can allow an agent to become increasingly more skilled, while always having the perfectly matched opponent: itself. This was the strategy employed when training AlphaGo, and more recently used by OpenAI to train a human-beating 1v1 Dota 2 agent. (Demo project included – “Tennis”)
对抗性自我玩法 –具有反向奖励功能的两个相互作用的主体链接到单个大脑。 在两人游戏中,对抗性自我玩法可以使特工变得越来越熟练,同时始终拥有完美匹配的对手:本身。 这是训练AlphaGo时采用的策略,最近由OpenAI用来训练人性化的1v1 Dota 2代理 。 (包括演示项目-“网球”)
Cooperative Multi-Agent – Multiple interacting agents with a shared reward function linked to either a single or multiple different brains. In this scenario, all agents must work together to accomplish a task than couldn’t be done alone. Examples include environments where each agent only has access to partial information, which needs to be shared in order to accomplish the task or collaboratively solve a puzzle. (Demo project coming soon)
合作多代理 -具有共享奖励功能的多个交互代理链接到单个或多个不同的大脑。 在这种情况下,所有座席必须共同努力完成一项任务,而这是单独完成的任务。 示例包括每个代理只能访问部分信息的环境,需要共享这些信息才能完成任务或协作解决难题。 (演示项目即将推出)
Competitive Multi-Agent – Multiple interacting agents with inverse reward function linked to either a single or multiple different brains. In this scenario, agents must compete with one another to either win a competition, or obtain some limited set of resources. All team sports would fall into this scenario. (Demo project coming soon)
竞争性多代理 -具有反向奖励功能的多个交互代理链接到单个或多个不同的大脑。 在这种情况下,座席必须相互竞争才能赢得比赛或获得一些有限的资源。 所有团队运动都将属于这种情况。 (演示项目即将推出)
Ecosystem – Multiple interacting agents with independent reward function linked to either a single or multiple different brains. This scenario can be thought of as creating a small world in which animals with different goals all interact, such a savanna in which there might be zebras, elephants, and giraffes, or an autonomous driving simulation within an urban environment. (Demo project coming soon)
生态系统 –具有独立奖励功能的多个交互作用因子链接到单个或多个不同的大脑。 可以将这种情况想象为创建一个小世界,在这个小世界中,具有不同目标的动物都会相互作用,例如在热带稀树草原中可能有斑马,大象和长颈鹿,或者在城市环境中进行自动驾驶模拟。 (演示项目即将推出)
Beyond the flexible training scenarios made possible by the Academy/Brain/Agent system, the Unity ML-Agents toolkit also includes other features which improve the flexibility and interpretability of the training process.
除了Academy / Brain / Agent系统提供的灵活培训方案外,Unity ML-Agents工具包还包括其他功能,这些功能可提高培训过程的灵活性和可解释性。
Monitoring Agent’s Decision Making – Since communication in Unity ML-Agents toolkit is a two-way street, we provide an Agent Monitor class in Unity which can display aspects of the trained agent, such as policy and value output within the Unity environment itself. By providing these outputs in real-time, researchers and developers can more easily debug an agent’s behavior.
监视代理的决策 –由于Unity ML-Agents工具包中的通信是一条双向路,因此我们在Unity中提供了一个Agent Monitor类,它可以显示受过训练的代理的各个方面,例如Unity环境自身中的策略和价值输出。 通过实时提供这些输出,研究人员和开发人员可以更轻松地调试代理的行为。
Above each agent is a value estimate, corresponding to how much future reward the agent expects. When the right agent misses the ball, the value estimate drops to zero, since it expects the episode to end soon, resulting in no additional reward.
每个代理商上方都有一个价值估算,对应于代理商期望获得的未来报酬。 当合适的经纪人错过球时,价值估计会降为零,因为它期望情节很快结束,因此不会产生额外的回报。
Curriculum Learning – It is often difficult for agents to learn a complex task at the beginning of the training process. Curriculum learning is the process of gradually increasing the difficulty of a task to allow more efficient learning. The Unity ML-Agents toolkit supports setting custom environment parameters every time the environment is reset. This allows elements of the environment related to difficulty or complexity to be dynamically adjusted based on training progress.
课程学习 –代理商在培训过程开始时通常很难学习复杂的任务。 课程学习是逐渐增加任务难度以提高学习效率的过程。 每次重置环境时,Unity ML-Agents工具箱都支持设置自定义环境参数。 这允许根据培训进度动态调整与难度或复杂性相关的环境元素。
Different possible configurations of the GridWorld environment with increasing complexity.
GridWorld环境的不同可能配置,其复杂性不断增加。
Complex Visual Observations – Unlike other platforms, where the agent’s observation might be limited to a single vector or image, the Unity ML-Agents toolkit allows multiple cameras to be used for observations per agent. This enables agents to learn to integrate information from multiple visual streams, as would be the case when training a self-driving car which required multiple cameras with different viewpoints, a navigational agent which might need to integrate aerial and first-person visuals, or an agent which takes both a raw visual input, as well as a depth-map or object-segmented image.
复杂的视觉观察 –与其他平台(代理商的观察可能仅限于单个矢量或图像)不同,Unity ML-Agents工具包允许将多个摄像机用于每个代理商的观察。 这使特工能够学习从多个视觉流中整合信息,例如在训练需要多个具有不同视点的摄像头的自动驾驶汽车,可能需要整合航空和第一人称视觉图像的导航特工时,就是这种情况。既要获取原始视觉输入,又要获取深度图或对象分段图像的代理。
Two different camera views on the same environment. When both are provided to an agent, it can learn to utilize both first-person and map-like information about the task to defeat the opponent.
在同一环境中的两个不同的相机视图。 当两者都提供给代理时,它可以学习利用有关任务的第一人称和类似地图的信息来击败对手。
Imitation Learning (Coming Soon) – It is often more intuitive to simply demonstrate the behavior we want an agent to perform, rather than attempting to have it learn via trial-and-error methods. In a future release, the Unity ML-Agents toolkit will provide the ability to record all state/action/reward information for use in supervised learning scenarios, such as imitation learning. By utilizing imitation learning, a player can provide demonstrations of how an agent should behave in an environment, and then utilize those demonstrations to train an agent in either a standalone fashion, or as a first-step in a reinforcement learning process.
模仿学习 (即将推出)–通常简单地演示我们希望代理执行的行为,而不是尝试通过试错法学习,通常会更直观。 在将来的版本中,Unity ML-Agents工具包将提供记录所有状态/动作/奖励信息的功能,以用于有监督的学习场景(例如模仿学习)中。 通过利用模仿学习,玩家可以提供关于特工在环境中应如何表现的演示,然后利用这些演示以独立方式或作为强化学习过程的第一步来训练特工。
As mentioned above, we are excited to be releasing this open beta version of Unity Machine Learning Agents Toolkit today, which can be downloaded from our GitHub page. This release is only the beginning, and we plan to iterate quickly and provide additional features for both those of you who are interested in Unity as a platform for Machine Learning research, and those of you who are focused on the potential of Machine Learning in game development. While this beta release is more focused on the former group, we will be increasingly providing support for the latter use-case. As mentioned above, we are especially interested in hearing about use-cases and features you would like to see included in future releases of Unity ML-Agents Toolkit, and we will be welcoming Pull Requests made to the GitHub Repository. Please feel free to reach out to us at [email protected] to share feedback and thoughts. If the project sparks your interests, come join the Unity Machine Learning team!
如上所述,我们很高兴今天发布Unity Machine Learning Agents Toolkit的开放测试版,可以从我们的GitHub页面下载。 该版本仅仅是开始,我们计划快速迭代,并为对Unity作为机器学习研究平台感兴趣的那些人和那些专注于机器学习在游戏中的潜力的人提供更多功能。发展。 虽然此beta版本更侧重于前者,但我们将越来越多地为后者用例提供支持。 如上所述,我们特别希望听到您希望在Unity ML-Agents Toolkit的未来版本中包含的用例和功能,并且我们将欢迎对GitHub存储库发出的拉取请求。 请随时通过[email protected]与我们联系,以分享反馈和想法。 如果项目激发您的兴趣,请加入Unity Machine Learning团队 !
Happy training!
培训愉快!
翻译自: https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/