[Unity与强化学习] ML-Agents Python Api 环境配置与开发

Unity 机器学习代理工具包 (ML-Agents) 是一个开源项目,它使游戏和模拟能够作为训练智能代理的环境。 unity官方提供基于PyTorch的强化学习算法的实现,使游戏开发人员和爱好者能够轻松地为 2D、3D 和 VR/AR 游戏训练智能代理。 研究人员还可以使用提供的简单易用的 Python API 来训练使用强化学习、模仿学习、神经进化或任何其他方法的代理。
[Unity与强化学习] ML-Agents Python Api 环境配置与开发_第1张图片

本文主要围绕官方提供的 Getting Started 文档介绍环境搭建及API的使用,并补充部分我实际使用的代码。

安装环境

首先是一份windows环境的入门搭建指南:Installing ML-Agents Toolkit for Windows (Deprecated)。但是这个教程实际上已经不再维护了,所以从Step2开始就不能保证环境安装成功,只建议部分参考。

Installation是实际可使用的文档,建议按照这个文档给出的最小安装版本进行环境搭建。这个文档的最后给出了Next Step指导如何使用官方预训练模型运行,但是我们的关注点在于如何使用Python Api开发我们自己的网络模型,因此可以基于这个进行环境有效性验证,但是对于 mlagents-learn 这一部分不会深入(因为这个限制比较多只适合需要快速构建模型的开发者使用,不适合研究者和希望使用自己的模型的用户)。

项目导入

官方文档Getting-Started给出很详细的导入方式。

其实使用Unity Editor打开项目(根目录下的Project文件夹)就可以,当然更推荐按照官方教程的sample包导入方式。

编写Python脚本

官方文档:Python-API

文档比较详细,下面给出一段我的demo代码:

        """
        file_name is the name of the environment binary (located in the root directory of the python project).
        worker_id indicates which port to use for communication with the environment.
            For use in parallel training regimes such as A3C.
        seed indicates the seed to use when generating random numbers during the training process.
            In environments which are deterministic, setting the seed enables reproducible experimentation by ensuring
            that the environment and trainers utilize the same random seed.
        side_channels provides a way to exchange data with the Unity simulation
            that is not related to the reinforcementlearning loop.
            For example: configurations or properties.More on them in the Modifying the environment from Python section.
        If you want to directly interact with the Editor, you need to use file_name=None,
            then press the Play button in the Editor when the message
            "Start training by pressing the Play button in the Unity Editor" is displayed on the screen
        """
        env = UnityEnvironment(file_name=None, seed=1, side_channels=[UnityStaticLogChannel()])
        # set time scale
        config_channel.set_configuration_parameters(time_scale=1.0)
        # Start interacting with the environment.
        env.reset()
        """
        Returns a Mapping of BehaviorName to BehaviorSpec objects (read only).
            A BehaviorSpec contains the observation shapes and the ActionSpec (which defines the action shape).
            Note that the BehaviorSpec for a specific group is fixed throughout the simulation.
            The number of entries in the Mapping can change over time in the simulation if new Agent behaviors
                are created in the simulation.
        An Agent "Behavior" is a group of Agents identified by a BehaviorName that share the same observations
            and action types (described in their BehaviorSpec).
        """

        behavior_names = env.behavior_specs.keys()

        for i in behavior_names:
            print("[Info] Behavior Name: ", i.title())

        count = 0
        while True:
            if count > 5000:
                break
            for name in behavior_names:
                states = env.get_steps(name)
                # 在此添加算法
                # ...
                actions = ActionTuple()
                # 测试时让16个agent向四个方向随机移动 使用中应改为算法提供的action
                ac = np.random.randint(0, 5, size=16).reshape(-1, 1)
                actions.add_discrete(ac)

                env.set_actions(name, actions)
                env.step()
            count += 1
        env.close()

UnityStaticLogChannel类是用于unity前端和python算法端旁路通信的类,文档在Custom-SideChannels,这里给出一个示例:

class UnityStaticLogChannel(SideChannel):

    def __init__(self) -> None:
        super().__init__(uuid.UUID("a1d8f7b7-cec8-50f9-b78b-d3e165a78520"))

    def on_message_received(self, msg: IncomingMessage) -> None:
        """
        Note: We must implement this method of the SideChannel interface to
        receive messages from Unity
        """
        # We simply read a string from the message and print it.
        print(msg.read_string())

    def send_string(self, data: str) -> None:
        # Add the string to an OutgoingMessage
        msg = OutgoingMessage()
        msg.write_string(data)
        # We call this method to queue the data we want to send
        super().queue_message_to_send(msg)

unity端的代码可以参考官方给出的sample,这里就不多加赘述了。Learning-Environment-Examples

运行

运行比较简单,只需要先启动python脚本然后点击editor里对应的游戏场景RUN就可以了。

你可能感兴趣的:(unity,python,python,机器学习,深度学习,unity,pygame)