强化学习环境-Gym安装到使用入门

 

Gym是一个用于测试和比较强化学习算法的工具包,它不依赖强化学习算法结构,并且可以使用很多方法对它进行调用,像Tensorflow、Theano。

Gym库收集、解决了很多环境的测试过程中的问题,能够很好地使得你的强化学习算法得到很好的工作。并且含有游戏界面,能够帮助你去写更适用的算法。

安装

在开始安装之前,你需要安装Python 3.5+,简单安装的话,用pip安装就可以啦:

pip install gym

完成之后你就可以很好地去玩gym啦。

从github资源里面进行安装。

如果你喜欢的话,你也可以直接克隆github里面的资源进行安装。这种方法在你需要自己添加环境,或者修改环境的时候会比较有用。用下面的命令进行下载和安装。

git clone https://github.com/openai/gym
cd gym
pip install -e .

你之后可以运行以下命令去安装环境包含的所有游戏。

pip install -e .[all]

上述命令要求一些独立的库,如cmake和新版本的pip。

环境

这里有以下小例子来跑一些gym包含的游戏环境,下面的例子是将 CartPole-v0这个环境迭代了1000次。在每次迭代的时候都会将环境初始化。运行之后你将会看到一个经典的倒立摆小车问题。

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
    env.render()
    env.step(env.action_space.sample()) # take a random action

如果你喜欢的话,你可以尝试一些新的环境,像MountainCar-v0, MsPacman-v0 (要求 Atari dependency),或者Hopper-v1(要求MuJoCo)环境都在Env下面。

如果你缺少一些环境的依赖关系的话,你将会得到一个有用的错误信息,这个错误信息会告诉你是否是这个依赖关系导致的,并且会指导你如何去修复这个依赖关系。安装需要的依赖关系是很简单的。如果你需要去跑Hopper-v1的话,你将会需要一个MuJoCo license

观测

如果我们想要在与gym环境迭代的过程中采取更好的动作的话,你就会需要知道我们的动作是如何在环境中进行交互的。

与环境交互过程中,环境返回的值就是我们所需要的,实际上,每一步环境都会返回四个值:

observation (object):一个特定的环境对象,代表了你从环境中得到的观测值,例如从摄像头获得的像素数据,机器人的关节角度和关节速度,或者棋盘游戏的棋盘。

reward (float):由于之前采取的动作所获得的大量奖励,与环境交互的过程中,奖励值的规模会发生变化,但是总体的目标一直都是使得总奖励最大。

done (boolean):决定是否将环境初始化,大多数,但不是所有的任务都被定义好了什么情况该结束这个回合。(举个例子,这个倒立摆的小车离开地太远了就结束了这个回合)

info (dict):调试过程中将会产生的有用信息,有时它会对我们的强化学习学习过程很有用(例如,有时它会包含最后一个状态改变后的原始概率),然而在评估你的智能体的时候你是不会用到这些信息去驱动你的智能体学习的。

一个经典的强化学习智能体与环境交互的过程可以被描述成如下方式:每次迭代,智能体选择一个动作,这个动作输入到环境中去,智能体会得到下一个观测值(也就是下一个状态)和奖励。

程序的开始被叫做reset(),它会返回一个初始的观测值,一个合适的方式编写代码如下所示:

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

空间

下面是一个例子,我们从环境的动作空间中随机选取一些动作,但是实际的动作是这些动作吗?每个环境独有一个动作空间和一个状态空间。这就是空间的属性,他们有效地描述了动作和状态。

import gym
env = gym.make('CartPole-v0')
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(4,)

离散的空间允许一个固定的非负范围的数字,因此,在这种情况下,有效的动作是0或者1。BOX空间是一个n维的box,因此,一个有效的状态空间将会是四个数字的array。我们可以通过以下方式查看动作边界范围。

print(env.observation_space.high)
#> array([ 2.4       ,         inf,  0.20943951,         inf])
print(env.observation_space.low)
#> array([-2.4       ,        -inf, -0.20943951,        -inf])

This introspection can be helpful to write generic code that works for many different environments. Box and Discrete are the most common Spaces. You can sample from a Space or check that something belongs to it:

这种方式可以很有效地帮助我们编写不同环境下的代码,Box和离散的space是最常见的space。你可以从space中进行采样,或者查看一些信息:

from gym import spaces
space = spaces.Discrete(8) # Set with 8 elements {0, 1, 2, ..., 7}
x = space.sample()
assert space.contains(x)
assert space.n == 8

可用的环境

Gym配有多种多样的环境,从难到易包含各种数据。你可以大致浏览以下full list of environments。

我自己也总结了一份:深度强化学习中实验环境-开源平台框架汇总

Classic control and toy text: 完整的小规模的任务,大多数来自RL的文献。

Algorithmic: 执行计算例如加减计算,和反转序列等。人们一般都认为这些任务对于计算机来说是相对比较容易的。

Atari: 玩经典的Atari游戏。我们有完整的Arcade Learning Environment(它在强化学习研究领域有很大的影响力)

windows下你可以通过以下命令安装Atari:

pip install --no-index -f https://github.com/Kojoley/atari-py/releases atari_py

2D and 3D robots: 控制机器人仿真。这些任务都是用MuJoCo 物理引擎,它是被设计用来进行更快,更准确的机器人仿真。包含了一些来自 UC Berkeley研究人员的环境 benchmark 。(who incidentally will be joining us this summer). MuJoCo is proprietary software, but offers free trial licenses.

The registry

gym’s main purpose is to provide a large collection of environments that expose a common interface and are versioned to allow for comparisons. To list the environments available in your installation, just ask gym.envs.registry:

from gym import envs
print(envs.registry.all())
#> [EnvSpec(DoubleDunk-v0), EnvSpec(InvertedDoublePendulum-v0), EnvSpec(BeamRider-v0), EnvSpec(Phoenix-ram-v0), EnvSpec(Asterix-v0), EnvSpec(TimePilot-v0), EnvSpec(Alien-v0), EnvSpec(Robotank-ram-v0), EnvSpec(CartPole-v0), EnvSpec(Berzerk-v0), EnvSpec(Berzerk-ram-v0), EnvSpec(Gopher-ram-v0), ...

This will give you a list of EnvSpec objects. These define parameters for a particular task, including the number of trials to run and the maximum number of steps. For example, EnvSpec(Hopper-v1) defines an environment where the goal is to get a 2D simulated robot to hop; EnvSpec(Go9x9-v0) defines a Go game on a 9x9 board.

These environment IDs are treated as opaque strings. In order to ensure valid comparisons for the future, environments will never be changed in a fashion that affects performance, only replaced by newer versions. We currently suffix each environment with a v0 so that future replacements can naturally be called v1, v2, etc.

It’s very easy to add your own enviromments to the registry, and thus make them available for gym.make(): just register() them at load time.

Background: Why Gym? (2016)

Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:

  • RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot’s motors so that it’s able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. RL can even be applied to supervised learning problems with sequential or structured outputs.
  • RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind’s Atari results, BRETT from Pieter Abbeel’s group, and AlphaGo all used deep RL algorithms which did not make too many assumptions about their environment, and thus can be applied in other settings.

However, RL research is also slowed down by two factors:

  • The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.
  • Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

Gym is an attempt to fix both problems.

我的微信公众号名称:深度学习与先进智能决策
微信公众号ID:MultiAgent1024
公众号介绍:主要研究强化学习、计算机视觉、深度学习、机器学习等相关内容,分享学习过程中的学习笔记和心得!期待您的关注,欢迎一起学习交流进步!

 

你可能感兴趣的:(装包装库装系统,强化学习)