Gym小记(四)

Gym Box2D

        Gym为我们提供了各种各样的环境,其中对我最有用的是MuJoCo,但是这个要收费......

        所以,我就只能去用Box2D环境了,毕竟免费~现在对Gym Box2D环境进行一些简单的说明。

        首先,我们来了解一下什么是Box2D。Box2D是一个强大的开源物理游戏引擎,用来模拟2D刚体物体的运动和碰撞。我们需要记住的是两点:1.2D;2.刚体。在Box2D中集成了大量的物理力学和运动学的计算,并且将物理模拟过程封装到对象中,将对物体的操作,以简单友好的接口提供给开发者。我们只需要调用引擎中显影的对象或者函数,就可以模拟现实生活中的加速、减速、抛物线运动、万有引力、碰撞反弹等等各种真实的物理运动。

        那如何在Gym中使用Box2D呢?是不是可以简单地与前面一样直接env=gym.make("Env")就可以呢?

        答案是需要先安装......

        下面我们介绍一下安装Box2D的步骤:

        1)git clone https://github.com/pybox2d/pybox2d.git

        2)cd pybox2d

        3)python setup.py clean

        4)python setup.py install

        注意,如果我们以前按照某个教程安装Box2D没有安装成功,并且在/usr/local/lib/python2.7/dist-packages中有Box2D相关文件,请先删除,然后按照上面的指令进行安装,其中3)、4)两步可能需要sudo才行,具体视情况而定。

        装好之后import Box2D,没有报错则表示安装成功。下面我们通过示例展示一下Gym Box2D:

import gym
env = gym.make('LunarLander-v2')

print env.observation_space
print env.action_space

for i_episode in range(100):
	observation = env.reset()
	for t in range(100):
		env.render()
		print(observation)
		action = env.action_space.sample()
		observation, reward, done, info = env.step(action)
		if done:
			print("Episode finished after {} timesteps".format(t+1))
			break
        运行示例:

Gym小记(四)_第1张图片

        上面是对Gym Box2D中的LunarLander-v2环境进行了展示,该环境与另一个环境LunarLanderContinuous-v2定义在同一个.py文件中:lunar_lander.py。二者的区别是LunarLander-v2中的action是是离散的(包括do nothing、fire left orientation engine、fire main engine、fire right orientation engine),而LunarLanderContinuous-v2中的action则是连续的(由一个包含两个实值的向量来表示,一个值控制main engine,(-1,0)表示关闭,(0,+1)表示在50%~100%Power之间进行调节;另一个实值则控制left/right engine,(-1,-0.5)表示fire the left engine,(0.5,1)表示fire the right engine,(-0.5,0.5)表示off)。我们的目标是控制这里说的这三个引擎(main/left/right)来控制飞行器的降落。我们的智能体的目标是将飞行器停到指定的位置(两旗中间的位置),环境的具体介绍如下:

        1)LunarLander-v2

        "Landing pad is always at coordinates (0,0). Coordinates are the first two numbers in state vector.Reward for moving from the top of the screen to landing pad and zero speed is about 100..140 points.If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. Each leg ground contact is +10. Firing main engine is -0.3 points each frame. Solved is 200 points.Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.Four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine."

        2)LunarLanderContinuous-v2

        "Landing pad is always at coordinates (0,0). Coordinates are the first two numbers in state vector.Reward for moving from the top of the screen to landing pad and zero speed is about 100..140 points.If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. Each leg ground contact is +10. Firing main engine is -0.3 points each frame. Solved is 200 points.Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.Action is two real values vector from -1 to +1. First controls main engine, -1..0 off, 0..+1 throttle from 50% to 100% power. Engine can't work with less than 50% power. Second value -1.0..-0.5 fire left engine, +0.5..+1.0 fire right engine, -0.5..0.5 off."

        我们通过以下命令可以看一下比较好的示范:

cd ~/gym
python gym/env/box2d/lunar_lander.py
        或者我们自己玩这个游戏(keboard_agent仅仅支持discrete action space环境):
cd ~/gym
python examples/agents/keyboard_agent.py LunarLander-v2

        我没太搞懂这个怎么玩......

        对了,也可以去Wiki上看这个游戏的具体说明:Lunar Lander。

        因为我想要找一个以raw image作为输入的env,所以对于这个LunarLander环境也就不是很关注,该环境的state如下:

state = [
            (pos.x - VIEWPORT_W/SCALE/2) / (VIEWPORT_W/SCALE/2),
            (pos.y - (self.helipad_y+LEG_DOWN/SCALE)) / (VIEWPORT_W/SCALE/2),
            vel.x*(VIEWPORT_W/SCALE/2)/FPS,
            vel.y*(VIEWPORT_H/SCALE/2)/FPS,
            self.lander.angle,
            20.0*self.lander.angularVelocity/FPS,
            1.0 if self.legs[0].ground_contact else 0.0,
            1.0 if self.legs[1].ground_contact else 0.0
]
        并且最后return为:
return np.array(state), reward, done, {}

        也就是说observation等于state,也是上面的长度为8个的向量,为low-dimension的observation输入,所以这里就不仔细考察了。

       

        下一次我们将尝试去寻找一个以raw image作为observation输入的环境进行分析,下回再见~




你可能感兴趣的:(OpenAI,Gym学习笔记)