Q-learning实现简单的Gym游戏

Q-learning实现简单的Gym游戏

Gym是为测试和开发RL算法而设计的环境/任务的集合。它让用户不必再创建复杂的环境。Gym用Python编写,它有很多的环境,比如机器人模拟或Atari 游戏。这里以一个基础的出租车游戏为例,示范Gym的使用方法,以及基本的Q-learning的实现

1.创建环境

import gym
import numpy as np

env = gym.make("Taxi-v3") #创建出租车游戏环境
state = env.reset() #初始化环境
envspace = env.observation_space.n #状态空间的大小
actspace = env.action_space.n #动作空间的大小

2.在不使用Q-learning时虽然也能够通过随机尝试获得满意的结果,但是往往需要很多步才能做对一次

# 随机动作
conter = 0
reward = None
while reward!=20:
    state, reward, done, info = env.step(env.action_space.sample())
    conter = conter +1
    print(reward)
    print(done)
print(conter)

3.建立一个Q表,将经验存储在Q表中

# Q-learning
Q = np.zeros([envspace,actspace]) #创建一个Q-table

alpha = 0.5 #学习率
for episode in range(1,2000):
    done = False
    reward = 0 #瞬时reward
    R_cum = 0 #累计reward
    state = env.reset() #状态初始化
    while done != True:
        action = np.argmax(Q[state])
        state2,reward,done,info = env.step(action)
        Q[state,action] += alpha*(reward+np.max(Q[state2])-Q[state,action])
        R_cum +=reward
        state = state2
        # env.render()
    if episode % 50 == 0:
        print('episode:{};total reward:{}'.format(episode,R_cum))

print('The Q table is:{}'.format(Q)) 

# 测试阶段
conter = 0
reward = None
while conter<200:
    action = np.argmax(Q[state])
    state, reward, done, info = env.step(action)
    conter = conter +1
    # env.render()
    print(reward)

参考来源:https://www.sohu.com/a/197847451_633700

你可能感兴趣的:(python,强化学习)