强化学习案例复现(2)--- MountainCar基于DQN

1.搭建环境

 

import gym
 
# Create environment
env = gym.make("MountainCar-v0")
 
eposides = 10
for eq in range(eposides):
    obs = env.reset()
    done = False
    rewards = 0
    while not done:
        action = env.action_space.sample()
        obs, reward, done, action, info = env.step(action)
        env.render()
        rewards += reward
    print(rewards)

 

2.训练模型

 

      本文搭建的环境返回值为4个,可直接用该DQN模型训练,但在上一期基于Q-learning的文章中,自我搭建的环境中step有5个返回值,无法在该DQN的模型中运行,后面如果想用5个返回值,该代码要重写。

import gym
from stable_baselines3 import DQN


# Create environment
env = gym.make("MountainCar-v0")

model = DQN(
    "MlpPolicy",
    env,
    verbose=1,
    learning_rate=5e-4)

# Train the agent and display a progress bar
model.learn(
    total_timesteps=int(1.5e6),
    progress_bar=True)

# Save the agent
model.save("DQN_MountainCar")

 

3.测试模型

import gym
from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy

def test_something(): #帮助pytest找到测试案例,无该两行代码,程序无法运行
    assert 1 == 1
# Create environment
env = gym.make("MountainCar-v0", render_mode="human") # render_mode="human"在显示屏幕上显示

# load model
model = DQN.load("DQN_MountainCar", env=env)

mean_reward, std_reward = evaluate_policy(
    model,
    model.get_env(),

    deterministic=True,
    render=True,
    n_eval_episodes=10)
print(mean_reward)

 

4.运行过程中遇到的问题

 

1.直接用 pip install stable-baselines3[extra],安装失败,换成以下命令安装

pip install stable-baselines3[extra] -i https://pypi.tuna.tsinghua.edu.cn/simple some-package 

 

2.源代码为了测试方便,测试过程并不显示,代码为

 env = gym.make("MountainCar-v0")

笔者希望可以显示测试过程,代码改为

env = gym.make("MountainCar-v0", render_mode="human")

 

往期文献

1.强化学习案例复现(1)--- MountainCar基于Q-learning_笑傲江湖2023的博客-CSDN博客

 

参考文献

1.Stable-Baselines3安装_下载stable-baselines3==1.2.0_cclbl的博客-CSDN博客

2.

基于强化学习的gym Mountain Car稳定控制_mountaincar环境-CSDN博客

 

你可能感兴趣的:(python,开发语言)