基于DQN强化学习的高速路决策控制

基于DQN强化学习的高速路决策控制

依赖包

gym == 0.21.0
stable-baselines3 == 1.6.2
highway-env == 1.5

环境测试

highway-env环境介绍:highway-env

import gym
import highway_env


# Create environment
env = gym.make('highway-fast-v0')

eposides = 10
rewards = 0
for eq in range(eposides):
    obs = env.reset()
    done = False
    while not done:
        action = env.action_space.sample()
        obs, reward, done, info = env.step(action)
        env.render()
        rewards += reward
print(rewards/eposides)

目标车辆随机选取动作,测试视频如下highway_fast_test:,由视频可知,随机选取的动作的平均奖励(Reward)为:9.800666098251863

DQN决策控制研究

模型训练

采用DQN算法进行目标车辆的决策控制,模型训练代码如下:

import gym
import highway_env
from stable_baselines3 import DQN


# Create environment
env = gym.make("highway-fast-v0")

model = DQN('MlpPolicy',
            env,
            policy_kwargs=dict(net_arch=[256, 256]),
            learning_rate=5e-4,
            buffer_size=15000,
            learning_starts=200,
            batch_size=32,
            gamma=0.8,
            train_freq=1,
            gradient_steps=1,
            target_update_interval=50,
            verbose=1,
            tensorboard_log="./logs")

model.learn(int(2e4))
model.save("highway_dqn_model")

模型测试

import gym
import highway_env
from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy


# Create environment
env = gym.make("highway-fast-v0")

# load model
model = DQN.load("highway_dqn_model", env=env)

mean_reward, std_reward = evaluate_policy(
    model,
    model.get_env(),
    deterministic=True,
    render=True,
    n_eval_episodes=10)

print(mean_reward)

模型测试视频如下highway_fast_valid:,可知训练后平均奖励为:18.4022157

后记

stable-baseline3: 手册
gym: 手册
highway-env: 手册

你可能感兴趣的:(python,人工智能,深度学习,自动驾驶)