RL | DQN

Catalogue

  • DQN Framework
  • Application
    • 1.1 Cartpole Introduction
    • 1.2 Code
    • 1.3 Result
  • Reference

DQN Framework

RL | DQN_第1张图片

  1. The agent interacts with the environment to generate next state, reward and termination information, which will be stored in a replay buffer.

Agent与环境交互,产生下一个状态、奖励和终止等信息,并将这些信息存储在回放缓冲区中。

  1. Sample from the buffer, calculate the loss and optimize the model.

从缓冲区采样,计算损耗并优化模型

Application

1.1 Cartpole Introduction

RL | DQN_第2张图片

  • action spaces: left or right

动作空间:向左或者向右

  • state spaces:
    • position of the cart on the track (小车在轨的位置)
    • angle of the pole with the vertical (杆与竖直方向的夹角)
    • cart velocity (小车速度)
    • rate of change of the angle (角度变化率)
  • tips
    • the reward boundary of cartpole-v0 is 200, and that of cartpole-v1 is 500.

cartpole-v0的奖励边界是200,cartpole-v1的奖励边界是500。

1.2 Code

  • Github

1.3 Result

  • episode reward
    RL | DQN_第3张图片
  • mean reward
    RL | DQN_第4张图片

Reference

  • 150行代码实现DQN算法玩CartPole
  • Introduction to Reinforcement Learning
  • [动手学强化学习] 2.DQN解决CartPole-v0问题
  • OpenAI Gym 经典控制环境介绍——CartPole(倒立摆)

你可能感兴趣的:(RL,RL,DQN,CartPole,Gym,强化学习)