on and off policy强化学习

强化学习入门

  • 强化学习 on and off policy 即Q-learning Or SARSA

强化学习 on and off policy 即Q-learning Or SARSA

Created with Raphaël 2.2.0 开始学习 using Policy such as e greedy, State S, Action A Get Reward R next state, S' from env , on or off policy ? Q_target = Reward + gamma* Max(Q(S')) Q_table[S,A] += alpha* (Q_target - Q_table[S,A]) end Q_target = Reward + gamma* Policy(Q(S')) yes no no

你可能感兴趣的:(on and off policy强化学习)