Reinforcement Learning学习笔记|Q-learning算法

Q-learning algorithm: learning the Action Value Function

The Action Value Function takes two inputs:state and action,it returns the expected future reward of that action at that state.

 Reinforcement Learning学习笔记|Q-learning算法_第1张图片


Before exploring,the Q-table gives the same arbitrary fixed value(most of time 0).As we explore the environment,the Q-table will give us a better and better approximation by iteratively updating Q(s,a) using Bellman Equation.

Reinforcement Learning学习笔记|Q-learning算法_第2张图片

Reinforcement Learning学习笔记|Q-learning算法_第3张图片

Step1: Initialize Q-values

We build a Q-table, with m cols (m= number of actions), and n rows (n = number of states). We initialize the values at 0.

 Step 2: For life (or until learning is stopped)

{步骤3-5 会持续进行,直到循环次数停止,或者手动停止

Step 3: Choose an action


对于开始(All of values are 0),我们用到了探索\利用权衡的方法:




   Reinforcement Learning学习笔记|Q-learning算法_第4张图片



Steps 4–5: Evaluate!

执行动作action a(上一步得到),观察结果state s 和 reward r。用Bellman公式更新Q function表格。

 Reinforcement Learning学习笔记|Q-learning算法_第5张图片





