经典Q-learning讲解

本文转载,很经典:

Diving deeper into Reinforcement Learning with Q-Learning

1、Q-learning

Step 1: We init our

Q-table

经典Q-learning讲解_第1张图片

The initialized Q-table

Step 2: Choose an action
From the starting position, you can choose between going right or down. Because we have a big epsilon rate (since we don’t know anything about the environment yet), we choose randomly. For example… move right.

经典Q-learning讲解_第2张图片

经典Q-learning讲解_第3张图片

We move at random (for instance, right)

We found a piece of cheese (+1), and we can now update the Q-value of being at start and going right. We do this by using the Bellman equation.

Steps 4–5: Update the Q-function

经典Q-learning讲解_第4张图片经典Q-learning讲解_第5张图片

  • First, we calculate the change in Q value ΔQ(start, right)
  • Then we add the initial Q value to the ΔQ(start, right) multiplied by a learning rate.

Think of the learning rate as a way of how quickly a network abandons the former value for the new. If the learning rate is 1, the new estimate will be the new Q-value.

经典Q-learning讲解_第6张图片

The updated Q-table

Good! We’ve just updated our first Q value. Now we need to do that again and again until the learning is stopped.

2、Bellman 方程的解释

马尔科夫决策过程之Bellman Equation(贝尔曼方程) - 知乎

经典Q-learning讲解_第7张图片

你可能感兴趣的:(强化学习,强化学习,Q-learning)