Value-based vs Policy-based Reinforcement Learning

1. Policy-based Reinforcement Learning

Suppose we have a good policy \pi(a|s).

Upon observing the stats s_{t},

random sampling: a_{t}~\pi(.|s_{t}).

2. Policy-based Reinforcement Learning

Suppose we know the optimal action-value function Q^{*}(s,a).

Upon observe the state s_{t},

choose the action that maximizes the value:  a_{t} = argmax_{a}Q^{*}(s_{t},a).

3. Summary

The objective of Reinforcement Learning is to learn good policy \pi(a|s) and/or optimal action-value function Q^{*}(s,a).

你可能感兴趣的:(Reinforcement,Learning,强化学习)