增强学习四要素

增强学习有四个要素:policy, reward signal, value function and model of the environment.

1.Policy

policy定义了在给定时间点,对环境(situation)将做出如何的行为。( a policy defines the learning agent's way of the behaving at a given time).

2.Reward Signal

reward signal定义了在增强学习过程中的目标(goal)(a reward signal defines the goal in a reinforcement learning problem)。我们的学习目标就是要maximize the total reward。

3. Value Function

value function定义了长期来看的reward(a value function specifies what is good in the long run)。举个例子,agent可能选择一个暂时low的reward,但是在那个时间段内,总体的reward比较大。value function可以看作是对未来reward的estimate,是增强学习算法中核心的部分。

4. Model of the environment

model of the environment定义了环境因agent的action如何变化(the model of the environment is something that mimics the behavior of the environment, or more generally,that allows inferences to be made about how the environment will behavior)。 

你可能感兴趣的:(增强学习四要素)