强化学习点滴- model-free vs. model-based; on-policy vs. off-policy
Model-free:不需要知道状态之间的转移概率(transitionprobability),仅仅依赖agent和environment进行实时的交互。并不一定使用当前策略产生的样本。Model-freemethodattemptstolearntheoptimalpolicyinONEstep,suchasQ-learning,whichlearnstheoptimalpolicyinthe