【GAE】《High-Dimensional Continuous Control Using Generalized Advantage Estimation》译读笔记
High-DimensionalContinuousControlUsingGeneralizedAdvantageEstimation摘要Policygradientmethods在reinforcementlearning中是一种具有吸引力的方法,因为它们直接优化累积奖励,并且可以很直接地与非线性functionapproximators如neuralnetworks一起使用。其两个主要挑战是