LEARNING GOAL-CONDITIONED VALUE FUNCTIONS WITH ONE-STEP PATH REWARDS RATHER THAN GOAL- REWARDS
学习目标条件的价值功能与一步走的路径奖励比目标奖励更多ABSTRACTMulti-goalreinforcementlearning(MGRL)addressestaskswherethedesiredgoalstatecanchangeforeverytrial.State-of-the-artalgorithmsmodeltheseproblemssuchthattherewardformul