强化学习-赵世钰(一):基本概念【state、action、state transition、policy、reward、return、trajectories、episode、Markov】
1.1AgridworldexampleConsideranexampleasshowninFigure1.2,wherearobotmovesinagridworld.Therobot,calledagent,canmoveacrossadjacentcellsinthegrid.Ateachtimestep,itcanonlyoccupyasinglecell.Thewhitecellsare