机器学习 11 MDP variations

Lesson 18.

1. state-action rewards

2. finite horizon MDP

DP algorithm:

LQR: linearized quadratic regulation

你可能感兴趣的:(机器学习 11 MDP variations)