Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO
https://statweb.stanford.edu/~owen/mc/Ch-var-is.pdfhttps://zhuanlan.zhihu.com/p/29934206bluecurveisthelowerboundedoneconjugategradienttosolvetheoptimizationproblem.Fisherinformationmatrix,naturalpolic