[Chapter 6] Reinforcement Learning (4) Policy Search
Intheprevioussections,wetrytolearntheutilityfunction,ormoreusually,theaction-valuefunctionsandgreedilyselecttheactionwiththehighestQ-value:ThismeansthatoncewehavelearnttheQ-functionwell,wecangetanopti