强化学习理论学习资料

推荐书籍

machine learning and learning theory books

1. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018. 2
2. Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

reinforcement learning books

1. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. 4
2. Dimitri P Bertsekas and John N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. (非常重要)

approximate dynamic programming

1. Remi Munos. Introduction to Reinforcement Learning and multi-armed bandits. NETADIS Summer School, 2013.

论文

  • Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
  • Dimitri P Bertsekas and John N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
  • Ronald A Howard. Dynamic Programming and Markov Processes. MIT Press, 1960.
  • Alessandro Lazaric, Mohammad Ghavamzadeh, and R´emi Munos. “Finite-sample analysis of least-squares policy iteration”. In: The Journal of Machine Learning Research 13 (2012), pp. 3041–3074.
  • Odalric-Ambrym Maillard et al. “Finite-sample analysis of Bellman residual minimization”. In: Asian Conference on Machine Learning (ACML). 2010, pp. 299–314.
  • Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.
  • R´emi Munos and Csaba Szepesv´ari. “Finite-time bounds for fitted value iteration”. In: Journal of Machine Learning Research 9 (2008), pp. 815–857.
  • Remi Munos. ´ Introduction to Reinforcement Learning and multi-armed bandits. NETADIS Summer School, 2013
  • Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  • Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
  • Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
  • Richard S Sutton et al. “Policy gradient methods for reinforcement learning with function approximation”. In: Advances in Neural Information Processing Systems (NeurIPS). 1999, pp. 1057–1063.
  • Leslie G Valiant. “A theory of the learnable”. In: Communications of the ACM 27.11 (1984), pp. 1134–1142.
  • Christopher John Cornish Hellaby Watkins. “Learning From Delayed Rewards”. PhD Thesis. University of Cambridge, 1989.
  • Ronald J. Williams and Leemon C. Baird III. Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep. NU-CCS-93-14, College of Computer Science, Northeastern University. 1993.
  • Ronald J Williams. “Simple statistical gradient-following algorithms for connectionist reinforcement learning”. In: Machine learning 8.3-4 (1992).
  • Shuang Wu and Jun Wang. Decision making and AI: a white paper. 2020.
  • Pan Xu and Quanquan Gu. “A finite-time analysis of q-learning with neural network function approximation”. In: arXiv preprint arXiv:1912.04511 (2019).
  • Zhuoran Yang, Yuchen Xie, and Zhaoran Wang. “A theoretical analysis of deep Q-learning”. In: arXiv preprint arXiv:1901.00137 (2019).

你可能感兴趣的:(RL进阶原理)