强化学习重点文献汇总

理论

文献名 引用信息 备注
Reinforcement learning: An introduction Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018. 入门书籍
Reinforcement Learning Wiering M A, Van Otterlo M. Reinforcement learning[J]. Adaptation, learning, and optimization, 2012, 12(3): 729. 入门书籍
Q-learning Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3): 279-292. Q-Learning算法的收敛性
Convergence of Q-learning: A simple proof Melo F S. Convergence of Q-learning: A simple proof[J]. Institute Of Systems and Robotics, Tech. Rep, 2001: 1-4. Q-Learning算法的收敛性
Human-level control through deep reinforcement learning Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533. 提出了DQN算法
Policy gradient methods for reinforcement learning with function approximatio Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in neural information processing systems. 2000: 1057-1063. 提出了Policy Gradient算法
Deterministic Policy Gradient Algorithms Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]//International conference on machine learning. PMLR, 2014: 387-395. 提出了DPG算法
Continuous control with deep reinforcement learning Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015. 提出了DDPG算法
Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems Matignon L, Laurent G J, Le Fort-Piat N. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31. 汇总了Multi-Agent RL相较于Single-Agent RL的难点
Multi-agent actor-critic for mixed cooperative-competitive environments Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30. 提出了MADDPG算法
Trust region policy optimization Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897. 提出了TRPO算法
Proximal policy optimization algorithms Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017. 提出了PPO算法

应用

3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach R. Ding, F. Gao and X. S. Shen, “3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach,” in IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7796-7809, Dec. 2020, doi: 10.1109/TWC.2020.3016024. DDPG算法应用无人机通信资源分配+路径规划

你可能感兴趣的:(强化学习,深度学习,机器学习,人工智能)