LyzBlog

最全深度强化学习资料

转载 J.Q.Wang2011 -----深度强化学习系列: 最全深度强化学习资料
下面附上原地址 https://blog.csdn.net/gsww404/article/details/103074046

关于这项工作:

本工作是一项由深度强化学习实验室(Deep Reinforcement Learning Laboratory, DeepRL-Lab)发起的项目。
文章同步于Github仓库：
https://github.com/NeuronDance/DeepRL/tree/master/A-Guide-Resource-For-DeepRL(点击进入GitHub)
欢迎大家Star, Fork和Contribution.

1. Books
2. Courses
3. Survey-and-Frontier
4. Environment-and-Framework
5. Baselines-and-Benchmarks
6. Algorithm
7. Applications
8. Advanced-Topics
9. Relate-Coureses
10. Multi-Agents
11. Paper-Resources

#1. Books

Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto (2017),Chinese-Edtion, Code
Algorithms for Reinforcement Learning by Csaba Szepesvari (updated 2019)
Deep Reinforcement Learning Hands-On by Maxim Lapan (2018),Code
Reinforcement learning, State-Of-The- Art by Marco Wiering, Martijin van Otterlo
Deep Reinforcement Learning in Action by Alexander Zai and Brandon Brown (in progress)
Grokking Deep Reinforcement Learning by Miguel Morales (in progress)
Multi-Agent Machine Learning A Reinforcement Approach【百度云链接】 by Howard M.Schwartz(2017)
强化学习在阿里的技术演进与业务创新 by Alibaba Group
Hands-On Reinforcement Learning with Python(百度云链接)
Reinforcement Learning And Optimal Control by Dimitri P. Bertsekas, 2019

#2. Courses

UCL Course on RL(★★★) by David Sliver, Video-en,Video-zh
OpenAI’s Spinning Up in Deep RL by OpenAI(2018)
Udacity-Deep Reinforcement learning, 2019-10-31
Stanford CS-234: Reinforcement Learning (2019), Videos
DeepMind Advanced Deep Learning & Reinforcement Learning (2018),Videos
GeorgiaTech CS-8803 Deep Reinforcement Learning (2018?)
UC Berkeley CS294-112 Deep Reinforcement Learning (2018 Fall),Video-zh
Deep RL Bootcamp by Berkeley CA(2017)
Thomas Simonini’s Deep Reinforcement Learning Course
CS-6101 Deep Reinforcement Learning , NUS SoC, 2018/2019, Semester II
Course on Reinforcement Learning by Alessandro Lazaric，2018
Learn Deep Reinforcement Learning in 60 days

#3. Survey-and-Frontier

Deep Reinforcement Learning by Yuxi Li
Algorithms for Reinforcement Learning by Morgan & Claypool, 2009
Modern Deep Reinforcement Learning Algorithms by Sergey Ivanov(54-Page)
Deep Reinforcement Learning: An Overview (2018)
A Brief Survey of Deep Reinforcement Learning (2017)
Deep Reinforcement Learning Doesn’t Work Yet（★） by Irpan, Alex(2018), ChineseVersion
Deep Reinforcement Learning that Matters(★) by Peter Henderson1, Riashat Islam1
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
An Introduction to Deep Reinforcement Learning
Challenges of Real-World Reinforcement Learning
Topics in Reinforcement Learning
Reinforcement Learning: A Survey,1996.
A Tutorial Survey of Reinforcement Learning, Sadhana,1994.
Reinforcement Learning in Robotics, A Survey, 2013
A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation., 2018
Universal Reinforcement Learning Algorithms: Survey and Experiments,2017
Bayesian Reinforcement Learning: A Survey, 2016
Benchmarking Reinforcement Learning Algorithms on Real-World Robots

#4. Environment-and-Framework

OpenAI Gym (GitHub) (docs)
rllab (GitHub) (readthedocs)
Ray (Doc)
Dopamine: https://github.com/google/dopamine (uses some tensorflow)
trfl: https://github.com/deepmind/trfl (uses tensorflow)
ChainerRL (GitHub) (API: Python)
Surreal GitHub (API: Python) (support: Stanford Vision and Learning Lab).Paper
PyMARL GitHub (support: http://whirl.cs.ox.ac.uk/)
TF-Agents: https://github.com/tensorflow/agents (uses tensorflow)
TensorForce (GitHub) (uses tensorflow)
RL-Glue (Google Code Archive) (API: C/C++, Java, Matlab, Python, Lisp) (support: Alberta)
MAgent https://github.com/geek-ai/MAgent (uses tensorflow)
RLlib http://ray.readthedocs.io/en/latest/rllib.html (API: Python)
http://burlap.cs.brown.edu/ (API: Java)
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
robotics-rl-srl - S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics
pysc2: StarCraft II Learning Environment
Arcade-Learning-Environment
OpenAI universe - A software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications
DeepMind Lab - A customisable 3D platform for agent-based AI research
Project Malmo - A platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
Retro Learning Environment - An AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
torch-twrl - A package that enables reinforcement learning in Torch by Twitter
UETorch - A Torch plugin for Unreal Engine 4 by Facebook
TorchCraft - Connecting Torch to StarCraft
rllab - A framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym
TensorForce - Practical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
OpenAI lab - An experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
keras-rl - State-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
BURLAP - Brown-UMBC Reinforcement Learning and Planning, a library written in Java
MAgent - A Platform for Many-agent Reinforcement Learning.
Ray RLlib - Ray RLlib is a reinforcement learning library that aims to provide both performance and composability.
SLM Lab - A research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.
Unity ML Agents - Create reinforcement learning environments using the Unity Editor
Intel Coach - Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.
ELF - An End-To-End, Lightweight and Flexible Platform for Game Research
Unity ML-Agents Toolkit
rlkit
https://gym.openai.com/envs/#classic_control
https://github.com/erlerobot/gym-gazebo
https://github.com/robotology/gym-ignition
https://github.com/dartsim/gym-dart
https://github.com/Roboy/gym-roboy
https://github.com/openai/retro
https://github.com/openai/gym-soccer
https://github.com/duckietown/gym-duckietown
https://github.com/Unity-Technologies/ml-agents (Unity, multiagent)
https://github.com/koulanurag/ma-gym (multiagent)
https://github.com/ucuapps/modelicagym
https://github.com/mwydmuch/ViZDoom
https://github.com/benelot/pybullet-gym
https://github.com/Healthcare-Robotics/assistive-gym
https://github.com/Microsoft/malmo
https://github.com/nadavbh12/Retro-Learning-Environment
https://github.com/twitter/torch-twrl
https://github.com/arex18/rocket-lander
https://github.com/ppaquette/gym-doom
https://github.com/thedimlebowski/Trading-Gym
https://github.com/Phylliade/awesome-openai-gym-environments
https://github.com/deepmind/pysc2 (by DeepMind) (Blizzard StarCraft II Learning Environment (SC2LE) component)

#5. Baselines-and-Benchmarks

https://github.com/openai/baselines 【stalbe-baseline】
rl-baselines-zoo
ROBEL (google-research/robel)
RLBench (stepjam/RLBench)
https://martin-thoma.com/sota/#reinforcment-learning
https://github.com/rlworkgroup/garage
Atari Environments Scores

#6. Algorithms

1. DQN serial

Playing Atari with Deep Reinforcement Learning [arxiv] [code]
Deep Reinforcement Learning with Double Q-learning [arxiv] [code]
Dueling Network Architectures for Deep Reinforcement Learning [arxiv] [code]
Prioritized Experience Replay [arxiv] [code]
Noisy Networks for Exploration [arxiv] [code]
A Distributional Perspective on Reinforcement Learning [arxiv] [code]
Rainbow: Combining Improvements in Deep Reinforcement Learning [arxiv] [code]

2. Others

Algorithm Codeing

Deep-Reinforcement-Learning-Algorithms-with-PyTorch

#7. Applications

7.1 Basic

Reinforcement Learning Applications
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control by Hua Wei，Guanjie Zheng(2018)
Deep Reinforcement Learning by Yuxi Li, 2018
Deep Reinforcement Learning in Robotics

7.2 Robotics

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]

#8. Advanced-Topics

8.1. Model-free RL

playing atari with deep reinforcement learning NIPS Deep Learning Workshop 2013. paper

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Human-level control through deep reinforcement learning Nature 2015. paper

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis
Deep Reinforcement Learning with Double Q-learning AAAI 16. paper

Hado van Hasselt, Arthur Guez, David Silver
Dueling Network Architectures for Deep Reinforcement Learning ICML16. paper

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
Deep Recurrent Q-Learning for Partially Observable MDPs AAA15. paper

Matthew Hausknecht, Peter Stone
Prioritized Experience Replay ICLR 2016. paper

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
Asynchronous Methods for Deep Reinforcement Learning ICML2016. paper

Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
A Distributional Perspective on Reinforcement Learning ICML2017. paper

Marc G. Bellemare, Will Dabney, Rémi Munos
Noisy Networks for Exploration ICLR2018. paper

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
Rainbow: Combining Improvements in Deep Reinforcement Learning AAAI2018. paper

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

8.2. Model-based RL

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion NIPS2018. paper

Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning ICML2018.paper

Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine
Value Prediction Network NIPS2017. paper

Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine
Imagination-Augmented Agents for Deep Reinforcement Learning NIPS2017. paper

Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
Continuous Deep Q-Learning with Model-based Acceleration ICML2016. paper

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning CoRL2017. paper

Gabriel Kalweit, Joschka Boedecker
Model-Ensemble Trust-Region Policy Optimization ICLR2018. paper

Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models NIPS2018. paper

Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine
Dyna, an integrated architecture for learning, planning, and reacting ACM1991. paper

Sutton, Richard S
Learning Continuous Control Policies by Stochastic Value Gradients NIPS 2015. paper

Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez
Imagination-Augmented Agents for Deep Reinforcement Learning NIPS 2017. paper

Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks ICLR 2017. paper

Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft

8.3 Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)

Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]

8.4 Policy Search/Policy Gradient

Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
Natural Actor-Critic, ECML, 2005. [Paper]
Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
Relative Entropy Policy Search, AAAI, 2010. [Paper]
Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]

8.5 Hierarchical RL

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]

8.6 Inverse RL

updating…

8.7 Meta RL

updating…

8.8. Rewards

Deep Reinforcement Learning Models: Tips & Tricks for Writing Reward Functions
Meta Reward Learning

8.9. Policy Gradient

Policy Gradient

8.10. Distributed Reinforcement Learning

Asynchronous Methods for Deep Reinforcement Learning by ICML 2016.paper
GA3C: GPU-based A3C for Deep Reinforcement Learning by Iuri Frosio, Stephen Tyree, NIPS 2016
Distributed Prioritized Experience Replay by Dan Horgan, John Quan, David Budden,ICLR 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures by Lasse Espeholt, Hubert Soyer, Remi Munos ,ICML 2018
Distributed Distributional Deterministic Policy Gradients by Gabriel Barth-Maron, Matthew W. Hoffman, ICLR 2018.
Emergence of Locomotion Behaviours in Rich Environments by Nicolas Heess, Dhruva TB, Srinivasan Sriram, 2017
GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning by Jacky Liang, Viktor Makoviychuk, 2018
Recurrent Experience Replay in Distributed Reinforcement Learning bySteven Kapturowski, Georg Ostrovski, ICLR 2019.

#9. Relate-Coureses

9.1. Game Theory

Game Theory Course, Yale University
Game Theory - The Full Course, Stanford University
Algorithmic Game Theory (CS364A, Fall 2013) , Stanford University

9.2. other

…

#10. Multi-Agents

10.1 Tutorial and Books

Deep Multi-Agent Reinforcement Learning by Jakob N Foerster, 2018. PhD Thesis.
Multi-Agent Machine Learning: A Reinforcement Approach by H. M. Schwartz, 2014.
Multiagent Reinforcement Learning by Daan Bloembergen, Daniel Hennes, Michael Kaisers, Peter Vrancx. ECML, 2013.
Multiagent systems: Algorithmic, game-theoretic, and logical foundations by Shoham Y, Leyton-Brown K. Cambridge University Press, 2008.

10.2 Review Papers

A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems by Silva, Felipe Leno da; Costa, Anna Helena Reali. JAIR, 2019.
Autonomously Reusing Knowledge in Multiagent Reinforcement Learning by Silva, Felipe Leno da; Taylor, Matthew E.; Costa, Anna Helena Reali. IJCAI, 2018.
Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms by Castaneda A O. 2016.
Evolutionary Dynamics of Multi-Agent Learning: A Survey by Bloembergen, Daan, et al. JAIR, 2015.
Game theory and multi-agent reinforcement learning by Nowé A, Vrancx P, De Hauwere Y M. Reinforcement Learning. Springer Berlin Heidelberg, 2012.
Multi-agent reinforcement learning: An overview by Buşoniu L, Babuška R, De Schutter B. Innovations in multi-agent systems and applications-1. Springer Berlin Heidelberg, 2010
A comprehensive survey of multi-agent reinforcement learning by Busoniu L, Babuska R, De Schutter B. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 2008
If multi-agent learning is the answer, what is the question? by Shoham Y, Powers R, Grenager T. Artificial Intelligence, 2007.
From single-agent to multi-agent reinforcement learning: Foundational concepts and methods by Neto G. Learning theory course, 2005.
Evolutionary game theory and multi-agent reinforcement learning by Tuyls K, Nowé A. The Knowledge Engineering Review, 2005.
An Overview of Cooperative and Competitive Multiagent Learning by Pieter Jan ’t HoenKarl TuylsLiviu PanaitSean LukeJ. A. La Poutré. AAMAS’s workshop LAMAS, 2005.
Cooperative multi-agent learning: the state of the art by Liviu Panait and Sean Luke, 2005.

10.3 Framework papers

Mean Field Multi-Agent Reinforcement Learning by Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. ICML 2018.
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments by Lowe R, Wu Y, Tamar A, et al. arXiv, 2017.
Deep Decentralized Multi-task Multi-Agent RL under Partial Observability by Omidshafiei S, Pazis J, Amato C, et al. arXiv, 2017.
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games by Peng P, Yuan Q, Wen Y, et al. arXiv, 2017.
Robust Adversarial Reinforcement Learning by Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta. arXiv, 2017.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning by Foerster J, Nardelli N, Farquhar G, et al. arXiv, 2017.
Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer by Zhou L, Yang P, Chen C, et al. IEEE transactions on cybernetics, 2016.
Decentralised multi-agent reinforcement learning for dynamic and uncertain environments by Marinescu A, Dusparic I, Taylor A, et al. arXiv, 2014.
CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning by HolmesParker C, Taylor M E, Agogino A, et al. AAMAS, 2014.
Bayesian reinforcement learning for multiagent systems with state uncertainty by Amato C, Oliehoek F A. MSDM Workshop, 2013.
Multiagent learning: Basics, challenges, and prospects by Tuyls, Karl, and Gerhard Weiss. AI Magazine, 2012.
Classes of multiagent q-learning dynamics with epsilon-greedy exploration by Wunder M, Littman M L, Babes M. ICML, 2010.
Conditional random fields for multi-agent reinforcement learning by Zhang X, Aberdeen D, Vishwanathan S V N. ICML, 2007.
Multi-agent reinforcement learning using strategies and voting by Partalas, Ioannis, Ioannis Feneris, and Ioannis Vlahavas. ICTAI, 2007.
A reinforcement learning scheme for a partially-observable multi-agent game by Ishii S, Fujita H, Mitsutake M, et al. Machine Learning, 2005.
Asymmetric multiagent reinforcement learning by Könönen V. Web Intelligence and Agent Systems, 2004.
Adaptive policy gradient in multiagent learning by Banerjee B, Peng J. AAMAS, 2003.
Reinforcement learning to play an optimal Nash equilibrium in team Markov games by Wang X, Sandholm T. NIPS, 2002.
Multiagent learning using a variable learning rate by Michael Bowling and Manuela Veloso, 2002.
Value-function reinforcement learning in Markov game by Littman M L. Cognitive Systems Research, 2001.
Hierarchical multi-agent reinforcement learning by Makar, Rajbala, Sridhar Mahadevan, and Mohammad Ghavamzadeh. The fifth international conference on Autonomous agents, 2001.
An analysis of stochastic game theory for multiagent reinforcement learning by Michael Bowling and Manuela Veloso, 2000.

10.4 Joint action learning

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents by Conitzer V, Sandholm T. Machine Learning, 2007.
Extending Q-Learning to General Adaptive Multi-Agent Systems by Tesauro, Gerald. NIPS, 2003.
Multiagent reinforcement learning: theoretical framework and an algorithm. by Hu, Junling, and Michael P. Wellman. ICML, 1998.
The dynamics of reinforcement learning in cooperative multiagent systems by Claus C, Boutilier C. AAAI, 1998.
Markov games as a framework for multi-agent reinforcement learning by Littman, Michael L. ICML, 1994.

10.5 Cooperation and competition

Emergent complexity through multi-agent competition by Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, 2018.
Learning with opponent learning awareness by Jakob Foerster, Richard Y. Chen2, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018.
Multi-agent Reinforcement Learning in Sequential Social Dilemmas by Leibo J Z, Zambaldi V, Lanctot M, et al. arXiv, 2017. [Post]
Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds by Roi Ceren, Prashant Doshi, and Bikramjit Banerjee, pp. 530-538, AAMAS 2016.
Opponent Modeling in Deep Reinforcement Learning by He H, Boyd-Graber J, Kwok K, et al. ICML, 2016.
Multiagent cooperation and competition with deep reinforcement learning by Tampuu A, Matiisen T, Kodelja D, et al. arXiv, 2015.
Emotional multiagent reinforcement learning in social dilemmas by Yu C, Zhang M, Ren F. International Conference on Principles and Practice of Multi-Agent Systems, 2013.
Multi-agent reinforcement learning in common interest and fixed sum stochastic games: An experimental study by Bab, Avraham, and Ronen I. Brafman. Journal of Machine Learning Research, 2008.
Combining policy search with planning in multi-agent cooperation by Ma J, Cameron S. Robot Soccer World Cup, 2008.
Collaborative multiagent reinforcement learning by payoff propagation by Kok J R, Vlassis N. JMLR, 2006.
Learning to cooperate in multi-agent social dilemmas by de Cote E M, Lazaric A, Restelli M. AAMAS, 2006.
Learning to compete, compromise, and cooperate in repeated general-sum games by Crandall J W, Goodrich M A. ICML, 2005.
Sparse cooperative Q-learning by Kok J R, Vlassis N. ICML, 2004.

10.6 Coordination

Coordinated Multi-Agent Imitation Learning by Le H M, Yue Y, Carr P. arXiv, 2017.
Reinforcement social learning of coordination in networked cooperative multiagent systems by Hao J, Huang D, Cai Y, et al. AAAI Workshop, 2014.
Coordinating multi-agent reinforcement learning with limited communication by Zhang, Chongjie, and Victor Lesser. AAMAS, 2013.
Coordination guided reinforcement learning by Lau Q P, Lee M L, Hsu W. AAMAS, 2012.
Coordination in multiagent reinforcement learning: a Bayesian approach by Chalkiadakis G, Boutilier C. AAMAS, 2003.
Coordinated reinforcement learning by Guestrin C, Lagoudakis M, Parr R. ICML, 2002.
Reinforcement learning of coordination in cooperative multi-agent systems by Kapetanakis S, Kudenko D. AAAI/IAAI, 2002.

10.7 Security

Markov Security Games: Learning in Spatial Security Problems by Klima R, Tuyls K, Oliehoek F. The Learning, Inference and Control of Multi-Agent Systems at NIPS, 2016.
Cooperative Capture by Multi-Agent using Reinforcement Learning, Application for Security Patrol Systems by Yasuyuki S, Hirofumi O, Tadashi M, et al. Control Conference (ASCC), 2015
Improving learning and adaptation in security games by exploiting information asymmetry by He X, Dai H, Ning P. INFOCOM, 2015.

10.8 Self-Play

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning by Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel. NIPS 2017.
Deep reinforcement learning from self-play in imperfect-information games by Heinrich, Johannes, and David Silver. arXiv, 2016.
Fictitious Self-Play in Extensive-Form Games by Heinrich, Johannes, Marc Lanctot, and David Silver. ICML, 2015.

10.9 Learning To Communicate

Emergent Communication through Negotiation by Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark, 2018.
Emergence of Linguistic Communication From Referential Games with Symbolic and Pixel Input by Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark
EMERGENCE OF LANGUAGE WITH MULTI-AGENT GAMES: LEARNING TO COMMUNICATE WITH SEQUENCES OF SYMBOLS by Serhii Havrylov, Ivan Titov. ICLR Workshop, 2017.
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning by Abhishek Das, Satwik Kottur, et al. arXiv, 2017.
Emergence of Grounded Compositional Language in Multi-Agent Populations by Igor Mordatch, Pieter Abbeel. arXiv, 2017. [Post]
Cooperation and communication in multiagent deep reinforcement learning by Hausknecht M J. 2017.
Multi-agent cooperation and the emergence of (natural) language by Lazaridou A, Peysakhovich A, Baroni M. arXiv, 2016.
Learning to communicate to solve riddles with deep distributed recurrent q-networks by Foerster J N, Assael Y M, de Freitas N, et al. arXiv, 2016.
Learning to communicate with deep multi-agent reinforcement learning by Foerster J, Assael Y M, de Freitas N, et al. NIPS, 2016.
Learning multiagent communication with backpropagation by Sukhbaatar S, Fergus R. NIPS, 2016.
Efficient distributed reinforcement learning through agreement by Varshavskaya P, Kaelbling L P, Rus D. Distributed Autonomous Robotic Systems, 2009.

10.10 Transfer Learning

Simultaneously Learning and Advising in Multiagent Reinforcement Learning by Silva, Felipe Leno da; Glatt, Ruben; and Costa, Anna Helena Reali. AAMAS, 2017.
Accelerating Multiagent Reinforcement Learning through Transfer Learning by Silva, Felipe Leno da; and Costa, Anna Helena Reali. AAAI, 2017.
Accelerating multi-agent reinforcement learning with dynamic co-learning by Garant D, da Silva B C, Lesser V, et al. Technical report, 2015
Transfer learning in multi-agent systems through parallel transfer by Taylor, Adam, et al. ICML, 2013.
Transfer learning in multi-agent reinforcement learning domains by Boutsioukis, Georgios, Ioannis Partalas, and Ioannis Vlahavas. European Workshop on Reinforcement Learning, 2011.
Transfer Learning for Multi-agent Coordination by Vrancx, Peter, Yann-Michaël De Hauwere, and Ann Nowé. ICAART, 2011.

10.11 Imitation and Inverse Reinforcement Learning

Multi-Agent Adversarial Inverse Reinforcement Learning by Lantao Yu, Jiaming Song, Stefano Ermon. ICML 2019.
Multi-Agent Generative Adversarial Imitation Learning by Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon. NeurIPS 2018.
Cooperative inverse reinforcement learning by Hadfield-Menell D, Russell S J, Abbeel P, et al. NIPS, 2016.
Comparison of Multi-agent and Single-agent Inverse Learning on a Simulated Soccer Example by Lin X, Beling P A, Cogill R. arXiv, 2014.
Multi-agent inverse reinforcement learning for zero-sum games by Lin X, Beling P A, Cogill R. arXiv, 2014.
Multi-robot inverse reinforcement learning under occlusion with interactions by Bogert K, Doshi P. AAMAS, 2014.
Multi-agent inverse reinforcement learning by Natarajan S, Kunapuli G, Judah K, et al. ICMLA, 2010.

10.12 Meta Learning

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments by l-Shedivat, M. 2018.

10.13 Application

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence by Zheng L et al. NIPS 2017 & AAAI 2018 Demo. (Github Page)
Collaborative Deep Reinforcement Learning for Joint Object Search by Kong X, Xin B, Wang Y, et al. arXiv, 2017.
Multi-Agent Stochastic Simulation of Occupants for Building Simulation by Chapman J, Siebers P, Darren R. Building Simulation, 2017.
Extending No-MASS: Multi-Agent Stochastic Simulation for Demand Response of residential appliances by Sancho-Tomás A, Chapman J, Sumner M, Darren R. Building Simulation, 2017.
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving by Shalev-Shwartz S, Shammah S, Shashua A. arXiv, 2016.
Applying multi-agent reinforcement learning to watershed management by Mason, Karl, et al. Proceedings of the Adaptive and Learning Agents workshop at AAMAS, 2016.
Crowd Simulation Via Multi-Agent Reinforcement Learning by Torrey L. AAAI, 2010.
Traffic light control by multiagent reinforcement learning systems by Bakker, Bram, et al. Interactive Collaborative Information Systems, 2010.
Multiagent reinforcement learning for urban traffic control using coordination graphs by Kuyer, Lior, et al. oint European Conference on Machine Learning and Knowledge Discovery in Databases, 2008.
A multi-agent Q-learning framework for optimizing stock trading systems by Lee J W, Jangmin O. DEXA, 2002.
Multi-agent reinforcement learning for traffic light control by Wiering, Marco. ICML. 2000.

#11. Paper-Resources

2019-07

Benchmarking Model-Based Reinforcement Learning
Learning World Graphs to Accelerate
Hierarchical Reinforcement Learning
Perspective Taking in Deep Reinforcement Learning Agents
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
Dynamic Face Video Segmentation via Reinforcement Learning
Striving for Simplicity in Off-policy Deep Reinforcement Learning
Intrinsic Motivation Driven Intuitive Physics Learning using Deep Reinforcement Learning with Intrinsic Reward Normalization
A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning
Attentive Multi-Task Deep Reinforcement Learning
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning
Google Research Football: A Novel Reinforcement Learning Environment
Deep Reinforcement Learning in Financial Markets
Dynamic Input for Deep Reinforcement Learning in Autonomous Driving
Characterizing Attacks on Deep Reinforcement Learning
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
VRLS: A Unified Reinforcement Learning Scheduler for Vehicle-to-Vehicle Communications
Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges
Arena: a toolkit for Multi-Agent Reinforcement Learning
GPU-Accelerated Atari Emulation for Reinforcement Learning
Photonic architecture for reinforcement learning

Jun

Towards Empathic Deep Q-Learning
Ranking Policy Gradient
Hyp-RL : Hyperparameter Optimization by Reinforcement Learning
Modern Deep Reinforcement Learning Algorithms
A Framework for Automatic Question Generation from Text using Deep Reinforcement Learning
Deep Reinforcement Learning for Unmanned Aerial Vehicle-Assisted Vehicular Networks
Is multiagent deep reinforcement learning the answer or the question? A brief survey
Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning
Cooperative Lane Changing via Deep Reinforcement Learning
A Hierarchical Architecture for Sequential Decision-Making in Autonomous Driving using Deep Reinforcement Learning
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Autonomous Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking
A Survey of Reinforcement Learning Informed by Natural Language
Load Balancing for Ultra-Dense Networks: A Deep Reinforcement Learning Based Approach
Deep Reinforcement Learning Architecture for Continuous Power Allocation in High Throughput Satellites
Harnessing Reinforcement Learning for Neural Motion Planning

April-May

Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
On the Generalization Gap in Reparameterizable Reinforcement Learning
Targeted Attacks on Deep Reinforcement Learning Agents through Adversarial Observations
Inverse Reinforcement Learning in Contextual MDPs
Teaching on a Budget in Multi-Agent Deep Reinforcement Learning
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
Generation of Policy-Level Explanations for Reinforcement Learning
A Control-Model-Based Approach for Reinforcement Learning
Interactive Teaching Algorithms for Inverse Reinforcement Learning
Snooping Attacks on Deep Reinforcement Learning

March 2019

IRLAS: Inverse Reinforcement Learning for Architecture Search
Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning
M3RL: Mind-aware Multi-agent Management Reinforcement Learning
Concurrent Meta Reinforcement Learning
Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
Using Natural Language for Reward Shaping in Reinforcement Learning
Model-Based Reinforcement Learning for Atari
RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems
Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning
Hacking Google reCAPTCHA v3 using Reinforcement Learning
Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
Deep Reinforcement Learning with Feedback-based Exploration
Deep Reinforcement Learning for Autonomous Driving
Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention
Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning
Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Feb 2019

Distributional reinforcement learning with linear function approximation
Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance
Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning
Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications
Reinforcement Learning for Optimal Load Distribution Sequencing in Resource-Sharing System
Learning to Schedule Communication in Multi-agent Reinforcement Learning
On Reinforcement Learning for Full-length Game of StarCraft
Implicit Policy for Reinforcement Learning
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Visual Rationalizations in Deep Reinforcement Learning for Atari Games
Statistics and Samples in Distributional Reinforcement Learning
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
Investigating Generalisation in Continuous Deep Reinforcement Learning
Model-Free Adaptive Optimal Control of Episodic Fixed-Horizon Manufacturing Processes using Reinforcement Learning
Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Towards the Next Generation Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking
Parenting: Safe Reinforcement Learning from Human Input
Reinforcement Learning Without Backpropagation or a Clock
Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning
A new Potential-Based Reward Shaping for Reinforcement Learning Agent
How to Combine Tree-Search Methods in Reinforcement Learning
Unsupervised Basis Function Adaptation for Reinforcement Learning
Communication Topologies Between Learning Agents in Deep Reinforcement Learning
Logically-Constrained Reinforcement Learning
Hyperbolic Embeddings for Learning Options in Hierarchical Reinforcement Learning
ProLoNets: Neural-encoding Human Experts’ Domain Knowledge to Warm Start Reinforcement Learning
A Framework for Automated Cellular Network Tuning with Reinforcement Learning
Deep Reinforcement Learning for Search, Recommendation, and Online Advertising: A Survey
The Value Function Polytope in Reinforcement Learning
Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations
Deep Reinforcement Learning Based High-level Driving Behavior Decision-making Model in Heterogeneous Traffic
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
Verifiably Safe Off-Model Reinforcement Learning
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Exploration versus exploitation in reinforcement learning: a stochastic control approach
ACTRCE: Augmenting Experience via Teacher’s Advice For Multi-Goal Reinforcement Learning
End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning
WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving
Emergence of Hierarchy via Reinforcement Learning Using a Multiple Timescale Stochastic RNN

Jan 2019

Federated Reinforcement Learning
Verifiable Reinforcement Learning via Policy Extraction
QFlow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks
Complementary reinforcement learning towards explainable agents
The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition
Hierarchical Reinforcement Learning for Multi-agent MOBA Game
Reinforcement Learning of Markov Decision Processes with Peak Constraints
Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target
Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
A Short Survey on Probabilistic Reinforcement Learning
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
Recurrent Control Nets for Deep Reinforcement Learning
Amplifying the Imitation Effect for Reinforcement Learning of UCAV’s Mission Execution
Multi-agent Reinforcement Learning Embedded Game for the Optimization of Building Energy Control and Power System Planning
Representation Learning on Graphs: A Reinforcement Learning Application
Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
Exploration versus exploitation in reinforcement learning: a stochastic control approach
Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks
Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning
Improving Coordination in Multi-Agent Deep Reinforcement Learning through Memory-driven Communication
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement learning
Accelerated Methods for Deep Reinforcement Learning
Motion Perception in Reinforcement Learning with Dynamic Objects
A New Tensioning Method using Deep Reinforcement Learning for Surgical Pattern Cutting
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning
Risk-Aware Active Inverse Reinforcement Learning
A dual mode adaptive basal-bolus advisor based on reinforcement learning
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning
Deep Reinforcement Learning for Imbalanced Classification
Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning
Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning
Floyd-Warshall Reinforcement Learning: Learning from Past Experiences to Reach New Goals
A Critical Investigation of Deep Reinforcement Learning for Navigation
Accelerating Goal-Directed Reinforcement Learning by Model Characterization
Machine Teaching in Hierarchical Genetic Reinforcement Learning: Curriculum Design of Reward Functions for Swarm Shepherding
Reinforcement Learning Using Quantum Boltzmann Machines
Communication-Efficient Distributed Reinforcement Learning
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
Human-Like Autonomous Car-Following Model with Deep Reinforcement Learning
Adversarial Text Generation Without Reinforcement Learning
End-to-End Video Captioning with Multitask Reinforcement Learning

2018

Accelerated Methods for Deep Reinforcement Learning. arxiv
A Deep Reinforcement Learning Chatbot (Short Version). arxiv
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search. arxiv ⭐️
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. arxiv
Composable Deep Reinforcement Learning for Robotic Manipulation. arxiv
Cooperative Multi-Agent Reinforcement Learning for Low-Level Wireless Communication. arxiv
Deep Reinforcement Fuzzing. arxiv
Deep Reinforcement Learning of Cell Movement in the Early Stage of C. elegans Embryogenesis. arxiv
Deep Reinforcement Learning For Sequence to Sequence Models. arxiv code
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods. arxiv
Deep Reinforcement Learning in Portfolio Management. arxiv code
Deep Reinforcement Learning using Capsules in Advanced Game Environments. arxiv
Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft. arxiv
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes. arxiv code
Diversity is All You Need: Learning Skills without a Reward Function. arxiv
Faster Deep Q-learning using Neural Episodic Control. arxiv
Feedback-Based Tree Search for Reinforcement Learning. arxiv
Feudal Reinforcement Learning for Dialogue Management in Large Domains. arxiv
Forward-Backward Reinforcement Learning. arxiv
Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies. arxiv
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arxiv
Kickstarting Deep Reinforcement Learning. arxiv
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. arxiv
Meta Reinforcement Learning with Latent Variable Gaussian Processes. arxiv
Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches. arxiv
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations. arxiv
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents. arxiv
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. arxiv
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. arxiv
Reinforcement Learning from Imperfect Demonstrations. arxiv
Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arxiv
RUDDER: Return Decomposition for Delayed Rewards. arxiv code
Semi-parametric Topological Memory for Navigation. arxiv tensorflow
Shared Autonomy via Deep Reinforcement Learning. arxiv
Setting up a Reinforcement Learning Task with a Real-World Robot. arxiv
Simple random search provides a competitive approach to reinforcement learning. arxiv code
Unsupervised Meta-Learning for Reinforcement Learning. arxiv
Using reinforcement learning to learn how to play text-based games. arxiv

2017

A Deep Reinforcement Learning Chatbot. arxiv
A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. arxiv code
A Deep Reinforced Model for Abstractive Summarization. arxiv
A Distributional Perspective on Reinforcement Learning. arxiv
A Laplacian Framework for Option Discovery in Reinforcement Learning. arxiv ⭐️
Boosting the Actor with Dual Critic. arxiv
Bridging the Gap Between Value and Policy Based Reinforcement Learning. arxiv
Car Racing using Reinforcement Learning. pdf
Cold-Start Reinforcement Learning with Softmax Policy Gradients. arxiv
Curiosity-driven Exploration by Self-supervised Prediction. arxiv tensorflow
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arxiv code
DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning. arxiv code
Deep Reinforcement Learning: An Overview. arxiv
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. arxiv code
Deep reinforcement learning from human preferences. arxiv
Deep Reinforcement Learning that Matters. arxiv code
Device Placement Optimization with Reinforcement Learning. arxiv
Distributional Reinforcement Learning with Quantile Regression. arxiv
End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning. arxiv
Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arxiv
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning. arxiv
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arxiv
Learning how to Active Learn: A Deep Reinforcement Learning Approach. arxiv tensorflow
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning. arxiv tensorflow
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. arxiv code ⭐️
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arxiv
Micro-Objective Learning : Accelerating Deep Reinforcement Learning through the Discovery of Continuous Subgoals. arxiv
Neural Architecture Search with Reinforcement Learning. arxiv tensorflow
Neural Map: Structured Memory for Deep Reinforcement Learning. arxiv
Observational Learning by Reinforcement Learning. arxiv
Overcoming Exploration in Reinforcement Learning with Demonstrations. arxiv
Practical Network Blocks Design with Q-Learning. arxiv
Rainbow: Combining Improvements in Deep Reinforcement Learning. arxiv
Reinforcement Learning for Architecture Search by Network Transformation. arxiv code
Reinforcement Learning via Recurrent Convolutional Neural Networks. arxiv code
Reinforcement Learning with a Corrupted Reward Channel. arxiv ⭐️
Reinforcement Learning with Deep Energy-Based Policies. arxiv code
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads. arxiv
Robust Deep Reinforcement Learning with Adversarial Attacks. arxiv
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arxiv
Shallow Updates for Deep Reinforcement Learning. arxiv code
Stochastic Neural Networks for Hierarchical Reinforcement Learning. pdf code
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing. arxiv code
Task-Oriented Query Reformulation with Reinforcement Learning. arxiv code
Teaching a Machine to Read Maps with Deep Reinforcement Learning. arxiv code
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning. arxiv code
Value Prediction Network. arxiv
Variational Deep Q Network. arxiv
Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation.arxiv
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. arxiv

2016

Asynchronous Methods for Deep Reinforcement Learning. [arxiv] ⭐️
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al., ICLR. [arxiv]
A New Softmax Operator for Reinforcement Learning.[url]
Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al., ICML. [arxiv]
Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al., ICLR. [arxiv]
Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al., ICLR. [arxiv]
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al., arXiv. [url]
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML. [arxiv]
Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al., ICML. [arxiv]
Continuous control with deep reinforcement learning. [arxiv] ⭐️
Deep Successor Reinforcement Learning. [arxiv]
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop. [arxiv]
Deep Exploration via Bootstrapped DQN. [arxiv] ⭐️
Deep Reinforcement Learning for Dialogue Generation. [arxiv] tensorflow
Deep Reinforcement Learning in Parameterized Action Space. [arxiv] ⭐️
Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments.[url]
Designing Neural Network Architectures using Reinforcement Learning. arxiv code
Dialogue manager domain adaptation using Gaussian process reinforcement learning. [arxiv]
End-to-End Reinforcement Learning of Dialogue Agents for Information Access. [arxiv]
Generating Text with Deep Reinforcement Learning. [arxiv]
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al., arXiv. [arxiv]
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al., arXiv. [arxiv]
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv. [arxiv]
Hierarchical Object Detection with Deep Reinforcement Learning. [arxiv]
High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al., ICLR. [arxiv]
Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI. [arxiv]
Interactive Spoken Content Retrieval by Deep Reinforcement Learning. [arxiv]
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al., arXiv. [url]
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al., arXiv. [url]
Learning to compose words into sentences with reinforcement learning. [url]
Loss is its own Reward: Self-Supervision for Reinforcement Learning.[arxiv]
Model-Free Episodic Control. [arxiv]
Mastering the game of Go with deep neural networks and tree search. [nature] ⭐️
MazeBase: A Sandbox for Learning from Games .[arxiv]
Neural Architecture Search with Reinforcement Learning. [pdf]
Neural Combinatorial Optimization with Reinforcement Learning. [arxiv]
Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning. [url]
Online Sequence-to-Sequence Active Learning for Open-Domain Dialogue Generation. arXiv. [arxiv]
Policy Distillation, A. A. Rusu et at., ICLR. [arxiv]
Prioritized Experience Replay. [arxiv] ⭐️
Reinforcement Learning Using Quantum Boltzmann Machines. [arxiv]
Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.[arxiv]
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. [arxiv]
Sample-efficient Deep Reinforcement Learning for Dialog Control. [url]
Self-Correcting Models for Model-Based Reinforcement Learning.[url]
Unifying Count-Based Exploration and Intrinsic Motivation. [arxiv]
Value Iteration Networks. [arxiv]

2015

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources. arxiv
Action-Conditional Video Prediction using Deep Networks in Atari Games. arxiv ⭐️
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. arxiv ⭐️
[DDPG] Continuous control with deep reinforcement learning. arxiv ⭐️
[NAF] Continuous Deep Q-Learning with Model-based Acceleration. arxiv ⭐️
Dueling Network Architectures for Deep Reinforcement Learning. arxiv ⭐️
Deep Reinforcement Learning with an Action Space Defined by Natural Language.arxiv
Deep Reinforcement Learning with Double Q-learning. arxiv ⭐️
Deep Recurrent Q-Learning for Partially Observable MDPs. arxiv ⭐️
DeepMPC: Learning Deep Latent Features for Model Predictive Control. pdf
Deterministic Policy Gradient Algorithms. pdf ⭐️
Dueling Network Architectures for Deep Reinforcement Learning. arxiv
End-to-End Training of Deep Visuomotor Policies. arxiv ⭐️
Giraffe: Using Deep Reinforcement Learning to Play Chess. arxiv
Generating Text with Deep Reinforcement Learning. arxiv
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies. arxiv
Human-level control through deep reinforcement learning. nature ⭐️
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arxiv ⭐️
Learning Simple Algorithms from Examples. arxiv
Language Understanding for Text-based Games Using Deep Reinforcement Learning. pdf ⭐️
Learning Continuous Control Policies by Stochastic Value Gradients.pdf ⭐️
Multiagent Cooperation and Competition with Deep Reinforcement Learning. arxiv
Maximum Entropy Deep Inverse Reinforcement Learning. arxiv
Massively Parallel Methods for Deep Reinforcement Learning. pdf] ⭐️
On Learning to Think- Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arxiv
Playing Atari with Deep Reinforcement Learning. arxiv
Recurrent Reinforcement Learning: A Hybrid Approach. arxiv
Strategic Dialogue Management via Deep Reinforcement Learning. arxiv
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control. arxiv
Trust Region Policy Optimization. pdf ⭐️
Universal Value Function Approximators. pdf
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. arxiv

2014

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.[url]

2013

Evolving large-scale neural networks for vision-based reinforcement learning. [idsia] ⭐️
Playing Atari with Deep Reinforcement Learning. [toronto] ⭐️

More About

These documents will be updated in sync with my personal blog and knowledge column

CSDN-Blog: A Guide Resource for Deep Reinforcement Learning
ZhiHu-Blog: A Guide Resource for Deep Reinforcement Learning
WeChat(Add account: “NeuronDance”, remark “Name-University/Company”)

Cite

Based on the above information, we have made a comprehensive summary of the deep reinforcement of learning materials, and we would like to express our heartfelt thanks to them.

[1].https://github.com/brianspiering/awesome-deep-rl
[2].https://github.com/jgvictores/awesome-deep-reinforcement-learning
[3].https://github.com/PaddlePaddle/PARL/blob/develop/papers/archive.md#distributed-training
[4].https://github.com/LantaoYu/MARL-Papers
[5].https://github.com/gopala-kr/DRL-Agents
[6].https://github.com/junhyukoh/deep-reinforcement-learning-papers
[7].https://www.eff.org/ai/metrics#Source-Code
[8].https://agi.university/the-landscape-of-deep-reinforcement-learning
[9].https://github.com/tigerneil/awesome-deep-rl
[10].https://planspace.org/20170830-berkeley_deep_rl_bootcamp/
[11].https://aikorea.org/awesome-rl/
[12].https://github.com/junhyukoh/deep-reinforcement-learning-papers

你可能感兴趣的:(强化学习,深度学习)

PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
深度学习模型表征提取全解析 ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 深度学习人工智能 python embedding 语言模型
模型内部进行表征提取的方法在自然语言处理（NLP）中，“表征（Representation）”指将文本（词、短语、句子、文档等）转化为计算机可理解的数值形式（如向量、矩阵），核心目标是捕捉语言的语义、语法、上下文依赖等信息。自然语言表征技术可按“静态/动态”“有无上下文”“是否融入知识”等维度划分一、传统静态表征（无上下文，词级为主）这类方法为每个词分配固定向量，不考虑其在具体语境中的含义（无法解
【Qualcomm】高通SNPE框架简介、下载与使用 Jackilina_Stone 人工智能 Qualcomm SNPE
目录一高通SNPE框架1SNPE简介2QNN与SNPE3Capabilities4工作流程二SNPE的安装与使用1下载2Setup3SNPE的使用概述一高通SNPE框架1SNPE简介SNPE（SnapdragonNeuralProcessingEngine），是高通公司推出的面向移动端和物联网设备的深度学习推理框架。SNPE提供了一套完整的深度学习推理框架，能够支持多种深度学习模型，包括Pytor
深度学习篇---昇腾NPU&CANN 工具包 Atticus-Orion 上位机知识篇图像处理篇深度学习篇深度学习人工智能 NPU 昇腾 CANN
介绍昇腾NPU是华为推出的神经网络处理器，具有强大的AI计算能力，而CANN工具包则是面向AI场景的异构计算架构，用于发挥昇腾NPU的性能优势。以下是详细介绍：昇腾NPU架构设计：采用达芬奇架构，是一个片上系统，主要由特制的计算单元、大容量的存储单元和相应的控制单元组成。集成了多个CPU核心，包括控制CPU和AICPU，前者用于控制处理器整体运行，后者承担非矩阵类复杂计算。此外，还拥有AICore
深度学习图像分类数据集—桃子识别分类 AI街潜水的八角深度学习图像数据集深度学习分类人工智能
该数据集为图像分类数据集，适用于ResNet、VGG等卷积神经网络，SENet、CBAM等注意力机制相关算法，VisionTransformer等Transformer相关算法。数据集信息介绍：桃子识别分类：['B1','M2','R0','S3']训练数据集总共有6637张图片，每个文件夹单独放一种数据各子文件夹图片统计:·B1:1601张图片·M2:1800张图片·R0:1601张图片·S3:
NumPy-@运算符详解 GG不是gg numpy numpy
NumPy-@运算符详解一、@运算符的起源与设计目标1.从数学到代码：符号的统一2.设计目标二、@运算符的核心语法与运算规则1.基础用法：二维矩阵乘法2.一维向量的矩阵语义3.高维数组：批次矩阵运算4.广播机制：灵活的形状匹配三、@运算符与其他乘法方式的核心区别1.对比`np.dot()`2.对比元素级乘法`*`3.对比`np.matrix`的`*`运算符四、典型应用场景：从基础到高阶1.深度学习
NLP_知识图谱_大模型——个人学习记录 macken9999 自然语言处理知识图谱大模型自然语言处理知识图谱学习
1.自然语言处理、知识图谱、对话系统三大技术研究与应用https://github.com/lihanghang/NLP-Knowledge-Graph深度学习-自然语言处理(NLP)-知识图谱：知识图谱构建流程【本体构建、知识抽取（实体抽取、关系抽取、属性抽取）、知识表示、知识融合、知识存储】-元気森林-博客园https://www.cnblogs.com/-402/p/16529422.htm
解决 Python 包安装失败问题：以 accelerate 为例
在使用Python开发项目时，我们经常会遇到依赖包安装失败的问题。今天，我们就以accelerate包为例，详细探讨一下可能的原因以及解决方法。通过这篇文章，你将了解到Python包安装失败的常见原因、如何切换镜像源、如何手动安装包，以及一些实用的注意事项。一、问题背景在开发一个深度学习项目时，我需要安装accelerate包来优化模型的训练过程。然而，当我运行以下命令时：bash复制pipins
从RNN循环神经网络到Transformer注意力机制：解析神经网络架构的华丽蜕变熊猫钓鱼>_> 神经网络 rnn transformer
1.引言在自然语言处理和序列建模领域，神经网络架构经历了显著的演变。从早期的循环神经网络（RNN）到现代的Transformer架构，这一演变代表了深度学习方法在处理序列数据方面的重大进步。本文将深入比较这两种架构，分析它们的工作原理、优缺点，并通过实验结果展示它们在实际应用中的性能差异。2.循环神经网络（RNN）2.1基本原理循环神经网络是专门为处理序列数据而设计的神经网络架构。RNN的核心思想
如何使用Python实现交通工具识别
如何使用Python实现交通工具识别文章目录技术架构功能流程识别逻辑用户界面增强特性依赖项主要类别内容展示该系统是一个基于深度学习的交通工具识别工具，具备以下核心功能与特点：技术架构使用预训练的ResNet50卷积神经网络模型（来自ImageNet数据集）集成图像增强预处理技术（随机裁剪、旋转、翻转等）采用多数投票机制提升预测稳定性基于置信度评分的结果筛选策略功能流程用户通过GUI界面选择待识别图
Python OpenCV教程从入门到精通的全面指南【文末送书】一键难忘 python opencv 开发语言
文章目录PythonOpenCV从入门到精通1.安装OpenCV2.基本操作2.1读取和显示图像2.2图像基本操作3.图像处理3.1图像转换3.2图像阈值处理3.3图像平滑4.边缘检测和轮廓4.1Canny边缘检测4.2轮廓检测5.高级操作5.1特征检测5.2目标跟踪5.3深度学习与OpenCVPythonOpenCV从入门到精通【文末送书】PythonOpenCV从入门到精通OpenCV(Ope
第八周 tensorflow实现猫狗识别降花绘 365天深度学习 tensorflow系列 tensorflow 深度学习人工智能
本文为365天深度学习训练营内部限免文章（版权归K同学啊所有）**参考文章地址：[TensorFlow入门实战｜365天深度学习训练营-第8周：猫狗识别（训练营内部成员可读）]**作者：K同学啊文章目录一、本周学习内容:1、自己搭建VGG16网络2、了解model.train_on_batch（）3、了解tqdm，并使用tqdm实现可视化进度条二、前言三、电脑环境四、前期准备1、导入相关依赖项2、
深度学习实战-使用TensorFlow与Keras构建智能模型程序员Gloria Python超入门 TensorFlow python
深度学习实战-使用TensorFlow与Keras构建智能模型深度学习已经成为现代人工智能的重要组成部分，而Python则是实现深度学习的主要编程语言之一。本文将探讨如何使用TensorFlow和Keras构建深度学习模型，包括必要的代码实例和详细的解析。1.深度学习简介深度学习是机器学习的一个分支，使用多层神经网络来学习和表示数据中的复杂模式。其广泛应用于图像识别、自然语言处理、推荐系统等领域。
AI在垂直领域的深度应用：医疗、金融与自动驾驶的革新之路
AI在垂直领域的深度应用：医疗、金融与自动驾驶的革新之路一、医疗领域：AI驱动的精准诊疗与效率提升1.医学影像诊断AI算法通过深度学习技术，已实现对X光、CT、MRI等影像的快速分析，辅助医生检测癌症、骨折等疾病。例如，GoogleDeepMind的AI系统在乳腺癌筛查中，误检率比人类专家低9.4%；中国的推想医疗AI系统可在20秒内完成肺部CT扫描分析，为急诊救治争取黄金时间。2.药物研发传统药
专题：2025云计算与AI技术研究趋势报告|附200+份报告PDF、原数据表汇总下载
原文链接：https://tecdat.cn/?p=42935关键词：2025,云计算，AI技术，市场趋势，深度学习，公有云，研究报告云计算和AI技术正以肉眼可见的速度重塑商业世界。过去十年，全球云服务收入激增8倍，中国云计算市场规模突破6000亿元，而深度学习算法的应用量更是暴涨400倍。这些数字背后，是企业从“自建机房”到“云原生开发”的转型，是AI从“实验室”走向“产业级应用”的跨越。本报告
【深度学习解惑】在实践中如何发现和修正RNN训练过程中的数值不稳定？云博士的AI课堂大模型技术开发与实践哈佛博后带你玩转机器学习深度学习深度学习 rnn 人工智能 tensorflow pytorch 神经网络机器学习
在实践中发现和修正RNN训练过程中的数值不稳定目录引言与背景介绍原理解释代码说明与实现应用场景与案例分析实验设计与结果分析性能分析与技术对比常见问题与解决方案创新性与差异性说明局限性与挑战未来建议和进一步研究扩展阅读与资源推荐图示与交互性内容语言风格与通俗化表达互动交流1.引言与背景介绍循环神经网络(RNN)在处理序列数据时表现出色，但训练过程中常面临梯度消失和梯度爆炸问题，导致数值不稳定。当网络
【深度学习实战】当前三个最佳图像分类模型的代码详解云博士的AI课堂大模型技术开发与实践哈佛博后带你玩转机器学习深度学习深度学习人工智能分类模型机器学习 Transformer EfficientNet ConvNeXt
下面给出三个在当前图像分类任务中精度表现突出的模型示例，分别基于SwinTransformer、EfficientNet与ConvNeXt。每个模型均包含：训练代码（使用PyTorch）从预训练权重开始微调（也可注释掉预训练选项，从头训练）数据集目录结构：└──dataset_root├──buy#第一类图像└──nobuy#第二类图像随机拆分：80%训练，20%验证每个Epoch输出一次loss
第35周—————糖尿病预测模型优化探索
目录目录前言1.检查GPU2.查看数据编辑3.划分数据集4.创建模型与编译训练5.编译及训练模型6.结果可视化7.总结前言本文为365天深度学习训练营中的学习记录博客原作者：K同学啊1.检查GPUimporttorch.nnasnnimporttorch.nn.functionalasFimporttorchvision,torch#设置硬件设备，如果有GPU则使用，没有则使用cpudevice=
强化学习之 DQN、Double DQN、PPO JNU freshman 强化学习强化学习
文章目录通俗理解DQNDoubleDQNPPO结合公式理解通俗理解DQN一个简单的比喻和分步解释来理解DQN（DeepQ-Network，深度Q网络），就像教小朋友学打游戏一样：先理解基础概念：Q学习（Q-Learning）想象你在教一只小狗玩电子游戏（比如打砖块）。小狗每做一个动作（比如“向左移动”或“发射球”），游戏会给出一个奖励（比如得分增加）或惩罚（比如球掉了）。小狗的目标是通过不断尝试，
深度学习预备知识 AmazingMQ 深度学习人工智能
1.Tensor张量定义：张量（tensor）表示一个由数值组成的数组，这个数组可能有多个维度（轴）。具有一个轴的张量对应数学上的向量，具有两个轴的张量对应数学上的矩阵，具有两个以上轴的张量目前没有特定的数学名称。importtorch#arange创建一个行向量x，这个行向量包含以0开始的前12个整数。x=torch.arange(12)print("x=",x)#x=tensor([0,1,2
根茎式装配体（RA）作为下一代协同智能范式的理论、架构与应用由数入道人工智能思维框架软件工程智能体
一、引言——范式危机与新大陆的召唤1.1表征主义的黄昏：当前AI协同范式的认知天花板自艾伦·图灵在《计算机器与智能》中播下思想的种子以来，人工智能的漫长征途始终被一个强大而内隐的哲学范式所笼罩——我们称之为“表征主义”（Representationism）。这一范式，无论其外在形态如何演变，从早期的符号逻辑、专家系统，到如今风靡全球的深度学习神经网络，其核心信念从未动摇：智能的核心，在于构建一个关
Manus AI与多语言手写识别
ManusAI与多语言手写识别背景与概述手写识别技术的发展现状与挑战ManusAI的核心技术与应用场景多语言手写识别的市场需求与难点ManusAI的技术架构深度学习在手写识别中的应用多语言支持的模型设计数据预处理与特征提取方法多语言手写识别的关键挑战不同语言字符的多样性处理上下文语义与书写风格适应性低资源语言的训练数据获取解决方案与优化策略迁移学习在多语言任务中的应用端到端模型的优化与轻量化用户反
基于LIDC-IDRI肺结节肺癌数据集的人工智能深度学习分类良性和恶性肺癌（Python 全代码）全流程解析（二）
基于LIDC-IDRI肺结节肺癌数据集的人工智能深度学习分类良性和恶性肺癌（Python全代码）全流程解析（二）1环境配置和数据集预处理1.1环境配置1.1数据集预处理2深度学习模型训练和评估2.1深度学习模型训练2.1深度学习模型评估笑话一则开心一下喽完整代码如下：模型文件如下深度学习模型讲解---待续第一部分内容的传送门第三部分传送门1环境配置和数据集预处理1.1环境配置环境配置建议使用ana
深度学习交互式图像分割技术演进与突破 wang1776866571 深度学习交互式分割深度学习人工智能交互式分割
说明本文为作者读研期间基于交互式图像分割领域公开文献的系统梳理与个人理解总结，所有内容均为原创撰写（ai辅助创作），未直接复制或抄袭他人成果。文中涉及的算法、模型及实验结论均参考自领域内公开发表的学术论文（具体文献见文末参考文献列表）。本文旨在为交互式图像分割领域的学习者提供一份结构化的综述参考，内容涵盖技术演进、核心方法、关键技术优化及应用前景，希望能为相关研究提供启发。摘要：本文系统综述了基于
Python 强化学习算法实用指南（三）绝不原创的飞龙默认分类默认分类
原文：annas-archive.org/md5/e3819a6747796b03b9288831f4e2b00c译者：飞龙协议：CCBY-NC-SA4.0第十一章：理解黑盒优化算法在前几章中，我们研究了强化学习（RL）算法，从基于价值的方法到基于策略的方法，以及从无模型方法到基于模型的方法。在本章中，我们将提供另一种解决序列任务的方法，那就是使用一类黑盒算法——进化算法（EA）。EAs由进化机制
Python 强化学习算法实用指南（二）
原文：annas-archive.org/md5/e3819a6747796b03b9288831f4e2b00c译者：飞龙协议：CCBY-NC-SA4.0第六章：学习随机优化与PG优化到目前为止，我们已经探讨并开发了基于价值的强化学习算法。这些算法通过学习一个价值函数来找到一个好的策略。尽管它们表现良好，但它们的应用受限于一些内在的限制。在本章中，我们将介绍一类新的算法——策略梯度方法，它们通过
前沿交叉：Fluent与深度学习驱动的流体力学计算体系 m0_75133639 流体力学深度学习人工智能航空航天 fluent 流体力学材料科学 CFD
基础模块流体力学方程求解1、不可压缩N-S方程数值解法（有限差分/有限元/伪谱法）·Fluent工业级应用：稳态/瞬态流、两相流仿真（圆柱绕流、入水问题）·Tecplot流场可视化与数据导出2、CFD数据的AI预处理·基于PCA/SVD的流场数据降维·特征值分解与时空特征提取深度学习核心3.物理机理嵌入的神经网络架构·物理信息神经网络（PINN）：将N-S方程嵌入损失函数（JAX框架实现）·神经常
如何使用目标检测深度学习框架yolov8训练钢管管道表面缺陷VOC+YOLO格式1159张3类别的检测数据集步骤和流程 FL1623863129 深度学习目标检测深度学习 YOLO
【数据集介绍】数据集中有很多增强图片，大约300张为原图剩余为增强图片数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：1159标注数量(xml文件个数)：1159标注数量(txt文件个数)：1159标注类别数：3所在仓库：firc-dataset标注类别名称(注意yo
2025年人工智能、虚拟现实与交互设计国际学术会议学术小八学术人工智能 vr 交互
重要信息官网：www.aivrid.com时间：2025年10月17-19日地点：中国-东莞部分介绍征稿主题包括但不限于：生物特征模式识别机器视觉专家系统深度学习智能搜索自动编程智能控制智能机器人系统组件虚拟现实平台用于VR/AR的AI平台数据和生成、操作、分析和验证浸入式环境和虚拟世界的生成优化和现实的渲染人工智能与用户体验个性化推荐系统情感计算与用户响应虚拟现实与沉浸式技术沉浸式环境设计交互设
机器学习深度学习驱动在光子学设计中的应用与未来【专题培训会议邀您共探科技前沿】软研科技信息与通信信号处理量子计算人工智能
一、背景介绍在智能科技飞速发展的今天，光子学设计与智能算法的结合正成为科研创新的热点。深度学习、机器学习等算法在光子器件的逆向设计、超构表面材料设计、光学神经网络构建等方面展现出巨大潜力。二、会议亮点由北京软研国际信息技术研究院主办的“智能算法驱动的光子学设计与应用”专题培训会议，将深入探讨以下核心内容：光子器件的逆向设计：利用深度学习优化多参数光子器件设计。超构表面与超材料设计：智能算法在新型光
jdk tomcat 环境变量配置 Array_06 java jdk tomcat
Win7 下如何配置java环境变量 1。准备jdk包，win7系统，tomcat安装包（均上网下载即可） 2。进行对jdk的安装，尽量为默认路径（但要记住啊！！以防以后配置用。。。） 3。分别配置高级环境变量。电脑-->右击属性-->高级环境变量-->环境变量。分别配置 : path &nbs
Spring调SDK包报java.lang.NoSuchFieldError错误 bijian1013 java spring
在工作中调另一个系统的SDK包，出现如下java.lang.NoSuchFieldError错误。 org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.l
LeetCode[位运算] - #136 数组中的单一数 Cwind java 题解位运算 LeetCode Algorithm
原题链接：#136 Single Number 要求：给定一个整型数组，其中除了一个元素之外，每个元素都出现两次。找出这个元素注意：算法的时间复杂度应为O(n)，最好不使用额外的内存空间难度：中等分析：题目限定了线性的时间复杂度，同时不使用额外的空间，即要求只遍历数组一遍得出结果。由于异或运算 n XOR n = 0, n XOR 0 = n，故将数组中的每个元素进
qq登陆界面开发 15700786134 qq
今天我们来开发一个qq登陆界面，首先写一个界面程序，一个界面首先是一个Frame对象，即是一个窗体。然后在这个窗体上放置其他组件。代码如下： public class First { public void initul(){ jf=ne
Linux的程序包管理器RPM 被触发 linux
在早期我们使用源代码的方式来安装软件时，都需要先把源程序代码编译成可执行的二进制安装程序，然后进行安装。这就意味着每次安装软件都需要经过预处理-->编译-->汇编-->链接-->生成安装文件--> 安装，这个复杂而艰辛的过程。为简化安装步骤，便于广大用户的安装部署程序，程序提供商就在特定的系统上面编译好相关程序的安装文件并进行打包，提供给大家下载，我们只需要根据自己的
socket通信遇到EOFException 肆无忌惮_ EOFException
java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:
基于spring的web项目定时操作知了ing java Web
废话不多说，直接上代码，很简单配置一下项目启动就行 1，web.xml <?xml version="1.0" encoding="UTF-8"?> <web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="h
树形结构的数据库表Schema设计矮蛋蛋 schema
原文地址： http://blog.csdn.net/MONKEY_D_MENG/article/details/6647488 程序设计过程中，我们常常用树形结构来表征某些数据的关联关系，如企业上下级部门、栏目结构、商品分类等等，通常而言，这些树状结构需要借助于数据库完成持久化。然而目前的各种基于关系的数据库，都是以二维表的形式记录存储数据信息，
maven将jar包和源码一起打包到本地仓库 alleni123 maven
http://stackoverflow.com/questions/4031987/how-to-upload-sources-to-local-maven-repository <project> ... <build> <plugins> <plugin> <groupI
java IO操作与 File 获取文件或文件夹的大小，可读，等属性！！！百合不是茶
类 File File是指文件和目录路径名的抽象表示形式。 1，何为文件：标准文件（txt doc mp3...）目录文件（文件夹）虚拟内存文件 2，File类中有可以创建文件的 createNewFile（）方法,在创建新文件的时候需要try{} catch(）{}因为可能会抛出异常；也有可以判断文件是否是一个标准文件的方法isFile();这些防抖都
Spring注入有继承关系的类（2） bijian1013 java spring
被注入类的父类有相应的属性，Spring可以直接注入相应的属性，如下所例：1.AClass类 package com.bijian.spring.test4; public class AClass { private String a; private String b; public String getA() { retu
30岁转型期你能否成为成功人士 bijian1013 成长励志
很多人由于年轻时走了弯路，到了30岁一事无成，这样的例子大有人在。但同样也有一些人，整个职业生涯都发展得很优秀，到了30岁已经成为职场的精英阶层。由于做猎头的原因，我们接触很多30岁左右的经理人，发现他们在职业发展道路上往往有很多致命的问题。在30岁之前，他们的职业生涯表现很优秀，但从30岁到40岁这一段，很多人
【Velocity四】Velocity与Java互操作 bit1129 velocity
Velocity出现的目的用于简化基于MVC的web应用开发，用于替代JSP标签技术，那么Velocity如何访问Java代码.本篇继续以Velocity三http://bit1129.iteye.com/blog/2106142中的例子为基础， POJO package com.tom.servlets; public
【Hive十一】Hive数据倾斜优化 bit1129 hive
什么是Hive数据倾斜问题操作：join,group by,count distinct 现象：任务进度长时间维持在99%（或100%），查看任务监控页面，发现只有少量（1个或几个）reduce子任务未完成；查看未完成的子任务，可以看到本地读写数据量积累非常大，通常超过10GB可以认定为发生数据倾斜。原因：key分布不均匀倾斜度衡量：平均记录数超过50w且
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua csrf
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-3.求子数组的最大和 bylijinnan java
package beautyOfCoding; public class MaxSubArraySum { /** * 3.求子数组的最大和题目描述：输入一个整形数组，数组里有正数也有负数。数组中连续的一个或多个整数组成一个子数组，每个子数组都有一个和。求所有子数组的和的最大值。要求时间复杂度为O(n)。例如输入的数组为1, -2, 3, 10, -4,
Netty源码学习-FileRegion bylijinnan java netty
今天看org.jboss.netty.example.http.file.HttpStaticFileServerHandler.java 可以直接往channel里面写入一个FileRegion对象，而不需要相应的encoder： //pipeline（没有诸如“FileRegionEncoder”的handler）： public ChannelPipeline ge
使用ZeroClipboard解决跨浏览器复制到剪贴板的问题 cngolon 跨浏览器复制到粘贴板 Zero Clipboard
Zero Clipboard的实现原理 Zero Clipboard 利用透明的Flash让其漂浮在复制按钮之上，这样其实点击的不是按钮而是 Flash ，这样将需要的内容传入Flash，再通过Flash的复制功能把传入的内容复制到剪贴板。 Zero Clipboard的安装方法首先需要下载 Zero Clipboard的压缩包，解压后把文件夹中两个文件：ZeroClipboard.js
单例模式 cuishikuan 单例模式
第一种（懒汉，线程不安全）： public class Singleton { 2 private static Singleton instance; 3 pri
spring+websocket的使用 dalan_123
一、spring配置文件 <?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.or
细节问题：ZEROFILL的用法范围。 dcj3sjt126com mysql
1、zerofill把月份中的一位数字比如1，2，3等加前导0 mysql> CREATE TABLE t1 (year YEAR(4), month INT(2) UNSIGNED ZEROFILL, -> day
Android开发10——Activity的跳转与传值 dcj3sjt126com Android开发
Activity跳转与传值，主要是通过Intent类，Intent的作用是激活组件和附带数据。一、Activity跳转方法一Intent intent = new Intent(A.this, B.class); startActivity(intent) 方法二Intent intent = new Intent();intent.setCla
jdbc 得到表结构、主键 eksliang jdbc 得到表结构、主键
转自博客：http://blog.csdn.net/ocean1010/article/details/7266042 假设有个con DatabaseMetaData dbmd = con.getMetaData(); rs = dbmd.getColumns(con.getCatalog(), schema, tableName, null); rs.getSt
Android 应用程序开关GPS gqdy365 android
要在应用程序中操作GPS开关需要权限： <uses-permission android:name="android.permission.WRITE_SECURE_SETTINGS" /> 但在配置文件中添加此权限之后会报错，无法再eclipse里面正常编译，怎么办？ 1、方法一：将项目放到Android源码中编译； 2、方法二：网上有人说cl
Windows上调试MapReduce zhiquanliu mapreduce
1.下载hadoop2x-eclipse-plugin https://github.com/winghc/hadoop2x-eclipse-plugin.git 把 hadoop2.6.0-eclipse-plugin.jar 放到eclipse plugin 目录中。 2.下载 hadoop2.6_x64_.zip http://dl.iteye.com/topics/download/d2b
如何看待一些知名博客推广软文的行为？ justjavac 博客
本文来自我在知乎上的一个回答：http://www.zhihu.com/question/23431810/answer/24588621 互联网上的两种典型心态：当初求种像条狗，如今撸完嫌人丑当初搜贴像条犬，如今读完嫌人软你为啥感觉不舒服呢？难道非得要作者把自己的劳动成果免费给你用，你才舒服？就如同 Google 关闭了 Gooled Reader，那是
sql优化总结 macroli sql
为了是自己对sql优化有更好的原则性，在这里做一下总结，个人原则如有不对请多多指教。谢谢！要知道一个简单的sql语句执行效率，就要有查看方式，一遍更好的进行优化。一、简单的统计语句执行时间 declare @d datetime ---定义一个datetime的变量set @d=getdate() ---获取查询语句开始前的时间select user_id
Linux Oracle中常遇到的一些问题及命令总结超声波 oracle linux
1.linux更改主机名 (1)#hostname oracledb　　　　临时修改主机名 (2) vi /etc/sysconfig/network 　　修改hostname (3) vi /etc/hosts　　　　　　　　修改IP对应的主机名 2.linux重启oracle实例及监听的各种方法（注意操作的顺序应该是先监听，后数据库实例） &nbs
hive函数大全及使用示例 superlxw1234 hadoop hive函数
具体说明及示例参见附件文档。文档目录：目录一、关系运算： 4 1. 等值比较: = 4 2. 不等值比较: <> 4 3. 小于比较: < 4 4. 小于等于比较: <= 4 5. 大于比较: > 5 6. 大于等于比较: >= 5 7. 空值判断: IS NULL 5
Spring 4.2新特性-使用@Order调整配置类加载顺序 wiselyman spring 4
4.1 @Order Spring 4.2 利用@Order控制配置类的加载顺序 4.2 演示两个演示bean package com.wisely.spring4_2.order; public class Demo1Service { } package com.wisely.spring4_2.order; public class