本文为知乎专栏作者Alex-zhai原创,已授权CSDN转载。
责编:王艺
Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.
Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.
Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.
Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.
Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.
Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.
Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.
Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver
Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.
State of the Art Control of Atari Games using shallow reinforcement learning
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening(11.13更新)
Deep Reinforcement Learning with Averaged Target DQN(11.14更新)
Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.
Deep Attention Recurrent Q-Network
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.
Progressive Neural Networks
Language Understanding for Text-based Games Using Deep Reinforcement Learning
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Recurrent Reinforcement Learning: A Hybrid Approach
End-to-End Training of Deep Visuomotor Policies
Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
Trust Region Policy Optimization
Deterministic Policy Gradient Algorithms
Continuous control with deep reinforcement learning
High-Dimensional Continuous Control Using Using Generalized Advantage Estimation
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
Deep Reinforcement Learning in Parameterized Action Space
Memory-based control with recurrent neural networks
Terrain-adaptive locomotion skills using deep reinforcement learning
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13更新)
End-to-End Training of Deep Visuomotor Policies
Interactive Control of Diverse Complex Characters with Neural Networks
Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC(11.13更新)
PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING(11.13更新)
Gradient Estimation Using Stochastic Computation Graphs
Continuous Deep Q-Learning with Model-based Acceleration
Benchmarking Deep Reinforcement Learning for Continuous Control
Learning Continuous Control Policies by Stochastic Value Gradients
Deep Successor Reinforcement Learning
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks
Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel (11.14更新)
ADAAPT: A Deep Arc hitecture for Adaptive Policy Transfer from Multiple Sources
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
Policy Distillation
Progressive Neural Networks
Universal Value Function Approximators
Multi-task learning with deep model based reinforcement learning(11.14更新)
Modular Multitask Reinforcement Learning with Policy Sketches (11.14更新)
Control of Memory, Active Perception, and Action in Minecraft
Model-Free Episodic Control
Action-Conditional Video Prediction using Deep Networks in Atari Games
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks
Deep Exploration via Bootstrapped DQN
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
Unifying Count-Based Exploration and Intrinsic Motivation
#Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning(11.14更新)
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning(11.14更新)
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Multiagent Cooperation and Competition with Deep Reinforcement Learning
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Maximum Entropy Deep Inverse Reinforcement Learning
Generalizing Skills with Semi-Supervised Reinforcement Learning(11.14更新)
Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
Better Computer Go Player with Neural Network and Long-term Prediction
Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.
Asynchronous Methods for Deep Reinforcement Learning
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU(11.14更新)
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.
Strategic Attentive Writer for Learning Macro-Actions
Unifying Count-Based Exploration and Intrinsic Motivation
Policy Distillation
Universal Value Function Approximators
Learning values across many orders of magnitude
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Fictitious Self-Play in Extensive-Form Games
Smooth UCT search in computer poker
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning
Playing FPS Games with Deep Reinforcement Learning
LEARNING TO ACT BY PREDICTING THE FUTURE(11.13更新)
Deep Reinforcement Learning From Raw Pixels in Doom(11.14更新)
Learning Visual Predictive Models of Physics for Playing Billiards
J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv
Learning Continuous Control Policies by Stochastic Value Gradients
Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models
Action-Conditional Video Prediction using Deep Networks in Atari Games
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
Trust Region Policy Optimization
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control
Path Integral Guided Policy Search
Memory-based control with recurrent neural networks
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
Learning Deep Neural Network Policies with Continuous Memory States
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
End-to-End Training of Deep Visuomotor Policies
DeepMPC: Learning Deep Latent Features for Model Predictive Control
Deep Visual Foresight for Planning Robot Motion
Deep Reinforcement Learning for Robotic Manipulation
Continuous Deep Q-Learning with Model-based Acceleration
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
Asynchronous Methods for Deep Reinforcement Learning
Learning Continuous Control Policies by Stochastic Value Gradients
Deep Reinforcement Learning for Dialogue Generation
SimpleDS: A Simple Deep Reinforcement Learning Dialogue System
Strategic Dialogue Management via Deep Reinforcement Learning
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
Designing Neural Network Architectures using Reinforcement Learning(11.14更新)
Tuning Recurrent Neural Networks with Reinforcement Learning(11.14更新)
Neural Architecture Search with Reinforcement Learning(11.14更新)
130+位讲师,16大分论坛,中国科学院院士陈润生、滴滴出行高级副总裁章文嵩、联想集团高级副总裁兼CTO芮勇、上交所前总工程师白硕等专家将亲临2016中国大数据技术大会,票价折扣即将结束,预购从速。
想要更多干货?请关注CSDN人工智能公众号AI_Thinker。