Reading lists for new MILA students


 

Research in General

 

How to write a great research paper

Basic concepts on information theoryin visual terms

 

Blog post from Christopher Olah onvisualizing the representations of neural networks

http://colah.github.io/posts/2015-09-Visual-Information/

 

 

Basics of machine learning

 

DL book chapter on linear algebra:http://www.deeplearningbook.org/contents/linear_algebra.htmlhttp://www.iro.umontreal.ca/~bengioy/DLbook/linear_algebra.html

●  DL book chapter on probability:http://www.iro.umontreal.ca/~bengioy/dlbook/prob.html

●  DL book chapter on numerical computation:http://www.iro.umontreal.ca/~bengioy/dlbook/numerical.html

●  DL book chapter on machine learning: http://www.iro.umontreal.ca/~bengioy/DLbook/ml.html

 

Basics of deep learning

●  Intro to deep learning: http://www.iro.umontreal.ca/~bengioy/DLbook/intro.html

●  Feedforward multi-layer nets: http://www.iro.umontreal.ca/~bengioy/DLbook/mlp.html

●   

●  Learning deep architectures for AI

●  Practicalrecommendations for gradient-based training of deep architectures

●  Quick’n’dirty introduction to deep learning: Advances inDeep Learning

●  A fast learning algorithm for deep belief nets

●  Greedy Layer-Wise Training of Deep Networks

●  Stacked denoising autoencoders: Learning usefulrepresentations in a deep network with a local denoising criterion

●  Contractive auto-encoders: Explicit invariance duringfeature extraction

●  Why does unsupervised pre-training help deep learning?

●  An Analysis of Single Layer Networks in UnsupervisedFeature Learning

●  The importance of Encoding Versus Training With SparseCoding and Vector Quantization

●  RepresentationLearning: A Review and New Perspectives

●  DeepLearning of Representations: Looking Forward

●  Measuring Invariances in Deep Networks

●  Neural networks course at USherbrooke [youtube]

Feedforward nets

●  http://www.iro.umontreal.ca/~bengioy/DLbook/mlp.html

●  “Improving Neural Nets with Dropout” by NitishSrivastava

●  BatchNormalization

●  “Fast Drop Out

●  “Deep Sparse Rectifier Neural Networks”

●  “What is the best multi-stage architecture for objectrecognition?”

●  “Maxout Networks

MCMC

●  Iain Murray’s MLSS slides

●  Radford Neal’s Review Paper (old but stillvery comprehensive)

●  BetterMixing via Deep Representations

●  Bayesian Learning via Stochastic Gradient LangevinDynamics

Restricted Boltzmann Machines

●  Unsupervised learning of distributions of binary vectorsusing 2-layer networks

●  A practical guide to training restricted Boltzmannmachines

●  Training restricted Boltzmann machines usingapproximations to the likelihood gradient

●  Tempered Markov Chain Monte Carlo for training ofRestricted Boltzmann Machine

●  How to Center Binary Restricted Boltzmann Machines

●  Enhanced Gradient for Training Restricted BoltzmannMachines

●  Using fast weights to improve persistent contrastivedivergence

●  Training Products of Experts by Minimizing ContrastiveDivergence

   

Boltzmann Machines

●  Deep Boltzmann Machines (Salakhutdinov &Hinton)

●  Multimodal Learning with Deep Boltzmann Machines

●  Multi-Prediction Deep Boltzmann Machines

●  A Two-stage Pretraining Algorithm for Deep BoltzmannMachines

 

Regularized Auto-Encoders

●  The Manifold Tangent Classifier

●  DL book chapter on unsupervised learning: http://www.iro.umontreal.ca/~bengioy/dlbook/unsupervised.html

●  DL book chapter on manifolds: http://www.iro.umontreal.ca/~bengioy/dlbook/manifolds.html

●  RepresentationLearning: A Review and New Perspectives, in particular section 7.

 

Regularization

 

Stochastic Nets & GSNs

●  Estimatingor Propagating Gradients Through Stochastic Neurons for Conditional Computation

● Learning StochasticFeedforward Neural Networks

●  GeneralizedDenoising Auto-Encoders as Generative Models

●  DeepGenerative Stochastic Networks Trainable by Backprop

 

Others

● Slow, DecorrelatedFeatures for Pretraining Complex Cell-like Networks

●  WhatRegularized Auto-Encoders Learn from the Data Generating Distribution

●  GeneralizedDenoising Auto-Encoders as Generative Models

●  Why the logistic function?

 

RecurrentNets

●  DL book chapter on recurrent nets

●  Learning long-term dependencies with gradient descent isdifficult

●  Advancesin Optimizing Recurrent Networks

●  Learning recurrent neural networks with Hessian-freeoptimization

●  On the importance of momentum and initialization in deeplearning,

●  Long short-term memory (Hochreiter &Schmidhuber)

●  GeneratingSequences With Recurrent Neural Networks

●  Long Short-Term Memory in Echo State Networks: Details ofa Simulation Study

●  The "echo state" approach to analysing andtraining recurrent neural networks

●  Backpropagation-Decorrelation: online recurrent learningwith O(N) complexity

●  New results on recurrent network training:Unifying thealgorithms and accelerating convergence

●  Audio Chord Recognition with Recurrent Neural Networks

●  ModelingTemporal Dependencies in High-Dimensional Sequences: Application to PolyphonicMusic Generation and Transcription

 

Memorynetworks

●  Weston,Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXivpreprint arXiv:1410.3916 (2014).

●  Graves, Alex, Greg Wayne, and IvoDanihelka. "Neural Turing Machines." arXiv preprint arXiv:1410.5401(2014).

●  Vinyals, Oriol, Meire Fortunato, andNavdeep Jaitly. "Pointer networks." arXiv preprint arXiv:1506.03134(2015).

●  Kurach,Karol, Andrychowicz, Marcin andSutskever,Ilya. "Neural Random-Access Machines." arXiv preprintarXiv:1511.06392 (2015).

●  Cho, Kyunghyun, Aaron Courville, andYoshua Bengio. "Describing Multimedia Content using Attention-basedEncoder--Decoder Networks." arXiv preprint arXiv:1507.01053 (2015).

●  Salakhutdinov,Ruslan, and Geoffrey Hinton. "Semantic hashing." InternationalJournal of Approximate Reasoning 50.7 (2009): 969-978.

●  Hinton,Geoffrey E. "Distributed representations." (1984)

ConvolutionalNets

●  DL book chapter on convolutional nts: http://www.iro.umontreal.ca/~bengioy/DLbook/convnets.html

●  Generalization and Network Design Strategies(LeCun)

●  ImageNet Classification with Deep Convolutional NeuralNetworks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, NIPS2012.

●  On Random Weights and Unsupervised Feature Learning

Optimizationissues with DL

●  Curriculum Learning 

●  Evolving Culture vs Local Minima

●  Knowledge Matters: Importance of Prior Information forOptimization

●  Efficient Backprop

●  Practicalrecommendations for gradient-based training of deep architectures

●  BatchNormalization

●  Natural Gradient Works Efficiently (Amari 1998)

●  Hessian Free

●  Natural Gradient (TONGA)

●  RevisitingNatural Gradient

 

NLP + DL

●  The first journal paper on neural language models (there was aNIPS’2000 paper before):A Neural Probabilistic Language Model

●  Natural Language Processing (Almost) from Scratch

●  DeViSE: A Deep Visual-Semantic Embedding Model

●  Distributed Representations of Words and Phrases andtheir Compositionality

●  Dynamic Pooling and Unfolding Recursive Autoencoders forParaphrase Detection

 

CV+RBM

●  Fields of Experts

●  What makes a good model of natural images?

●  Phone Recognition with the mean-covariance restrictedBoltzmann machine

●  Unsupervised Models of Images by Spike-and-Slab RBMs

 

CV + DL

●  Imagenet classification with deep convolutional neuralnetworks

●  Learning to relate images

 

ScalingUp

●  Large Scale Distributed Deep Networks

●  Random search for hyper-parameter optimization

●  Practical Bayesian Optimization of Machine LearningAlgorithms

 

DL + Reinforcement learning

●  PlayingAtari with Deep Reinforcement Learning

●  True Online TD Lambda

 

 

GraphicalModels Background

●  An Introduction to Graphical Models (MikeJordan, brief course notes)

●  A View of the EM Algorithm that Justifies Incremental,Sparse and Other Variants (Neal & Hinton, important paper to themodern understanding of Expectation-Maximization)

●  A Unifying Review of Linear Gaussian Models(Roweis & Ghahramani, ties together PCA, factor analysis, hidden Markovmodels, Gaussian mixtures, k-means, linear dynamical systems)

●  An Introduction to Variational Methods for GraphicalModels (Jordan et al, mean-field, etc.)

 

Writing

●  Writing a great research paper (videoof the presentation)

 

Software documentation

●  Python, Theano, Pylearn2, Linux (bash) (at least the 5 first sections),git (5first sections),github/contributing to it (Theano doc), vim tutorial or emacs tutorial

 

Software lists of built-incommands/functions

●  Bashcommands

●  List of Built-in Python Functions

●  vim commands

 

OtherSoftware stuff to know about:

●  screen/tmux

●  ssh

●  ipython & ipython notebook (now Jupyter)

●  matplotlib

●  Caffe - caffe.berkeleyvision.org

●  DIGITS - https://developer.nvidia.com/digits

 

 

 

你可能感兴趣的:(Reading lists for new MILA students)