Blog post from Christopher Olah onvisualizing the representations of neural networks
http://colah.github.io/posts/2015-09-Visual-Information/
Basics of machine learning
●DL book chapter on linear algebra:http://www.deeplearningbook.org/contents/linear_algebra.htmlhttp://www.iro.umontreal.ca/~bengioy/DLbook/linear_algebra.html
● DL book chapter on probability:http://www.iro.umontreal.ca/~bengioy/dlbook/prob.html
● DL book chapter on numerical computation:http://www.iro.umontreal.ca/~bengioy/dlbook/numerical.html
● DL book chapter on machine learning: http://www.iro.umontreal.ca/~bengioy/DLbook/ml.html
● Intro to deep learning: http://www.iro.umontreal.ca/~bengioy/DLbook/intro.html
● Feedforward multi-layer nets: http://www.iro.umontreal.ca/~bengioy/DLbook/mlp.html
●
● Learning deep architectures for AI
● Practicalrecommendations for gradient-based training of deep architectures
● Quick’n’dirty introduction to deep learning: Advances inDeep Learning
● A fast learning algorithm for deep belief nets
● Greedy Layer-Wise Training of Deep Networks
● Stacked denoising autoencoders: Learning usefulrepresentations in a deep network with a local denoising criterion
● Contractive auto-encoders: Explicit invariance duringfeature extraction
● Why does unsupervised pre-training help deep learning?
● An Analysis of Single Layer Networks in UnsupervisedFeature Learning
● The importance of Encoding Versus Training With SparseCoding and Vector Quantization
● RepresentationLearning: A Review and New Perspectives
● DeepLearning of Representations: Looking Forward
● Measuring Invariances in Deep Networks
● Neural networks course at USherbrooke [youtube]
● http://www.iro.umontreal.ca/~bengioy/DLbook/mlp.html
● “Improving Neural Nets with Dropout” by NitishSrivastava
● BatchNormalization
● “Fast Drop Out”
● “Deep Sparse Rectifier Neural Networks”
● “What is the best multi-stage architecture for objectrecognition?”
● “Maxout Networks”
● Iain Murray’s MLSS slides
● Radford Neal’s Review Paper (old but stillvery comprehensive)
● BetterMixing via Deep Representations
● Bayesian Learning via Stochastic Gradient LangevinDynamics
● Unsupervised learning of distributions of binary vectorsusing 2-layer networks
● A practical guide to training restricted Boltzmannmachines
● Training restricted Boltzmann machines usingapproximations to the likelihood gradient
● Tempered Markov Chain Monte Carlo for training ofRestricted Boltzmann Machine
● How to Center Binary Restricted Boltzmann Machines
● Enhanced Gradient for Training Restricted BoltzmannMachines
● Using fast weights to improve persistent contrastivedivergence
● Training Products of Experts by Minimizing ContrastiveDivergence
● Deep Boltzmann Machines (Salakhutdinov &Hinton)
● Multimodal Learning with Deep Boltzmann Machines
● Multi-Prediction Deep Boltzmann Machines
● A Two-stage Pretraining Algorithm for Deep BoltzmannMachines
● The Manifold Tangent Classifier
● DL book chapter on unsupervised learning: http://www.iro.umontreal.ca/~bengioy/dlbook/unsupervised.html
● DL book chapter on manifolds: http://www.iro.umontreal.ca/~bengioy/dlbook/manifolds.html
● RepresentationLearning: A Review and New Perspectives, in particular section 7.
● Estimatingor Propagating Gradients Through Stochastic Neurons for Conditional Computation
● Learning StochasticFeedforward Neural Networks
● GeneralizedDenoising Auto-Encoders as Generative Models
● DeepGenerative Stochastic Networks Trainable by Backprop
● Slow, DecorrelatedFeatures for Pretraining Complex Cell-like Networks
● WhatRegularized Auto-Encoders Learn from the Data Generating Distribution
● GeneralizedDenoising Auto-Encoders as Generative Models
● Why the logistic function?
● DL book chapter on recurrent nets
● Learning long-term dependencies with gradient descent isdifficult
● Advancesin Optimizing Recurrent Networks
● Learning recurrent neural networks with Hessian-freeoptimization
● On the importance of momentum and initialization in deeplearning,
● Long short-term memory (Hochreiter &Schmidhuber)
● GeneratingSequences With Recurrent Neural Networks
● Long Short-Term Memory in Echo State Networks: Details ofa Simulation Study
● The "echo state" approach to analysing andtraining recurrent neural networks
● Backpropagation-Decorrelation: online recurrent learningwith O(N) complexity
● New results on recurrent network training:Unifying thealgorithms and accelerating convergence
● Audio Chord Recognition with Recurrent Neural Networks
● ModelingTemporal Dependencies in High-Dimensional Sequences: Application to PolyphonicMusic Generation and Transcription
● Weston,Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXivpreprint arXiv:1410.3916 (2014).
● Graves, Alex, Greg Wayne, and IvoDanihelka. "Neural Turing Machines." arXiv preprint arXiv:1410.5401(2014).
● Vinyals, Oriol, Meire Fortunato, andNavdeep Jaitly. "Pointer networks." arXiv preprint arXiv:1506.03134(2015).
● Kurach,Karol, Andrychowicz, Marcin andSutskever,Ilya. "Neural Random-Access Machines." arXiv preprintarXiv:1511.06392 (2015).
● Cho, Kyunghyun, Aaron Courville, andYoshua Bengio. "Describing Multimedia Content using Attention-basedEncoder--Decoder Networks." arXiv preprint arXiv:1507.01053 (2015).
● Salakhutdinov,Ruslan, and Geoffrey Hinton. "Semantic hashing." InternationalJournal of Approximate Reasoning 50.7 (2009): 969-978.
● Hinton,Geoffrey E. "Distributed representations." (1984)
● DL book chapter on convolutional nts: http://www.iro.umontreal.ca/~bengioy/DLbook/convnets.html
● Generalization and Network Design Strategies(LeCun)
● ImageNet Classification with Deep Convolutional NeuralNetworks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, NIPS2012.
● On Random Weights and Unsupervised Feature Learning
● Curriculum Learning
● Evolving Culture vs Local Minima
● Knowledge Matters: Importance of Prior Information forOptimization
● Efficient Backprop
● Practicalrecommendations for gradient-based training of deep architectures
● BatchNormalization
● Natural Gradient Works Efficiently (Amari 1998)
● Hessian Free
● Natural Gradient (TONGA)
● RevisitingNatural Gradient
● The first journal paper on neural language models (there was aNIPS’2000 paper before):A Neural Probabilistic Language Model
● Natural Language Processing (Almost) from Scratch
● DeViSE: A Deep Visual-Semantic Embedding Model
● Distributed Representations of Words and Phrases andtheir Compositionality
● Dynamic Pooling and Unfolding Recursive Autoencoders forParaphrase Detection
● Fields of Experts
● What makes a good model of natural images?
● Phone Recognition with the mean-covariance restrictedBoltzmann machine
● Unsupervised Models of Images by Spike-and-Slab RBMs
● Imagenet classification with deep convolutional neuralnetworks
● Learning to relate images
● Large Scale Distributed Deep Networks
● Random search for hyper-parameter optimization
● Practical Bayesian Optimization of Machine LearningAlgorithms
● PlayingAtari with Deep Reinforcement Learning
● True Online TD Lambda
GraphicalModels Background
● An Introduction to Graphical Models (MikeJordan, brief course notes)
● A View of the EM Algorithm that Justifies Incremental,Sparse and Other Variants (Neal & Hinton, important paper to themodern understanding of Expectation-Maximization)
● A Unifying Review of Linear Gaussian Models(Roweis & Ghahramani, ties together PCA, factor analysis, hidden Markovmodels, Gaussian mixtures, k-means, linear dynamical systems)
● An Introduction to Variational Methods for GraphicalModels (Jordan et al, mean-field, etc.)
● Writing a great research paper (videoof the presentation)
● Python, Theano, Pylearn2, Linux (bash) (at least the 5 first sections),git (5first sections),github/contributing to it (Theano doc), vim tutorial or emacs tutorial
● Bashcommands
● List of Built-in Python Functions
● vim commands
OtherSoftware stuff to know about:
● screen/tmux
● ssh
● ipython & ipython notebook (now Jupyter)
● matplotlib
● Caffe - caffe.berkeleyvision.org
● DIGITS - https://developer.nvidia.com/digits