Finding Structure in Time Review on RNNs

A Rigorous & Readable Review on RNNs

02 JUNE 2015
on Machine Learning

This post introduces a new Critical Review on Recurrent Neural Networks for Sequence Learning.
Twelve nights back, while up late preparing pretty pictures for a review onRecurrent Neural Networks for Sequence Learning, I figured I should share my Google art with the world. After all, RNNs are poorly understood by most outside the subfield, even within the machine learning community.


Since hitting post around 2AM, we've had great feedback and the article quickly jumped to one of this blog's most popular pieces. However, as the article hit Reddit the response was far from universal praise. Some readers were upset that the portion devoted to the LSTM itself was too short. Others noted in their comments that it had the misfortune of appearing on the same day as Andrej Karpathy's terrific, more thorough tutorial.
Some courteous Redditors even graciously noted that there was a typo in the title - the word Demistifying should have been Demystifying - that somehow made it through my bleary eyed typing. It seems this Redditor has never lived in the Bay Area or San Diego where mist and not myst is the true obstacle to perceiving the world clearly.

Jokes aside, since the community appears to demand a more thorough review on Recurrent Nets, I've posted my full 33-page review on my personal site and on the arXiv as well.
The review covers high level arguments, exact update equations, backpropagation, LSTMs, BRNNs and also the history dating from Hopfield nets, Elman nets, and Jordan nets. It explores evaluation methodology on sequence learning tasks, and the slew of recent amazing papers on novel applications from leading researchers in the field including the work of Jurgen Schmidhuber, Yoshua Bengio, Geoff Hinton, Ilya Sutskever, Alex Graves, Andrej Karpathy, Trevor Darrel, Quoc Le, Oriol Vinyals, Wojciech Zaremba, and many more. It also offers a high level introduction to the growing body of work characterizing the optimization problems.
Why I've written this review
I started off trying to understand the deep literature on LSTMs. This work can be difficult to access at first. Different papers have slightly different calculations, and notation varies tremendously across papers. This variability is compounded by the fact that so many papers seem to require familiarity with the rest of the literature. The best reviews I found were Alex Graves' excellent book on sequence labeling and Felix Gers thorough thesis. However, the first is very specific in scope and the latter is over a decade old.
My goal in composing this document is to provide a critical but gentle overview of the following important elements: motivation, historical perspective, modern models, optimization problem, training algorithms, and important applications.
For the moment, this is an evolving document and the highest priority is clarity. To that end any feedback about omitted details, glaring typos, or papers that anyone might find valuable is welcomed.

Looking Back at "Finding Structure in Time"

22 MAY 2015
on Machine Learning

Keeping up with the break-neck pace of research in computer science can be daunting. Even in my comfortable position as a graduate researcher, with no students to advise and no current teaching responsibilities, there are more interesting papers published each month than I could reasonably read. For an engineer with full-time responsibilities in industry it might seem impossible.
But occasionally, we ought to pull back from the bleeding edge. As I learned recently when studying operating systems for the first time, suprisingly many of our shiny new ideas were developed 20-30 years ago but weren't yet practicable.Virtualization and resource sharing over large networks were obvious next steps to the creators of Sprite and Plan 9 from Bell Labs. As I learned recently, when researching a review on recurrent neural networks, deep learning has its share of prophetic classics as well.
In this post I'll follow up on my last, Demistifying LSTM Neural Networks, by reflecting on a classic paper, Finding Structure in Time by Jeffrey Elman. Elman is a Distinguished Professor of Cognitive Science at the University of California, San Diego.


The paper is exceptionally clearly written and portends several lines of current research, such as Sutskever et al.'s recurrent nets which generate strings of text one letter at a time. Before launching into experiments, Elman first lays out clear qualitiative arguments explaining why time ought to be explicitly modeled, rather than considering a sliding window of fixed sized context.
He also motivates our interest in time both from a practical perspective and also from a cognitive modeling perspective. Interestingly, he never really suggests that these should be conflicting aims. The models described aim to replicate something resembling human behavior, therefore it's given that they should be connectionist, i.e., generally inspired by biological brains. However, the emphasis on biological plausibility doesn't go so far as to preclude backpropagation as a learning algorithm.
Elman's models contain hidden nodes with self connected recurrent edges. In a way this foreshadows Hochreiter and Schmidhuber's subsequent work on LSTMs. He experiments with a temporal version of the XOR problem - the standard XOR problem famously justifies neural networks and kernel methods as it is unsolvable with a simple linear model). He then considers text, touching upon distributed representations of words, and introducing a model for predicting strings of text one character at a time. He even goes so far as to suggest that some latent notion of what words are could be learned by the network. This is born out in Sutskever's more recent paper.
While great new papers abound, finding the most meaningful ones requires a fair amount of digging. In contrast, past gems are relatively easy to identify ( Finding Structure in Time has nearly 7000 citations) and often contain insights that are relevant today. Of note, one thing that stands out in many of these older papers is the quality of the prose, expressing critical, qualitative arguments. While the field of machine learning has become increasingly theoretical and mathematically riguorous, and this is largely a good thing, it's worth recalling that especially in deep learning, many of the biggest ideas are heuristics backed by a very different sort of theory and many of them have been floating around the literature waiting for new life for decades.

你可能感兴趣的:(Finding Structure in Time Review on RNNs)