Papperoni

2015

Visualizing and Understanding Recurrent Networks

Li Fei Fei

Open PDF Google Scholar

citations

Cite Score

47

AI summary

This paper introduces an analysis of LSTMs using character-level language models as an interpretable testbed. The analysis reveals interpretable cells tracking long-range dependencies, quantifies LSTM predictions with comparisons to n-gram models, and provides an error analysis to identify areas for further study.

Main Contributions

Reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.
Quantifies LSTM predictions with comprehensive comparison to n-gram models, showing LSTMs perform significantly better on characters that require long-range reasoning.
Conducts an error analysis using a sequence of oracles to quantify the extent of remaining errors and suggest specific areas for further study.
Demonstrates that LSTMs can effectively utilize information beyond 20 characters through error analysis and comparisons with n-gram models.
Shows that the LSTM "grows" its competence over increasingly longer dependencies during training.

Abstract

Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. However, while LSTMs provide exceptional results in practice, the source of their performance and their limitations remain rather poorly understood. Using character-level language models as an interpretable testbed, we aim to bridge this gap by providing an analysis of their representations, predictions and error types. In particular, our experiments reveal the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Moreover, our comparative analysis with finite horizon n-gram models traces the source of the LSTM improvements to long-range structural dependencies. Finally, we provide analysis of the remaining errors and suggests areas for further study.

Citation Graph

Loading graph...

References [35]

Sort:

Filter:

[1]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[2]Visualizing Data Using t-SNE

Geoffrey Hinton - 2008

7 papers in library cite

Amazing. Simple. Impactful. Easy to understand. Masterpiece.

[3]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[4]Learning Internal Representations by Error Propagation

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

I expected very little of this, but was so good in explaining concepts! Very good read. It gets a bit boring when it starts explaining things by the end of the chapter, but good nonetheless.

[5]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[6]Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014

11 papers in library cite

It's a good paper but results are veeeery underwhelming.

[7]Learning Long-Term Dependencies With Gradient Descent Is Difficult

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

The first ones to notice that there is a problem with gradient descent, but way too mathy for me.

[8]Speech Recognition With Deep Recurrent Neural Networks

Geoffrey Hinton - 2013

13 papers in library cite

Small, simple, nice results. First usage of deep BiLSTM?

[9]Building a Large Annotated Corpus of English: The Penn Treebank

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

Well, not really interesting but very cool to see how the peen tree bank was made.

[10]On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Kyunghyun Cho, B. V. Merrienboer, D. Bahdanau, Yoshua Bengio - 2014

9 papers in library cite

This just builds upon other work, comparing and analyzing previous architectures. This faces a specific problem of the time: long sentences. Not too relevant today.

[11]Show and Tell: A Neural Image Caption Generator

Dumitru Erhan - 2015

11 papers in library cite

It's nice and they beat a ton of SotA. However, I read the one that uses attention first so this is a bit less surprising.

[12]LSTM: A Search Space Odyssey

K. Greff, R. K. Srivastava, J. Koutn'ik, B. R. Steunebrink, Jürgen Schmidhuber - 2015

4 papers in library cite

Very good review on different architectural choices of LSTM -  actually brings some nice insights (as opposed to the other paper that compares GRU and LSTMs)

[13]On the Difficulty of Training Recurrent Neural Networks

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

It starts very mathy but in the end there are some very nice contributions! You don't actually need to understand the math to know what's going on in the end.

[14]Recurrent Neural Network Based Language Model

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

The comeback of RNNs for language modeling. Not too exciting but impactful and a short read.

[15]Generating Sequences With Recurrent Neural Networks

Alex Graves - 2013

27 papers in library cite

Very cool and is the first to actually proposed the Attention mechanism! It gets a bit mathy but nothing too crazy. Also has the first examples of good machine generated writing I've seen in these papers, so very nice results.

[16]Neural Turing Machines

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

This paper is amazing. If someone told me that NNs could use and address memory by position I wouldn't believe it worked. Very nice, but it's a shame that it's just a toy example.

[17]Memory Networks

Jason Weston, S. Chopra, Antoine Bordes - 2015

18 papers in library cite

The first half of the paper (when they discuss the concept in a very abstract way) is amazing. However, the actual methodology was very convoluted - I did not like it. I thought that Neural Turing Machines were inspired in this, but actually they are contemporary... So anyway, the concept is nice, execution is not.

[18]Generating Text With Recurrent Neural Networks

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

Pleasant paper but results are underwhelming. They use RNNs for character-level modeling, which is different. They also use the hessian-free method proposed by Martens, but don't go too deep into how it works, which is nice because otherwise it would be very mathy. Other papers cite this more as an example of usage rather than an actual milestone.

[19]Semi-Supervised Sequence Learning

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Very good paper that was probably the first to introduce pre-training in NLP!

[20]Generalization of Backpropagation With Application to a Recurrent Gas Market Model

Paul J. Werbos - 1988

11 papers in library cite

The first two sections are amazing. Very nice concepts, well explained. From section 3 on it gets more abstract and too formalized, and it's mostly for Hopfield recurrent nets (so not very relevant now). Section 4 is just the application to the gas market, which I don't care for at all.

[21]How to Construct Deep Recurrent Neural Networks

Razvan Pascanu, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2013

7 papers in library cite

Very interesting despite not being too relevant. Good read and a new way of thinking about RNNs.

[22]Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets

Armand Joulin, Tomas Mikolov - 2015

9 papers in library cite

Very underwhelming TBH. I expected more after reading the Neural Turing Machine paper. This reads like "yeah, we lost the race, here's what we were doing before they did something better"

[23]Deep Learning in Neural Networks: An Overview

Jürgen Schmidhuber - 2015

2 papers in library cite

34 pages, but seems like a good overview

[24]Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, Trevor Darrell - 2014

4 papers in library cite

Early multimodality

[25]Deep Visual-Semantic Alignments for Generating Image Descriptions

A. Karpathy, Li Fei Fei - 2014

6 papers in library cite

Early multimodality

[26]Statistical Language Models Based on Neural Networks

Tomas Mikolov - 2012

17 papers in library cite

Mikolov's Thesis

[27]An Empirical Study of Smoothing Techniques for Language Modeling

S. F. Chen, J. Goodman - 1998

13 papers in library cite

[28]An Empirical Exploration of Recurrent Network Architectures

R. Jozefowicz, Wojciech Zaremba, Ilya Sutskever - 2015

4 papers in library cite

[29]Diagnosing Error in Object Detectors

D. Hoiem, Y. Chodpathumwan, Q. Dai - 2012

4 papers in library cite

[30]The Human Knowledge Compression Contest

M. Hutter - 2012

4 papers in library cite

[31]A Dynamic Language Model for Speech Recognition

Frederick Jelinek, B. Merialdo, S. Roukos, M. Strauss - 1991

3 papers in library cite

[32]Rmsprop and Equilibrated Adaptive Learning Rates for Non-Convex Optimization

Yann N. Dauphin, H. D. Vries, J. Chung, Yoshua Bengio - 2015

2 papers in library cite

[33]Scalable Modified Kneser-Ney Language Model Estimation

K. Heafield, I. Pouzyrevsky, J. H. Clark, P. Koehn - 2013

2 papers in library cite

[34]Training and Analysing Deep Recurrent Neural Networks

M. Hermans, B. Schrauwen - 2013

2 papers in library cite

[35]Spoken Language Processing: A Guide to Theory, algorithm, and System Development

X. Huang, Alex Acero, H. W. Hon - 2001

1 paper in library cites

Cited by

3

papers in your library

Cites

25

papers in your library

Read

on October 30, 2025

Very nice! The results were a bit underwhelming though. Nice visualizations but few explanations. Good nonetheless.

Tags

Paper Aliases

No aliases