Cite Score
99
AI summary
This paper introduces Long Short-Term Memory (LSTM), a novel, efficient gradient-based method for recurrent networks. LSTM uses multiplicative gate units to learn and bridge time intervals in excess of 1000 discrete time steps. Experiments on artificial data show that LSTM leads to more successful runs and learns much faster.
Main Contributions
Abstract
Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insufficient decaying error back-flow. We briefly review Hochreiter's analysis of this problem, then address it by introducing a novel, efficient gradient-based method called Long Short-Term Memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete time steps by enforcing constant error flow through constant error carrousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with RTRL, BPTT, Recurrent Cascade-Correlation, Elman nets, and Neural Sequence Chunking, LSTM leads to many more successful runs and learns much faster. LSTM also solves complex artificial long time lag tasks that have never been solved by previous recurrent network algorithms.
Citation Graph
References [39]
Jeffrey L. Elman - 1990
23 papers in library cite
Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994
31 papers in library cite
Fernando J. Pineda - 1987
5 papers in library cite
Paul J. Werbos - 1988
11 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
5 papers in library cite
Ronald J. Williams, David Zipser - 1992
8 papers in library cite
Sepp Hochreiter - 1991
18 papers in library cite
A. J. Robinson, F. Fallside - 1987
10 papers in library cite
Jürgen Schmidhuber - 1992
8 papers in library cite
Ronald J. Williams - 1989
6 papers in library cite
L. B. Almeida - 1987
5 papers in library cite
B. A. Pearlmutter - 1995
5 papers in library cite
M. C. Mozer - 1992
5 papers in library cite
B. A. Pearlmutter - 1989
5 papers in library cite
Jürgen Schmidhuber - 1992
4 papers in library cite
Yoshua Bengio, Paolo Frasconi - 1994
4 papers in library cite
Fernando J. Pineda - 1988
4 papers in library cite
A. Cleeremans, D. S. Schreiber, J. L. Mcclelland - 1989
4 papers in library cite
A. W. Smith, David Zipser - 1989
4 papers in library cite
K. Doya, S. Yoshizawa - 1989
3 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1996
3 papers in library cite
C. B. Miller, C. L. Giles - 1993
3 papers in library cite
R. L. Watrous, G. M. Kuhn, J. E. Moody, S. J. Hanson, R. P. Lippman - 1992
3 papers in library cite
S. E. Fahlman - 1991
3 papers in library cite
B. D. Vries, J. C. Principe - 1991
2 papers in library cite
K. Lang, A. Waibel, Geoffrey E. Hinton - 1990
2 papers in library cite
K. Doya - 1992
2 papers in library cite
P. Baldi, F. Pineda - 1991
2 papers in library cite
M. B. Ring - 1993
2 papers in library cite
Jürgen Schmidhuber - 1992
2 papers in library cite
Jürgen Schmidhuber - 1993
2 papers in library cite
G. Sun, H. Chen, Y. Lee - 1993
2 papers in library cite
M. C. Mozer - 1989
1 paper in library cites
G. X. Silva, J. D. Amaral, T. Langlois, L. B. Almeida - 1996
1 paper in library cites
Jürgen Schmidhuber, Sepp Hochreiter - 1996
1 paper in library cites
T. A. Plate - 1993
1 paper in library cites
J. B. Pollack - 1991
1 paper in library cites
G. V. Puskorius, L. A. Feldkamp - 1994
1 paper in library cites
Jürgen Schmidhuber - 1989
1 paper in library cites
Cited by
94
papers in your library
Cites
6
papers in your library
Read
on April 24, 2025
Your review
Tags
Paper Aliases
No aliases