1997

Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber

citations

Cite Score

99

AI summary

This paper introduces Long Short-Term Memory (LSTM), a novel, efficient gradient-based method for recurrent networks. LSTM uses multiplicative gate units to learn and bridge time intervals in excess of 1000 discrete time steps. Experiments on artificial data show that LSTM leads to more successful runs and learns much faster.

Main Contributions

  • Introduces Long Short-Term Memory (LSTM), a novel recurrent network architecture and learning algorithm.
  • LSTM overcomes error back-flow problems by enforcing constant error flow through constant error carrousels within special units.
  • Multiplicative gate units learn to open and close access to the constant error flow.
  • LSTM is local in space and time with O(1) computational complexity per time step and weight.
  • LSTM solves complex artificial long time lag tasks that have never been solved by previous recurrent network algorithms.

Abstract

Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insufficient decaying error back-flow. We briefly review Hochreiter's analysis of this problem, then address it by introducing a novel, efficient gradient-based method called Long Short-Term Memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete time steps by enforcing constant error flow through constant error carrousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with RTRL, BPTT, Recurrent Cascade-Correlation, Elman nets, and Neural Sequence Chunking, LSTM leads to many more successful runs and learns much faster. LSTM also solves complex artificial long time lag tasks that have never been solved by previous recurrent network algorithms.

Citation Graph

Loading graph...

References [39]

Sort:
Filter:

Jeffrey L. Elman - 1990

23 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Fernando J. Pineda - 1987

5 papers in library cite

Paul J. Werbos - 1988

11 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

5 papers in library cite

Ronald J. Williams, David Zipser - 1992

8 papers in library cite

Sepp Hochreiter - 1991

18 papers in library cite

A. J. Robinson, F. Fallside - 1987

10 papers in library cite

Jürgen Schmidhuber - 1992

8 papers in library cite

Ronald J. Williams - 1989

6 papers in library cite

L. B. Almeida - 1987

5 papers in library cite

B. A. Pearlmutter - 1995

5 papers in library cite

M. C. Mozer - 1992

5 papers in library cite

B. A. Pearlmutter - 1989

5 papers in library cite

Yoshua Bengio, Paolo Frasconi - 1994

4 papers in library cite

Fernando J. Pineda - 1988

4 papers in library cite

A. Cleeremans, D. S. Schreiber, J. L. Mcclelland - 1989

4 papers in library cite

A. W. Smith, David Zipser - 1989

4 papers in library cite

K. Doya, S. Yoshizawa - 1989

3 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1996

3 papers in library cite

C. B. Miller, C. L. Giles - 1993

3 papers in library cite

R. L. Watrous, G. M. Kuhn, J. E. Moody, S. J. Hanson, R. P. Lippman - 1992

3 papers in library cite

S. E. Fahlman - 1991

3 papers in library cite

B. D. Vries, J. C. Principe - 1991

2 papers in library cite

K. Lang, A. Waibel, Geoffrey E. Hinton - 1990

2 papers in library cite

K. Doya - 1992

2 papers in library cite

P. Baldi, F. Pineda - 1991

2 papers in library cite

M. B. Ring - 1993

2 papers in library cite

Jürgen Schmidhuber - 1992

2 papers in library cite

Jürgen Schmidhuber - 1993

2 papers in library cite

G. Sun, H. Chen, Y. Lee - 1993

2 papers in library cite

M. C. Mozer - 1989

1 paper in library cites

G. X. Silva, J. D. Amaral, T. Langlois, L. B. Almeida - 1996

1 paper in library cites

Jürgen Schmidhuber, Sepp Hochreiter - 1996

1 paper in library cites

T. A. Plate - 1993

1 paper in library cites

J. B. Pollack - 1991

1 paper in library cites

G. V. Puskorius, L. A. Feldkamp - 1994

1 paper in library cites

Cited by

94

papers in your library

Cites

6

papers in your library

Read

on April 24, 2025

Your review

Tags

Paper Aliases

No aliases