1994

Learning Long-Term Dependencies With Gradient Descent Is Difficult

Yoshua Bengio, Patrice Simard, Paolo Frasconi

citations

Cite Score

88

AI summary

This paper investigates the difficulties of training recurrent neural networks with gradient descent for tasks with long-term dependencies; it introduces a minimal task with a recurrent neuron and demonstrates that gradient descent fails for long-term dependencies, highlighting a trade-off between learning and information latching.

Main Contributions

  • Demonstrates the difficulty of training recurrent neural networks to learn long-term dependencies using gradient descent.
  • Introduces the concept of information latching as a requirement for storing state information over arbitrary durations.
  • Presents experimental results on a minimal task, showing that gradient descent fails even in simple cases.
  • Provides theoretical analysis linking the problem to vanishing gradients when using hyperbolic attractors for robust information storage.
  • Suggests alternative approaches to address the vanishing gradient problem, including simulated annealing, multi-grid random search, and time-weighted pseudo-Newton optimization.

Abstract

Abstract- Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.

Citation Graph

Loading graph...

References [23]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

S. Becker, Yann Lecun - 1988

9 papers in library cite

R. Williams, David Zipser - 1989

8 papers in library cite

Yann Lecun - 1986

3 papers in library cite

S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi - 1983

6 papers in library cite

M. C. Mozer - 1992

5 papers in library cite

M. C. Mozer - 1989

3 papers in library cite

Yoshua Bengio, R. D. Mori, G. Flammia, F. Kompe - 1991

2 papers in library cite

J. Ortega, W. Rheinboldt - 1970

2 papers in library cite

T. Grossman, R. Meir, E. Domany - 1989

2 papers in library cite

R. Rohwer - 1990

2 papers in library cite

Kevin J. Lang, Geoffrey E. Hinton - 1988

2 papers in library cite

Yoshua Bengio, Paolo Frasconi, Patrice Simard - 1993

2 papers in library cite

R. J. Gaynier, T. Downs - 1993

1 paper in library cites

Yoshua Bengio - 1991

1 paper in library cites

M. Gori, Yoshua Bengio, R. D. Mori - 1989

1 paper in library cites

Missing year

C. L. Giles, C. W. Omlin

1 paper in library cites

Paolo Frasconi, M. Gori, G. Soda - 1992

1 paper in library cites

A. Corana, M. Marchesi, C. Martini, S. Ridella - 1987

1 paper in library cites

C. M. Marcus, F. R. Waugh, R. M. Westervelt - 1991

1 paper in library cites

Missing year

Paolo Frasconi, M. Gori, M. Maggini, G. Soda

1 paper in library cites

P. L. Bartlett, T. Downs - 1992

1 paper in library cites

Cited by

31

papers in your library

Cites

4

papers in your library

Read

on April 25, 2025

Your review

Tags

Paper Aliases

No aliases