2013

On the Difficulty of Training Recurrent Neural Networks

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio

citations

Cite Score

81

AI summary

This paper explores the exploding and vanishing gradient problems in RNNs, introduces a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. The paper validates the proposed solutions on pathological synthetic datasets and on polyphonic music prediction and language modeling tasks.

Main Contributions

  • The paper analyzes the exploding and vanishing gradient problems in RNNs from an analytical, a geometric and a dynamical systems perspective.
  • It proposes a gradient norm clipping strategy to deal with exploding gradients.
  • It proposes a soft constraint for the vanishing gradients problem.
  • It validates empirically the hypothesis and proposed solutions on the experimental section.
  • The paper shows that the proposed solutions improve performance on both the pathological synthetic datasets considered as well as on polyphonic music prediction and language modelling.

Abstract

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.

Citation Graph

Loading graph...

References [25]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Jeffrey L. Elman - 1990

23 papers in library cite

John Duchi, Elad Hazan, Yoram Singer - 2011

19 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

F. Bastien, P. Lamblin, Razvan Pascanu, James Bergstra, I. Goodfellow, A. Bergeron, A. Bouchard, N. Nicolas, Yoshua Bengio - 2012

13 papers in library cite

Paul J. Werbos - 1988

11 papers in library cite

James Martens, Ilya Sutskever - 2011

13 papers in library cite

Yoshua Bengio, N. B. Lewandowski, Razvan Pascanu - 2013

4 papers in library cite

Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011

13 papers in library cite

Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky - 2012

7 papers in library cite

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

Tomas Mikolov - 2012

17 papers in library cite

Alex Graves, M. Liwicki, Santiago Fernandez, R. Bertolami, H. Bunke, Jürgen Schmidhuber - 2009

5 papers in library cite

Herbert Jaeger, M. Lukosevicius, D. Popovici, U. Siewert - 2007

3 papers in library cite

M. Lukosevicius, Herbert Jaeger - 2009

2 papers in library cite

Yoshua Bengio, Paolo Frasconi, Patrice Simard - 1993

2 papers in library cite

Razvan Pascanu, Herbert Jaeger - 2011

1 paper in library cites

K. Doya, S. Yoshizawa - 1991

1 paper in library cites

K. Doya - 1993

1 paper in library cites

Herbert Jaeger - 2012

1 paper in library cites

M. Moreira, E. Fiesler - 1995

1 paper in library cites

Cited by

21

papers in your library

Cites

14

papers in your library

Read

on June 20, 2025

Your review

Tags

Paper Aliases

No aliases