2013
Cite Score
81
AI summary
This paper explores the exploding and vanishing gradient problems in RNNs, introduces a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. The paper validates the proposed solutions on pathological synthetic datasets and on polyphonic music prediction and language modeling tasks.
Main Contributions
Abstract
There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.
Citation Graph
References [25]
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986
34 papers in library cite
Jeffrey L. Elman - 1990
23 papers in library cite
John Duchi, Elad Hazan, Yoram Singer - 2011
19 papers in library cite
Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994
31 papers in library cite
Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011
13 papers in library cite
F. Bastien, P. Lamblin, Razvan Pascanu, James Bergstra, I. Goodfellow, A. Bergeron, A. Bouchard, N. Nicolas, Yoshua Bengio - 2012
13 papers in library cite
Paul J. Werbos - 1988
11 papers in library cite
Pascal Vincent - 2012
8 papers in library cite
James Martens, Ilya Sutskever - 2011
13 papers in library cite
Yoshua Bengio, N. B. Lewandowski, Razvan Pascanu - 2013
4 papers in library cite
Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011
13 papers in library cite
Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky - 2012
7 papers in library cite
James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010
22 papers in library cite
Tomas Mikolov - 2012
17 papers in library cite
Alex Graves, M. Liwicki, Santiago Fernandez, R. Bertolami, H. Bunke, Jürgen Schmidhuber - 2009
5 papers in library cite
Herbert Jaeger, M. Lukosevicius, D. Popovici, U. Siewert - 2007
3 papers in library cite
M. Lukosevicius, Herbert Jaeger - 2009
2 papers in library cite
Yoshua Bengio, Paolo Frasconi, Patrice Simard - 1993
2 papers in library cite
Razvan Pascanu, Herbert Jaeger - 2011
1 paper in library cites
K. Doya, S. Yoshizawa - 1991
1 paper in library cites
K. Doya - 1993
1 paper in library cites
Herbert Jaeger - 2012
1 paper in library cites
M. Moreira, E. Fiesler - 1995
1 paper in library cites
S. Strogatz - 1994
1 paper in library cites
Cited by
21
papers in your library
Cites
14
papers in your library
Read
on June 20, 2025
Your review
Tags
Paper Aliases
No aliases