2013

Advances in Optimizing Recurrent Networks

Yoshua Bengio, N. B. Lewandowski, Razvan Pascanu

citations

Cite Score

29

AI summary

This paper introduces various techniques, such as clipping gradients, leaky integration, advanced momentum, powerful output probability models, and sparse gradients, to improve the training and performance of recurrent neural networks (RNNs). The techniques are evaluated on text and music datasets, demonstrating improved training and test error.

Main Contributions

  • Introduces and evaluates clipping gradients to address the exploding gradient problem in RNNs.
  • Explores the use of leaky integration to span longer time ranges by introducing shorter paths between time steps.
  • Combines RNNs with powerful output probability models like Restricted Boltzmann Machines (RBM) and Neural Autoregressive Distribution Estimator (NADE) to reduce underfitting.
  • Proposes sparse output regularization and rectified outputs to encourage sparser gradients and improve specialization of hidden units.
  • Derives a new formulation of Nesterov momentum with improved stability and convergence for RNN training.

Abstract

After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of modeling sequences, their training is plagued by two aspects of the same issue regarding the learning of long-term dependencies. Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. The experiments are performed on text and music data and show off the combined effects of these techniques in generally improving both training and test error.

Citation Graph

Loading graph...

References [31]

Sort:
Filter:

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

James Bergstra, Yoshua Bengio - 2012

7 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Xavier Glorot, Antoine Bordes, Yoshua Bengio - 2011

17 papers in library cite

Yoshua Bengio - 2009

25 papers in library cite

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010

12 papers in library cite

Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng - 2012

10 papers in library cite

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

James Martens - 2010

12 papers in library cite

James Martens, Ilya Sutskever - 2011

13 papers in library cite

Tomas Mikolov, Geoffrey Zweig - 2012

12 papers in library cite

Yoshua Bengio, Aaron Courville, Pascal Vincent - 2013

2 papers in library cite

Sepp Hochreiter - 1991

18 papers in library cite

Tomas Mikolov - 2012

17 papers in library cite

S. Elhihi, Yoshua Bengio - 1996

6 papers in library cite

Hugo Larochelle, I. Murray - 2011

5 papers in library cite

Ilya Sutskever, Geoffrey Hinton, G. Taylor - 2008

5 papers in library cite

T. Lin, B. G. Horne, P. Tino, C. L. Giles - 1995

4 papers in library cite

Herbert Jaeger, M. Lukosevicius, D. Popovici, U. Siewert - 2007

3 papers in library cite

Ilya Sutskever - 2012

3 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2012

3 papers in library cite

U. Siewert, W. Wustlich - 2007

1 paper in library cites

Ilya Sutskever, Geoffrey Hinton - 2010

1 paper in library cites

Cited by

4

papers in your library

Cites

19

papers in your library

Read

on June 21, 2025

Your review

Tags

Paper Aliases

No aliases