2011

Learning Recurrent Neural Networks With Hessian-Free Optimization

James Martens, Ilya Sutskever

citations

Cite Score

33

AI summary

This paper introduces a Hessian-Free optimization approach with structural damping to train RNNs, achieving state-of-the-art results on synthetic datasets and real-world sequence datasets, outperforming LSTMs in motion video prediction, music modeling, and speech modeling.

Main Contributions

  • Introduces a Hessian-Free optimization approach for training RNNs.
  • Develops a novel structural damping scheme to improve the robustness of the HF optimizer.
  • Offers a new interpretation of the generalized Gauss-Newton matrix.
  • Demonstrates that HF-trained RNNs outperform LSTMs on complex synthetic problems with long-term dependencies.
  • Shows that HF-trained RNNs outperform LSTMs on real-world sequence modeling problems (motion video prediction, music modeling, and speech modeling).

Abstract

In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-the-art method for training neural sequence models: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of Schraudolph (2002) which is used within the HF approach of Martens.

Citation Graph

Loading graph...

References [17]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Felix A. Gers, Jürgen Schmidhuber, Fred Cummins - 2000

13 papers in library cite

Herbert Jaeger, Harald Haas - 2004

4 papers in library cite

Alex Graves, Jürgen Schmidhuber - 2009

5 papers in library cite

James Martens - 2010

12 papers in library cite

Alex Graves, Jürgen Schmidhuber - 2005

14 papers in library cite

Sepp Hochreiter - 1991

18 papers in library cite

N. N. Schraudolph - 2002

4 papers in library cite

B. Pearlmutter - 1994

4 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1996

3 papers in library cite

Kevin P. Murphy - 2002

2 papers in library cite

J. Nocedal, S. Wright - 1999

2 papers in library cite

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001

1 paper in library cites

H. Mayer, Faustino Gomez, Daan Wierstra, I. Nagy, A. Knoll, Jürgen Schmidhuber - 2007

1 paper in library cites

Cited by

13

papers in your library

Cites

9

papers in your library

Read

on July 11, 2025

Your review

Tags

Paper Aliases

No aliases