2015

LSTM: A Search Space Odyssey

K. Greff, R. K. Srivastava, J. Koutn'ik, B. R. Steunebrink, Jürgen Schmidhuber

citations

Cite Score

82

AI summary

This paper presents a large-scale analysis of eight LSTM variants on speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the fANOVA framework.

Main Contributions

  • The paper presents the first large-scale analysis of eight LSTM variants on three representative tasks.
  • The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework.
  • The results show that none of the variants can improve upon the standard LSTM architecture significantly.
  • The results demonstrate the forget gate and the output activation function to be the most critical components of the LSTM architecture.
  • The paper observes that the studied hyperparameters are virtually independent and derives guidelines for their efficient adjustment.

Abstract

Abstract—Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs (≈ 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

Citation Graph

Loading graph...

References [45]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014

11 papers in library cite

James Bergstra, Yoshua Bengio - 2012

7 papers in library cite

Felix A. Gers, Jürgen Schmidhuber, Fred Cummins - 2000

13 papers in library cite

Alex Graves, Santiago Fernandez, Faustino Gomez, Jürgen Schmidhuber - 2006

7 papers in library cite

Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013

13 papers in library cite

Alex Graves - 2013

27 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001

16 papers in library cite

Paul J. Werbos - 1988

11 papers in library cite

Felix A. Gers, Jürgen Schmidhuber - 2000

1 paper in library cites

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

Jürgen Schmidhuber, Daan Wierstra, Matteo Gagliolo, F. J. Gomez - 2007

1 paper in library cites

R. L. Anderson - 1953

1 paper in library cites

Alex Graves, Jürgen Schmidhuber - 2005

14 papers in library cite

J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, Trevor Darrell - 2014

4 papers in library cite

H. Sak, A. W. Senior, F. Beaufays - 2014

5 papers in library cite

V. Pham, T. Bluche, C. Kermorvant, J. Louradour - 2014

5 papers in library cite

F. J. Solis, R. J. B. Wets - 1981

1 paper in library cites

Sepp Hochreiter - 1991

18 papers in library cite

A. J. Robinson, F. Fallside - 1987

10 papers in library cite

J. Snoek, Hugo Larochelle, R. P. Adams - 2012

9 papers in library cite

Ronald J. Williams - 1989

6 papers in library cite

Alex Graves - 2012

6 papers in library cite

Alex Graves, M. Liwicki, Santiago Fernandez, R. Bertolami, H. Bunke, Jürgen Schmidhuber - 2009

5 papers in library cite

D. Isto - 1990

5 papers in library cite

Alex Graves, Santiago Fernandez, M. Liwicki, H. Bunke, Jürgen Schmidhuber - 2008

5 papers in library cite

R. Jozefowicz, Wojciech Zaremba, Ilya Sutskever - 2015

4 papers in library cite

M. Liwicki, H. Bunke - 2005

3 papers in library cite

Frank Hutter, H. H. Hoos, K. L. Brown - 2011

3 papers in library cite

M. Allan, Christopher K. I. Williams - 2005

2 papers in library cite

A. K. Halberstadt - 1998

2 papers in library cite

Frank Hutter, H. H. Hoos, K. L. Brown - 2014

1 paper in library cites

Felix A. Gers, J. A. P. Ortiz, D. Eck, Jürgen Schmidhuber - 2002

1 paper in library cites

D. Crystal - 2011

1 paper in library cites

P. Mermelstein - 1976

1 paper in library cites

S. Otte, M. Liwicki, A. Zell - 2014

1 paper in library cites

J. Bayer, Daan Wierstra, J. Togelius, Jürgen Schmidhuber - 2009

1 paper in library cites

P. Doetsch, M. Kozielski, Hermann Ney - 2014

1 paper in library cites

E. Marchi, G. Ferroni, F. Eyben, L. Gabrielli, S. Squartini, B. Schuller - 2014

1 paper in library cites

S. K. Sonderby, O. Winther - 2014

1 paper in library cites

Yu Fan, Y. Qian, F. Xie, F. K. Soong - 2014

1 paper in library cites

Cited by

4

papers in your library

Cites

20

papers in your library

Read

on August 9, 2025

Your review

Tags

Paper Aliases

No aliases