2013
Cite Score
78
AI summary
This paper demonstrates that stochastic gradient descent with momentum, combined with a well-designed random initialization and a specific schedule for the momentum parameter, can train DNNs and RNNs to performance levels previously only achievable with Hessian-Free optimization, achieving state-of-the-art results.
Main Contributions
Abstract
Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNS (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks perform markedly worse when the momentum is absent or poorly tuned. Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods.
Citation Graph
References [30]
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012
71 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
Yoshua Bengio - 2010
20 papers in library cite
Geoffrey Hinton, Ruslan Salakhutdinov - 2006
37 papers in library cite
Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006
43 papers in library cite
Geoffrey Hinton - 2012
21 papers in library cite
Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994
31 papers in library cite
Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998
20 papers in library cite
Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006
33 papers in library cite
Herbert Jaeger, Harald Haas - 2004
4 papers in library cite
G. Dahl, D. Yu, L. Deng, Alex Acero - 2012
19 papers in library cite
Alex Graves - 2012
7 papers in library cite
Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011
13 papers in library cite
Tapani Raiko, Harri Valpola, Yann Lecun - 2012
7 papers in library cite
James Martens - 2010
12 papers in library cite
James Martens, Ilya Sutskever - 2011
13 papers in library cite
James Martens, Ilya Sutskever - 2011
13 papers in library cite
Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky - 2012
7 papers in library cite
A. Mohamed, G. Dahl, Geoffrey Hinton - 2012
12 papers in library cite
Y. Nesterov - 1983
3 papers in library cite
G. Lan - 2010
2 papers in library cite
G. B. Orr - 1996
2 papers in library cite
O. Chapelle, Dumitru Erhan - 2011
2 papers in library cite
Y. Nesterov - 2013
2 papers in library cite
W. Wiegerinck, A. Komoda, T. Heskes - 1994
2 papers in library cite
A. Cotter, O. Shamir, N. Srebro, K. Sridharan - 2011
1 paper in library cites
Leon Bottou, Yann Lecun - 2004
1 paper in library cites
Herbert Jaeger - 2012
1 paper in library cites
B. T. Polyak - 1964
1 paper in library cites
C. Darken, J. Moody - 1993
1 paper in library cites
Cited by
13
papers in your library
Cites
19
papers in your library
Read
on August 18, 2025
Your review
Tags
Paper Aliases
No aliases