2014
Cite Score
59
AI summary
This paper introduces an exact analytical theory of learning in deep linear neural networks. It derives nonlinear coupled differential equations and finds time-dependent solutions, revealing insights into how deep networks build information. It examines pretraining and random orthogonal initial conditions, achieving depth-independent learning times. MNIST dataset is used.
Main Contributions
Abstract
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.
Citation Graph
References [25]
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012
71 papers in library cite
Yoshua Bengio - 2010
20 papers in library cite
Geoffrey Hinton, Ruslan Salakhutdinov - 2006
37 papers in library cite
Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994
31 papers in library cite
Yoshua Bengio - 2009
25 papers in library cite
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013
21 papers in library cite
Ronan Collobert, Jason Weston - 2008
32 papers in library cite
Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998
20 papers in library cite
Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006
33 papers in library cite
Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013
13 papers in library cite
Dan C. Ciresan, Ueli Meier, Jürgen Schmidhuber - 2012
11 papers in library cite
Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010
12 papers in library cite
Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng - 2012
10 papers in library cite
Yoshua Bengio, Yann Lecun - 2007
15 papers in library cite
James Martens - 2010
12 papers in library cite
Pascal Vincent - 2009
5 papers in library cite
A. Mohamed, G. Dahl, Geoffrey Hinton - 2012
12 papers in library cite
Sepp Hochreiter - 1991
18 papers in library cite
P. Baldi, Kur Hornik - 1989
3 papers in library cite
Richard Socher, J. Bauer, Christopher D. Manning, Andrew Y. Ng - 2013
3 papers in library cite
O. Chapelle, Dumitru Erhan - 2011
2 papers in library cite
Yann N. Dauphin, Yoshua Bengio - 2013
1 paper in library cites
K. Fukumizu - 1998
1 paper in library cites
P. Lamblin, Yoshua Bengio - 2010
1 paper in library cites
Andrew M. Saxe, J. L. Mcclelland, Surya Ganguli - 2013
1 paper in library cites
Cited by
9
papers in your library
Cites
17
papers in your library
Read
on August 21, 2025
Your review
Tags
Paper Aliases
No aliases