2006

Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle

citations

Cite Score

79

AI summary

This paper introduces a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN) and Restricted Boltzmann Machines (RBM), extending them to handle continuous values, and demonstrates improved performance in supervised tasks through better weight initialization and high-level feature extraction, achieving state-of-the-art results on Abalone and Cotton datasets.

Main Contributions

  • Extended Deep Belief Networks (DBNs) and Restricted Boltzmann Machines (RBMs) to handle continuous input values.
  • Empirically validated the advantage of greedy layer-wise unsupervised learning for DBNs, showing it helps optimization by initializing weights near good local minima.
  • Demonstrated that the greedy layer-wise strategy initializes upper layers with better representations of relevant high-level abstractions, improving generalization.
  • Showed that initializing each layer as an auto-encoder yields similar results to initializing as an RBM.
  • Introduced a mixed training criterion combining unsupervised and supervised objectives to handle uncooperative input distributions, significantly improving performance on financial datasets.

Abstract

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

Citation Graph

Loading graph...

References [16]

Sort:
Filter:

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

Geoffrey Hinton - 2002

23 papers in library cite

Yoshua Bengio, Yann Lecun - 2007

15 papers in library cite

Geoffrey Hinton, Peter Dayan, B. Frey, R. Neal - 1995

9 papers in library cite

Gerald Tesauro - 1992

3 papers in library cite

Thierry Denoeux - 1996

2 papers in library cite

M. Welling, M. R. Zvi, Geoffrey Hinton - 2005

8 papers in library cite

Yoshua Bengio, O. Delalleau, N. L. Roux - 2006

7 papers in library cite

S. E. Fahlman, C. Lebiere - 1989

6 papers in library cite

P. Utgoff, D. Stracuzzi - 2002

5 papers in library cite

J. Hastad - 1987

3 papers in library cite

E. Allender - 1996

2 papers in library cite

Yoshua Bengio, N. L. Roux, Pascal Vincent, O. Delalleau, P. Marcotte - 2006

2 papers in library cite

H. Chen, A. Murray - 2003

1 paper in library cites

J. Movellan, P. Mineiro, R. Williams - 2002

1 paper in library cites

Cited by

33

papers in your library

Cites

7

papers in your library

Read

on June 27, 2025

Your review

Tags

Paper Aliases

No aliases