2009

The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

Pascal Vincent

citations

Cite Score

26

AI summary

This paper explores deep learning architectures and the advantages of unsupervised pre-training, using denoising auto-encoders and two datasets: Shapeset and MNIST. The main result is that pre-training acts as a regularizer, improving generalization and robustness to poor local minima, especially for deeper models.

Main Contributions

  • Demonstrates pre-training adds robustness to deep architectures and improves generalization.
  • Shows that increasing depth without pre-training increases the probability of finding poor local minima.
  • Finds that pre-training acts as a regularizer.
  • Indicates that pre-training is more effective for lower layers than for higher layers.
  • Visualizes the error landscape and provides a function space approximation to the solutions learned by deep architectures, confirming that the solutions corresponding to the two initialization strategies are qualitatively different.

Abstract

Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.

Citation Graph

Loading graph...

References [17]

Sort:
Filter:

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

Yoshua Bengio - 2009

25 papers in library cite

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, Yoshua Bengio - 2007

13 papers in library cite

Jason Weston, F. Ratle, Ronan Collobert - 2008

10 papers in library cite

Marc'aurelio Ranzato, Y. Boureau, Yann Lecun - 2008

12 papers in library cite

Honglak Lee, C. Ekanadham, A. Ng - 2008

10 papers in library cite

Y. Freund, D. Haussler - 1992

8 papers in library cite

J. Hastad, M. Goldmann - 1991

7 papers in library cite

Yoshua Bengio, O. Delalleau - 2007

5 papers in library cite

Ruslan Salakhutdinov, A. Mnih, Geoffrey E. Hinton - 2007

5 papers in library cite

Ruslan Salakhutdinov, Geoffrey E. Hinton - 2008

4 papers in library cite

Laurens Van Der Maaten, Geoffrey E. Hinton - 2008

2 papers in library cite

Cited by

5

papers in your library

Cites

10

papers in your library

Read

on June 28, 2025

Your review

Tags

Paper Aliases

No aliases