2010

Why Does Unsupervised Pre-Training Help Deep Learning?

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio

citations

Cite Score

69

AI summary

This paper explores the role of unsupervised pre-training in deep learning, suggesting it acts as a regularizer, guiding learning toward better generalization and minimizing variance, validated through experiments on MNIST, InfiniteMNIST, and Shapeset.

Main Contributions

  • Demonstrates that unsupervised pre-training acts as a regularizer in deep learning.
  • Shows that pre-training guides learning towards basins of attraction with better generalization.
  • Empirically validates the influence of pre-training on architecture depth, model capacity, and number of training examples.
  • Introduces experiments on MNIST, InfiniteMNIST and Shapeset datasets.
  • Finds that pre-training improves generalization and robustness to initialization, but can hurt performance with smaller layers.

Abstract

Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results obtained in several areas, mostly on vision and language data sets. The best results obtained on supervised learning tasks involve an unsupervised learning component, usually in an unsupervised pre-training phase. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. The main question investigated here is the following: how does unsupervised pre-training work? Answering this questions is important if learning in deep architectures is to be further improved. We propose several explanatory hypotheses and test them through extensive simulations. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples. The experiments confirm and clarify the advantage of unsupervised pre-training. The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre-training.

Citation Graph

Loading graph...

References [51]

Sort:
Filter:

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Geoffrey Hinton - 2008

7 papers in library cite

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

Yoshua Bengio - 2009

25 papers in library cite

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Geoffrey Hinton - 2002

23 papers in library cite

Yoshua Bengio, Yann Lecun - 2007

15 papers in library cite

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pascal Vincent - 2009

4 papers in library cite

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, Yoshua Bengio - 2007

13 papers in library cite

Jason Weston, F. Ratle, Ronan Collobert - 2008

10 papers in library cite

I. Goodfellow, Quoc Le, A. Saxe, A. Ng - 2009

7 papers in library cite

Geoffrey Hinton - 2006

5 papers in library cite

Honglak Lee, R. Grosse, R. Ranganath, Andrew Y. Ng - 2009

12 papers in library cite

Marc'aurelio Ranzato, Y. Boureau, Yann Lecun - 2008

12 papers in library cite

Honglak Lee, C. Ekanadham, A. Ng - 2008

10 papers in library cite

Yann Lecun - 1987

9 papers in library cite

M. Welling, M. R. Zvi, Geoffrey Hinton - 2005

8 papers in library cite

J. Tenenbaum, V. D. Silva, John Langford - 2000

7 papers in library cite

Hugo Larochelle, Yoshua Bengio, J. Louradour, P. Lamblin - 2009

7 papers in library cite

J. Hastad, M. Goldmann - 1991

7 papers in library cite

Yoshua Bengio, O. Delalleau, N. L. Roux - 2006

7 papers in library cite

Yoshua Bengio, O. Delalleau - 2007

5 papers in library cite

S. H. Seung - 1998

5 papers in library cite

Ruslan Salakhutdinov, A. Mnih, Geoffrey E. Hinton - 2007

5 papers in library cite

Ruslan Salakhutdinov, Geoffrey Hinton - 2007

5 papers in library cite

O. Chapelle, B. Scholkopf, A. Zien - 2006

5 papers in library cite

J. Hastad - 1986

4 papers in library cite

L. Bahl, P. Brown, P. D. Souza, R. Mercer - 1986

4 papers in library cite

Ruslan Salakhutdinov, Geoffrey E. Hinton - 2008

4 papers in library cite

H. Mobahi, Ronan Collobert, Jason Weston - 2009

3 papers in library cite

Raia Hadsell, P. Sermanet, M. Scoffier, A. Erkan, K. Kavackuoglu, U. Muller, Yann Lecun - 2009

3 papers in library cite

P. Gallinari, Yann Lecun, S. Thiria, F. F. Soulie - 1987

3 papers in library cite

S. Osindero, Geoffrey E. Hinton - 2008

3 papers in library cite

Hugo Larochelle, Yoshua Bengio - 2008

2 papers in library cite

O. Chapelle, Jason Weston, B. Scholkopf - 2003

2 papers in library cite

M. Belkin, P. Niyogi - 2002

2 papers in library cite

D. Povey, P. Woodland - 2002

2 papers in library cite

A. Yao - 1985

2 papers in library cite

G. Loosli, S. Canu, Leon Bottou - 2007

2 papers in library cite

L. Zhu, Yanru Chen, A. Yuille - 2009

2 papers in library cite

S. I. Amari, N. Murata, Klaus Robert Muller, M. Finke, H. H. Yang - 1997

1 paper in library cites

A. E. Barron - 1991

1 paper in library cites

J. M. Susskind, E. Geoffrey, J. R. Movellan, A. K. Anderson - 2008

1 paper in library cites

Andrew Y. Ng, Michael I. Jordan - 2002

1 paper in library cites

J. Sjoberg, L. Ljung - 1995

1 paper in library cites

J. A. Lasserre, C. M. Bishop, T. P. Minka - 2006

1 paper in library cites

M. H. Bornstein - 1987

1 paper in library cites

Cited by

12

papers in your library

Cites

18

papers in your library

Read

on October 14, 2025

Your review

Tags

Paper Aliases

No aliases