Papperoni

2009

The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

Pascal Vincent

Open PDF Google Scholar

citations

Cite Score

26

AI summary

This paper explores deep learning architectures and the advantages of unsupervised pre-training, using denoising auto-encoders and two datasets: Shapeset and MNIST. The main result is that pre-training acts as a regularizer, improving generalization and robustness to poor local minima, especially for deeper models.

Main Contributions

Demonstrates pre-training adds robustness to deep architectures and improves generalization.
Shows that increasing depth without pre-training increases the probability of finding poor local minima.
Finds that pre-training acts as a regularizer.
Indicates that pre-training is more effective for lower layers than for higher layers.
Visualizes the error landscape and provides a function space approximation to the solutions learned by deep architectures, confirming that the solutions corresponding to the two initialization strategies are qualitatively different.

Abstract

Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. Answering these questions is important if learning in deep architectures is to be further improved. We attempt to shed some light on these questions through extensive simulations. The experiments confirm and clarify the advantage of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.

Citation Graph

Loading graph...

References [17]

Sort:

Filter:

[1]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[2]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[3]Learning Deep Architectures for AI

Yoshua Bengio - 2009

25 papers in library cite

It's a nice overview. Some sections get very theoretical, but the first half is very good and I feel that it does a waaaay better job of explaining RBMs and DBNs than other papers. This feels like Bengio is taking your hand and saying "if you don't know what's going on, here you go, everything you need to know to jump into the deep nets train"

[4]Extracting and Composing Robust Features With Denoising Autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

I am *so* glad we found an alternative to DBNs. Also, introduced the idea of denoising which is nice.

[5]A Unified Architecture for Natural Language Processing: Deep Neural Networks With Multitask Learning

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Really did not add much to the game. I think this was more of a small perf. improvement over other existing things and set a few methodological standards. Maybe main contribution is Multitask Learning + Deep learning

[6]Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Bengio is perfect. This is everything that Hinton's paper hoped to be. Very well explained, and also tying back to real use cases (not just "hey, the math works and it reduced the score")

[7]Efficient Learning of Sparse Representations With an Energy-Based Model

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

It's ok. Not really good, but alright.

[8]An Empirical Evaluation of Deep Architectures on Problems With Many Factors of Variation

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, Yoshua Bengio - 2007

13 papers in library cite

Good paper showing promising results for Deep Learning. Nothing amazing but good nonetheless

[9]Deep Learning via Semi-Supervised Embedding

Jason Weston, F. Ratle, Ronan Collobert - 2008

10 papers in library cite

It's a good paper and nice idea, but seems overly complicated and I don't think it's very used... (PS: this was republished in 2012)

[10]Sparse Feature Learning for Deep Belief Networks

Marc'aurelio Ranzato, Y. Boureau, Yann Lecun - 2008

12 papers in library cite

[11]Sparse Deep Belief Net Model for Visual Area V2

Honglak Lee, C. Ekanadham, A. Ng - 2008

10 papers in library cite

[12]Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks

Y. Freund, D. Haussler - 1992

8 papers in library cite

[13]On the Power of Small-Depth Threshold Circuits

J. Hastad, M. Goldmann - 1991

7 papers in library cite

[14]Justifying and Generalizing Contrastive Divergence

Yoshua Bengio, O. Delalleau - 2007

5 papers in library cite

[15]Restricted Boltzmann Machines for Collaborative Filtering

Ruslan Salakhutdinov, A. Mnih, Geoffrey E. Hinton - 2007

5 papers in library cite

[16]Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Ruslan Salakhutdinov, Geoffrey E. Hinton - 2008

4 papers in library cite

[17]Visualizing High-Dimensional Data Using T-Sne

Laurens Van Der Maaten, Geoffrey E. Hinton - 2008

2 papers in library cite

Cited by

5

papers in your library

Cites

10

papers in your library

Read

on June 28, 2025

Very nice analysis of why supervised pretraining works!

Tags

Paper Aliases

No aliases