Papperoni

2010

Deep Learning via Hessian-Free Optimization

James Martens

citations

Cite Score

AI summary

This paper introduces a Hessian-free optimization method for training deep auto-encoders. It achieves results superior to existing methods without pre-training on the same tasks, scaling well to large datasets. The method addresses pathological curvature, offering a practical and effective approach to deep learning optimization.

Main Contributions

Introduces a 2nd-order optimization method based on the Hessian-free approach.
Achieves superior results to existing methods on deep auto-encoder training tasks without pre-training.
Scales effectively to very large datasets.
Discusses pathological curvature as an explanation for deep-learning difficulties.
Provides a practical and easy-to-use optimization method for deep learning.

Abstract

We develop a 2nd-order optimization method based on the "Hessian-free" approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn't limited in applicability to autoencoders, or any specific model class. We also discuss the issue of "pathological curvature" as a possible explanation for the difficulty of deep-learning and how 2nd-order optimization, and our method in particular, effectively deals with it.

Citation Graph

Loading graph...

References [9]

Sort:

Filter:

[1]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Google Scholar

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[2]Efficient Backprop

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Google Scholar

The first half is very very good. The remainder is very boring.

[3]Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Google Scholar

Bengio is perfect. This is everything that Hinton's paper hoped to be. Very well explained, and also tying back to real use cases (not just "hey, the math works and it reduced the score")

[4]Why Does Unsupervised Pre-Training Help Deep Learning?

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010

12 papers in library cite

Google Scholar

Good paper, easy to follow, and brings some light to the pre-training stuff (layer-by-layer). I just wish it wasn't so long. It's a chore.

[5]Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

N. N. Schraudolph - 2002

4 papers in library cite

Google Scholar

[6]Fast Exact Multiplication by the Hessian

B. Pearlmutter - 1994

4 papers in library cite

Google Scholar

[7]Numerical Optimization

J. Nocedal, S. Wright - 1999

2 papers in library cite

Google Scholar

[8]Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons

S. Amari, H. Park, K. Fukumizu - 2000

1 paper in library cites

Google Scholar

[9]Second-Order Stagewise Backpropagation for Hessian-Matrix Analyses and Investigation of Negative Curvature

E. Mizutani, S. Dreyfus - 2008

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on July 31, 2025

This paper is surprisingly good! When I first read the Hessian-Free optimization part, I thought "ugh, this is going to be full of math", but in the end it was very very enjoyable. I think I just wouldn't give it a 5 because it doesn't seem to have had that much impact.