2012

Deep Learning Made Easier by Linear Transformations in Perceptrons

Tapani Raiko, Harri Valpola, Yann Lecun

citations

Cite Score

46

AI summary

This paper introduces transformations to multi-layer perceptrons, making hidden neuron outputs zero mean and slope, and using shortcut connections for linear dependencies, which enhances basic stochastic gradient learning on MNIST classification and autoencoder tasks by improving convergence and generalization.

Main Contributions

  • Proposed a novel transformation for MLP hidden neuron outputs to achieve zero mean and zero slope.
  • Introduced separate shortcut connections to model linear dependencies, aiming to decouple linear and nonlinear learning.
  • Theoretically showed that these transformations make the Fisher information matrix closer to diagonal, aligning standard gradient with natural gradient.
  • Demonstrated that basic stochastic gradient learning with transformations becomes competitive with state-of-the-art algorithms in speed and generalization.
  • Experimentally validated the method's benefits on handwritten digit classification and image representation learning using 3-layer and 6-layer networks, with and without regularization.

Abstract

We transform the outputs of each hidden neuron in a multi-layer perceptron network to be zero mean and zero slope, and use separate shortcut connections to model the linear dependencies instead. This transformation aims at separating the problems of learning the linear and nonlinear parts of the whole input-output mapping, which has many benefits. We study the theoretical properties of the transformation by noting that they make the Fisher information matrix closer to a diagonal matrix, and thus standard gradient closer to the natural gradient. We experimentally confirm the usefulness of the transformations by noting that they make basic stochastic gradient learning competitive with state-of-the-art learning algorithms in speed, and that they seem also to help find solutions that generalize better. The experiments include both classification of handwritten digits with a 3- layer network and learning a low-dimensional representation for images by using a 6-layer auto-encoder network. The transformations were beneficial in all cases, with and without regularization.

Citation Graph

Loading graph...

References [13]

Sort:
Filter:

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Yoshua Bengio - 2010

20 papers in library cite

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

James Martens - 2010

12 papers in library cite

Dan C. Ciresan, Ueli Meier, Luca M. Gambardella, Jürgen Schmidhuber - 2010

10 papers in library cite

S. I. Amari - 1998

6 papers in library cite

N. Leroux, Pierre Antoine Manzagol, Yoshua Bengio - 2008

2 papers in library cite

A. Krogh, J. Hertz - 1992

1 paper in library cites

S. Rifai, Xavier Glorot, Yoshua Bengio, Pascal Vincent - 2011

1 paper in library cites

N. Schraudolph - 1998

1 paper in library cites

Cited by

7

papers in your library

Cites

7

papers in your library

Read

on February 17, 2026

Your review

Tags

Paper Aliases

No aliases