1988

Improving the Convergence of Back-Propagation Learning With Second-Order Methods

S. Becker, Yann Lecun

citations

Cite Score

26

AI summary

This paper introduces a computationally efficient algorithm that incorporates approximate curvature information in back-propagation learning, using an extension of the back-propagation algorithm. The method achieves faster learning compared to standard back-propagation on a random four-way classification problem.

Main Contributions

  • Introduces a computationally efficient algorithm for incorporating approximate curvature information in back-propagation learning.
  • Explores the application of Newton's and other optimization methods to improve the convergence of back-propagation.
  • Describes a locally computable algorithm for approximating curvature in back-propagation learning.
  • Presents simulation results on a random four-way classification problem, demonstrating faster learning compared to standard back-propagation.
  • Analyzes the norms and eigenvalues of the Hessian for some simple problems.

Abstract

Back-propagation has proven to be a robust algorithm for difficult connectionist learning problems. However, as with many gradient based optimization methods, it converges slowly. We describe an extension of the back-propagation algorithm which uses a simple approximation to the second derivative terms. This method is shown to reduce the required number of iterations to learn a random classification problem, with only a small increase in the complexity of each iteration. The back-propagation learning algorithm for multilayer connectionist networks performs a gradient descent search in weight space for a minimum of some cost function C (which is frequently the mean squared error between actual and desired outputs). A general drawback of gradient-based numerical optimization methods is their slow convergence. In connectionist learning problems in particular, one typically starts a long way from the solution, and spends most of the time oscillating around "ravines" in weight space, because the gradient is sharp in some directions but shallow in others. Consequently, the learning parameters tend to be selected in an ad-hoc manner, according to the particular problem and the current performance of the network. Many numerical optimization methods, e.g. Newton's method, use the second derivative in addition to the gradient to determine the next step direction and step size, and converge quadratically when close to a solution of a convex function. We explore the application of Newton's and other optimization methods towards improving the convergence of back-propagation. We then describe a computationally efficient, locally computable algorithm for incorporating approximate curvature information in back-propagation learning. Finally, we present results of simulations on a random four-way classification problem where this method is shown to learn somewhat faster than standard back-propagation.

Citation Graph

Loading graph...

References [11]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

S. Becker, Yann Lecun - 1988

9 papers in library cite

D. C. Plaut, S. J. Nowlan, Geoffrey E. Hinton - 1986

5 papers in library cite

Yann Lecun - 1987

9 papers in library cite

L. Y. Bottou, Yann Lecun - 1988

5 papers in library cite

J. Dennis, R. Schnabel - 1983

3 papers in library cite

P. E. Gill, W. Murray, M. H. Wright - 1981

3 papers in library cite

R. Watrous - 1988

1 paper in library cites

R. Watrous, Lokendra Shastri, A. Waibel - 1987

1 paper in library cites

B. T. Smith, J. M. Boyle, B. S. Garbow, Y. Ikebe, V. C. Klema, C. B. Moler - 1974

1 paper in library cites

Cited by

9

papers in your library

Cites

2

papers in your library

Read

on June 24, 2025

Your review

Tags

Paper Aliases

No aliases