Papperoni

1988

Improving the Convergence of Back-Propagation Learning With Second-Order Methods

S. Becker, Yann Lecun

citations

Cite Score

AI summary

This paper introduces a computationally efficient algorithm that incorporates approximate curvature information in back-propagation learning, using an extension of the back-propagation algorithm. The method achieves faster learning compared to standard back-propagation on a random four-way classification problem.

Main Contributions

Introduces a computationally efficient algorithm for incorporating approximate curvature information in back-propagation learning.
Explores the application of Newton's and other optimization methods to improve the convergence of back-propagation.
Describes a locally computable algorithm for approximating curvature in back-propagation learning.
Presents simulation results on a random four-way classification problem, demonstrating faster learning compared to standard back-propagation.
Analyzes the norms and eigenvalues of the Hessian for some simple problems.

Abstract

Back-propagation has proven to be a robust algorithm for difficult connectionist learning problems. However, as with many gradient based optimization methods, it converges slowly. We describe an extension of the back-propagation algorithm which uses a simple approximation to the second derivative terms. This method is shown to reduce the required number of iterations to learn a random classification problem, with only a small increase in the complexity of each iteration. The back-propagation learning algorithm for multilayer connectionist networks performs a gradient descent search in weight space for a minimum of some cost function C (which is frequently the mean squared error between actual and desired outputs). A general drawback of gradient-based numerical optimization methods is their slow convergence. In connectionist learning problems in particular, one typically starts a long way from the solution, and spends most of the time oscillating around "ravines" in weight space, because the gradient is sharp in some directions but shallow in others. Consequently, the learning parameters tend to be selected in an ad-hoc manner, according to the particular problem and the current performance of the network. Many numerical optimization methods, e.g. Newton's method, use the second derivative in addition to the gradient to determine the next step direction and step size, and converge quadratically when close to a solution of a convex function. We explore the application of Newton's and other optimization methods towards improving the convergence of back-propagation. We then describe a computationally efficient, locally computable algorithm for incorporating approximate curvature information in back-propagation learning. Finally, we present results of simulations on a random four-way classification problem where this method is shown to learn somewhat faster than standard back-propagation.

Citation Graph

Loading graph...

References [11]

Sort:

Filter:

[1]Learning Internal Representations by Error Propagation

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

Google Scholar

I expected very little of this, but was so good in explaining concepts! Very good read. It gets a bit boring when it starts explaining things by the end of the chapter, but good nonetheless.

[2]Improving the Convergence of Back-Propagation Learning With Second-Order Methods

S. Becker, Yann Lecun - 1988

9 papers in library cite

Google Scholar

Surprisingly good. I thought this would be very math-heavy but in the end it's very easy to understand. Sadly results are underwhelming

[3]Experiments on Learning by Back-Propagation

D. C. Plaut, S. J. Nowlan, Geoffrey E. Hinton - 1986

5 papers in library cite

Google Scholar

45 Pages. Seems alright, but not sure

[4]Modèles Connexionnistes De l'Apprentissage

Yann Lecun - 1987

9 papers in library cite

Google Scholar

[5]Sn: A Simulator for Connectionist Models

L. Y. Bottou, Yann Lecun - 1988

5 papers in library cite

Google Scholar

[6]Numerical Methods for Unconstrained Optimization and Nonlinear Equations

J. Dennis, R. Schnabel - 1983

3 papers in library cite

Google Scholar

[7]Practical Optimization

P. E. Gill, W. Murray, M. H. Wright - 1981

3 papers in library cite

Google Scholar

[8]Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Non-Linear Optimization

R. Watrous - 1987

2 papers in library cite

Google Scholar

[9]Connectionist Speech Recognition Using the Temporal Flow Model

R. Watrous - 1988

1 paper in library cites

Google Scholar

[10]Learned Phonetic Discrimination Using Connectionist Networks

R. Watrous, Lokendra Shastri, A. Waibel - 1987

1 paper in library cites

Google Scholar

[11]Matrix Eigensystem Routines - EISPACK Guide

B. T. Smith, J. M. Boyle, B. S. Garbow, Y. Ikebe, V. C. Klema, C. B. Moler - 1974

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on June 24, 2025

Surprisingly good. I thought this would be very math-heavy but in the end it's very easy to understand. Sadly results are underwhelming