1988
Cite Score
26
AI summary
This paper introduces a computationally efficient algorithm that incorporates approximate curvature information in back-propagation learning, using an extension of the back-propagation algorithm. The method achieves faster learning compared to standard back-propagation on a random four-way classification problem.
Main Contributions
Abstract
Back-propagation has proven to be a robust algorithm for difficult connectionist learning problems. However, as with many gradient based optimization methods, it converges slowly. We describe an extension of the back-propagation algorithm which uses a simple approximation to the second derivative terms. This method is shown to reduce the required number of iterations to learn a random classification problem, with only a small increase in the complexity of each iteration. The back-propagation learning algorithm for multilayer connectionist networks performs a gradient descent search in weight space for a minimum of some cost function C (which is frequently the mean squared error between actual and desired outputs). A general drawback of gradient-based numerical optimization methods is their slow convergence. In connectionist learning problems in particular, one typically starts a long way from the solution, and spends most of the time oscillating around "ravines" in weight space, because the gradient is sharp in some directions but shallow in others. Consequently, the learning parameters tend to be selected in an ad-hoc manner, according to the particular problem and the current performance of the network. Many numerical optimization methods, e.g. Newton's method, use the second derivative in addition to the gradient to determine the next step direction and step size, and converge quadratically when close to a solution of a convex function. We explore the application of Newton's and other optimization methods towards improving the convergence of back-propagation. We then describe a computationally efficient, locally computable algorithm for incorporating approximate curvature information in back-propagation learning. Finally, we present results of simulations on a random four-way classification problem where this method is shown to learn somewhat faster than standard back-propagation.
Citation Graph
References [11]
D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986
46 papers in library cite
S. Becker, Yann Lecun - 1988
9 papers in library cite
D. C. Plaut, S. J. Nowlan, Geoffrey E. Hinton - 1986
5 papers in library cite
Yann Lecun - 1987
9 papers in library cite
L. Y. Bottou, Yann Lecun - 1988
5 papers in library cite
J. Dennis, R. Schnabel - 1983
3 papers in library cite
P. E. Gill, W. Murray, M. H. Wright - 1981
3 papers in library cite
R. Watrous - 1987
2 papers in library cite
R. Watrous - 1988
1 paper in library cites
R. Watrous, Lokendra Shastri, A. Waibel - 1987
1 paper in library cites
B. T. Smith, J. M. Boyle, B. S. Garbow, Y. Ikebe, V. C. Klema, C. B. Moler - 1974
1 paper in library cites
Cited by
9
papers in your library
Cites
2
papers in your library
Read
on June 24, 2025
Your review
Tags
Paper Aliases
No aliases