1986

Two Problems With Backpropagation and Other Steepest-Descent Learning Procedures for Networks

Richard S. Sutton

citations

Cite Score

23

AI summary

This paper identifies two key problems with steepest-descent learning procedures like backpropagation: inefficiency in 'ravine' surfaces and high cross-pattern interference, and briefly explores alternative descent procedures for improvement.

Main Contributions

  • Identifies and analyzes two problems (ravines and cross-pattern interference) that slow down learning in backpropagation and other steepest-descent network learning procedures.
  • Explains how ravines, which are common in network performance surfaces, significantly impede steepest-descent methods.
  • Details how steepest-descent leads to high interference between learning patterns, causing changes to already useful features rather than creating new ones.
  • Discusses alternative descent procedures to mitigate these problems, such as methods that distort the surface or adjust individual learning rates.
  • Suggests that knowledge from general optimization can be carried over to improve network learning algorithms significantly.

Abstract

This article contributes to the theory of network learning procedures by identifying and analyzing two problems with the backpropagation procedure of Rumelhart, Hinton, and Williams (1985) that may slow its learning. Both problems are due to backpropagation's being a gradient- or steepest-descent method in the weight space of the network. The first problem is that steepest descent is a particularly poor descent procedure for surfaces containing ravines—places which curve more sharply in some directions than others—and such ravines are common and pronounced in performance surfaces arising from networks. The second problem is that steepest descent results in a high level of interference between learning with different patterns, because those units that have so far been found most useful are also those most likely to be changed to handle new patterns. The same problems probably also arise with the Boltzmann machine learning procedure (Ackley, Hinton and Sejnowski, 1985) and with reinforcement learning procedures (Barto and Anderson, 1985), as these are also steepest-descent procedures. Finally, some directions in which to look for improvements to backpropagation based on alternative descent procedures are briefly considered.

Citation Graph

Loading graph...

References [15]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

R. O. Duda, P. E. Hart - 1973

9 papers in library cite

F. Rosenblatt - 1962

7 papers in library cite

D. H. Ackley, Geoffrey E. Hinton, T. J. Sejnowski - 1985

6 papers in library cite

T. J. Sejnowski, C. R. Rosenberg - 1986

6 papers in library cite

B. Widrow, H. E. Hoff - 1960

5 papers in library cite

P. E. Gill, W. Murray, M. H. Wright - 1981

3 papers in library cite

Ronald J. Williams - 1986

3 papers in library cite

H. Kesten - 1958

2 papers in library cite

Y. Tsypkin - 1971

2 papers in library cite

A. G. Barto, Richard S. Sutton - 1981

2 papers in library cite

A. G. Barto, C. W. Anderson - 1985

2 papers in library cite

M. Derthick - 1984

2 papers in library cite

Missing year

C. W. Anderson

1 paper in library cites

K. S. Fu - 1968

1 paper in library cites

Cited by

2

papers in your library

Cites

1

papers in your library

Read

on January 17, 2026

Your review

Tags

Paper Aliases

No aliases