Papperoni

1986

Two Problems With Backpropagation and Other Steepest-Descent Learning Procedures for Networks

Richard S. Sutton

citations

Cite Score

AI summary

This paper identifies two key problems with steepest-descent learning procedures like backpropagation: inefficiency in 'ravine' surfaces and high cross-pattern interference, and briefly explores alternative descent procedures for improvement.

Main Contributions

Identifies and analyzes two problems (ravines and cross-pattern interference) that slow down learning in backpropagation and other steepest-descent network learning procedures.
Explains how ravines, which are common in network performance surfaces, significantly impede steepest-descent methods.
Details how steepest-descent leads to high interference between learning patterns, causing changes to already useful features rather than creating new ones.
Discusses alternative descent procedures to mitigate these problems, such as methods that distort the surface or adjust individual learning rates.
Suggests that knowledge from general optimization can be carried over to improve network learning algorithms significantly.

Abstract

This article contributes to the theory of network learning procedures by identifying and analyzing two problems with the backpropagation procedure of Rumelhart, Hinton, and Williams (1985) that may slow its learning. Both problems are due to backpropagation's being a gradient- or steepest-descent method in the weight space of the network. The first problem is that steepest descent is a particularly poor descent procedure for surfaces containing ravines—places which curve more sharply in some directions than others—and such ravines are common and pronounced in performance surfaces arising from networks. The second problem is that steepest descent results in a high level of interference between learning with different patterns, because those units that have so far been found most useful are also those most likely to be changed to handle new patterns. The same problems probably also arise with the Boltzmann machine learning procedure (Ackley, Hinton and Sejnowski, 1985) and with reinforcement learning procedures (Barto and Anderson, 1985), as these are also steepest-descent procedures. Finally, some directions in which to look for improvements to backpropagation based on alternative descent procedures are briefly considered.

Citation Graph

Loading graph...

References [15]

Sort:

Filter:

[1]Learning Internal Representations by Error Propagation

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite