1988

Increased Rates of Convergence Through Learning Rate Adaptation

Robert A. Jacobs

citations

Cite Score

63

AI summary

This paper investigates steepest descent and its slow convergence in connectionist networks, proposing four heuristics for adaptive learning rates. It introduces the delta-bar-delta rule and momentum as implementations, demonstrating faster convergence in simulations across quadratic surfaces and various network tasks.

Main Contributions

  • Analyzed why steepest descent can be slow to converge in connectionist networks.
  • Proposed four heuristics for achieving faster convergence rates, emphasizing individual and time-varying learning rates for each weight.
  • Introduced the delta-bar-delta rule, an adaptive learning rate algorithm that adjusts rates based on the sign of current and exponentially averaged past derivatives.
  • Evaluated momentum and delta-bar-delta (and a hybrid) on tasks like quadratic surfaces, exclusive-or, multiplexer, and binary-to-local functions.
  • Demonstrated that adaptive learning rate algorithms generally achieve faster convergence than steepest descent.

Abstract

While there exist many techniques for finding the parameters that minimize an error function, only those methods that solely perform local computations are used in connectionist networks. The most popular learning algorithm for connectionist networks is the back-propagation procedure (13), which can be used to update the weights by the method of steepest descent. In this paper, we examine steepest descent and analyze why it can be slow to converge. We then propose four heuristics for achieving faster rates of convergence while adhering to the locality constraint. These heuristics suggest that every weight of a network should be given its own learning rate and that these rates should be allowed to vary over time. Additionally, the heuristics suggest how the learning rates should be adjusted. Two implementations of these heuristics, namely momentum and an algorithm called the delta-bar-delta rule, are studied and simulation results are presented.

Citation Graph

Loading graph...

References [19]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

Richard S. Sutton - 1986

2 papers in library cite

D. C. Plaut, S. J. Nowlan, Geoffrey E. Hinton - 1986

5 papers in library cite

B. Widrow, H. E. Hoff - 1960

5 papers in library cite

B. Widrow, S. D. Stearns - 1985

4 papers in library cite

P. E. Gill, W. Murray, M. H. Wright - 1981

3 papers in library cite

H. Kesten - 1958

2 papers in library cite

A. G. Barto, Richard S. Sutton - 1981

2 papers in library cite

N. Littlestone - 1988

2 papers in library cite

M. Derthick - 1984

2 papers in library cite

D. B. Parker - 1986

1 paper in library cites

Missing year

R. Scalettar, A. Zee

1 paper in library cites

S. Haykin - 1986

1 paper in library cites

M. L. Honig, D. G. Messerschmitt - 1984

1 paper in library cites

E. Mjolsness - 1987

1 paper in library cites

C. W. Anderson - 1986

1 paper in library cites

G. N. Saridis - 1970

1 paper in library cites

S. E. Hampson, D. J. Volper - 1986

1 paper in library cites

Cited by

4

papers in your library

Cites

2

papers in your library

Read

on January 13, 2026

Your review

Tags

Paper Aliases

No aliases