1992
Cite Score
88
AI summary
This paper introduces a general class of REINFORCE algorithms for connectionist networks with stochastic units, demonstrating their ability to perform gradient-following reinforcement learning in both immediate and delayed reinforcement tasks without explicit gradient computation, and showing their integration with backpropagation.
Main Contributions
Abstract
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Citation Graph
References [33]
D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986
46 papers in library cite
Yann Lecun - 1985
4 papers in library cite
Geoffrey E. Hinton, T. J. Sejnowski - 1986
9 papers in library cite
P. Werbos - 1974
14 papers in library cite
D. B. Parker - 1985
8 papers in library cite
Richard S. Sutton - 1988
3 papers in library cite
A. G. Barto, Richard S. Sutton, C. W. Anderson - 1983
3 papers in library cite
Ronald J. Williams - 1986
3 papers in library cite
P. Munro - 1987
2 papers in library cite
A. G. Barto - 1985
2 papers in library cite
C. J. C. H. Watkins - 1989
2 papers in library cite
A. G. Barto, P. Anandan - 1985
2 papers in library cite
A. G. Barto, C. W. Anderson - 1985
2 papers in library cite
Richard S. Sutton - 1984
2 papers in library cite
Ronald J. Williams - 1987
1 paper in library cites
M. A. L. Thathatchar, P. S. Sastry - 1985
1 paper in library cites
V. Gullapalli - 1990
1 paper in library cites
G. C. Goodwin, K. S. Sin - 1984
1 paper in library cites
V. K. Rohatgi - 1976
1 paper in library cites
K. S. Narendra, R. M. W. Jr - 1983
1 paper in library cites
A. G. Barto, Richard S. Sutton, P. S. Brouwer - 1981
1 paper in library cites
R. M. W. Jr, K. S. Narendra - 1986
1 paper in library cites
Michael I. Jordan, D. E. Rumelhart - 1990
1 paper in library cites
Ronald J. Williams, J. Peng - 1991
1 paper in library cites
A. G. Barto, Michael I. Jordan - 1987
1 paper in library cites
A. G. Barto, Richard S. Sutton, C. J. C. H. Watkins - 1990
1 paper in library cites
K. S. Narendra, M. A. L. Thathatchar - 1989
1 paper in library cites
J. H. Schmidhuber, R. Huber - 1990
1 paper in library cites
Ronald J. Williams - 1988
1 paper in library cites
N. J. Nilsson - 1980
1 paper in library cites
Peter Dayan - 1990
1 paper in library cites
Ronald J. Williams - 1987
1 paper in library cites
Ronald J. Williams - 1988
1 paper in library cites
Cited by
11
papers in your library
Cites
4
papers in your library
Read
on January 20, 2026
Your review
Tags
Paper Aliases
No aliases