Papperoni

1992

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

R. Williams

Open PDF Google Scholar

citations

Cite Score

88

AI summary

This paper introduces a general class of REINFORCE algorithms for connectionist networks with stochastic units, demonstrating their ability to perform gradient-following reinforcement learning in both immediate and delayed reinforcement tasks without explicit gradient computation, and showing their integration with backpropagation.

Main Contributions

Introduces REINFORCE algorithms for connectionist networks with stochastic units.
Demonstrates that REINFORCE algorithms perform gradient-following for expected reinforcement in immediate and limited delayed reinforcement tasks without explicit gradient computation.
Shows how REINFORCE algorithms can be integrated with backpropagation for networks with deterministic hidden units.
Explores the application of REINFORCE with multiparameter distributions, such as Gaussian units, allowing control over exploratory behavior.
Provides analytical results on the relationship between the average weight update and the gradient of the performance measure for REINFORCE and episodic REINFORCE algorithms.

Abstract

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.

Citation Graph

Loading graph...

References [33]

Sort:

Filter:

[1]Learning Internal Representations by Error Propagation

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

I expected very little of this, but was so good in explaining concepts! Very good read. It gets a bit boring when it starts explaining things by the end of the chapter, but good nonetheless.

[2]Une procédure d'apprentissage pour Réseau a seuil Asymmetrique (A Learning Scheme for Asymmetric Threshold Networks)

Yann Lecun - 1985

4 papers in library cite

Meh. Really did not bring much to the table. Describes back-prop but mentions that it comes from Hinton.

[3]Learning and Relearning in Boltzmann Machines

Geoffrey E. Hinton, T. J. Sejnowski - 1986

9 papers in library cite

37 pages; Introduced Boltzmann machines

[4]Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences

P. Werbos - 1974

14 papers in library cite

Werbos PhD Thesis, introduced backprop

[5]Learning-Logic

D. B. Parker - 1985

8 papers in library cite

[6]Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton - 1988

3 papers in library cite

[7]Neuronlike Elements That Can Solve Difficult Learning Control Problems

A. G. Barto, Richard S. Sutton, C. W. Anderson - 1983

3 papers in library cite

[8]Reinforcement Learning in Connectionist Networks: A Mathematical Analysis

Ronald J. Williams - 1986

3 papers in library cite

[9]A Dual Back-Propagation Scheme for Scalar Reward Learning

P. Munro - 1987

2 papers in library cite

[10]Learning by Statistical Cooperation of Self-Interested Neuron-Like Computing Elements

A. G. Barto - 1985

2 papers in library cite

[11]Learning From Delayed Rewards

C. J. C. H. Watkins - 1989

2 papers in library cite

[12]Pattern Recognizing Stochastic Learning automata

A. G. Barto, P. Anandan - 1985

2 papers in library cite

[13]Structural Learning in Connectionist Systems

A. G. Barto, C. W. Anderson - 1985

2 papers in library cite

[14]Temporal Credit Assignment in Reinforcement Learning

Richard S. Sutton - 1984

2 papers in library cite

[15]A Class of Gradient-Estimating Algorithms for Reinforcement Learning in Neural Networks

Ronald J. Williams - 1987

1 paper in library cites

[16]A New Approach to the Design of Reinforcement Schemes for Learning automata

M. A. L. Thathatchar, P. S. Sastry - 1985

1 paper in library cites

[17]A Stochastic Reinforcement Learning algorithm for Learning Real-Valued Functions

V. Gullapalli - 1990

1 paper in library cites

[18]Adaptive Filtering Prediction and Control

G. C. Goodwin, K. S. Sin - 1984

1 paper in library cites

[19]An Introduction to Probability Theory and Mathematical Statistics

V. K. Rohatgi - 1976

1 paper in library cites

[20]An N-Player Sequential Stochastic Game With Identical Payoffs

K. S. Narendra, R. M. W. Jr - 1983

1 paper in library cites

[21]Associative Search Network: A Reinforcement Learning Associative Memory

A. G. Barto, Richard S. Sutton, P. S. Brouwer - 1981

1 paper in library cites

[22]Decentralized Learning in finite Markov Chains

R. M. W. Jr, K. S. Narendra - 1986

1 paper in library cites

[23]Forward Models: Supervised Learning With a Distal Teacher

Michael I. Jordan, D. E. Rumelhart - 1990

1 paper in library cites

[24]Function Optimization Using Connectionist Reinforcement Learning Algorithms

Ronald J. Williams, J. Peng - 1991

1 paper in library cites

[25]Gradient Following Without Back-Propagation in Layered Networks

A. G. Barto, Michael I. Jordan - 1987

1 paper in library cites

[26]Learning and Sequential Decision Making

A. G. Barto, Richard S. Sutton, C. J. C. H. Watkins - 1990

1 paper in library cites

[27]Learning Automata: An Introduction

K. S. Narendra, M. A. L. Thathatchar - 1989

1 paper in library cites

[28]Learning to Generate Focus Trajectories for Attentive Vision

J. H. Schmidhuber, R. Huber - 1990

1 paper in library cites

[29]On the Use of Backpropagation in Associative Reinforcement Learning

Ronald J. Williams - 1988

1 paper in library cites

[30]Principles of Artificial Intelligence

N. J. Nilsson - 1980

1 paper in library cites

[31]Reinforcement Comparison

Peter Dayan - 1990

1 paper in library cites

[32]Reinforcement-Learning Connectionist Systems

Ronald J. Williams - 1987

1 paper in library cites

[33]Toward a Theory of Reinforcement-Learning Connectionist Systems

Ronald J. Williams - 1988

1 paper in library cites

Cited by

11

papers in your library

Cites

4

papers in your library

Read

on January 20, 2026

It's alright for formalizing the concept, but it's a bit boring and doesn't add a lot from the middle on. Focuses too much in reviewing existing techniques and in stochastic units.

Tags

RLHF

Paper Aliases

No aliases