Papperoni

1992

Practical Issues in Temporal Difference Learning

Gerald Tesauro

citations

Cite Score

AI summary

This paper explores the application of Sutton's TD(λ) algorithm to learn backgammon through self-play, demonstrating that a connectionist network can learn to play at an intermediate to near-expert level, outperforming traditional commercial programs and networks trained on human expert data.

Main Contributions

Examines practical issues of applying TD(λ) methods to complex real-world problems beyond theoretical assumptions.
Applies TD(λ) to learn the game of backgammon from self-play, a complex nontrivial task.
Demonstrates that a connectionist network, with zero built-in knowledge, can learn to play backgammon at a strong intermediate level.
Shows that TD nets can surpass networks trained on massive human expert data sets in backgammon performance.
Achieves near-expert level performance in backgammon by adding hand-crafted features to the input representation, competing with world-class human players.

Abstract

This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(λ) algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(λ) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of hand-crafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results against world-class human play.

Citation Graph

Loading graph...

References [9]

Sort:

Filter:

[1]Practical Issues in Temporal Difference Learning

Gerald Tesauro - 1992

3 papers in library cite

Google Scholar

(There's a larger paper with the same title) It's a fun read and I learned a bit about early reinforcement learning with NNs. However, I read this because Bengio cited for "problems related to learning deep archs", and that's not it.

[2]Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton - 1988

3 papers in library cite

Google Scholar

[3]Some Studies in Machine Learning Using the Game of Checkers

A. Samuel - 1959

2 papers in library cite

Google Scholar

[4]A Parallel Network That Learns to Play Backgammon

Gerald Tesauro, T. J. Sejnowski - 1989

1 paper in library cites

Google Scholar

[5]Algorithmic Strategies for Improving the Performance of Game Playing Programs

P. W. Frey - 1986

1 paper in library cites

Google Scholar

[6]Computer Backgammon

H. Berliner - 1980

1 paper in library cites

Google Scholar

[7]Connectionist Learning of Expert Preferences by Comparison Training

Gerald Tesauro - 1989

1 paper in library cites

Google Scholar

[8]Neurogammon: A Neural Network Backgammon Program

Gerald Tesauro - 1990

1 paper in library cites

Google Scholar

[9]Temporal Differences: TD(λ) for General Λ

Peter Dayan - 1992

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on February 1, 2026

(There's a larger paper with the same title) It's a fun read and I learned a bit about early reinforcement learning with NNs. However, I read this because Bengio cited for "problems related to learning deep archs", and that's not it.