1992

Practical Issues in Temporal Difference Learning

Gerald Tesauro

citations

Cite Score

47

AI summary

This paper explores the application of Sutton's TD(λ) algorithm to learn backgammon through self-play, demonstrating that a connectionist network can learn to play at an intermediate to near-expert level, outperforming traditional commercial programs and networks trained on human expert data.

Main Contributions

  • Examines practical issues of applying TD(λ) methods to complex real-world problems beyond theoretical assumptions.
  • Applies TD(λ) to learn the game of backgammon from self-play, a complex nontrivial task.
  • Demonstrates that a connectionist network, with zero built-in knowledge, can learn to play backgammon at a strong intermediate level.
  • Shows that TD nets can surpass networks trained on massive human expert data sets in backgammon performance.
  • Achieves near-expert level performance in backgammon by adding hand-crafted features to the input representation, competing with world-class human players.

Abstract

This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(λ) algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(λ) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of hand-crafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results against world-class human play.

Citation Graph

Loading graph...

References [9]

Sort:
Filter:

Gerald Tesauro - 1992

3 papers in library cite

Richard S. Sutton - 1988

3 papers in library cite

A. Samuel - 1959

2 papers in library cite

Gerald Tesauro, T. J. Sejnowski - 1989

1 paper in library cites

P. W. Frey - 1986

1 paper in library cites

H. Berliner - 1980

1 paper in library cites

Gerald Tesauro - 1989

1 paper in library cites

Gerald Tesauro - 1990

1 paper in library cites

Peter Dayan - 1992

1 paper in library cites

Cited by

3

papers in your library

Cites

1

papers in your library

Read

on February 1, 2026

Your review

Tags

Paper Aliases

No aliases