2002

Training Products of Experts by Minimizing Contrastive Divergence

Geoffrey Hinton

citations

Cite Score

78

AI summary

This paper introduces a method for training Products of Experts (PoE) by minimizing contrastive divergence, which addresses the difficulty of making experts as different as possible. The approach optimizes a different objective function than the log likelihood of the data, leading to more efficient training.

Main Contributions

  • Introduces a novel approach for training Products of Experts (PoE) by minimizing contrastive divergence.
  • Presents an alternative objective function to avoid computing derivatives of the partition function.
  • Demonstrates the effectiveness of the approach on synthetic data and handwritten digit recognition.
  • Shows that the method learns localized features and performs well in discrimination tasks.
  • Discusses the relationship between PoE's and Boltzmann machines, highlighting the advantages of the contrastive divergence learning algorithm.

Abstract

It is possible to combine multiple probabilistic models of the same data by multiplying their probability distributions together and then renormalizing. This is a very efficient way to model high-dimensional data which simultaneously satisfies many different low-dimensional constraints because each individual expert model can focus on giving high probability to data vectors that satisfy just one of the constraints. Data vectors that satisfy this one constraint but violate other constraints will be ruled out by their low probability under the other experts. Training a product of experts appears difficult because, in addition to maximizing the probability that each individual expert assigns to the observed data, it is necessary to make the experts be as different as possible. This ensures that the product of their distributions is small which allows the renormalization to magnify the probability of the data under the product of experts model. Fortunately, if the individual experts are tractable there is an efficient way to train a product of experts.

Citation Graph

Loading graph...

References [14]

Sort:
Filter:

A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996

10 papers in library cite

Geoffrey Hinton, Peter Dayan, B. Frey, R. Neal - 1995

9 papers in library cite

P. Smolensky - 1986

11 papers in library cite

Geoffrey E. Hinton, T. J. Sejnowski - 1986

9 papers in library cite

Y. Freund, D. Haussler - 1992

8 papers in library cite

S. H. Seung - 1998

5 papers in library cite

C. Genest, J. V. Zidek - 1986

3 papers in library cite

Geoffrey E. Hinton, J. L. Mcclelland - 1988

3 papers in library cite

L. Saul, T. Jaakkola, M. Jordan - 1996

3 papers in library cite

Geoffrey E. Hinton, B. Sallans, Zoubin Ghahramani - 1999

2 papers in library cite

T. Heskes - 1998

2 papers in library cite

P. Winston - 1975

1 paper in library cites

M. Revow, C. Williams, Geoffrey Hinton - 1996

1 paper in library cites

Cited by

23

papers in your library

Cites

4

papers in your library

Read

on June 26, 2025

Your review

Tags

Paper Aliases

No aliases