2003
Cite Score
11
AI summary
This paper introduces a novel approach to train probabilistic neural networks for statistical language modeling, utilizing importance sampling to address the curse of dimensionality. The method significantly speeds up training by reducing the number of network passes, achieving comparable performance to full-gradient models on the Brown corpus.
Main Contributions
Abstract
Our previous work on statistical language modeling introduced the use of probabilistic feedforward neural networks to help dealing with the curse of dimensionality. Training this model by maximum likelihood however requires for each example to perform as many network passes as there are words in the vocabulary. Inspired by the contrastive divergence model, we propose and evaluate sampling-based methods which require network passes only for the observed "positive example" and a few sampled negative example words. A very significant speed-up is obtained with an adaptive importance sampling.
Citation Graph
References [16]
Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001
62 papers in library cite
Geoffrey Hinton - 2002
23 papers in library cite
A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996
10 papers in library cite
Geoffrey E. Hinton - 1986
13 papers in library cite
S. Katz - 1987
11 papers in library cite
Frederick Jelinek, R. L. Mercer - 1980
8 papers in library cite
C. Chelba, Frederick Jelinek - 2000
6 papers in library cite
Manning, Schutze - 1999
4 papers in library cite
C. Genest, J. V. Zidek - 1986
3 papers in library cite
E. Charniak - 2000
2 papers in library cite
T. Heskes - 1998
2 papers in library cite
L. Saul, M. Jordan - 1996
2 papers in library cite
Michael Collins - 1999
2 papers in library cite
M. Jordan - 1998
2 papers in library cite
Samy Bengio, Yoshua Bengio - 2000
2 papers in library cite
G. Foster - 2002
1 paper in library cites
Cited by
11
papers in your library
Cites
4
papers in your library
Read
on March 28, 2025
Your review
Tags
Paper Aliases
No aliases