2012
Cite Score
33
AI summary
This paper introduces a fast and simple algorithm for training Neural Probabilistic Language Models (NPLMs) based on noise-contrastive estimation. It achieves state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset, reducing training times by more than an order of magnitude.
Main Contributions
Abstract
In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less widely used than n-gram models due to their notoriously long training times, which are measured in weeks even for moderately-sized datasets. Training NPLMS is computationally expensive because they are explicitly normalized, which leads to having to consider all words in the vocabulary when computing the log-likelihood gradients. We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. We investigate the behaviour of the algorithm on the Penn Treebank corpus and show that it reduces the training times by more than an order of magnitude without affecting the quality of the resulting models. The algorithm is also more efficient and much more stable than importance sampling because it requires far fewer noise samples to perform well. We demonstrate the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary, obtaining state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset.
Citation Graph
References [21]
Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001
62 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
Ronan Collobert, Jason Weston - 2008
32 papers in library cite
Andreas Stolcke - 2002
13 papers in library cite
A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996
10 papers in library cite
J. Turian, L. Ratinov, Yoshua Bengio - 2010
17 papers in library cite
Richard Socher, C. C. Lin, C. Manning, Andrew Y. Ng - 2011
10 papers in library cite
F. Morin, Yoshua Bengio - 2005
19 papers in library cite
A. Mnih, Geoffrey E. Hinton - 2009
16 papers in library cite
A. Mnih, Geoffrey Hinton - 2007
12 papers in library cite
Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011
13 papers in library cite
Yoshua Bengio, Jean Sebastien Senecal - 2008
6 papers in library cite
Yoshua Bengio, Jean Sebastien Senecal - 2003
11 papers in library cite
Holger Schwenk, Jean Luc Gauvain - 2005
7 papers in library cite
M. Gutmann, A. Hyvarinen - 2010
7 papers in library cite
Geoffrey Zweig, C. J. Burges - 2011
6 papers in library cite
M. U. Gutmann, A. Hyvarinen - 2012
2 papers in library cite
M. Pihlaja, M. Gutmann, A. Hyvarinen - 2010
1 paper in library cites
A. L. Maas, Andrew Y. Ng - 2010
1 paper in library cites
A. Mnih, Z. Yuecheng, Geoffrey Hinton - 2009
1 paper in library cites
S. Bird, E. Klein, E. Loper - 2009
1 paper in library cites
Cited by
5
papers in your library
Cites
14
papers in your library
Read
on April 28, 2025
Your review
Tags
Paper Aliases
No aliases