2012

A Fast and Simple Algorithm for Training Neural Probabilistic Language Models

A. Mnih, Yee Whye Teh

citations

Cite Score

33

AI summary

This paper introduces a fast and simple algorithm for training Neural Probabilistic Language Models (NPLMs) based on noise-contrastive estimation. It achieves state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset, reducing training times by more than an order of magnitude.

Main Contributions

  • Proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation.
  • Demonstrates the algorithm's efficiency on the Penn Treebank corpus, reducing training times significantly.
  • Shows that the algorithm is more stable and efficient than importance sampling.
  • Trains neural language models on a 47M-word corpus with an 80K-word vocabulary.
  • Achieves state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset.

Abstract

In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less widely used than n-gram models due to their notoriously long training times, which are measured in weeks even for moderately-sized datasets. Training NPLMS is computationally expensive because they are explicitly normalized, which leads to having to consider all words in the vocabulary when computing the log-likelihood gradients. We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. We investigate the behaviour of the algorithm on the Penn Treebank corpus and show that it reduces the training times by more than an order of magnitude without affecting the quality of the resulting models. The algorithm is also more efficient and much more stable than importance sampling because it requires far fewer noise samples to perform well. We demonstrate the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary, obtaining state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset.

Citation Graph

Loading graph...

References [21]

Sort:
Filter:

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Andreas Stolcke - 2002

13 papers in library cite

A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996

10 papers in library cite

J. Turian, L. Ratinov, Yoshua Bengio - 2010

17 papers in library cite

Richard Socher, C. C. Lin, C. Manning, Andrew Y. Ng - 2011

10 papers in library cite

F. Morin, Yoshua Bengio - 2005

19 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

A. Mnih, Geoffrey Hinton - 2007

12 papers in library cite

Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011

13 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2008

6 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2003

11 papers in library cite

Holger Schwenk, Jean Luc Gauvain - 2005

7 papers in library cite

M. Gutmann, A. Hyvarinen - 2010

7 papers in library cite

Geoffrey Zweig, C. J. Burges - 2011

6 papers in library cite

M. U. Gutmann, A. Hyvarinen - 2012

2 papers in library cite

M. Pihlaja, M. Gutmann, A. Hyvarinen - 2010

1 paper in library cites

A. L. Maas, Andrew Y. Ng - 2010

1 paper in library cites

A. Mnih, Z. Yuecheng, Geoffrey Hinton - 2009

1 paper in library cites

S. Bird, E. Klein, E. Loper - 2009

1 paper in library cites

Cited by

5

papers in your library

Cites

14

papers in your library

Read

on April 28, 2025

Your review

Tags

Paper Aliases

No aliases