2001

A Neural Probabilistic Language Model

Yoshua Bengio, R. Ducharme, Pascal Vincent

citations

Cite Score

87

AI summary

This paper introduces a neural probabilistic language model using distributed word representations and neural networks to overcome the curse of dimensionality in language modeling, achieving improved perplexity on the Brown corpus and AP News data compared to n-gram models.

Main Contributions

  • Introduces a neural probabilistic language model that learns distributed representations for words.
  • The model learns word feature vectors and the probability function simultaneously.
  • Demonstrates improved generalization by leveraging semantic similarity between words.
  • Achieves significantly better perplexity on the Brown corpus compared to state-of-the-art n-gram models.
  • Shows that the model can effectively utilize longer contexts for language modeling.

Abstract

A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

Citation Graph

Loading graph...

References [33]

Sort:
Filter:

Jeffrey L. Elman - 1990

23 papers in library cite

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Geoffrey Hinton - 2002

23 papers in library cite

Andreas Stolcke - 2002

13 papers in library cite

A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996

10 papers in library cite

R. Kneser, Hermann Ney - 1995

11 papers in library cite

Geoffrey E. Hinton - 1986

13 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2003

11 papers in library cite

Holger Schwenk, Jean Luc Gauvain - 2002

14 papers in library cite

Weixin Xu, Alex Rudnicky - 2000

5 papers in library cite

Jürgen Schmidhuber - 1996

3 papers in library cite

C. Fellbaum - 1998

12 papers in library cite

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman - 1990

12 papers in library cite

J. Goodman - 2001

15 papers in library cite

S. F. Chen, J. Goodman - 1998

13 papers in library cite

P. F. Brown, P. V. Desouza, R. L. Mercer, Vincent J. Della Pietra, J. C. Lai - 1992

12 papers in library cite

Frederick Jelinek, R. L. Mercer - 1980

8 papers in library cite

Fernando Pereira, N. Tishby, L. Lee - 1993

4 papers in library cite

R. Miikkulainen, M. G. Dyer - 1991

4 papers in library cite

Yoshua Bengio, Samy Bengio - 2000

3 papers in library cite

Hinrich Schutze - 1993

3 papers in library cite

J. R. Bellegarda - 1997

2 papers in library cite

T. R. Niesler, E. W. D. Whittaker, P. C. Woodland - 1998

2 papers in library cite

D. Baker, Andrew Mccallum - 1998

2 papers in library cite

A. Paccanaro, Geoffrey Hinton - 2000

2 papers in library cite

Hermann Ney, R. Kneser - 1993

2 papers in library cite

Yoshua Bengio - 2002

2 papers in library cite

K. J. Jensen, S. Riis - 2000

2 papers in library cite

Samy Bengio, Yoshua Bengio - 2000

2 papers in library cite

J. Dongarra, D. Walker, T. M. P. I. Forum - 1995

1 paper in library cites

A. Brown, Geoffrey E. Hinton - 2000

1 paper in library cites

Cited by

62

papers in your library

Cites

13

papers in your library

Read

on March 17, 2025

Your review

Tags

Paper Aliases

No aliases