2001
Cite Score
87
AI summary
This paper introduces a neural probabilistic language model using distributed word representations and neural networks to overcome the curse of dimensionality in language modeling, achieving improved perplexity on the Brown corpus and AP News data compared to n-gram models.
Main Contributions
Abstract
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.
Citation Graph
References [33]
Jeffrey L. Elman - 1990
23 papers in library cite
Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998
20 papers in library cite
Geoffrey Hinton - 2002
23 papers in library cite
Andreas Stolcke - 2002
13 papers in library cite
A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996
10 papers in library cite
R. Kneser, Hermann Ney - 1995
11 papers in library cite
Geoffrey E. Hinton - 1986
13 papers in library cite
Yoshua Bengio, Jean Sebastien Senecal - 2003
11 papers in library cite
Holger Schwenk, Jean Luc Gauvain - 2002
14 papers in library cite
Weixin Xu, Alex Rudnicky - 2000
5 papers in library cite
Jürgen Schmidhuber - 1996
3 papers in library cite
C. Fellbaum - 1998
12 papers in library cite
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman - 1990
12 papers in library cite
J. Goodman - 2001
15 papers in library cite
S. F. Chen, J. Goodman - 1998
13 papers in library cite
P. F. Brown, P. V. Desouza, R. L. Mercer, Vincent J. Della Pietra, J. C. Lai - 1992
12 papers in library cite
S. Katz - 1987
11 papers in library cite
Frederick Jelinek, R. L. Mercer - 1980
8 papers in library cite
Fernando Pereira, N. Tishby, L. Lee - 1993
4 papers in library cite
R. Miikkulainen, M. G. Dyer - 1991
4 papers in library cite
Yoshua Bengio, Samy Bengio - 2000
3 papers in library cite
Hinrich Schutze - 1993
3 papers in library cite
J. R. Bellegarda - 1997
2 papers in library cite
T. R. Niesler, E. W. D. Whittaker, P. C. Woodland - 1998
2 papers in library cite
D. Baker, Andrew Mccallum - 1998
2 papers in library cite
A. Paccanaro, Geoffrey Hinton - 2000
2 papers in library cite
Hermann Ney, R. Kneser - 1993
2 papers in library cite
S. Riis, A. Krogh - 1996
2 papers in library cite
Yoshua Bengio - 2002
2 papers in library cite
K. J. Jensen, S. Riis - 2000
2 papers in library cite
Samy Bengio, Yoshua Bengio - 2000
2 papers in library cite
J. Dongarra, D. Walker, T. M. P. I. Forum - 1995
1 paper in library cites
A. Brown, Geoffrey E. Hinton - 2000
1 paper in library cites
Cited by
62
papers in your library
Cites
13
papers in your library
Read
on March 17, 2025
Your review
Tags
Paper Aliases
No aliases