2005

Hierarchical Probabilistic Neural Network Language Model

F. Morin, Yoshua Bengio

citations

Cite Score

44

AI summary

This paper introduces a hierarchical decomposition of conditional probabilities in neural network language models, achieving a speed-up of about 200 during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by prior knowledge extracted from the WordNet semantic hierarchy.

Main Contributions

  • Introduces a hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition.
  • The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy.
  • The implementation and the experiments show that a very significant speed-up of around 200-fold can be achieved, with only a little degradation in generalization performance.

Abstract

In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbols) in a continuous space that helps to smooth the language model and provide good generalization even when the number of training examples is insufficient. However, these models are extremely slow in comparison to the more commonly used n-gram models, both for training and recognition. As an alternative to an importance sampling method proposed to speed-up training, we introduce a hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy.

Citation Graph

Loading graph...

References [26]

Sort:
Filter:

Jeffrey L. Elman - 1990

23 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Geoffrey Hinton - 2002

23 papers in library cite

Andreas Stolcke - 2002

13 papers in library cite

A. L. Berger, S. A. D. Pietra, Vincent J. Della Pietra - 1996

10 papers in library cite

Geoffrey E. Hinton - 1986

13 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2003

11 papers in library cite

Holger Schwenk, Jean Luc Gauvain - 2002

14 papers in library cite

Weixin Xu, Alex Rudnicky - 2000

5 papers in library cite

Jürgen Schmidhuber - 1996

3 papers in library cite

Holger Schwenk - 2004

6 papers in library cite

P. Xu, A. Emami, Frederick Jelinek - 2003

3 papers in library cite

C. Fellbaum - 1998

12 papers in library cite

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman - 1990

12 papers in library cite

G. Salton, C. Buckley - 1988

2 papers in library cite

J. Goodman - 2001

15 papers in library cite

P. F. Brown, P. V. Desouza, R. L. Mercer, Vincent J. Della Pietra, J. C. Lai - 1992

12 papers in library cite

Frederick Jelinek, R. L. Mercer - 1980

8 papers in library cite

J. T. Goodman - 2001

7 papers in library cite

Fernando Pereira, N. Tishby, L. Lee - 1993

4 papers in library cite

R. Miikkulainen, M. G. Dyer - 1991

4 papers in library cite

Hinrich Schutze - 1993

3 papers in library cite

T. R. Niesler, E. W. D. Whittaker, P. C. Woodland - 1998

2 papers in library cite

D. Baker, Andrew Mccallum - 1998

2 papers in library cite

Hermann Ney, R. Kneser - 1993

2 papers in library cite

Cited by

19

papers in your library

Cites

15

papers in your library

Read

on March 18, 2025

Your review

Tags

Paper Aliases

No aliases