Papperoni

2005

Training Neural Network Language Models on Very Large Corpora

Holger Schwenk, Jean Luc Gauvain

Open PDF Google Scholar

citations

Cite Score

10

AI summary

This paper introduces new algorithms to train neural network language models on large text corpora, enabling their use in domains with hundreds of millions of words. Evaluated on French Broadcast News, the models achieve a 0.5% absolute word error reduction using minimal additional processing time.

Main Contributions

Introduces new algorithms for training neural network language models on very large text corpora.
Demonstrates the applicability of neural network language models in domains with extensive text data.
Achieves a significant word error reduction of 0.5% absolute in a state-of-the-art real-time continuous speech recognizer for French Broadcast News.
The neural network LMs is incorporated into the speech recognizer by rescoring lattices in less than 0.05xRT.
Presents an algorithm for training the neural network on arbitrary large training corpora by using a different small random subset at each epoch.

Abstract

During the last years there has been growing interest in using neural networks for language modeling. In contrast to the well known back-off n-gram language models, the neural network approach attempts to overcome the data sparseness problem by performing the estimation in a continuous space. This type of language model was mostly used for tasks for which only a very limited amount of in-domain training data is available. In this paper we present new algorithms to train a neural network language model on very large text corpora. This makes possible the use of the approach in domains where several hundreds of millions words of texts are available. The neural network language model is evaluated in a state-of-the-art real-time continuous speech recognizer for French Broadcast News. Word error reductions of 0.5% absolute are reported using only a very limited amount of additional processing time.

Citation Graph

Loading graph...

References [18]

Sort:

Filter:

[1]A Neural Probabilistic Language Model

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

What started it all. Very simple and elegant.

[2]Srilm - An Extensible Language Modeling Toolkit

Andreas Stolcke - 2002

13 papers in library cite

Toolkit for N-grams. Not too relevant and sounds veeeery simple (sorry for those who implemented it). It's nice to see early implementation of OOP though. The paper is boring and doesn't really say much about the framework, more of a description of how to use the commands and n-gram models.

[3]Connectionist Language Modeling for Large Vocabulary Continuous Speech Recognition

Holger Schwenk, Jean Luc Gauvain - 2002

14 papers in library cite

Only real relevance is being early. Otherwise not much to see.

[4]Efficient Training of Large Neural Networks for Language Modeling

Holger Schwenk - 2004

6 papers in library cite

Very early paper, maybe that's why it's relevant. But very uninteresting despite the nice name ("large NNs" had a very different meaning back then)

[5]Boosting a Weak Learning Algorithm by Majority

Y. Freund - 1995

2 papers in library cite

50 pages; Boosting

[6]Statistical Learning Theory

V. N. Vapnik - 1998

10 papers in library cite

Book, but there's a 12 page overview

[7]An Empirical Study of Smoothing Techniques for Language Modeling

S. F. Chen, J. Goodman - 1998

13 papers in library cite

[8]Class-Based N-Gram Models of Natural Language

P. F. Brown, P. V. Desouza, R. L. Mercer, Vincent J. Della Pietra, J. C. Lai - 1992

12 papers in library cite

[9]Estimation of Probabilities From Sparse Data for the Language Model Component of a Speech Recognizer

S. Katz - 1987

11 papers in library cite

[10]A Maximum Entropy Approach to Adaptive Statistical Language Modeling

R. Rosenfeld - 1996

6 papers in library cite

[11]Structured Language Modeling

C. Chelba, Frederick Jelinek - 2000

6 papers in library cite

[12]Exact Training of a Neural Syntactic Language Model

A. Emami, Frederick Jelinek - 2004

4 papers in library cite

[13]Random Clusterings for Language Modeling

A. Emami, Frederick Jelinek - 2005

4 papers in library cite

[14]Building Continuous Space Language Models for Transcribing European Languages

Holger Schwenk, Jean Luc Gauvain - 2005

3 papers in library cite

[15]Using Phipac to Speed Error Back-Propagation Learning

J. Bilmes, K. Asanovic, C. Chin, J. Demmel - 1997

3 papers in library cite

[16]Neural Network Language Models for Conversational Speech Recognition

Holger Schwenk, Jean Luc Gauvain - 2004

2 papers in library cite

[17]Random Forest in Language Modeling

P. Xu, Frederick Jelinek - 2004

2 papers in library cite

[18]Where Are We in Transcribing BN French?

Jean Luc Gauvain, G. Adda, M. A. Decker, A. Allauzen, V. Gendner, L. Lamel, Holger Schwenk - 2005

2 papers in library cite

Cited by

7

papers in your library

Cites

6

papers in your library

Read

on March 21, 2025

Seems very derivative of Schwenk's early work. It's also very focused on speech recognition, and "very large corpora" seems very relative.

Tags

Paper Aliases

No aliases