Papperoni

2004

Efficient Training of Large Neural Networks for Language Modeling

Holger Schwenk

citations

Cite Score

AI summary

This paper introduces techniques for fast training and recognition of neural network language models (NNLM) in large vocabulary speech recognition. It achieves significant word error reductions on a conversational speech recognizer for the DARPA rich transcriptions evaluations, using corpora of over 10 million words.

Main Contributions

Introduces techniques for fast training and recognition of neural network language models (NNLM).
Presents algorithms for fast training and recognition of the neural network LM and discusses convergence properties.
Achieves word error reductions with respect to a carefully tuned 4-gram backoff language model in a state of the art conversational speech recognizer for the DARPA rich transcriptions evaluations.
Evaluates the approach within a state of the art speech recognizer for conversational telephone speech (CTS).
Explores the impact of different network sizes on perplexity and word error rate.

Abstract

Recently there has been increasing interest in using neural networks for language modeling. In contrast to the well known backoff n-gram language models, the neural network approach tries to limit the data sparseness problem by performing the estimation in a continuous space, allowing by this means smooth interpolations. The complexity to train such a model and to calculate one n-gram probability is however several orders of magnitude higher than for the backoff models, making the new approach difficult to use in real applications. In this paper several techniques are presented that allow the use of a neural network language model in a large vocabulary speech recognition system, in particular very fast lattice rescoring and efficient training of large neural networks on training corpora of over 10 million words. The described approach achieves significant word error reductions with respect to a carefully tuned 4-gram backoff language model in a state of the art conversational speech recognizer for the DARPA rich transcriptions evaluations.

Citation Graph

Loading graph...

References [11]

Sort:

Filter:

[1]A Neural Probabilistic Language Model

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Google Scholar

What started it all. Very simple and elegant.

[2]Srilm - An Extensible Language Modeling Toolkit

Andreas Stolcke - 2002

13 papers in library cite

Google Scholar

Toolkit for N-grams. Not too relevant and sounds veeeery simple (sorry for those who implemented it). It's nice to see early implementation of OOP though. The paper is boring and doesn't really say much about the framework, more of a description of how to use the commands and n-gram models.

[3]Quick Training of Probabilistic Neural Nets by Importance Sampling

Yoshua Bengio, Jean Sebastien Senecal - 2003

11 papers in library cite

Google Scholar

Good idea to overcome softmax computation cost. Not sure if too relevant today, but definitely better than the 2008 paper that is the same stuff.

[4]Connectionist Language Modeling for Large Vocabulary Continuous Speech Recognition

Holger Schwenk, Jean Luc Gauvain - 2002

14 papers in library cite

Google Scholar

Only real relevance is being early. Otherwise not much to see.

[5]Using a Connectionist Model in a Syntactical Based Language Model

Frederick Jelinek - 2003

6 papers in library cite

Google Scholar

A copy of the other paper. I still don't get what the "Structured Language Model" is and this seems like trying to fit a cube in a circular hole. I don't like it.

[6]An Empirical Study of Smoothing Techniques for Language Modeling

S. F. Chen, J. Goodman - 1998

13 papers in library cite

Google Scholar

[7]Using Phipac to Speed Error Back-Propagation Learning

J. Bilmes, K. Asanovic, C. Chin, J. Demmel - 1997

3 papers in library cite

Google Scholar

[8]Conversational Telephone Speech Recognition

Jean Luc Gauvain, L. Lamel, Holger Schwenk, G. Adda, L. C. Chen, F. Lefe`vre - 2003

2 papers in library cite

Google Scholar

[9]Relevance Weighting for Combining Multi-Domain Data for N-Gram Language Modeling

R. Iyer, M. Ostendorf - 1999

2 papers in library cite

Google Scholar

[10]Using Continuous Space Language Models for Conversational Speech Recognition

Holger Schwenk, Jean Luc Gauvain - 2003

2 papers in library cite

Google Scholar

[11]Spring Speech-to-Text Transcription Evaluation Results

A. Lee, J. Fiscus, J. Garofolo, M. Przybocki, A. Martin, G. Sanders, D. Pallett - 2003

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on March 18, 2025

Very early paper, maybe that's why it's relevant. But very uninteresting despite the nice name ("large NNs" had a very different meaning back then)