2011

Strategies for Training Large Scale Neural Network Language Models

Tomas Mikolov, A. Deoras, D. Povey, Lukas Burget, Jan Cernocky

citations

Cite Score

30

AI summary

This paper introduces a method to effectively train neural network language models on large datasets by sorting training data by relevance and using a hash-based maximum entropy model, achieving a 10% relative reduction in word error rate on the English Broadcast News task using 400M tokens.

Main Contributions

  • Introduces a method to sort training data by relevance for faster convergence and better performance.
  • Presents a hash-based implementation of a maximum entropy model that can be trained as part of the neural network model, reducing computational complexity.
  • Achieves around 10% relative reduction of word error rate on English Broadcast News speech recognition task.
  • Experiments are performed using Recurrent neural network language model (RNN LM).

Abstract

We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.

Citation Graph

Loading graph...

References [21]

Sort:
Filter:

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Yoshua Bengio, J. Louradour, Ronan Collobert, Jason Weston - 2009

6 papers in library cite

Jeffrey L. Elman - 1993

5 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

F. Morin, Yoshua Bengio - 2005

19 papers in library cite

Holger Schwenk - 2007

12 papers in library cite

Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011

13 papers in library cite

Holger Schwenk, Jean Luc Gauvain - 2005

7 papers in library cite

Weixin Xu, Alex Rudnicky - 2000

5 papers in library cite

H. S. Le, I. Oparin, A. Allauzen, Jean Luc Gauvain, F. Yvon - 2011

7 papers in library cite

J. T. Goodman - 2001

7 papers in library cite

R. Rosenfeld - 1996

6 papers in library cite

S. F. Chen - 2009

3 papers in library cite

Geoffrey Zweig, P. Nguyen, D. V. Compernolle, K. Demuynck, L. Atlas, Peter Clark, G. Sell, Mingliang Wang, F. Sha, H. Hermansky, D. Karakos, A. Jansen, S. Thomas, G. S. V. S. Sivaram, S. Bowman, J. Kao - 2011

3 papers in library cite

H. Soltau, G. Saon, Brian Kingsbury - 2010

3 papers in library cite

A. Deoras, Tomas Mikolov, K. Church - 2011

2 papers in library cite

T. Alumae, M. Kurimo - 2010

2 papers in library cite

P. Xu, A. Gunawardana, Sanjeev Khudanpur - 2011

2 papers in library cite

A. Deoras, D. Filimonov, M. Harper, Frederick Jelinek - 2010

1 paper in library cites

S. Chen, L. Mangu, Bhuvana Ramabhadran, R. Sarikaya, A. Sethy - 2009

1 paper in library cites

Cited by

9

papers in your library

Cites

10

papers in your library

Read

on March 20, 2025

Your review

Tags

Paper Aliases

No aliases