2012

Subword Language Modeling With Neural Networks

Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky

citations

Cite Score

12

AI summary

This paper introduces a subword language model using neural networks, combining character and word-level advantages. It demonstrates that neural network models can be significantly smaller than compressed n-gram models while maintaining performance on the Broadcast news RT04 task, with further size reductions possible through sub-word units and quantization.

Main Contributions

  • Proposed a simple technique for learning sub-word level units from data, combining the advantages of character and word-level models.
  • Showed that neural network based language models can be an order of magnitude smaller than compressed n-gram models.
  • Demonstrated that using quantization, memory requirements can be reduced by around 90% while maintaining word error rate.
  • Explored the possibility of further reduction of size of the neural network language model by decomposing infrequent words into subwords.
  • Achieved comparable or better performance than n-gram models in speech recognition tasks with significantly smaller neural network models.

Abstract

We explore the performance of several types of language models on the word-level and the character-level language modeling tasks. This includes two recently proposed recurrent neural network architectures, a feedforward neural network model, a maximum entropy model and the usual smoothed n-gram models. We then propose a simple technique for learning sub-word level units from the data, and show that it combines advantages of both character and word-level models. Finally, we show that neural network based language models can be order of magnitude smaller than compressed n-gram models, at the same level of performance when applied to a Broadcast news RT04 speech recognition task. By using sub-word units, the size can be reduced even more.

Citation Graph

Loading graph...

References [23]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

Jeffrey L. Elman - 1990

23 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Andreas Stolcke - 2002

13 papers in library cite

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

James Martens, Ilya Sutskever - 2011

13 papers in library cite

Tomas Mikolov, A. Deoras, D. Povey, Lukas Burget, Jan Cernocky - 2011

9 papers in library cite

Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011

13 papers in library cite

Lukas Burget - 2008

1 paper in library cites

M. Shaik, A. Mousa, R. Schluter, Hermann Ney - 2011

1 paper in library cites

I. Bazzi - 2002

3 papers in library cite

H. Soltau, G. Saon, Brian Kingsbury - 2010

3 papers in library cite

A. Deoras, Tomas Mikolov, K. Church - 2011

2 papers in library cite

M. Mahoney - 2005

2 papers in library cite

T. Watanabe, H. Tsukada, H. Isozaki - 2009

1 paper in library cites

K. Church, R. Wa, T. Hart, Jianfeng Gao - 2007

1 paper in library cites

C. Parada, Mark Dredze, A. Sethy, A. Rastrow - 2011

1 paper in library cites

M. Kang, T. Ng, L. Nguyen - 2011

1 paper in library cites

P. Matejka - 2009

1 paper in library cites

S. Kombrink, M. Hannemann, Lukas Burget, H. Hermansky - 2010

1 paper in library cites

Cited by

7

papers in your library

Cites

13

papers in your library

Read

on June 20, 2025

Your review

Tags

Paper Aliases

No aliases