Cite Score
33
AI summary
The paper introduces a distributed infrastructure to train language models up to 2 trillion tokens using a new smoothing method called Stupid Backoff, achieving improvements in machine translation quality as measured by the BLEU score.
Main Contributions
Abstract
This paper reports on the benefits of large-scale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.
Citation Graph
References [14]
K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002
19 papers in library cite
Jeffrey Dean, Sanjay Ghemawat - 2004
4 papers in library cite
R. Kneser, Hermann Ney - 1995
11 papers in library cite
J. Goodman - 2001
15 papers in library cite
S. F. Chen, J. Goodman - 1998
13 papers in library cite
S. Katz - 1987
11 papers in library cite
Frederick Jelinek, R. L. Mercer - 1980
8 papers in library cite
P. F. Brown, S. D. Pietra, Vincent J. Della Pietra, R. L. Mercer - 1993
7 papers in library cite
P. Koehn - 2004
2 papers in library cite
E. W. Noreen - 1989
1 paper in library cites
Y. Z. Zhang, A. S. Hildebrand, S. Vogel - 2006
1 paper in library cites
Hermann Ney, S. Ortmanns - 1999
1 paper in library cites
A. Emami, K. Papineni, J. S. Sorensen - 2007
1 paper in library cites
F. J. Och, Hermann Ney - 2004
1 paper in library cites
Cited by
3
papers in your library
Cites
3
papers in your library
Read
on March 24, 2025
Your review
Tags
Paper Aliases
No aliases