2014
Cite Score
41
AI summary
This paper introduces an approximate training algorithm based on importance sampling that allows the training of NMT models with larger target vocabulary. The results demonstrate improved translation performance and do not sacrifice speed for both training and decoding, achieving state-of-the-art results on the WMT'14 English→French translation task.
Main Contributions
Abstract
Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English→German and English→French translation tasks of WMT'14.
Citation Graph
References [22]
Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013
26 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014
38 papers in library cite
K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002
19 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, D. Bahdanau, Yoshua Bengio - 2014
9 papers in library cite
N. Kalchbrenner, Phil Blunsom - 2013
27 papers in library cite
F. Bastien, P. Lamblin, Razvan Pascanu, James Bergstra, I. Goodfellow, A. Bergeron, A. Bouchard, N. Nicolas, Yoshua Bengio - 2012
13 papers in library cite
T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014
14 papers in library cite
Yoshua Bengio, Jean Sebastien Senecal - 2008
6 papers in library cite
James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010
22 papers in library cite
P. Koehn, F. J. Och, D. Marcu - 2003
8 papers in library cite
M. Gutmann, A. Hyvarinen - 2010
7 papers in library cite
N. Durrani, B. Haddow, P. Koehn, K. Heafield - 2014
6 papers in library cite
P. Koehn - 2010
5 papers in library cite
C. Dyer, V. Chahuneau, Noah A. Smith - 2013
4 papers in library cite
A. Mnih, Koray Kavukcuoglu - 2013
4 papers in library cite
C. Buck, K. Heafield, B. V. Ooyen - 2014
3 papers in library cite
M. L. Forcada, R. P. Neco - 1997
2 papers in library cite
M. Freitag, S. Peitz, J. Wuebker, Hermann Ney, M. Huck, R. Sennrich, N. Durrani, M. Nadejde, P. Williams, P. Koehn - 2014
1 paper in library cites
Lei Li, Xiaobao Wu, S. C. Vaillo, J. Xie, A. Way, Qian Liu - 2014
1 paper in library cites
S. Peitz, J. Wuebker, M. Freitag, Hermann Ney - 2014
1 paper in library cites
Cited by
12
papers in your library
Cites
11
papers in your library
Read
on October 14, 2025
Your review
Tags
Paper Aliases
No aliases