2014

On Using Very Large Target Vocabulary for Neural Machine Translation

Yoshua Bengio

citations

Cite Score

41

AI summary

This paper introduces an approximate training algorithm based on importance sampling that allows the training of NMT models with larger target vocabulary. The results demonstrate improved translation performance and do not sacrifice speed for both training and decoding, achieving state-of-the-art results on the WMT'14 English→French translation task.

Main Contributions

  • Introduces an approximate training algorithm based on importance sampling for NMT models.
  • The approach allows training NMT models with a much larger target vocabulary.
  • The proposed algorithm effectively keeps the computational complexity during training at the level of using only a small subset of the full vocabulary.
  • Demonstrates that they can potentially achieve better translation performance using larger vocabularies, without sacrificing speed for both training and decoding.
  • Achieves state-of-the-art translation performance with single NMT models on the WMT'14 English→French translation task.

Abstract

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English→German and English→French translation tasks of WMT'14.

Citation Graph

Loading graph...

References [22]

Sort:
Filter:

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002

19 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, D. Bahdanau, Yoshua Bengio - 2014

9 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

F. Bastien, P. Lamblin, Razvan Pascanu, James Bergstra, I. Goodfellow, A. Bergeron, A. Bouchard, N. Nicolas, Yoshua Bengio - 2012

13 papers in library cite

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2008

6 papers in library cite

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

P. Koehn, F. J. Och, D. Marcu - 2003

8 papers in library cite

M. Gutmann, A. Hyvarinen - 2010

7 papers in library cite

N. Durrani, B. Haddow, P. Koehn, K. Heafield - 2014

6 papers in library cite

P. Koehn - 2010

5 papers in library cite

C. Dyer, V. Chahuneau, Noah A. Smith - 2013

4 papers in library cite

A. Mnih, Koray Kavukcuoglu - 2013

4 papers in library cite

C. Buck, K. Heafield, B. V. Ooyen - 2014

3 papers in library cite

M. L. Forcada, R. P. Neco - 1997

2 papers in library cite

M. Freitag, S. Peitz, J. Wuebker, Hermann Ney, M. Huck, R. Sennrich, N. Durrani, M. Nadejde, P. Williams, P. Koehn - 2014

1 paper in library cites

Lei Li, Xiaobao Wu, S. C. Vaillo, J. Xie, A. Way, Qian Liu - 2014

1 paper in library cites

S. Peitz, J. Wuebker, M. Freitag, Hermann Ney - 2014

1 paper in library cites

Cited by

12

papers in your library

Cites

11

papers in your library

Read

on October 14, 2025

Your review

Tags

Paper Aliases

No aliases