2017

Massive Exploration of Neural Machine Translation Architectures

D. Britz, Anna Goldie, M. Luong, Quoc Le

citations

Cite Score

29

AI summary

This paper presents a large-scale analysis of NMT architecture hyperparameters using 250,000 GPU hours on the WMT English to German translation task, provides insights into optimization, and releases an open-source NMT framework.

Main Contributions

  • Provides immediately applicable insights into the optimization of Neural Machine Translation models, as well as promising directions for future research.
  • Establishes the extent to which metrics such as BLEU are influenced by random initialization and slight hyperparameter variation, helping researchers to distinguish statistically significant results from random noise.
  • Releases an open source package based on TensorFlow, specifically designed for implementing reproducible state of the art sequence-to-sequence models.

Abstract

Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyper parameter search, as is commonly done with other neural network architectures, prohibitively expensive. In this work, we present the first large-scale analysis of NMT architecture hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on the standard WMT English to German translation task. Our experiments lead to novel insights and practical advice for building and extending NMT architectures. As part of this contribution, we release an open-source NMT framework that enables researchers to easily experiment with novel techniques and reproduce state of the art results.

Citation Graph

Loading graph...

References [24]

Sort:
Filter:

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015

20 papers in library cite

G. Huang, Ze Liu, K. Weinberger, Laurens Van Der Maaten - 2017

5 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016

15 papers in library cite

K. Greff, R. K. Srivastava, J. Koutn'ik, B. R. Steunebrink, Jürgen Schmidhuber - 2015

4 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Yoshua Bengio - 2014

12 papers in library cite

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

M. Abadi, P. Barham, Jixuan Chen, Ziru Chen, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, Geoffrey Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2016

2 papers in library cite

G. Klein, Yoon Kim, Y. Deng, J. Senellart, A. Rush - 2017

4 papers in library cite

Zhuowen Tu, Z. L. Lu, Yibo Liu, Xiaodong Liu, H. Li - 2016

4 papers in library cite

Jingren Zhou, Yue Cao, Xinpeng Wang, P. L. Li, Weixin Xu - 2016

5 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

5 papers in library cite

M. T. Luong, Christopher D. Manning - 2016

3 papers in library cite

J. Chung, Kyunghyun Cho, Yoshua Bengio - 2016

2 papers in library cite

J. Gehring, Michael Auli, D. Grangier, Yann N. Dauphin - 2016

2 papers in library cite

J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Zbigniew Wojna, Yueqi Song, S. Guadarrama, K. Murphy - 2017

2 papers in library cite

Zhuowen Tu, Yibo Liu, L. Shang, Xiaodong Liu, H. Li - 2017

1 paper in library cites

Cited by

1

papers in your library

Cites

17

papers in your library

Read

on August 4, 2025

Your review

Tags

Paper Aliases

No aliases