2017
Cite Score
29
AI summary
This paper presents a large-scale analysis of NMT architecture hyperparameters using 250,000 GPU hours on the WMT English to German translation task, provides insights into optimization, and releases an open-source NMT framework.
Main Contributions
Abstract
Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyper parameter search, as is commonly done with other neural network architectures, prohibitively expensive. In this work, we present the first large-scale analysis of NMT architecture hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on the standard WMT English to German translation task. Our experiments lead to novel insights and practical advice for building and extending NMT architectures. As part of this contribution, we release an open-source NMT framework that enables researchers to easily experiment with novel techniques and reproduce state of the art results.
Citation Graph
References [24]
K. He, X. Zhang, S. Ren, Jian Sun - 2016
20 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015
20 papers in library cite
G. Huang, Ze Liu, K. Weinberger, Laurens Van Der Maaten - 2017
5 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014
38 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
T. Luong, H. Pham, Christopher D. Manning - 2015
15 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016
15 papers in library cite
K. Greff, R. K. Srivastava, J. Koutn'ik, B. R. Steunebrink, Jürgen Schmidhuber - 2015
4 papers in library cite
N. Kalchbrenner, Phil Blunsom - 2013
27 papers in library cite
Yoshua Bengio - 2014
12 papers in library cite
T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014
14 papers in library cite
M. Abadi, P. Barham, Jixuan Chen, Ziru Chen, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, Geoffrey Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2016
2 papers in library cite
G. Klein, Yoon Kim, Y. Deng, J. Senellart, A. Rush - 2017
4 papers in library cite
Zhuowen Tu, Z. L. Lu, Yibo Liu, Xiaodong Liu, H. Li - 2016
4 papers in library cite
Jingren Zhou, Yue Cao, Xinpeng Wang, P. L. Li, Weixin Xu - 2016
5 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
5 papers in library cite
M. T. Luong, Christopher D. Manning - 2016
3 papers in library cite
J. Chung, Kyunghyun Cho, Yoshua Bengio - 2016
2 papers in library cite
J. Gehring, Michael Auli, D. Grangier, Yann N. Dauphin - 2016
2 papers in library cite
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Zbigniew Wojna, Yueqi Song, S. Guadarrama, K. Murphy - 2017
2 papers in library cite
Zhuowen Tu, Yibo Liu, L. Shang, Xiaodong Liu, H. Li - 2017
1 paper in library cites
Cited by
1
papers in your library
Cites
17
papers in your library
Read
on August 4, 2025
Your review
Tags
Paper Aliases
No aliases