2016

Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation

Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean

citations

Cite Score

83

AI summary

This paper introduces GNMT, Google's Neural Machine Translation system, using deep LSTM networks with residual connections and wordpieces to handle rare words. The model achieves competitive results on WMT'14 benchmarks and reduces translation errors by 60% compared to Google's phrase-based system, based on human evaluation.

Main Contributions

  • Introduces GNMT, Google's Neural Machine Translation system, addressing accuracy and speed challenges in NMT.
  • Employs a deep LSTM network with 8 encoder and 8 decoder layers, using residual connections and attention mechanisms.
  • Utilizes wordpieces for handling rare words, balancing flexibility and efficiency.
  • Achieves competitive results on WMT'14 English-to-French and English-to-German benchmarks.
  • Reduces translation errors by 60% compared to Google’s phrase-based production system, based on human evaluation.

Abstract

Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference – sometimes prohibitively so in the case of very large data sets and large models. Several authors have also charged that NMT systems lack robustness, particularly when input sentences contain rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (“wordpieces”) for both input and output. This method provides a good balance between the flexibility of “character”-delimited models and the efficiency of “word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. To directly optimize the translation BLEU scores, we consider refining the models by using reinforcement learning, but we found that the improvement in the BLEU scores did not reflect in the human evaluation. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system.

Citation Graph

Loading graph...

References [44]

Sort:
Filter:

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

M. Schuster, Kuldip K. Paliwal - 1997

10 papers in library cite

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Felix A. Gers, Jürgen Schmidhuber, Fred Cummins - 2000

13 papers in library cite

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012

16 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001

16 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

M. Schuster, Kaisuke Nakajima - 2012

3 papers in library cite

Yoshua Bengio - 2014

12 papers in library cite

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

M. T. Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser - 2015

4 papers in library cite

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, John Makhoul - 2014

9 papers in library cite

M. Abadi, P. Barham, Jixuan Chen, Ziru Chen, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, Geoffrey Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2016

2 papers in library cite

S. Han, H. Mao, W. J. Dally - 2015

3 papers in library cite

S. Gupta, A. Agrawal, Karthik Gopalakrishnan, P. Narayanan - 2015

3 papers in library cite

R. K. Srivastava, K. Greff, Jürgen Schmidhuber - 2015

6 papers in library cite

Marc'aurelio Ranzato, S. Chopra, Michael Auli, Wojciech Zaremba - 2015

6 papers in library cite

Jeffrey Wu, C. Leng, Yuzhi Wang, Q. Hu, J. Cheng - 2015

1 paper in library cites

F. Li, Bing Liu - 2016

1 paper in library cites

Zhuowen Tu, Z. L. Lu, Yibo Liu, Xiaodong Liu, H. Li - 2016

4 papers in library cite

D. Dong, H. Wu, Weiran He, D. Yu, Haiming Wang - 2015

2 papers in library cite

C. G. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, Yoshua Bengio - 2016

7 papers in library cite

M. Norouzi, Samy Bengio, Navdeep Jaitly, M. Schuster, Yonghui Wu, Dale Schuurmans - 2016

2 papers in library cite

P. Koehn, F. J. Och, D. Marcu - 2003

8 papers in library cite

P. F. Brown, S. D. Pietra, Vincent J. Della Pietra, R. L. Mercer - 1993

7 papers in library cite

N. Durrani, B. Haddow, P. Koehn, K. Heafield - 2014

6 papers in library cite

S. E. Fahlman, C. Lebiere - 1989

6 papers in library cite

Jingren Zhou, Yue Cao, Xinpeng Wang, P. L. Li, Weixin Xu - 2016

5 papers in library cite

P. Brown, J. Cocke, S. D. Pietra, Vincent J. Della Pietra, Frederick Jelinek, J. Lafferty, R. Mercer, P. Roossin - 1990

3 papers in library cite

M. T. Luong, Christopher D. Manning - 2016

3 papers in library cite

C. Buck, K. Heafield, B. V. Ooyen - 2014

3 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2012

3 papers in library cite

J. Chung, Kyunghyun Cho, Yoshua Bengio - 2016

2 papers in library cite

M. R. C. Jussa, J. A. R. Fonollosa - 2016

2 papers in library cite

P. Brown, J. Cocke, S. D. Pietra, V. D. Pietra, Frederick Jelinek, R. Mercer, P. Roossin - 1988

1 paper in library cites

L. Chrisman - 1991

1 paper in library cites

S. Shen, Y. Cheng, Z. He, Weiran He, H. Wu, Maosong Sun, Yibo Liu - 2016

1 paper in library cites

Cited by

15

papers in your library

Cites

30

papers in your library

Read

on October 12, 2025

Your review

Tags

Paper Aliases

No aliases