2017
Cite Score
100
AI summary
This paper introduces the Transformer, a novel sequence transduction model relying solely on attention mechanisms and achieving state-of-the-art results on WMT 2014 English-to-German and English-to-French translation tasks while being more parallelizable and requiring less training time.
Main Contributions
Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Citation Graph
References [40]
K. He, X. Zhang, S. Ren, Jian Sun - 2016
20 papers in library cite
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014
20 papers in library cite
Zbigniew Wojna - 2015
5 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014
38 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014
11 papers in library cite
Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton - 2016
14 papers in library cite
T. Luong, H. Pham, Christopher D. Manning - 2015
15 papers in library cite
M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993
22 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016
15 papers in library cite
Alex Graves - 2013
27 papers in library cite
J. Gehring, Michael Auli, D. Grangier, D. Yarats, Yann Dauphin - 2017
3 papers in library cite
Noam Shazeer, Azalia Mirhoseini, K. Maziarz, A. Davis, Quoc Le, Geoffrey Hinton, Jeffrey Dean - 2017
9 papers in library cite
S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015
18 papers in library cite
Zongyu Lin, M. Feng, C. D. Santos, M. Yu, Bing Xiang, B. Zhou, Yoshua Bengio - 2017
2 papers in library cite
Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001
16 papers in library cite
R. Paulus, Caiming Xiong, Richard Socher - 2017
7 papers in library cite
Mirella Lapata - 2016
8 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
Geoffrey Hinton - 2015
9 papers in library cite
M. T. Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser - 2015
4 papers in library cite
O. Press, Lior Wolf - 2017
7 papers in library cite
D. Britz, Anna Goldie, M. Luong, Quoc Le - 2017
1 paper in library cites
Lukasz Kaiser, Ilya Sutskever - 2016
5 papers in library cite
Lukasz Kaiser, Samy Bengio - 2016
2 papers in library cite
Francois Chollet - 2016
2 papers in library cite
N. Kalchbrenner, L. Espeholt, K. Simonyan, A. V. D. Oord, Alex Graves, Koray Kavukcuoglu - 2016
5 papers in library cite
Jingren Zhou, Yue Cao, Xinpeng Wang, P. L. Li, Weixin Xu - 2016
5 papers in library cite
D. Mcclosky, E. Charniak, M. J. Johnson - 2006
4 papers in library cite
Slav Petrov, L. Barrett, R. Thibaux, Dan Klein - 2006
4 papers in library cite
O. Kuchaiev, B. Ginsburg - 2017
2 papers in library cite
M. Zhu, Y. Z. Zhang, Weizhu Chen, Mingchuan Zhang, Jiacheng Zhu - 2013
2 papers in library cite
Zhongqiang Huang, M. Harper - 2009
2 papers in library cite
A. P. Parikh, O. Tackstrom, Dipanjan Das, Jakob Uszkoreit - 2016
1 paper in library cites
C. Dyer, A. Kuncoro, M. Ballesteros, N. Smith - 2016
1 paper in library cites
Yoon Kim, C. Denton, L. Hoang, A. Rush - 2017
1 paper in library cites
Cited by
47
papers in your library
Cites
31
papers in your library
Read
on April 22, 2025
Your review
Tags
Paper Aliases
No aliases