2017
Cite Score
17
AI summary
This paper introduces a simple and effective technique for using unsupervised pretraining to improve seq2seq models, initializing encoder and decoder networks with pretrained language model weights and fine-tuning with labeled data, achieving state-of-the-art results on WMT English→German.
Main Contributions
Abstract
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models. Our main result is that pretraining improves the generalization of seq2seq models. We achieve state-of-the-art results on the WMT English→German task, surpassing a range of methods using both phrase-based machine translation and neural machine translation. Our method achieves a significant improvement of 1.3 BLEU from the previous best models on both WMT'14 and WMT'15 English German. We also conduct human evaluations on abstractive summarization and find that our method outperforms a purely supervised learning baseline in a statistically significant manner.
Citation Graph
References [39]
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013
32 papers in library cite
Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014
31 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014
38 papers in library cite
K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002
19 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
Chin Yew Lin - 2004
9 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013
21 papers in library cite
K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015
31 papers in library cite
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014
22 papers in library cite
G. Dahl, D. Yu, L. Deng, Alex Acero - 2012
19 papers in library cite
N. Kalchbrenner, Phil Blunsom - 2013
27 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
M. T. Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser - 2015
4 papers in library cite
Alec Radford, R. Jozefowicz, Ilya Sutskever - 2017
8 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
4 papers in library cite
H. Sak, A. W. Senior, F. Beaufays - 2014
5 papers in library cite
R. Nallapati, Bing Xiang, B. Zhou - 2016
2 papers in library cite
Ian J. Goodfellow, M. Mirza, D. Xiao, Aaron Courville, Yoshua Bengio - 2014
2 papers in library cite
W. Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals - 2015
4 papers in library cite
C. G. Gulcehre, O. Firat, K. Xu, Kyunghyun Cho, L. Barrault, H. C. Lin, F. Bougares, Holger Schwenk, Yoshua Bengio - 2015
3 papers in library cite
O. Bojar, R. Chatterjee, C. Federmann, B. Haddow, M. Huck, C. Hokamp, P. Koehn, V. Logacheva, C. Monz, M. Negri, M. Post, C. Scarton, L. Specia, M. Turchi - 2015
3 papers in library cite
S. Jean, O. Firat, Kyunghyun Cho, R. Memisevic, Yoshua Bengio - 2015
3 papers in library cite
C. Napoles, M. Gormley, B. V. Durme - 2012
2 papers in library cite
Y. Cheng, Weixin Xu, Z. He, Weiran He, H. Wu, Maosong Sun, Yibo Liu - 2016
2 papers in library cite
Barret Zoph, Deniz Yuret, J. May, K. Knight - 2016
2 papers in library cite
O. Firat, B. Sankaran, Y. A. Onaizan, F. T. Y. Vural, Kyunghyun Cho - 2016
2 papers in library cite
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, Andreas Stolcke, D. Yu, Geoffrey Zweig - 2016
1 paper in library cites
R. P. Neco, M. L. Forcada - 1997
1 paper in library cites
P. Williams, R. Sennrich, M. Nadejde, M. Huck, B. Haddow, O. Bojar - 2016
1 paper in library cites
J. Zhang, C. Zong - 2016
1 paper in library cites
S. Venugopalan, L. A. Hendricks, R. Mooney, K. Saenko - 2016
1 paper in library cites
R. B. Allen - 1987
1 paper in library cites
F. Stahlberg, E. Hasler, B. Byrne - 2016
1 paper in library cites
Jixuan Chen, P. S. Huang, X. He, Jianfeng Gao, L. Deng - 2016
1 paper in library cites
Y. Z. Zhang, W. Chan, Navdeep Jaitly - 2016
1 paper in library cites
Cited by
9
papers in your library
Cites
24
papers in your library
Read
on October 30, 2025
Your review
Tags
Paper Aliases
No aliases