2017

Unsupervised Pretraining for Sequence to Sequence Learning

P. Ramachandran, P. J. Liu, Quoc V. Le

citations

Cite Score

17

AI summary

This paper introduces a simple and effective technique for using unsupervised pretraining to improve seq2seq models, initializing encoder and decoder networks with pretrained language model weights and fine-tuning with labeled data, achieving state-of-the-art results on WMT English→German.

Main Contributions

  • Proposes a simple and effective technique for using unsupervised pretraining to improve seq2seq models.
  • Initializes both encoder and decoder networks with pretrained weights of two language models.
  • Jointly trains the seq2seq objective with the language modeling objectives to prevent overfitting.
  • Achieves state-of-the-art results on the WMT English→German task.
  • Finds that pretraining improves the generalization of seq2seq models.

Abstract

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models. Our main result is that pretraining improves the generalization of seq2seq models. We achieve state-of-the-art results on the WMT English→German task, surpassing a range of methods using both phrase-based machine translation and neural machine translation. Our method achieves a significant improvement of 1.3 BLEU from the previous best models on both WMT'14 and WMT'15 English German. We also conduct human evaluations on abstractive summarization and find that our method outperforms a purely supervised learning baseline in a statistically significant manner.

Citation Graph

Loading graph...

References [39]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002

19 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Chin Yew Lin - 2004

9 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016

20 papers in library cite

M. T. Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser - 2015

4 papers in library cite

Alec Radford, R. Jozefowicz, Ilya Sutskever - 2017

8 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

4 papers in library cite

H. Sak, A. W. Senior, F. Beaufays - 2014

5 papers in library cite

R. Nallapati, Bing Xiang, B. Zhou - 2016

2 papers in library cite

Ian J. Goodfellow, M. Mirza, D. Xiao, Aaron Courville, Yoshua Bengio - 2014

2 papers in library cite

W. Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals - 2015

4 papers in library cite

C. G. Gulcehre, O. Firat, K. Xu, Kyunghyun Cho, L. Barrault, H. C. Lin, F. Bougares, Holger Schwenk, Yoshua Bengio - 2015

3 papers in library cite

O. Bojar, R. Chatterjee, C. Federmann, B. Haddow, M. Huck, C. Hokamp, P. Koehn, V. Logacheva, C. Monz, M. Negri, M. Post, C. Scarton, L. Specia, M. Turchi - 2015

3 papers in library cite

S. Jean, O. Firat, Kyunghyun Cho, R. Memisevic, Yoshua Bengio - 2015

3 papers in library cite

C. Napoles, M. Gormley, B. V. Durme - 2012

2 papers in library cite

Y. Cheng, Weixin Xu, Z. He, Weiran He, H. Wu, Maosong Sun, Yibo Liu - 2016

2 papers in library cite

Barret Zoph, Deniz Yuret, J. May, K. Knight - 2016

2 papers in library cite

O. Firat, B. Sankaran, Y. A. Onaizan, F. T. Y. Vural, Kyunghyun Cho - 2016

2 papers in library cite

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, Andreas Stolcke, D. Yu, Geoffrey Zweig - 2016

1 paper in library cites

R. P. Neco, M. L. Forcada - 1997

1 paper in library cites

P. Williams, R. Sennrich, M. Nadejde, M. Huck, B. Haddow, O. Bojar - 2016

1 paper in library cites

J. Zhang, C. Zong - 2016

1 paper in library cites

S. Venugopalan, L. A. Hendricks, R. Mooney, K. Saenko - 2016

1 paper in library cites

R. B. Allen - 1987

1 paper in library cites

F. Stahlberg, E. Hasler, B. Byrne - 2016

1 paper in library cites

Jixuan Chen, P. S. Huang, X. He, Jianfeng Gao, L. Deng - 2016

1 paper in library cites

Y. Z. Zhang, W. Chan, Navdeep Jaitly - 2016

1 paper in library cites

Cited by

9

papers in your library

Cites

24

papers in your library

Read

on October 30, 2025

Your review

Tags

Paper Aliases

No aliases