2018

Improving Language Understanding by Generative Pre-Training

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever

citations

Cite Score

89

AI summary

This paper introduces a semi-supervised approach for language understanding tasks, using a combination of unsupervised pre-training of a language model on the BooksCorpus dataset, followed by discriminative fine-tuning using Transformer networks, achieving state-of-the-art results on 9 out of 12 tasks.

Main Contributions

  • Introduces a semi-supervised approach for language understanding tasks, leveraging unsupervised pre-training and supervised fine-tuning.
  • Utilizes a Transformer-based language model for pre-training on the BooksCorpus dataset.
  • Employs task-specific input adaptations during fine-tuning to achieve effective transfer with minimal architectural changes.
  • Demonstrates state-of-the-art results on 9 out of 12 language understanding tasks, including significant improvements on commonsense reasoning, question answering, and textual entailment.
  • Analyzes zero-shot behaviors of the pre-trained model, showcasing its acquisition of useful linguistic knowledge for downstream tasks.

Abstract

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).

Citation Graph

Loading graph...

References [71]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

Yoon Kim - 2014

8 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton - 2016

14 papers in library cite

Sutton Monro - 1951

3 papers in library cite

Quoc Le, Tomas Mikolov - 2014

13 papers in library cite

Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013

24 papers in library cite

Ronan Collobert, Jason Weston, Leon Bottou, M. Karlen, Koray Kavukcuoglu, P. P. Kuksa - 2011

23 papers in library cite

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

Dan Hendrycks, Kevin Gimpel - 2016

9 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

J. Howard, Sebastian Ruder - 2018

14 papers in library cite

Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015

25 papers in library cite

A. Williams, Nikita Nangia, S. Bowman - 2018

19 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010

12 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

23 papers in library cite

Alexis Conneau, Douwe Kiela, Holger Schwenk, L. Barrault, Antoine Bordes - 2017

11 papers in library cite

W. Dolan, Chris Brockett - 2005

9 papers in library cite

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017

11 papers in library cite

B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017

14 papers in library cite

P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, Lukasz Kaiser, Noam Shazeer - 2018

7 papers in library cite

M. E. Peters, W. Ammar, C. Bhagavatula, Russell Power - 2017

5 papers in library cite

Tim Rocktaschel, Edward Grefenstette, K. Hermann, T. Kocisky, Phil Blunsom - 2016

5 papers in library cite

Graham Neubig - 2018

1 paper in library cites

P. Ramachandran, P. J. Liu, Quoc V. Le - 2017

9 papers in library cite

S. Arora, Yiqing Liang, T. Ma - 2017

4 papers in library cite

G. Lample, L. Denoyer, Marc'aurelio Ranzato - 2017

4 papers in library cite

S. Subramanian, A. Trischler, Yoshua Bengio, C. Pal - 2018

4 papers in library cite

L. Bentivogli, Peter Clark, Ido Dagan, D. Giampiccolo - 2009

7 papers in library cite

D. Cer, M. Diab, E. Agirre, I. L. Gazpio, L. Specia - 2017

6 papers in library cite

Yacine Jernite, S. Bowman, D. Sontag - 2017

4 papers in library cite

J. Suzuki, H. Isozaki - 2008

4 papers in library cite

Deli Chen, C. Manning - 2014

3 papers in library cite

L. Logeswaran, Honglak Lee - 2018

3 papers in library cite

Yangfeng Ji, J. Eisenstein - 2013

3 papers in library cite

D. Yu, L. Deng, G. Dahl - 2010

3 papers in library cite

M. Rei - 2017

3 papers in library cite

Yi Tay, L. A. Tuan, S. C. Hui - 2017

2 papers in library cite

Scott Gray, Alec Radford, D. P. Kingma - 2017

2 papers in library cite

Ziru Chen, Haowei Zhang, X. Zhang, L. Zhao - 2018

2 papers in library cite

A. Rahman, V. Ng - 2012

2 papers in library cite

Tushar Khot, Ashish Sabharwal, Peter Clark - 2018

2 papers in library cite

Percy Liang - 2005

2 papers in library cite

X. Zhu - 2005

2 papers in library cite

S. Srinivasan, R. Arora, M. Riedl - 2018

1 paper in library cites

N. Kitaev, Dan Klein - 2018

1 paper in library cites

Alex Warstadt, A. Singh, Samuel R. Bowman - 2018

1 paper in library cites

I. Loshchilov, Frank Hutter - 2017

1 paper in library cites

Z. He, Shuming Liu, M. Li, M. Zhou, Li Zhang, Haiming Wang - 2013

1 paper in library cites

N. Mostafazadeh, M. Roth, A. Louis, N. Chambers, J. Allen - 2017

1 paper in library cites

Yi Tay, L. A. Tuan, S. C. Hui - 2018

1 paper in library cites

Y. Tsvetkov - 2017

1 paper in library cites

F. Jiao, Shijie Wang, C. H. Lee, R. Greiner, Dale Schuurmans - 2006

1 paper in library cites

K. Nigam, Andrew Mccallum, T. Mitchell - 2006

1 paper in library cites

Robert Zhang, P. Isola, A. A. Efros - 2017

1 paper in library cites

Xiaodong Liu, K. Duh, Jianfeng Gao - 2018

1 paper in library cites

S. Chaturvedi, H. Peng, Dan Roth - 2017

1 paper in library cites

Yiheng Xu, Joseph Liu, Jianfeng Gao, Y. Shen, Xiaodong Liu - 2017

1 paper in library cites

Cited by

23

papers in your library

Cites

40

papers in your library

Read

on August 4, 2025

Your review

Tags

Paper Aliases

No aliases