2019

Language Models Are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever

citations

Cite Score

91

AI summary

This paper introduces GPT-2, a 1.5B parameter Transformer language model trained on WebText, a new dataset of millions of webpages, that achieves state-of-the-art zero-shot results on various NLP tasks like question answering (55 F1 on CoQA), translation, reading comprehension and summarization.

Main Contributions

  • Introduces GPT-2, a 1.5B parameter Transformer language model.
  • Introduces WebText, a new dataset of millions of webpages.
  • Demonstrates language models can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification.
  • Achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting.
  • Shows strong zero-shot results on question answering (55 F1 on CoQA).

Abstract

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset - matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Citation Graph

Loading graph...

References [78]

Sort:
Filter:

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Chelsea Finn, P. Abbeel, Sergey Levine - 2017

4 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton - 2016

14 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

K. He, X. Zhang, S. Ren, Jian Sun - 2016

4 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Rich Caruana - 1997

13 papers in library cite

J. Kirkpatrick, Razvan Pascanu, N. C. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. G. Barwinska, Demis Hassabis, C. Clopath, D. Kumaran, Raia Hadsell - 2017

5 papers in library cite

Ronan Collobert, Jason Weston, Leon Bottou, M. Karlen, Koray Kavukcuoglu, P. P. Kuksa - 2011

23 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

J. Howard, Sebastian Ruder - 2018

14 papers in library cite

Z. Dai, Zhilin Yang, Yining Yang, W. Cohen, J. Carbonell, Quoc Le, Ruslan Salakhutdinov - 2019

9 papers in library cite

Oriol Vinyals, M. Fortunato, Navdeep Jaitly - 2015

10 papers in library cite

T. Kwiatkowski, J. Palomaki, O. Rhinehart, Michael Collins, A. P. Parikh, C. Alberti, D. Epstein, Illia Polosukhin, M. Kelcey, Jacob Devlin, K. Lee, K. N. Toutanova, Llion Jones, M. W. Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov - 2019

9 papers in library cite

A. See, P. J. Liu, Christopher D. Manning - 2017

8 papers in library cite

S. Merity, Caiming Xiong, J. Bradbury, Richard Socher - 2017

12 papers in library cite

R. Nallapati, B. Zhou, C. N. D. Santos, C. G. Gulcehre, Bing Xiang - 2016

10 papers in library cite

R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

23 papers in library cite

Alexis Conneau, Douwe Kiela, Holger Schwenk, L. Barrault, Antoine Bordes - 2017

11 papers in library cite

Oriol Vinyals, Quoc V. Le - 2015

7 papers in library cite

A. Fan, Martha Lewis, Yann Dauphin - 2018

4 papers in library cite

R. Jia, Percy Liang - 2017

11 papers in library cite

Hector J. Levesque, E. Davis, Leora Morgenstern - 2011

13 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Li Fei Fei - 2015

3 papers in library cite

R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016

20 papers in library cite

Siva Reddy, Deli Chen, Christopher D. Manning - 2018

6 papers in library cite

Alexis Conneau, G. Lample, Marc'aurelio Ranzato, L. Denoyer, Hervé Jégou - 2018

3 papers in library cite

C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013

13 papers in library cite

B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017

14 papers in library cite

P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, Lukasz Kaiser, Noam Shazeer - 2018

7 papers in library cite

E. Dinan, S. Roller, K. Shuster, A. Fan, Michael Auli, Jason Weston - 2019

4 papers in library cite

M. Artetxe, G. Labaka, E. Agirre, Kyunghyun Cho - 2017

4 papers in library cite

D. Paperno, German Kruszewski, A. Lazaridou, N. Q. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, Raquel Fernandez - 2016

12 papers in library cite

Richard Socher - 2018

9 papers in library cite

F. Hill, Kyunghyun Cho, Anna Korhonen - 2016

12 papers in library cite

F. Hill, Antoine Bordes, S. Chopra, Jason Weston - 2015

14 papers in library cite

Alec Radford, R. Jozefowicz, Ilya Sutskever - 2017

8 papers in library cite

Vaishaal Shankar - 2018

2 papers in library cite

R. A. Rfou, D. Choe, Noah Constant, M. Guo, Llion Jones - 2018

6 papers in library cite

Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit - 2017

2 papers in library cite

M. A. Alcorn, Q. Li, Z. Gong, Caitlin Wang, L. Mai, W. S. Ku, Anh Nguyen - 2018

1 paper in library cites

P. Ramachandran, P. J. Liu, Quoc V. Le - 2017

9 papers in library cite

E. Grave, Armand Joulin, Nicolas Usunier - 2016

7 papers in library cite

Amarnag Subramanya - 2016

2 papers in library cite

J. E. Weston - 2016

1 paper in library cites

C. Alberti, K. Lee, Michael Collins - 2019

2 papers in library cite

J. Wieting, Douwe Kiela - 2019

1 paper in library cites

Joachim Denzler - 2020

1 paper in library cites

D. Yogatama, C. D. M. D'autume, J. Connor, T. Kocisky, M. Chrzanowski, L. Kong, A. Lazaridou, W. Ling, Longhui Yu, C. Dyer - 2019

2 papers in library cite

O. Bajgar, R. Kadlec, Jan Kleindienst - 2016

2 papers in library cite

Dario Amodei, S. Ananthanarayanan, R. Anubhai, Jinze Bai, E. Battenberg, C. Case, J. Casper, Bryan Catanzaro, Q. Cheng, Guanduo Chen - 2016

3 papers in library cite

B. M. Lake, T. D. Ullman, Joshua B. Tenenbaum, S. J. Gershman - 2017

1 paper in library cites

Omer Levy, Y. Goldberg - 2014

4 papers in library cite

J. Hestness, S. Narang, N. Ardalani, G. Diamos, Heewoo Jun, H. Kianinejad, M. Patwary, M. Ali, Yining Yang, Y. Zhou - 2017

5 papers in library cite

G. Lample, L. Denoyer, Marc'aurelio Ranzato - 2017

4 papers in library cite

Thomas Wolf, V. Sanh, J. Chaumond, C. Delangue - 2019

1 paper in library cites

T. H. Trinh, Quoc V. Le - 2018

4 papers in library cite

S. Subramanian, A. Trischler, Yoshua Bengio, C. Pal - 2018

4 papers in library cite

P. Trichelair, A. Emami, J. C. K. Cheung, A. Trischler, K. Suleman, F. Diaz - 2018

1 paper in library cites

Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018

6 papers in library cite

Frederick Jelinek, R. L. Mercer - 1980

8 papers in library cite

Sebastian Gehrmann, Y. Deng, Alexander M. Rush - 2018

2 papers in library cite

M. Artetxe, G. Labaka, E. Agirre - 2019

1 paper in library cites

M. E. Peters, D. Lecocq - 2013

1 paper in library cites

L. Hoang, Sam Wiseman, Alexander M. Rush - 2018

1 paper in library cites

C. Gong, D. He, X. Tan, T. Qin, Lisa Wang, T. Y. Liu - 2018

1 paper in library cites

Samuel R. Bowman, Ellie Pavlick, E. Grave, B. V. Durme, A. Wang, J. Hula, P. Xia, R. Pappagari, R. T. Mccoy, R. Patel - 2018

1 paper in library cites

Richard Schwartz, Maarten Sap, I. Konstas, L. Zilles, Yejin Choi, Noah A. Smith - 2017

1 paper in library cites

M. Davies - 2018

1 paper in library cites

Ilya Sutskever, R. Jozefowicz, K. Gregor, D. Rezende, T. Lillicrap, Oriol Vinyals - 2015

1 paper in library cites

Cited by

27

papers in your library

Cites

68

papers in your library

Read

on October 28, 2025

Your review

Tags

Paper Aliases

No aliases