2018
Cite Score
89
AI summary
This paper introduces a semi-supervised approach for language understanding tasks, using a combination of unsupervised pre-training of a language model on the BooksCorpus dataset, followed by discriminative fine-tuning using Transformer networks, achieving state-of-the-art results on 9 out of 12 tasks.
Main Contributions
Abstract
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
Citation Graph
References [71]
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013
32 papers in library cite
Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014
31 papers in library cite
Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006
43 papers in library cite
Yoon Kim - 2014
8 papers in library cite
M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018
27 papers in library cite
Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton - 2016
14 papers in library cite
Sutton Monro - 1951
3 papers in library cite
Quoc Le, Tomas Mikolov - 2014
13 papers in library cite
Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013
24 papers in library cite
Ronan Collobert, Jason Weston, Leon Bottou, M. Karlen, Koray Kavukcuoglu, P. P. Kuksa - 2011
23 papers in library cite
P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016
37 papers in library cite
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008
25 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018
26 papers in library cite
Dan Hendrycks, Kevin Gimpel - 2016
9 papers in library cite
Ronan Collobert, Jason Weston - 2008
32 papers in library cite
Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006
33 papers in library cite
J. Howard, Sebastian Ruder - 2018
14 papers in library cite
Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015
25 papers in library cite
A. Williams, Nikita Nangia, S. Bowman - 2018
19 papers in library cite
K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015
31 papers in library cite
Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010
12 papers in library cite
Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
18 papers in library cite
R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
23 papers in library cite
Alexis Conneau, Douwe Kiela, Holger Schwenk, L. Barrault, Antoine Bordes - 2017
11 papers in library cite
W. Dolan, Chris Brockett - 2005
9 papers in library cite
Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006
20 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017
11 papers in library cite
B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017
14 papers in library cite
P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, Lukasz Kaiser, Noam Shazeer - 2018
7 papers in library cite
M. E. Peters, W. Ammar, C. Bhagavatula, Russell Power - 2017
5 papers in library cite
Tim Rocktaschel, Edward Grefenstette, K. Hermann, T. Kocisky, Phil Blunsom - 2016
5 papers in library cite
Graham Neubig - 2018
1 paper in library cites
P. Ramachandran, P. J. Liu, Quoc V. Le - 2017
9 papers in library cite
S. Arora, Yiqing Liang, T. Ma - 2017
4 papers in library cite
G. Lample, L. Denoyer, Marc'aurelio Ranzato - 2017
4 papers in library cite
S. Subramanian, A. Trischler, Yoshua Bengio, C. Pal - 2018
4 papers in library cite
L. Bentivogli, Peter Clark, Ido Dagan, D. Giampiccolo - 2009
7 papers in library cite
D. Cer, M. Diab, E. Agirre, I. L. Gazpio, L. Specia - 2017
6 papers in library cite
Yacine Jernite, S. Bowman, D. Sontag - 2017
4 papers in library cite
J. Suzuki, H. Isozaki - 2008
4 papers in library cite
Deli Chen, C. Manning - 2014
3 papers in library cite
L. Logeswaran, Honglak Lee - 2018
3 papers in library cite
Yangfeng Ji, J. Eisenstein - 2013
3 papers in library cite
D. Yu, L. Deng, G. Dahl - 2010
3 papers in library cite
M. Rei - 2017
3 papers in library cite
Yi Tay, L. A. Tuan, S. C. Hui - 2017
2 papers in library cite
Scott Gray, Alec Radford, D. P. Kingma - 2017
2 papers in library cite
Ziru Chen, Haowei Zhang, X. Zhang, L. Zhao - 2018
2 papers in library cite
A. Rahman, V. Ng - 2012
2 papers in library cite
Tushar Khot, Ashish Sabharwal, Peter Clark - 2018
2 papers in library cite
Percy Liang - 2005
2 papers in library cite
X. Zhu - 2005
2 papers in library cite
S. Srinivasan, R. Arora, M. Riedl - 2018
1 paper in library cites
N. Kitaev, Dan Klein - 2018
1 paper in library cites
Alex Warstadt, A. Singh, Samuel R. Bowman - 2018
1 paper in library cites
J. Tian, Zijian Zhou, M. Lan, Yonghui Wu - 2017
1 paper in library cites
I. Loshchilov, Frank Hutter - 2017
1 paper in library cites
Z. He, Shuming Liu, M. Li, M. Zhou, Li Zhang, Haiming Wang - 2013
1 paper in library cites
N. Mostafazadeh, M. Roth, A. Louis, N. Chambers, J. Allen - 2017
1 paper in library cites
Yi Tay, L. A. Tuan, S. C. Hui - 2018
1 paper in library cites
Y. Tsvetkov - 2017
1 paper in library cites
F. Jiao, Shijie Wang, C. H. Lee, R. Greiner, Dale Schuurmans - 2006
1 paper in library cites
K. Nigam, Andrew Mccallum, T. Mitchell - 2006
1 paper in library cites
Robert Zhang, P. Isola, A. A. Efros - 2017
1 paper in library cites
Xiaodong Liu, K. Duh, Jianfeng Gao - 2018
1 paper in library cites
S. Chaturvedi, H. Peng, Dan Roth - 2017
1 paper in library cites
Yiheng Xu, Joseph Liu, Jianfeng Gao, Y. Shen, Xiaodong Liu - 2017
1 paper in library cites
Cited by
23
papers in your library
Cites
40
papers in your library
Read
on August 4, 2025
Your review
Tags
Paper Aliases
No aliases