2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le

citations

Cite Score

87

AI summary

The paper introduces XLNet, a generalized autoregressive pretraining method, which enables learning bidirectional contexts, overcomes BERT limitations via autoregressive formulation, and integrates Transformer-XL ideas, achieving superior performance on 20 tasks.

Main Contributions

  • Introduces a generalized autoregressive pretraining method called XLNet.
  • Enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order.
  • Overcomes the limitations of BERT thanks to its autoregressive formulation.
  • Integrates ideas from Transformer-XL into pretraining.
  • Achieves state-of-the-art results on 20 tasks, outperforming BERT on question answering, natural language inference, sentiment analysis, and document ranking.

Abstract

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.

Citation Graph

Loading graph...

References [40]

Sort:
Filter:

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019

17 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Z. Lan, Mark Chen, S. Goodman, Kevin Gimpel, P. Sharma, Radu Soricut - 2019

8 papers in library cite

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

J. Howard, Sebastian Ruder - 2018

14 papers in library cite

Z. Dai, Zhilin Yang, Yining Yang, W. Cohen, J. Carbonell, Quoc Le, Ruslan Salakhutdinov - 2019

9 papers in library cite

P. Rajpurkar, R. Jia, Percy Liang - 2018

14 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017

11 papers in library cite

B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017

14 papers in library cite

R. A. Rfou, D. Choe, Noah Constant, M. Guo, Llion Jones - 2018

6 papers in library cite

A. Baevski, Michael Auli - 2018

3 papers in library cite

X. Zhang, J. Zhao, Yann Lecun - 2015

7 papers in library cite

Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao - 2019

6 papers in library cite

R. Johnson, Tong Zhang - 2017

2 papers in library cite

William Fedus, I. Goodfellow, A. M. Dai - 2018

2 papers in library cite

A. V. D. Oord, N. Kalchbrenner, Koray Kavukcuoglu - 2016

3 papers in library cite

V. Kocijan, A. M. Cretu, O. M. Camburu, Y. Yordanov, T. Lukasiewicz - 2019

4 papers in library cite

T. Miyato, A. M. Dai, I. Goodfellow - 2016

4 papers in library cite

Zhilin Yang, Z. Dai, Ruslan Salakhutdinov, W. W. Cohen - 2017

4 papers in library cite

K. Clark, M. Luong, U. Khandelwal, C. Manning, Quoc Le - 2019

3 papers in library cite

Yoshua Bengio, Samy Bengio - 2000

3 papers in library cite

J. Guo, Yu Fan, Q. Ai, W. B. Croft - 2016

2 papers in library cite

J. Callan, M. Hoy, C. Yoo, L. Zhao - 2009

2 papers in library cite

C. Crawl - 2019

2 papers in library cite

Z. Dai, Caiming Xiong, J. Callan, Ze Liu - 2018

2 papers in library cite

S. Zhang, H. Zhao, Yonghui Wu, Zhengyou Zhang, Xinyu Zhou, Xinyu Zhou - 2019

2 papers in library cite

Caiming Xiong, Z. Dai, J. Callan, Ze Liu, Russell Power - 2017

2 papers in library cite

R. Parker, D. Graff, J. Kong, K. Chen, K. Maeda - 2011

1 paper in library cites

Xuehai Pan, Ke Sun, D. Yu, H. Ji, D. Yu - 2019

1 paper in library cites

M. Germain, K. Gregor, I. Murray, Hugo Larochelle - 2015

1 paper in library cites

B. Uria, M. A. Cote, K. Gregor, I. Murray, Hugo Larochelle - 2016

1 paper in library cites

D. S. Sachan, M. Zaheer, Ruslan Salakhutdinov - 2018

1 paper in library cites

Q. Xie, Z. Dai, Eduard Hovy, M. T. Luong, Quoc V. Le - 2019

1 paper in library cites

Cited by

11

papers in your library

Cites

23

papers in your library

Read

on November 17, 2025

Your review

Tags

Paper Aliases

No aliases