2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov

citations

Cite Score

92

AI summary

This paper introduces RoBERTa, a robustly optimized BERT pretraining approach, using dynamic masking, FULL-SENTENCES without NSP loss, large mini-batches, and a larger byte-level BPE. It achieves state-of-the-art results on GLUE, RACE and SQUAD without multi-task finetuning.

Main Contributions

  • Introduces a set of important BERT design choices and training strategies.
  • Introduces a novel dataset, CC-NEWS.
  • Confirms that using more data for pre-training further improves performance on downstream tasks.
  • Shows that masked language model pretraining, under the right design choices, is competitive with all other recently published methods.
  • Releases the model, pretraining and fine-tuning code implemented in PyTorch.

Abstract

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQUAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

Citation Graph

Loading graph...

References [50]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Devito, Zongyu Lin, A. Desmaison, L. Antiga, Adam Lerer - 2017

3 papers in library cite

Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019

11 papers in library cite

Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013

24 papers in library cite

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Dan Hendrycks, Kevin Gimpel - 2016

5 papers in library cite

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

J. Howard, Sebastian Ruder - 2018

14 papers in library cite

Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015

25 papers in library cite

A. Williams, Nikita Nangia, S. Bowman - 2018

19 papers in library cite

P. Rajpurkar, R. Jia, Percy Liang - 2018

14 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019

15 papers in library cite

Ido Dagan, O. Glickman, Bernardo Magnini - 2005

19 papers in library cite

M. Joshi, Deli Chen, Yibo Liu, D. Weld, Luke Zettlemoyer, Omer Levy - 2019

5 papers in library cite

W. Dolan, Chris Brockett - 2005

9 papers in library cite

Hector J. Levesque, E. Davis, Leora Morgenstern - 2011

13 papers in library cite

G. Lample, Alexis Conneau - 2019

5 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017

11 papers in library cite

B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017

14 papers in library cite

Y. You, Jeffrey Li, J. Hseu, X. Song, J. Demmel, Cho Jui Hsieh - 2019

2 papers in library cite

M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, Michael Auli - 2019

4 papers in library cite

L. Dong, N. Yang, Wenyi Wang, F. Wei, Xiaodong Liu, Yuzhi Wang, Jianfeng Gao, M. Zhou, H. W. Hon - 2019

4 papers in library cite

Alex Warstadt, A. Singh, S. Bowman - 2018

8 papers in library cite

Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao - 2019

6 papers in library cite

Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, F. Roesner, Yejin Choi - 2019

5 papers in library cite

K. Song, X. Tan, T. Qin, J. Lu, T. Y. Liu - 2019

5 papers in library cite

T. H. Trinh, Quoc V. Le - 2018

4 papers in library cite

L. Bentivogli, Peter Clark, Ido Dagan, D. Giampiccolo - 2009

7 papers in library cite

D. Giampiccolo, Bernardo Magnini, Ido Dagan, B. Dolan - 2007

7 papers in library cite

R. B. Haim, Ido Dagan, B. Dolan, L. Ferro, D. Giampiccolo, Bernardo Magnini, I. Szpektor - 2006

6 papers in library cite

V. Kocijan, A. M. Cretu, O. M. Camburu, Y. Yordanov, T. Lukasiewicz - 2019

4 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

4 papers in library cite

Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao - 2019

3 papers in library cite

M. Ott, S. Edunov, D. Grangier, Michael Auli - 2018

3 papers in library cite

Y. S. Sun, Shijie Wang, Yiwei Li, S. Feng, X. Chen, Haowei Zhang, X. Tian, D. Zhu, H. Tian, H. Wu - 2019

2 papers in library cite

S. Iyer, N. Dandekar, K. Csernai - 2017

2 papers in library cite

W. Chan, N. Kitaev, K. Guu, M. Stern, Jakob Uszkoreit - 2019

2 papers in library cite

A. Gokaslan, V. Cohen - 2019

2 papers in library cite

E. Agirre, L. M'arquez, R. Wicentowski - 2007

2 papers in library cite

S. Nagel - 2016

1 paper in library cites

A. Baevski, S. Edunov, Yibo Liu, Luke Zettlemoyer, Michael Auli - 2019

1 paper in library cites

F. Hamborg, N. Meuschke, C. Breitinger, B. Gipp - 2017

1 paper in library cites

Cited by

17

papers in your library

Cites

34

papers in your library

Read

on November 18, 2025

Your review

Tags

Paper Aliases

No aliases