2019
Cite Score
92
AI summary
This paper introduces RoBERTa, a robustly optimized BERT pretraining approach, using dynamic masking, FULL-SENTENCES without NSP loss, large mini-batches, and a larger byte-level BPE. It achieves state-of-the-art results on GLUE, RACE and SQUAD without multi-task finetuning.
Main Contributions
Abstract
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQUAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.
Citation Graph
References [50]
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018
39 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018
27 papers in library cite
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Devito, Zongyu Lin, A. Desmaison, L. Antiga, Adam Lerer - 2017
3 papers in library cite
Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019
11 papers in library cite
Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013
24 papers in library cite
P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016
37 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Dan Hendrycks, Kevin Gimpel - 2016
5 papers in library cite
A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018
26 papers in library cite
J. Howard, Sebastian Ruder - 2018
14 papers in library cite
Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015
25 papers in library cite
A. Williams, Nikita Nangia, S. Bowman - 2018
19 papers in library cite
P. Rajpurkar, R. Jia, Percy Liang - 2018
14 papers in library cite
Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
18 papers in library cite
A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019
15 papers in library cite
Ido Dagan, O. Glickman, Bernardo Magnini - 2005
19 papers in library cite
M. Joshi, Deli Chen, Yibo Liu, D. Weld, Luke Zettlemoyer, Omer Levy - 2019
5 papers in library cite
W. Dolan, Chris Brockett - 2005
9 papers in library cite
Hector J. Levesque, E. Davis, Leora Morgenstern - 2011
13 papers in library cite
G. Lample, Alexis Conneau - 2019
5 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017
11 papers in library cite
B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017
14 papers in library cite
Y. You, Jeffrey Li, J. Hseu, X. Song, J. Demmel, Cho Jui Hsieh - 2019
2 papers in library cite
M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, Michael Auli - 2019
4 papers in library cite
L. Dong, N. Yang, Wenyi Wang, F. Wei, Xiaodong Liu, Yuzhi Wang, Jianfeng Gao, M. Zhou, H. W. Hon - 2019
4 papers in library cite
Alex Warstadt, A. Singh, S. Bowman - 2018
8 papers in library cite
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao - 2019
6 papers in library cite
Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, F. Roesner, Yejin Choi - 2019
5 papers in library cite
K. Song, X. Tan, T. Qin, J. Lu, T. Y. Liu - 2019
5 papers in library cite
T. H. Trinh, Quoc V. Le - 2018
4 papers in library cite
L. Bentivogli, Peter Clark, Ido Dagan, D. Giampiccolo - 2009
7 papers in library cite
D. Giampiccolo, Bernardo Magnini, Ido Dagan, B. Dolan - 2007
7 papers in library cite
R. B. Haim, Ido Dagan, B. Dolan, L. Ferro, D. Giampiccolo, Bernardo Magnini, I. Szpektor - 2006
6 papers in library cite
V. Kocijan, A. M. Cretu, O. M. Camburu, Y. Yordanov, T. Lukasiewicz - 2019
4 papers in library cite
Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018
4 papers in library cite
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao - 2019
3 papers in library cite
M. Ott, S. Edunov, D. Grangier, Michael Auli - 2018
3 papers in library cite
M. Honnibal, I. Montani - 2017
3 papers in library cite
Y. S. Sun, Shijie Wang, Yiwei Li, S. Feng, X. Chen, Haowei Zhang, X. Tian, D. Zhu, H. Tian, H. Wu - 2019
2 papers in library cite
S. Iyer, N. Dandekar, K. Csernai - 2017
2 papers in library cite
W. Chan, N. Kitaev, K. Guu, M. Stern, Jakob Uszkoreit - 2019
2 papers in library cite
A. Gokaslan, V. Cohen - 2019
2 papers in library cite
E. Agirre, L. M'arquez, R. Wicentowski - 2007
2 papers in library cite
A. Baevski, S. Edunov, Yibo Liu, Luke Zettlemoyer, Michael Auli - 2019
1 paper in library cites
F. Hamborg, N. Meuschke, C. Breitinger, B. Gipp - 2017
1 paper in library cites
Cited by
17
papers in your library
Cites
34
papers in your library
Read
on November 18, 2025
Your review
Tags
Paper Aliases
No aliases