2019
Cite Score
84
AI summary
This paper introduces ALBERT, a lite BERT architecture, using factorized embedding parameterization and cross-layer parameter sharing to reduce memory consumption and improve training speed, achieving state-of-the-art results on GLUE, RACE, and SQUAD benchmarks with fewer parameters than BERT-large.
Main Contributions
Abstract
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT (Devlin et al., 2019). Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQUAD benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at https://github.com/google-research/ALBERT.
Citation Graph
References [62]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018
39 papers in library cite
Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013
32 papers in library cite
Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014
31 papers in library cite
Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019
17 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018
27 papers in library cite
Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018
23 papers in library cite
Quoc Le, Tomas Mikolov - 2014
13 papers in library cite
Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019
11 papers in library cite
Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013
24 papers in library cite
P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016
37 papers in library cite
Dan Hendrycks, Kevin Gimpel - 2016
5 papers in library cite
A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018
26 papers in library cite
J. Howard, Sebastian Ruder - 2018
14 papers in library cite
A. Williams, Nikita Nangia, S. Bowman - 2018
19 papers in library cite
Z. Dai, Zhilin Yang, Yining Yang, W. Cohen, J. Carbonell, Quoc Le, Ruslan Salakhutdinov - 2019
9 papers in library cite
John Richardson - 2018
3 papers in library cite
P. Rajpurkar, R. Jia, Percy Liang - 2018
14 papers in library cite
Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
18 papers in library cite
R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
23 papers in library cite
Ido Dagan, O. Glickman, Bernardo Magnini - 2005
19 papers in library cite
M. Shoeybi, M. Patwary, Raul Puri, P. Legresley, J. Casper, Bryan Catanzaro - 2019
3 papers in library cite
M. Joshi, Deli Chen, Yibo Liu, D. Weld, Luke Zettlemoyer, Omer Levy - 2019
5 papers in library cite
W. Dolan, Chris Brockett - 2005
9 papers in library cite
Hector J. Levesque, E. Davis, Leora Morgenstern - 2011
13 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017
11 papers in library cite
B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017
14 papers in library cite
F. Hill, Kyunghyun Cho, Anna Korhonen - 2016
12 papers in library cite
A. Baevski, Michael Auli - 2018
3 papers in library cite
E. Grave, Armand Joulin, M. Cisse, D. Grangier, Hervé Jégou - 2017
4 papers in library cite
Y. You, Jeffrey Li, J. Hseu, X. Song, J. Demmel, Cho Jui Hsieh - 2019
2 papers in library cite
Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019
17 papers in library cite
Christian Szegedy, S. Ioffe, Vincent Vanhoucke, A. A. Alemi - 2017
3 papers in library cite
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever - 2019
5 papers in library cite
Alex Warstadt, A. Singh, S. Bowman - 2018
8 papers in library cite
T. Chen, B. Xu, Chiyuan Zhang, C. Guestrin - 2016
2 papers in library cite
I. Turc, M. Chang, K. Lee, Kristina Toutanova - 2019
2 papers in library cite
Noam Shazeer, Y. Cheng, Niki Parmar, D. Tran, Ashish Vaswani, P. Koanantakool, P. Hawkins, Honglak Lee, M. Hong, C. Young - 2018
4 papers in library cite
Xiang Lisa Li, S. Chen, X. Hu, Jihan Yang - 2019
1 paper in library cites
Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018
6 papers in library cite
L. Bentivogli, Peter Clark, Ido Dagan, D. Giampiccolo - 2009
7 papers in library cite
D. Giampiccolo, Bernardo Magnini, Ido Dagan, B. Dolan - 2007
7 papers in library cite
D. Cer, M. Diab, E. Agirre, I. L. Gazpio, L. Specia - 2017
6 papers in library cite
R. B. Haim, Ido Dagan, B. Dolan, L. Ferro, D. Giampiccolo, Bernardo Magnini, I. Szpektor - 2006
6 papers in library cite
Yacine Jernite, S. Bowman, D. Sontag - 2017
4 papers in library cite
Allen Nie, E. Bennett, N. Goodman - 2017
4 papers in library cite
K. Clark, M. Luong, U. Khandelwal, C. Manning, Quoc Le - 2019
3 papers in library cite
S. Zhang, H. Zhao, Yonghui Wu, Zhengyou Zhang, Xinyu Zhou, Xinyu Zhou - 2019
2 papers in library cite
Shuai Bai, J. Zico Kolter, V. Koltun - 2019
2 papers in library cite
S. Iyer, N. Dandekar, K. Csernai - 2017
2 papers in library cite
T. Shen, T. Zhou, G. Long, J. J. Jiang, Chiyuan Zhang - 2018
1 paper in library cites
B. Grosz, A. Joshi, S. Weinstein - 1995
1 paper in library cites
J. Hobbs - 1979
1 paper in library cites
M. Halliday, R. Hasan - 1976
1 paper in library cites
L. Gong, D. He, Zhiyuan Li, T. Qin, Lisa Wang, T. Liu - 2019
1 paper in library cites
Z. Gan, Y. Pu, R. Henao, Chun-Liang Li, X. He, L. Carin - 2017
1 paper in library cites
J. Hao, Xinpeng Wang, B. Yang, Lisa Wang, J. Zhang, Zhuowen Tu - 2019
1 paper in library cites
S. Sun, Y. Cheng, Z. Gan, Joseph Liu - 2019
1 paper in library cites
Wenyi Wang, B. Bi, Minghao Yan, Chiyu Wu, Z. Bao, L. Peng, L. Si - 2019
1 paper in library cites
A. Gomez, M. Ren, R. Urtasun, R. Grosse - 2017
1 paper in library cites
Cited by
8
papers in your library
Cites
42
papers in your library
Read
on November 19, 2025
Your review
Tags
Paper Aliases
No aliases