2019

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Z. Lan, Mark Chen, S. Goodman, Kevin Gimpel, P. Sharma, Radu Soricut

citations

Cite Score

84

AI summary

This paper introduces ALBERT, a lite BERT architecture, using factorized embedding parameterization and cross-layer parameter sharing to reduce memory consumption and improve training speed, achieving state-of-the-art results on GLUE, RACE, and SQUAD benchmarks with fewer parameters than BERT-large.

Main Contributions

  • Introduces factorized embedding parameterization to separate the hidden layer size from vocabulary embedding size.
  • Presents cross-layer parameter sharing to prevent parameter growth with network depth.
  • Achieves significant parameter reduction (18x fewer parameters than BERT-large).
  • Introduces a self-supervised loss for sentence-order prediction (SOP) that focuses on inter-sentence coherence.
  • Establishes new state-of-the-art results on GLUE, SQUAD, and RACE benchmarks.

Abstract

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT (Devlin et al., 2019). Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQUAD benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at https://github.com/google-research/ALBERT.

Citation Graph

Loading graph...

References [62]

Sort:
Filter:

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019

17 papers in library cite

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Quoc Le, Tomas Mikolov - 2014

13 papers in library cite

Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019

11 papers in library cite

Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013

24 papers in library cite

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Dan Hendrycks, Kevin Gimpel - 2016

5 papers in library cite

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

J. Howard, Sebastian Ruder - 2018

14 papers in library cite

A. Williams, Nikita Nangia, S. Bowman - 2018

19 papers in library cite

Z. Dai, Zhilin Yang, Yining Yang, W. Cohen, J. Carbonell, Quoc Le, Ruslan Salakhutdinov - 2019

9 papers in library cite

P. Rajpurkar, R. Jia, Percy Liang - 2018

14 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

23 papers in library cite

Ido Dagan, O. Glickman, Bernardo Magnini - 2005

19 papers in library cite

M. Shoeybi, M. Patwary, Raul Puri, P. Legresley, J. Casper, Bryan Catanzaro - 2019

3 papers in library cite

M. Joshi, Deli Chen, Yibo Liu, D. Weld, Luke Zettlemoyer, Omer Levy - 2019

5 papers in library cite

W. Dolan, Chris Brockett - 2005

9 papers in library cite

Hector J. Levesque, E. Davis, Leora Morgenstern - 2011

13 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Guokun Lai, Q. Xie, Haozhe Liu, Yining Yang, Eduard Hovy - 2017

11 papers in library cite

B. Mccann, J. Bradbury, Caiming Xiong, Richard Socher - 2017

14 papers in library cite

F. Hill, Kyunghyun Cho, Anna Korhonen - 2016

12 papers in library cite

A. Baevski, Michael Auli - 2018

3 papers in library cite

E. Grave, Armand Joulin, M. Cisse, D. Grangier, Hervé Jégou - 2017

4 papers in library cite

Y. You, Jeffrey Li, J. Hseu, X. Song, J. Demmel, Cho Jui Hsieh - 2019

2 papers in library cite

Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019

17 papers in library cite

Christian Szegedy, S. Ioffe, Vincent Vanhoucke, A. A. Alemi - 2017

3 papers in library cite

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever - 2019

5 papers in library cite

Alex Warstadt, A. Singh, S. Bowman - 2018

8 papers in library cite

T. Chen, B. Xu, Chiyuan Zhang, C. Guestrin - 2016

2 papers in library cite

I. Turc, M. Chang, K. Lee, Kristina Toutanova - 2019

2 papers in library cite

Noam Shazeer, Y. Cheng, Niki Parmar, D. Tran, Ashish Vaswani, P. Koanantakool, P. Hawkins, Honglak Lee, M. Hong, C. Young - 2018

4 papers in library cite

Xiang Lisa Li, S. Chen, X. Hu, Jihan Yang - 2019

1 paper in library cites

Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018

6 papers in library cite

L. Bentivogli, Peter Clark, Ido Dagan, D. Giampiccolo - 2009

7 papers in library cite

D. Giampiccolo, Bernardo Magnini, Ido Dagan, B. Dolan - 2007

7 papers in library cite

D. Cer, M. Diab, E. Agirre, I. L. Gazpio, L. Specia - 2017

6 papers in library cite

R. B. Haim, Ido Dagan, B. Dolan, L. Ferro, D. Giampiccolo, Bernardo Magnini, I. Szpektor - 2006

6 papers in library cite

Yacine Jernite, S. Bowman, D. Sontag - 2017

4 papers in library cite

Allen Nie, E. Bennett, N. Goodman - 2017

4 papers in library cite

K. Clark, M. Luong, U. Khandelwal, C. Manning, Quoc Le - 2019

3 papers in library cite

S. Zhang, H. Zhao, Yonghui Wu, Zhengyou Zhang, Xinyu Zhou, Xinyu Zhou - 2019

2 papers in library cite

Shuai Bai, J. Zico Kolter, V. Koltun - 2019

2 papers in library cite

S. Iyer, N. Dandekar, K. Csernai - 2017

2 papers in library cite

T. Shen, T. Zhou, G. Long, J. J. Jiang, Chiyuan Zhang - 2018

1 paper in library cites

B. Grosz, A. Joshi, S. Weinstein - 1995

1 paper in library cites

J. Hobbs - 1979

1 paper in library cites

M. Halliday, R. Hasan - 1976

1 paper in library cites

L. Gong, D. He, Zhiyuan Li, T. Qin, Lisa Wang, T. Liu - 2019

1 paper in library cites

Z. Gan, Y. Pu, R. Henao, Chun-Liang Li, X. He, L. Carin - 2017

1 paper in library cites

J. Hao, Xinpeng Wang, B. Yang, Lisa Wang, J. Zhang, Zhuowen Tu - 2019

1 paper in library cites

S. Sun, Y. Cheng, Z. Gan, Joseph Liu - 2019

1 paper in library cites

Wenyi Wang, B. Bi, Minghao Yan, Chiyu Wu, Z. Bao, L. Peng, L. Si - 2019

1 paper in library cites

A. Gomez, M. Ren, R. Urtasun, R. Grosse - 2017

1 paper in library cites

Cited by

8

papers in your library

Cites

42

papers in your library

Read

on November 19, 2025

Your review

Tags

Paper Aliases

No aliases