2020
Cite Score
81
AI summary
This paper empirically studies scaling laws for language model performance, revealing power-law relationships with model size, dataset size, and compute, and shows that larger models are more sample-efficient, leading to optimal training with very large models on modest data.
Main Contributions
Abstract
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
Citation Graph
References [49]
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018
39 papers in library cite
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012
71 papers in library cite
Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019
17 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018
23 papers in library cite
Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019
11 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Z. Lan, Mark Chen, S. Goodman, Kevin Gimpel, P. Sharma, Radu Soricut - 2019
8 papers in library cite
Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
18 papers in library cite
A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019
15 papers in library cite
P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, Lukasz Kaiser, Noam Shazeer - 2018
7 papers in library cite
Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019
17 papers in library cite
S. Zagoruyko, N. Komodakis - 2016
5 papers in library cite
M. Belkin, D. Hsu, S. Ma, S. Mandal - 2019
2 papers in library cite
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever - 2019
5 papers in library cite
A. Veit, M. J. Wilber, S. Belongie - 2016
4 papers in library cite
J. Hestness, S. Narang, N. Ardalani, G. Diamos, Heewoo Jun, H. Kianinejad, M. Patwary, M. Ali, Yining Yang, Y. Zhou - 2017
5 papers in library cite
Noam Shazeer, M. Stern - 2018
3 papers in library cite
Noam Shazeer, Y. Cheng, Niki Parmar, D. Tran, Ashish Vaswani, P. Koanantakool, P. Hawkins, Honglak Lee, M. Hong, C. Young - 2018
4 papers in library cite
Sam McCandlish, Jared Kaplan, Dario Amodei, O. D. Team - 2018
3 papers in library cite
Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018
6 papers in library cite
J. Goodman - 2001
15 papers in library cite
J. S. Rosenfeld, A. Rosenfeld, Yonatan Belinkov, N. Shavit - 2019
5 papers in library cite
J. Hestness, N. Ardalani, G. Diamos - 2019
4 papers in library cite
M. Banko, E. Brill - 2001
3 papers in library cite
C. Crawl - 2019
2 papers in library cite
Y. Huang, Y. Cheng, Deli Chen, Honglak Lee, J. Ngiam, Quoc V. Le, Ziru Chen - 2018
2 papers in library cite
Scott Gray, Alec Radford, D. P. Kingma - 2017
2 papers in library cite
C. J. Shallue, Jaehoon Lee, J. Antognini, Jascha Sohl Dickstein, R. Frostig, George E. Dahl - 2018
2 papers in library cite
L. Wasserman - 2006
1 paper in library cites
B. Ghorbani, S. Krishnan, Y. Xiao - 2019
1 paper in library cites
G. Biau - 2012
1 paper in library cites
W. Wen, F. Yan, H. Li - 2019
1 paper in library cites
H. W. Lin, M. Tegmark - 2016
1 paper in library cites
M. Tan, Quoc V. Le - 2019
1 paper in library cites
W. Ebeling, T. Poschel - 1994
1 paper in library cites
Guy Gur Ari, D. A. Roberts, Ethan Dyer - 2018
1 paper in library cites
Y. X. Wang, D. Ramanan, M. Hebert - 2017
1 paper in library cites
M. S. Advani, Andrew M. Saxe - 2017
1 paper in library cites
S. Thurner, R. Hanel, P. Klimek - 2018
1 paper in library cites
A. Jacot, F. Gabriel, C. Hongler - 2018
1 paper in library cites
E. G. Altmann, G. Cristadoro, M. D. Esposti - 2012
1 paper in library cites
A. Komatsuzaki - 2019
1 paper in library cites
M. Geiger, A. Jacot, S. Spigler, F. Gabriel, L. Sagun, S. D'ascoli, G. Biroli, C. Hongler, M. Wyart - 2019
1 paper in library cites
V. Papyan - 2018
1 paper in library cites
G. Zhang, Lei Li, Z. Nado, James Martens, S. Sachdeva, George E. Dahl, C. J. Shallue, R. B. Grosse - 2019
1 paper in library cites
Jaehoon Lee, L. Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl Dickstein, Jeffrey Pennington - 2019
1 paper in library cites
Cited by
12
papers in your library
Cites
23
papers in your library
Read
on May 26, 2026
Your review
Tags
Paper Aliases
No aliases