2020

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

citations

Cite Score

81

AI summary

This paper empirically studies scaling laws for language model performance, revealing power-law relationships with model size, dataset size, and compute, and shows that larger models are more sample-efficient, leading to optimal training with very large models on modest data.

Main Contributions

  • Identified empirical power-law scaling for language model performance with model size, dataset size, and compute, spanning over seven orders of magnitude.
  • Showed that architectural details (network width/depth) have minimal impact on performance within a wide range.
  • Derived simple equations governing overfitting dependence on model/dataset size and training speed dependence on model size.
  • Determined that optimal compute allocation involves training very large models on modest data and stopping significantly before convergence due to larger models being more sample-efficient.
  • Proposed a predictive framework for language model performance based on these scaling laws.

Abstract

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

Citation Graph

Loading graph...

References [49]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019

17 papers in library cite

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019

11 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Z. Lan, Mark Chen, S. Goodman, Kevin Gimpel, P. Sharma, Radu Soricut - 2019

8 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019

15 papers in library cite

P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, Lukasz Kaiser, Noam Shazeer - 2018

7 papers in library cite

Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019

17 papers in library cite

S. Zagoruyko, N. Komodakis - 2016

5 papers in library cite

M. Belkin, D. Hsu, S. Ma, S. Mandal - 2019

2 papers in library cite

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever - 2019

5 papers in library cite

A. Veit, M. J. Wilber, S. Belongie - 2016

4 papers in library cite

J. Hestness, S. Narang, N. Ardalani, G. Diamos, Heewoo Jun, H. Kianinejad, M. Patwary, M. Ali, Yining Yang, Y. Zhou - 2017

5 papers in library cite

Noam Shazeer, M. Stern - 2018

3 papers in library cite

Noam Shazeer, Y. Cheng, Niki Parmar, D. Tran, Ashish Vaswani, P. Koanantakool, P. Hawkins, Honglak Lee, M. Hong, C. Young - 2018

4 papers in library cite

Sam McCandlish, Jared Kaplan, Dario Amodei, O. D. Team - 2018

3 papers in library cite

Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018

6 papers in library cite

J. Goodman - 2001

15 papers in library cite

J. S. Rosenfeld, A. Rosenfeld, Yonatan Belinkov, N. Shavit - 2019

5 papers in library cite

J. Hestness, N. Ardalani, G. Diamos - 2019

4 papers in library cite

M. Banko, E. Brill - 2001

3 papers in library cite

C. Crawl - 2019

2 papers in library cite

Y. Huang, Y. Cheng, Deli Chen, Honglak Lee, J. Ngiam, Quoc V. Le, Ziru Chen - 2018

2 papers in library cite

Scott Gray, Alec Radford, D. P. Kingma - 2017

2 papers in library cite

C. J. Shallue, Jaehoon Lee, J. Antognini, Jascha Sohl Dickstein, R. Frostig, George E. Dahl - 2018

2 papers in library cite

L. Wasserman - 2006

1 paper in library cites

B. Ghorbani, S. Krishnan, Y. Xiao - 2019

1 paper in library cites

G. Biau - 2012

1 paper in library cites

W. Wen, F. Yan, H. Li - 2019

1 paper in library cites

H. W. Lin, M. Tegmark - 2016

1 paper in library cites

M. Tan, Quoc V. Le - 2019

1 paper in library cites

W. Ebeling, T. Poschel - 1994

1 paper in library cites

Guy Gur Ari, D. A. Roberts, Ethan Dyer - 2018

1 paper in library cites

Y. X. Wang, D. Ramanan, M. Hebert - 2017

1 paper in library cites

M. S. Advani, Andrew M. Saxe - 2017

1 paper in library cites

S. Thurner, R. Hanel, P. Klimek - 2018

1 paper in library cites

A. Jacot, F. Gabriel, C. Hongler - 2018

1 paper in library cites

E. G. Altmann, G. Cristadoro, M. D. Esposti - 2012

1 paper in library cites

A. Komatsuzaki - 2019

1 paper in library cites

M. Geiger, A. Jacot, S. Spigler, F. Gabriel, L. Sagun, S. D'ascoli, G. Biroli, C. Hongler, M. Wyart - 2019

1 paper in library cites

V. Papyan - 2018

1 paper in library cites

G. Zhang, Lei Li, Z. Nado, James Martens, S. Sachdeva, George E. Dahl, C. J. Shallue, R. B. Grosse - 2019

1 paper in library cites

Jaehoon Lee, L. Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl Dickstein, Jeffrey Pennington - 2019

1 paper in library cites

Cited by

12

papers in your library

Cites

23

papers in your library

Read

on May 26, 2026

Your review

Tags

Paper Aliases

No aliases