2025

Pre-Training Under Infinite compute

Konwoo Kim, Suhas Kotha, Percy Liang, Tatsunori Hashimoto

citations

Cite Score

1

AI summary

This paper investigates pre-training language models under data constraints and infinite compute, proposing regularization and ensembling techniques to achieve significantly lower loss asymptotes and improved data efficiency, with distillation enabling smaller models to retain most benefits and achieve better performance on downstream benchmarks.

Main Contributions

  • Demonstrates that existing data-constrained approaches (increasing epoch count and parameter count) eventually overfit.
  • Introduces proper regularization, finding an optimal weight decay 30x larger than standard practice, leading to monotonically decreasing loss following a power law.
  • Shows that ensembling independently trained models achieves a significantly lower loss asymptote than regularized recipes.
  • Develops an intervention combining epoching, regularization, parameter scaling, and ensemble scaling, achieving a 5.17x data efficiency improvement at 200M tokens.
  • Achieves a 9% improvement for pre-training evaluations and a 17.5x data efficiency improvement on continued pre-training for math tasks.

Abstract

Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay is 30x larger than standard practice. Since our regularized recipe monotonically decreases loss following a simple power law in parameter count, we estimate its best possible performance via the asymptote of its scaling law rather than the performance at a fixed compute budget. We then identify that ensembling independently trained models achieves a significantly lower loss asymptote than the regularized recipe. Our best intervention combining epoching, regularization, parameter scaling, and ensemble scaling achieves an asymptote at 200M tokens using 5.17x less data than our baseline, and our data scaling laws predict that this improvement persists at higher token budgets. We find that our data efficiency gains can be realized at much smaller parameter counts as we can distill an ensemble into a student model that is 8x smaller and retains 83% of the ensembling benefit. Finally, our interventions designed for validation loss generalize to downstream benchmarks, achieving a 9% improvement for pre-training evals and a 17.5x data efficiency improvement over continued pre-training on math mid-training data. Our results show that simple algorithmic improvements can enable significantly more data-efficient pre-training in a compute-rich future.

Citation Graph

Loading graph...

References [107]

Sort:
Filter:

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020

21 papers in library cite

Jeffrey Dean - 2015

6 papers in library cite

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

Thomas Wolf - 2019

6 papers in library cite

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021

7 papers in library cite

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei - 2020

12 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021

8 papers in library cite

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord - 2018

5 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Yarin Gal - 2015

9 papers in library cite

E. Grave, Armand Joulin, Nicolas Usunier - 2016

7 papers in library cite

H. V. Hasselt, A. Guez, D. Silver - 2016

2 papers in library cite

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. A. Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, A. Yang, A. Fan, A. G. A. P. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, Abhishek Rao, A. Zhang, A. Rodriguez, A. Gregerson, A. Spataru, B. Roziere, B. Biron, B. Tang, B. Chern, C. Caucheteux, C. Nayak, C. Bi, C. Marra, C. Mcconnell, C. Keller, C. Touret, Chiyu Wu, C. Wong, C. C. Ferrer, C. Nikolaidis, D. Allonsius, Dawn Song, D. Pintz, D. Livshits, D. Wyatt, D. Esiobu, D. Choudhury, D. Mahajan, D. G. Olano, D. Perino, Dieuwke Hupkes, E. Lakomkin, E. Albadawy, E. Lobanova, E. Dinan, E. M. Smith, F. Radenovic, F. Guzman, F. Zhang, Gabriel Synnaeve, G. Lee, G. L. Anderson, G. Thattai, G. Nail, G. Mialon, G. Pang, G. Cucurell, H. Nguyen, H. Korevaar, Hu Xu, Hugo Touvron, I. Zarov, I. A. Ibarra, I. Kloumann, Ishan Misra, I. Evtimov, J. Zhang, J. Copet, Jaehoon Lee, J. Geffert, J. Vranes, J. Park, J. Mahadeokar, J. Shah, J. V. D. Linde, J. Billock, J. H. Hong, Jaehoon Lee, J. Fu, J. Chi, J. Huang, Joseph Liu, J. Wang, J. Yu, J. Bitton, J. Spisak, J. Park, J. Rocca, J. Johnstun, J. Saxe, J. Jia, K. V. Alwala, K. Prasad, K. Upasani, K. Plawiak, K. Li, K. Heafield, K. Stone, K. E. Arini, K. Iyer, K. Malik, K. Chiu, K. Bhalla, K. Lakhotia, L. R. Yeary, Laurens Van Der Maaten, L. C. Chen, L. Tan, L. Jenkins, L. Martin, L. Madaan, L. Malo, L. Blecher, L. Landzaat, L. D. Oliveira, M. Muzzi, M. Pasupuleti, M. Singh, M. Paluri, M. Kardas, M. Tsimpoukelli, M. Oldham, M. Rita, M. Pavlova, M. Kambadur, Martha Lewis, M. Si, M. K. Singh, M. Hassan, N. Goyal, N. Torabi, N. Bashlykov, N. Bogoychev, N. Chatterji, N. Zhang, O. Duchenne, O. Celebi, P. Alrassy, Peizhao Zhang, P. L. Li, P. Vasic, Paul Weng, P. Bhargava, P. Dubal, P. Krishnan, P. S. Koura, P. Xu, Q. He, Q. Dong, R. Srinivasan, R. Ganapathy, R. Calderer, R. S. Cabral, R. Stojnic, Roberta Raileanu, R. Maheswari, R. Girdhar, R. Patel, R. Sauvestre, R. Polidoro, R. Sumbaly, R. Taylor, R. Silva, R. Hou, R. Wang, S. Hosseini, S. Chennabasappa, Shivalika Singh, S. Bell, S. S. Kim, S. Edunov, S. Nie, S. Narang, S. Raparthy, S. Shen, S. Wan, S. Bhosale, S. Zhang, S. Vandenhende, S. Batra, S. Whitman, S. Sootla, S. Collot, Suchin Gururangan, S. Borodinsky, T. Herman, T. Fowler, T. Sheasha, T. Georgiou, T. Scialom, T. Speckbacher, T. Mihaylov, T. Xiao, U. Karn, V. Goswami, V. Gupta, V. Ramanathan, V. Kerkez, V. Gonguet, V. Do, V. Vogeti, V. Albiero, V. Petrovic, W. Chu, W. Xiong, W. Fu, W. Meers, X. Martinet, Xinpeng Wang, Xinpeng Wang, X. E. Tan, X. Xia, X. Xie, X. Jia, Xinpeng Wang, Y. Goldschlag, Y. Gaur, Y. Babaei, Y. Wen, Yueqi Song, Y. Z. Zhang, Yiwei Li, Y. Mao, Z. D. Coudert, Zhicheng Yan, Ziru Chen, Z. Papakipos, A. Singh, Aarohi Srivastava, A. Jain, A. Kelsey, A. Shajnfeld, A. Gangidi, A. Victoria, A. Goldstand, A. Menon, Archit Sharma, A. Boesenberg, A. Baevski, A. Feinstein, A. Kallet, A. Sangani, A. Teo, A. Yunus, A. Lupu, A. Alvarado, A. Caples, Albert Gu, A. Ho, A. Poulton, A. Ryan, A. Ramchandani, A. Dong, A. Franco, A. G. A. P. Goyal, A. Saraf, A. Chowdhury, A. Gabriel, A. Bharambe, A. Eisenman, A. Yazdan, B. James, B. Maurer, B. Leonhardi, B. Huang, B. Loyd, B. D. Paola, B. Paranjape, Bing Liu, Bo Wu, B. Ni, B. Hancock, B. Wasti, B. Spence, B. Stojkovic, B. Gamido, B. Montalvo, C. Parker, C. Burton, C. Mejia, C. L. Liu, Caitlin Wang, Christina Kim, Chang Zhou, Changran Hu, C. H. Chu, Carrie Cai, C. Tindal, C. Feichtenhofer, C. Gao, D. Civin, D. Beaty, D. Kreymer, Dustin Li, D. Adkins, D. X. Xu, D. Testuggine, D. David, D. Parikh, D. Liskovich, D. Foss, D. Wang, D. Le, D. Holland, E. Dowling, E. Jamil, E. Montgomery, E. Presani, E. Hahn, E. Wood, E. T. Le, E. Brinkman, E. Arcaute, E. Dunbar, E. Smothers, F. Sun, F. Kreuk, F. Tian, F. Kokkinos, F. Ozgenel, F. Caggioni, F. Kanayet, F. Seide, G. M. Florez, G. Schwarz, G. Badeer, G. Swee, G. Halpern, G. Herman, G. Sizov, G. Zhang, G. Lakshminarayanan, H. Inan, H. Shojanazeri, H. Zou, Haiming Wang, H. Zha, H. Habeeb, H. Rudolph, H. Suk, H. Aspegren, H. Goldman, H. Zhan, I. Damlaj, I. Molybog, I. Tufanov, I. Leontiadis, I. E. Veliche, I. Gat, J. Weissman, J. Geboski, J. Kohli, J. Lam, J. Asher, J. B. Gaya, J. Marcus, Jie Tang, J. Chan, J. Zhen, J. Reizenstein, J. Teboul, J. Zhong, J. Jin, Jihan Yang, J. Cummings, J. Carvill, J. Shepard, J. Mcphie, J. Torres, J. Ginsburg, J. Wang, K. Wu, U. K. Hou, K. Saxena, K. Khandelwal, K. Zand, K. Matosich, K. Veeraraghavan, K. Michelena, K. Li, K. Jagadeesh, K. H. Huang, K. Chawla, K. H. Huang, L. C. Chen, L. Garg, A. Lavender, L. Silva, L. Bell, Li Zhang, L. Guo, Longhui Yu, L. Moshkovich, L. Wehrstedt, M. Khabsa, M. Avalani, M. Bhatt, M. Mankus, M. Hasson, M. Lennie, M. Reso, M. Groshev, M. Naumov, M. Lathi, M. Keneally, Mickel Liu, M. L. Seltzer, Michal Valko, M. Restrepo, M. Patel, M. Vyatskov, M. Samvelyan, M. Clark, M. Macey, Mingliang Wang, M. J. Hermoso, M. Metanat, M. Rastegari, Mohit Bansal, N. Santhanam, N. Parks, N. White, N. Bawa, N. Singhal, N. Egebo, Nicolas Usunier, N. Mehta, N. P. Laptev, N. Dong, Newton Cheng, O. Chernoguz, O. Hart, O. Salpekar, O. Kalinli, P. Kent, P. Parekh, P. Saab, P. Balaji, P. Rittner, P. Bontrager, P. Roux, Piotr Dollar, P. Zvyagina, P. Ratanchandani, P. Yuvraj, Q. Liang, R. Alao, R. Rodriguez, R. Ayub, R. Murthy, R. Nayani, R. Mitra, R. Parthasarathy, R. Li, R. Hogan, R. Battey, R. Wang, Russell Howes, R. Rinott, S. Mehta, S. Siby, S. J. Bondu, S. Datta, S. Chugh, S. Hunt, S. Dhillon, S. Sidorov, Siyuan Pan, S. Mahajan, S. Verma, S. Yamamoto, S. Ramaswamy, S. Lindsay, S. Lindsay, S. Feng, Stephen Lin, S. C. Zha, S. Patil, S. Shankar, S. Zhang, S. Zhang, Shijie Wang, Sandhini Agarwal, S. Sajuyigbe, S. Chintala, S. Max, S. Chen, S. Kehoe, S. Satterfield, S. Govindaprasad, S. Gupta, S. Deng, S. Cho, S. Virk, S. Subramanian, S. Choudhury, S. Goldman, T. Remez, T. Glaser, T. Best, T. Koehler, Tony Robinson, Tao Li, Tong Zhang, T. Matthews, T. Chou, T. Shaked, V. Vontimitta, V. Ajayi, V. Montanez, V. Mohan, V. S. Kumar, V. Mangla, V. Ionescu, V. Poenaru, V. T. Mihailescu, V. Ivanov, Wentao Li, Wenyi Wang, W. Jiang, W. Bouaziz, W. Constable, X. Tang, Xiaobao Wu, Xinpeng Wang, Xiaobao Wu, X. Gao, Y. Kleinman, Yanru Chen, Y. Hu, Y. Jia, Y. Qi, Yiwei Li, Y. Z. Zhang, Y. Z. Zhang, Y. Adi, Y. Nam, Yu, Wang, Y. Zhao, Yiding Hao, Y. Qian, Yiwei Li, Yun He, Z. Rait, Z. Devito, Z. Rosnbrick, Z. Wen, Zhilin Yang, Zhuoye Zhao, Z. Ma - 2024

2 papers in library cite

Nitish Shirish Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P. T. P. Tang - 2016

4 papers in library cite

Missing author list

2022

4 papers in library cite

M. Belkin, D. Hsu, S. Ma, S. Mandal - 2019

2 papers in library cite

Yonatan Bisk, Rowan Zellers, R. L. Bras, Jianfeng Gao, Yejin Choi - 2019

5 papers in library cite

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, Bo Zheng, B. Yu, C. Gao, C. Huang, C. Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, Haowei Lin, Jie Tang, Jihan Yang, J. Tu, J. Zhang, Jihan Yang, Jihan Yang, Jingren Zhou, Jingren Zhou, Junyang Lin, K. Dang, K. Bao, K. Yang, Longhui Yu, L. Deng, M. Li, M. Xue, M. Li, Peizhao Zhang, Peng Wang, Qihao Zhu, R. Men, R. Gao, Shuming Liu, S. Luo, Tao Li, T. Tang, W. Yin, Xiang Ren, Xinpeng Wang, X. Zhang, Xiang Ren, Yu Fan, Yu Su, Y. Z. Zhang, Y. Z. Zhang, Y. Wan, Yibo Liu, Zhengtao Wang, Z. Cui, Zhengyou Zhang, Zijian Zhou, Z. Qiu - 2025

5 papers in library cite

G. Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova, A. Rame, M. Riviere, L. Rouillard, T. Mesnard, G. Cideron, J. B. Grill, S. Ramos, E. Yvinec, M. Casbon, E. Pot, I. Penchev, G. Liu, F. Visin, K. Kenealy, Lucas Beyer, Xiaohua Zhai, A. Tsitsulin, R. B. Fekete, A. Feng, N. Sachdeva, B. Coleman, Y. Gao, B. Mustafa, I. Barr, E. Parisotto, D. Tian, M. Eyal, C. Cherry, J. T. Peter, D. Sinopalnikov, S. Bhupatiraju, R. Agarwal, M. Kazemi, D. Malkin, Ramana Kumar, D. Vilar, I. Brusilovsky, J. Luo, A. Steiner, A. Friesen, Archit Sharma, Archit Sharma, A. M. Gilady, A. Goedeckemeyer, A. Saade, A. Feng, Alexander Kolesnikov, A. Bendebury, A. Abdagic, A. Vadi, A. Gyorgy, A. S. Pinto, A. Das, A. Bapna, A. Miech, A. Yang, A. Paterson, A. Shenoy, A. Chakrabarti, Bilal Piot, Bo Wu, B. Shahriari, B. Petrini, C. C. Chen, C. L. Lan, C. A. C. Choo, C. Carey, C. Brick, D. Deutsch, D. Eisenbud, D. Cattle, D. Cheng, D. Paparas, D. S. Sreepathihalli, D. Reid, D. Tran, D. Zelle, E. Noland, E. Huizenga, E. Kharitonov, F. Liu, G. Amirkhanyan, G. Cameron, H. Hashemi, H. K. Plucinska, H. Singh, Harsh Mehta, H. T. Lehri, H. Hazimeh, I. Ballantyne, I. Szpektor, I. Nardini, J. P. Abadie, J. Chan, J. Stanton, J. Wieting, J. Lai, J. Orbay, J. Fernandez, J. Newlan, J. Y. Ji, J. Singh, K. Black, K. Yu, K. Hui, K. Vodrahalli, K. Greff, L. Qiu, M. Valentine, M. Coelho, M. Ritter, M. Hoffman, M. Watson, M. Chaturvedi, M. Moynihan, M. Ma, N. Babar, N. Noy, N. Byrd, N. Roy, N. Momchev, N. Chauhan, N. Sachdeva, O. Bunyan, P. Botarda, P. Caron, P. K. Rubenstein, P. Culliton, P. Schmid, P. G. Sessa, P. Xu, P. Stanczyk, P. Tafti, R. Shivanna, R. Wu, R. Pan, R. Rokni, R. Willoughby, R. Vallu, R. Mullins, S. Jerome, S. Smoot, S. Girgin, S. Iqbal, Siva Reddy, S. Sheth, S. Poder, S. Bhatnagar, S. R. Panyam, S. Eiger, S. Zhang, T. Liu, T. Yacovone, T. Liechty, U. Kalra, U. Evci, Vedant Misra, V. Roseberry, V. Feinberg, V. Kolesnikov, W. Han, W. Kwon, X. Chen, Y. Chow, Yuxuan Zhu, Z. Wei, Z. Egyed, V. Cotruta, M. Giang, P. Kirk, Abhishek Rao, K. Black, N. Babar, J. Lo, E. Moreira, L. G. Martins, O. Sanseviero, L. Gonzalez, Z. Gleicher, T. Warkentin, V. Mirrokni, E. Senter, E. Collins, J. Barral, Zoubin Ghahramani, Raia Hadsell, Y. Matias, D. Sculley, Slav Petrov, Noah Fiedel, Noam Shazeer, Oriol Vinyals, Jeffrey Dean, Demis Hassabis, Koray Kavukcuoglu, Clement Farabet, E. Buchatskaya, J. B. Alayrac, R. Anil, Dmitry, Lepikhin, S. Borgeaud, O. Bachem, Armand Joulin, A. Andreev, C. Hardin, R. Dadashi, L. Hussenot - 2025

1 paper in library cites

C. Snell, Jaehoon Lee, K. Xu, A. Kumar - 2024

4 papers in library cite

A. Veit, M. J. Wilber, S. Belongie - 2016

4 papers in library cite

S. Merity, Nitish Shirish Keskar, Richard Socher - 2017

6 papers in library cite

J. Hestness, S. Narang, N. Ardalani, G. Diamos, Heewoo Jun, H. Kianinejad, M. Patwary, M. Ali, Yining Yang, Y. Zhou - 2017

5 papers in library cite

Li Zhang, Junxiao Song, A. Gao, Jixuan Chen, C. Bao, K. Ma - 2019

1 paper in library cites

I. Shumailov, Z. Shumaylov, Y. Zhao, N. Papernot, R. Anderson, Yarin Gal - 2024

1 paper in library cites

A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019

6 papers in library cite

B. Brown, J. Juravsky, R. Ehrlich, R. Clark, Quoc V. Le, C. Re, Azalia Mirhoseini - 2024

2 papers in library cite

Tom Henighan, Jared Kaplan, M. Katz, Mark Chen, Christopher Hesse, J. Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, C. Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish - 2020

5 papers in library cite

Z. A. Zhu, Yiwei Li - 2023

1 paper in library cites

J. G. Zilly, R. K. Srivastava, J. Koutnik, Jürgen Schmidhuber - 2016

6 papers in library cite

Sam McCandlish, Jared Kaplan, Dario Amodei, O. D. Team - 2018

3 papers in library cite

Z. A. Zhu, Yiwei Li - 2024

1 paper in library cites

M. Gerstgrasser, R. Schaeffer, A. Dey, Rafael Rafailov, H. Sleight, J. Hughes, Tomasz Korbak, R. Agrawal, D. Pai, A. Gromov, D. A. Roberts, Diyi Yang, D. L. Donoho, S. Koyejo - 2024

1 paper in library cites

P. Nakkiran, G. Kaplun, Y. Bansal, T. Yang, B. Barak, Ilya Sutskever - 2019

2 papers in library cite

J. S. Rosenfeld, A. Rosenfeld, Yonatan Belinkov, N. Shavit - 2019

5 papers in library cite

J. Hestness, N. Ardalani, G. Diamos - 2019

4 papers in library cite

Zhilin Yang, Z. Dai, Ruslan Salakhutdinov, W. W. Cohen - 2017

4 papers in library cite

J. Welbl, N. F. Liu, Matt Gardner - 2017

3 papers in library cite

B. Krause, E. Kahembwe, I. Murray, S. Renals - 2017

3 papers in library cite

Yoon Kim, A. Rush - 2016

3 papers in library cite

B. Lakshminarayanan, A. Pritzel, C. Blundell - 2017

3 papers in library cite

G. Huang, Yiwei Li, G. Pleiss, Ze Liu, J. Hopcroft, K. Weinberger - 2016

3 papers in library cite

Leo Gao, J. Tow, B. Abbasi, Stella Biderman, S. Black, A. Dipofi, C. Foster, L. Golding, J. Hsu, A. L. Noac'h, H. Li, Kyle Mcdonell, Niklas Muennighoff, C. Ociepa, Jason Phang, Laria Reynolds, H. Schoelkopf, A. Skowron, L. Sutawika, Eric Tang, A. Thite, B. Wang, K. Wang, Andy Zou - 2024

3 papers in library cite

D. Busbridge, A. Shidani, F. Weers, J. Ramapuram, E. Littwin, R. Webb - 2025

2 papers in library cite

T. G. Dietterich - 2000

2 papers in library cite

C. Cortes, Lawrence Jackel, Sara Solla, V. N. Vapnik, John Denker - 1993

2 papers in library cite

C. E. Shannon - 1951

2 papers in library cite

Niklas Muennighoff, Alexander M. Rush, B. Barak, T. L. Scao, A. Piktus, N. Tazi, S. Pyysalo, Thomas Wolf, Colin Raffel - 2023

2 papers in library cite

Richard S. Sutton - 2019

2 papers in library cite

A. W. V. D. Vaart - 2000

1 paper in library cites

N. Sardana, J. Portes, S. Doubov, J. Frankle - 2025

1 paper in library cites

B. Sorscher, R. Geirhos, S. Shekhar, Surya Ganguli, A. S. Morcos - 2023

1 paper in library cites

P. Maini, V. Dorna, Parth Doshi, A. Carranza, F. Pan, J. Urbanek, P. Burstein, A. Fang, A. Deng, A. Abbas, B. Larsen, C. Blakeney, C. Bannur, C. Baek, D. Teh, D. Schwab, H. Mongstad, H. Yin, J. Wills, K. Mentzer, L. Merrick, R. Monti, R. Adiga, S. Joshi, S. Das, Zhengtao Wang, B. Gaza, A. Morcos, M. Leavitt - 2025

1 paper in library cites

Alex Warstadt, L. Choshen, A. Mueller, A. Williams, E. Wilcox, C. Zhuang - 2023

1 paper in library cites

T. Besiroglu, E. Erdil, M. Barnett, Jiacheng You - 2024

1 paper in library cites

R. Taori, T. B. Hashimoto - 2022

1 paper in library cites

Z. Xie, S. I. Wang, Jeffrey Li, Daniel Levy, Allen Nie, Dan Jurafsky, Andrew Y. Ng - 2017

1 paper in library cites

Jeffrey Li, A. Fang, G. Smyrnis, M. Ivgi, M. Jordan, S. Gadre, H. Bansal, Etash Guha, S. Keh, K. Arora, S. Garg, R. Xin, Niklas Muennighoff, R. Heckel, J. Mercat, Mark Chen, Suchin Gururangan, M. Wortsman, A. Albalak, Y. Bitton, Marianna Nezhurina, A. Abbas, C. Y. Hsieh, D. Ghosh, J. Gardner, M. Kilian, Haowei Zhang, R. Shao, S. Pratt, S. Sanyal, G. Ilharco, G. Daras, K. Marathe, A. Gokaslan, J. Zhang, K. Chandu, T. N. Nguyen, I. Vasiljevic, S. Kakade, S. Song, S. Sanghavi, F. Faghri, S. Oh, Luke Zettlemoyer, K. Lo, Alaaeldin El-Nouby, H. Pouransari, A. Toshev, Shijie Wang, D. Groeneveld, L. Soldaini, P. W. Koh, Jenia Jitsev, T. Kollar, Alexandros G. Dimakis, Y. Carmon, A. Dave, Ludwig Schmidt, Vaishaal Shankar - 2025

1 paper in library cites

M. Prabhudesai, M. Wu, A. Zadeh, K. Fragkiadaki, D. Pathak - 2025

1 paper in library cites

J. Ni, The, Team - 2025

1 paper in library cites

S. Takase, J. Suzuki, M. Nagata - 2018

1 paper in library cites

S. Goyal, D. L. Paz, K. Ahuja - 2025

1 paper in library cites

A. Gladstone, G. Nanduru, M. M. Islam, P. Han, H. Ha, A. Chadha, Yulun Du, H. Ji, Jeffrey Li, T. Iqbal - 2025

1 paper in library cites

K. Wen, D. Hall, T. Ma, Percy Liang - 2025

1 paper in library cites

N. Vyas, A. Atanasov, B. Bordelon, D. Morwani, S. Sainathan, C. Pehlevan - 2023

1 paper in library cites

S. K. Ainsworth, J. Hayase, S. S. Srinivasa - 2023

1 paper in library cites

T. Thrush, Christopher Potts, Tatsunori Hashimoto - 2025

1 paper in library cites

S. Y. Gadre, G. Smyrnis, Vaishaal Shankar, Suchin Gururangan, M. Wortsman, R. Shao, J. Mercat, A. Fang, Jeffrey Li, S. Keh, R. Xin, Marianna Nezhurina, I. Vasiljevic, Jenia Jitsev, L. Soldaini, Alexandros G. Dimakis, G. Ilharco, P. W. Koh, S. Song, T. Kollar, Y. Carmon, A. Dave, R. Heckel, Niklas Muennighoff, Ludwig Schmidt - 2024

1 paper in library cites

T. Garipov, P. Izmailov, D. Podoprikhin, D. Vetrov, A. G. Wilson - 2018

1 paper in library cites

Y. Gu, L. Dong, F. Wei, M. Huang - 2024

1 paper in library cites

S. P. Singh, M. Jaggi - 2023

1 paper in library cites

M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. G. Lopes, A. S. Morcos, H. Namkoong, Ali Farhadi, Y. Carmon, S. Kornblith, Ludwig Schmidt - 2022

1 paper in library cites

James B. Simon, D. Karkada, N. Ghosh, M. Belkin - 2024

1 paper in library cites

Joseph Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Y. Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shuming Liu, Bohong Yin, Weiran He, H. Zhu, Yuzhi Wang, J. Wang, Mengnan Dong, Zhengyou Zhang, Y. Kang, Haowei Zhang, Xinran Xu, Y. Z. Zhang, Yonghui Wu, Xinyu Zhou, Zhilin Yang - 2025

1 paper in library cites

D. Su, K. Kong, Yutong Lin, J. Jennings, B. Norick, M. Kliegl, M. Patwary, M. Shoeybi, Bryan Catanzaro - 2025

1 paper in library cites

B. S. Ruben, W. L. Tong, H. T. Chaudhry, C. Pehlevan - 2024

1 paper in library cites

C. Summers, M. J. Dinneen - 2021

1 paper in library cites

Y. Ruan, C. J. Maddison, Tatsunori Hashimoto - 2024

1 paper in library cites

Zhengtao Wang, F. Zhou, Xiang Lisa Li, P. Liu - 2025

1 paper in library cites

E. Lobacheva, N. Chirkova, M. Kodryan, D. Vetrov - 2021

1 paper in library cites

S. L. Smith, E. Elsen, S. De - 2020

1 paper in library cites

R. Agarwal, N. Vieillard, Y. Zhou, P. Stanczyk, S. Ramos, M. Geist, O. Bachem - 2024

1 paper in library cites

P. Nakkiran, P. Venkat, S. Kakade, T. Ma - 2021

1 paper in library cites

J. M. Springer, S. Goyal, K. Wen, T. Kumar, Xiang Yue, S. Malladi, Graham Neubig, A. Raghunathan - 2025

1 paper in library cites

Y. Ruan, N. Band, C. J. Maddison, Tatsunori Hashimoto - 2025

1 paper in library cites

P. Maini, S. Seto, H. Bai, D. Grangier, Y. Z. Zhang, Navdeep Jaitly - 2024

1 paper in library cites

K. Everett, L. Xiao, M. Wortsman, A. A. Alemi, Roman Novak, P. J. Liu, I. Gur, Jascha Sohl Dickstein, L. P. Kaelbling, Jaehoon Lee, Jeffrey Pennington - 2024

1 paper in library cites

S. Goyal, P. Maini, Zachary C. Lipton, A. Raghunathan, J. Zico Kolter - 2024

1 paper in library cites

T. Kumar, Z. Ankner, B. F. Spector, B. Bordelon, Niklas Muennighoff, M. Paul, C. Pehlevan, C. Re, A. Raghunathan - 2024

1 paper in library cites

Yanru Chen, B. Huang, Y. Gao, Zhengtao Wang, Jihan Yang, H. Ji - 2025

1 paper in library cites

H. Mobahi, M. Farajtabar, P. L. Bartlett - 2020

1 paper in library cites

M. Marek, S. Lotfi, A. Somasundaram, A. G. Wilson, M. Goldblum - 2025

1 paper in library cites

A. Canatar, B. Bordelon, C. Pehlevan - 2021

1 paper in library cites

M. Advani, Surya Ganguli - 2016

1 paper in library cites

E. Dohmatob, Y. Feng, A. Subramonian, J. Kempe - 2024

1 paper in library cites

H. Shi, Karen Livescu, Kevin Gimpel - 2021

1 paper in library cites

T. Hastie, A. Montanari, S. Rosset, R. J. Tibshirani - 2020

1 paper in library cites

Zhilin Yang, N. Band, Shanda Li, Emmanuel Candès, Tatsunori Hashimoto - 2024

1 paper in library cites

G. Yang, E. J. Hu, Igor Babuschkin, S. Sidor, Xiaodong Liu, D. Farhi, Nick Ryder, J. Pachocki, Weizhu Chen, Jianfeng Gao - 2022

1 paper in library cites

J. Sevilla, E. Roldan - 2024

1 paper in library cites

F. D'angelo, M. Andriushchenko, A. Varre, N. Flammarion - 2024

1 paper in library cites

P. Villalobos, A. Ho, J. Sevilla, T. Besiroglu, L. Heim, M. Hobbhahn - 2024

1 paper in library cites

Cited by

0

papers in your library

Cites

37

papers in your library

Read

on April 18, 2026

Your review

Tags

ICLR2026

Paper Aliases

No aliases