2025

Datacomp-Lm: In Search of the Next Generation of Training Sets for Language Models

Jeffrey Li, A. Fang, G. Smyrnis, M. Ivgi, M. Jordan, S. Gadre, H. Bansal, Etash Guha, S. Keh, K. Arora, S. Garg, R. Xin, Niklas Muennighoff, R. Heckel, J. Mercat, Mark Chen, Suchin Gururangan, M. Wortsman, A. Albalak, Y. Bitton, Marianna Nezhurina, A. Abbas, C. Y. Hsieh, D. Ghosh, J. Gardner, M. Kilian, Haowei Zhang, R. Shao, S. Pratt, S. Sanyal, G. Ilharco, G. Daras, K. Marathe, A. Gokaslan, J. Zhang, K. Chandu, T. N. Nguyen, I. Vasiljevic, S. Kakade, S. Song, S. Sanghavi, F. Faghri, S. Oh, Luke Zettlemoyer, K. Lo, Alaaeldin El-Nouby, H. Pouransari, A. Toshev, Shijie Wang, D. Groeneveld, L. Soldaini, P. W. Koh, Jenia Jitsev, T. Kollar, Alexandros G. Dimakis, Y. Carmon, A. Dave, Ludwig Schmidt, Vaishaal Shankar

citations

Citation Graph

Loading graph...

References [0]

Sort:
Filter:

No references match the current filters.

Cited by

1

papers in your library

Cites

0

papers in your library

Notes

Tags

Paper Aliases

No aliases