2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
M. Marek, S. Lotfi, A. Somasundaram, A. G. Wilson, M. Goldblum
Citation Graph
References [0]
No references match the current filters.
Cited by
1
papers in your library
Cites
0
Add to reading list
Notes
Tags
Paper Aliases
No aliases