2025

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

M. Marek, S. Lotfi, A. Somasundaram, A. G. Wilson, M. Goldblum

citations

Citation Graph

Loading graph...

References [0]

Sort:
Filter:

No references match the current filters.

Cited by

1

papers in your library

Cites

0

papers in your library

Notes

Tags

Paper Aliases

No aliases