2016

Understanding Deep Learning Requires Rethinking Generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

citations

Cite Score

76

AI summary

This paper challenges traditional views of generalization by demonstrating that large deep neural networks, like Inception V3 and Alexnet on CIFAR10 and ImageNet, can perfectly fit random labels and noise, suggesting current complexity measures and explicit regularization inadequately explain their generalization performance.

Main Contributions

  • Deep neural networks can achieve 0 training error on completely random labels and even random pixels, regardless of explicit regularization.
  • Traditional generalization theories (VC-dimension, Rademacher complexity, uniform stability) fail to explain why neural networks generalize well in practice.
  • Explicit regularization (weight decay, dropout, data augmentation) improves generalization but is neither necessary nor sufficient for controlling it.
  • A theoretical construction shows simple depth-two ReLU networks can achieve perfect finite sample expressivity with parameters exceeding data points.
  • Implicit regularization, such as that provided by SGD, is suggested as a potential factor in generalization, with linear models showing SGD converges to minimum l2-norm solutions.

Abstract

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

Citation Graph

Loading graph...

References [32]

Sort:
Filter:

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

S. Ioffe, Christian Szegedy - 2015

18 papers in library cite

N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014

20 papers in library cite

Zbigniew Wojna - 2015

5 papers in library cite

Alex Krizhevsky - 2009

27 papers in library cite

M. Abadi, Akshat Agarwal, P. Barham, E. Brevdo, Ziru Chen, C. Citro, G. Corrado, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, I. Goodfellow, A. Harp, Geoffrey Irving, M. Isard, Y. Jia, R. Jozefowicz, Lukasz Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, Christopher Olah, M. Schuster, J. Shlens, B. Steiner, Ilya Sutskever, K. Talwar, P. Tucker, Vincent Vanhoucke, V. Vasudevan, F. Viegas, Oriol Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2015

11 papers in library cite

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

V. N. Vapnik - 1998

10 papers in library cite

A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, Yann Lecun - 2015

4 papers in library cite

G. Cybenko - 1988

2 papers in library cite

O. Delalleau, Yoshua Bengio - 2011

2 papers in library cite

B. Scholkopf, R. Herbrich, A. J. Smola - 2001

1 paper in library cites

H. N. Mhaskar - 1993

1 paper in library cites

M. Telgarsky - 2016

1 paper in library cites

N. Cohen, A. Shashua - 2016

1 paper in library cites

H. Mhaskar, T. A. Poggio - 2016

1 paper in library cites

T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi - 2004

1 paper in library cites

Junyang Lin, R. Camoriano, L. Rosasco - 2016

1 paper in library cites

Behnam Neyshabur, R. Tomioka, N. Srebro - 2014

1 paper in library cites

S. S. Shwartz, O. Shamir, N. Srebro, K. Sridharan - 2010

1 paper in library cites

A. Coates, Andrew Y. Ng - 2012

1 paper in library cites

Behnam Neyshabur, R. Tomioka, N. Srebro - 2015

1 paper in library cites

Y. Yao, L. Rosasco, A. Caponnetto - 2007

1 paper in library cites

R. Livni, S. S. Shwartz, O. Shamir - 2014

1 paper in library cites

P. L. Bartlett, S. Mendelson - 2003

1 paper in library cites

E. Edgington, P. Onghena - 2007

1 paper in library cites

O. Bousquet, A. Elisseeff - 2002

1 paper in library cites

R. Eldan, O. Shamir - 2016

1 paper in library cites

Moritz Hardt, Benjamin Recht, Yoram Singer - 2016

1 paper in library cites

Cited by

2

papers in your library

Cites

9

papers in your library

Read

on February 17, 2026

Your review

Tags

Paper Aliases

No aliases