2015

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

S. Ioffe, Christian Szegedy

citations

Cite Score

98

AI summary

This paper introduces batch normalization, a novel technique to accelerate the training of deep neural networks by reducing internal covariate shift, achieving state-of-the-art results on the ImageNet classification dataset, and reaching 4.82% top-5 test error.

Main Contributions

  • Introduces batch normalization, a new technique for normalizing layer inputs during training.
  • Demonstrates that batch normalization allows for the use of higher learning rates and reduces the need for careful initialization.
  • Shows that batch normalization can eliminate the need for Dropout in some cases.
  • Achieves state-of-the-art results on the ImageNet classification dataset, with a top-5 test error of 4.82%.
  • Provides an algorithm for constructing, training, and performing inference with batch-normalized networks.

Abstract

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

Citation Graph

Loading graph...

References [24]

Sort:
Filter:

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015

20 papers in library cite

N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014

20 papers in library cite

Yoshua Bengio - 2010

20 papers in library cite

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

K. He, X. Zhang, S. Ren, Jian Sun - 2015

10 papers in library cite

John Duchi, Elad Hazan, Yoram Singer - 2011

19 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013

13 papers in library cite

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012

16 papers in library cite

Surya Ganguli - 2014

9 papers in library cite

Tapani Raiko, Harri Valpola, Yann Lecun - 2012

7 papers in library cite

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

C. G. Gulcehre, Yoshua Bengio - 2013

3 papers in library cite

S. Lyu, E. Simoncelli - 2008

3 papers in library cite

R. Wu, Y. Shan, G. Sun - 2015

2 papers in library cite

G. Desjardins, Koray Kavukcuoglu - 2015

2 papers in library cite

S. Wiesler, Hermann Ney - 2011

1 paper in library cites

J. J. Jiang - 2008

1 paper in library cites

A. Hyvarinen, E. Oja - 2000

1 paper in library cites

S. Wiesler, A. Richard, R. Schluter, Hermann Ney - 2014

1 paper in library cites

D. Povey, X. Zhang, Sanjeev Khudanpur - 2014

1 paper in library cites

Cited by

18

papers in your library

Cites

15

papers in your library

Read

on July 19, 2025

Your review

Tags

Paper Aliases

No aliases