2015
Cite Score
98
AI summary
This paper introduces batch normalization, a novel technique to accelerate the training of deep neural networks by reducing internal covariate shift, achieving state-of-the-art results on the ImageNet classification dataset, and reaching 4.82% top-5 test error.
Main Contributions
Abstract
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.
Citation Graph
References [24]
Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998
62 papers in library cite
Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015
20 papers in library cite
N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014
20 papers in library cite
Yoshua Bengio - 2010
20 papers in library cite
V. Nair, Geoffrey E. Hinton - 2010
18 papers in library cite
K. He, X. Zhang, S. Ren, Jian Sun - 2015
10 papers in library cite
John Duchi, Elad Hazan, Yoram Singer - 2011
19 papers in library cite
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013
21 papers in library cite
Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998
20 papers in library cite
Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013
13 papers in library cite
Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012
16 papers in library cite
Surya Ganguli - 2014
9 papers in library cite
Tapani Raiko, Harri Valpola, Yann Lecun - 2012
7 papers in library cite
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014
18 papers in library cite
C. G. Gulcehre, Yoshua Bengio - 2013
3 papers in library cite
S. Lyu, E. Simoncelli - 2008
3 papers in library cite
R. Wu, Y. Shan, G. Sun - 2015
2 papers in library cite
H. Shimodaira - 2000
2 papers in library cite
G. Desjardins, Koray Kavukcuoglu - 2015
2 papers in library cite
S. Wiesler, Hermann Ney - 2011
1 paper in library cites
J. J. Jiang - 2008
1 paper in library cites
A. Hyvarinen, E. Oja - 2000
1 paper in library cites
S. Wiesler, A. Richard, R. Schluter, Hermann Ney - 2014
1 paper in library cites
D. Povey, X. Zhang, Sanjeev Khudanpur - 2014
1 paper in library cites
Cited by
18
papers in your library
Cites
15
papers in your library
Read
on July 19, 2025
Your review
Tags
Paper Aliases
No aliases