2016

Layer Normalization

Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton

citations

Cite Score

89

AI summary

This paper introduces layer normalization, a new normalization method for neural networks, which computes normalization statistics from summed inputs within a layer on a single training case, improving training speed and generalization performance for RNN models.

Main Contributions

  • Introduces Layer Normalization, a novel normalization technique.
  • Layer Normalization computes normalization statistics from summed inputs within a layer on a single training case.
  • Layer Normalization is effective for stabilizing hidden state dynamics in RNNs.
  • Layer Normalization reduces training time compared to existing techniques.
  • Demonstrates improved generalization performance of Layer Normalization on RNN models.

Abstract

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.

Citation Graph

Loading graph...

References [32]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

S. Ioffe, Christian Szegedy - 2015

18 papers in library cite

T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014

14 papers in library cite

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Alex Graves - 2013

27 papers in library cite

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012

16 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

23 papers in library cite

K. Gregor, Ivo Danihelka, Alex Graves, D. J. Rezende, Daan Wierstra - 2015

5 papers in library cite

Richard S. Zemel - 2014

5 papers in library cite

Geoffrey E. Hinton, L. Deng, D. Yu, George E. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, Vincent Vanhoucke, P. Nguyen, T. N. Sainath, Brian Kingsbury - 2012

8 papers in library cite

Dario Amodei, S. Ananthanarayanan, R. Anubhai, Jinze Bai, E. Battenberg, C. Case, J. Casper, Bryan Catanzaro, Q. Cheng, Guanduo Chen - 2016

3 papers in library cite

T. Salimans, D. A. Kingma, D. P. Diederik - 2016

4 papers in library cite

Lisa Wang, Yiwei Li, Svetlana Lazebnik - 2016

1 paper in library cites

I. Vendrov, R. Kiros, Sanja Fidler, R. Urtasun - 2016

4 papers in library cite

T. Cooijmans, Nicolas Ballas, C. Laurent, Aaron Courville - 2016

3 papers in library cite

C. Laurent, G. Pereyra, P. Brakel, Y. Z. Zhang, Yoshua Bengio - 2015

1 paper in library cites

Bo Pang, L. A. Lee, L. Lillian - 2004

8 papers in library cite

J. Wiebe, T. Wilson, T. Theresa, C. A. Cardie, C. Claire - 2005

7 papers in library cite

Marco Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini, R. Zamparelli - 2014

7 papers in library cite

M. Hu, B. A. Liu, B. Bing - 2004

6 papers in library cite

S. I. Amari - 1998

6 papers in library cite

Hugo Larochelle, I. Murray - 2011

5 papers in library cite

M. Liwicki, H. Bunke - 2005

3 papers in library cite

T. T. D. Team, R. A. Rfou, G. Alain, Amjad Almahairi, C. Angermueller, D. Bahdanau, Nicolas Ballas, F. Bastien, J. Bayer, A. Belikov - 2016

2 papers in library cite

Behnam Neyshabur, Ruslan Salakhutdinov, N. Srebro - 2015

1 paper in library cites

Cited by

14

papers in your library

Cites

22

papers in your library

Read

on July 20, 2025

Your review

Tags

Paper Aliases

Layernorm