2016

Bridging Nonlinearities and Stochastic Regularizers With Gaussian Error Linear Units

Dan Hendrycks, Kevin Gimpel

citations

Cite Score

81

AI summary

This paper introduces the Gaussian Error Linear Unit (GELU), a novel neural network activation function inspired by stochastic regularizers like dropout and zoneout. GELU is evaluated on MNIST and CIFAR-10, achieving performance improvements over ReLU and ELU activations.

Main Contributions

  • Introduces the Gaussian Error Linear Unit (GELU) activation function, bridging nonlinearities and stochastic regularizers.
  • Proposes a new probabilistic understanding of nonlinearities based on the connection between GELU and stochastic regularizers.
  • Empirically evaluates GELU against ReLU and ELU on MNIST and CIFAR-10 datasets.
  • Demonstrates that GELU achieves performance improvements across all tasks compared to ReLU and ELU.
  • Shows GELU surpasses ELU and ReLU for shallow and deep convolutional neural networks

Abstract

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map, combining the intuitions of dropout and zoneout while respecting neuron values. This connection suggests a new probabilistic understanding of nonlinearities. We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all tasks.

Citation Graph

Loading graph...

References [14]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014

20 papers in library cite

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

A. L. Maas, A. Y. Hannun, Andrew Y. Ng - 2013

3 papers in library cite

D. A. Clevert, Thomas Unterthiner, Sepp Hochreiter - 2016

2 papers in library cite

Surya Ganguli - 2014

9 papers in library cite

G. Huang, Y. S. Sun, Ze Liu, D. Sedra, K. Q. Weinberger - 2016

3 papers in library cite

A. Veit, M. J. Wilber, S. Belongie - 2016

4 papers in library cite

D. Mishkin, J. Matas - 2016

2 papers in library cite

P. Bachman, O. Alsharif, D. Precup - 2014

1 paper in library cites

David Krueger, T. Maharaj, J. Kramar, M. Pezeshki, Nicolas Ballas, N. R. Ke, A. G. A. P. Goyal, Yoshua Bengio, Hugo Larochelle, Aaron Courville - 2016

3 papers in library cite

Missing year

Dan Hendrycks, Kevin Gimpel

1 paper in library cites

A. Choudhury - 2014

1 paper in library cites

Cited by

9

papers in your library

Cites

12

papers in your library

Read

on August 6, 2025

Your review

Tags

Paper Aliases

Gaussian Error Linear Units (Gelus)