Papperoni

2011

Deep Sparse Rectifier neural Networks

Xavier Glorot, Antoine Bordes, Yoshua Bengio

Open PDF Google Scholar

citations

Cite Score

87

AI summary

This paper introduces deep rectifier neural networks, utilizing a linear by part activation function, max(0,x). It achieves comparable or superior performance to hyperbolic tangent networks and creates sparse representations suitable for sparse data. Experiments on image and text data show improved training and performance without unsupervised pre-training.

Main Contributions

Introduces deep rectifier networks with the max(0,x) activation function.
Demonstrates that rectifier networks achieve comparable or superior performance to hyperbolic tangent networks.
Shows that rectifier networks create sparse representations with true zeros, suitable for sparse data.
Finds that rectifier networks can achieve best performance without unsupervised pre-training on purely supervised tasks with large labeled datasets.
Presents empirical results on image recognition and sentiment analysis tasks, highlighting the effectiveness of rectifier networks.

Abstract

While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabeled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labeled datasets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised neural networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training.

Citation Graph

Loading graph...

References [36]

Sort:

Filter:

[1]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[2]Learning Multiple Layers of Features From Tiny Images

Alex Krizhevsky - 2009

27 papers in library cite

It's alright. It mainly focuses on RBMs and their features and the actual part that describes the dataset is like 1 page. However, it's maybe the best intuitive description of an RBM I have seen. Other than that, it reads very much like an undergraduate thesis.

[3]Understanding the Difficulty of Training Deep Feedforward Neural Networks

Yoshua Bengio - 2010

20 papers in library cite

Nice but underwhelming results (they still underperform vs. pretraining). I also didn't really like the way it's written. It's not bad, it's just a bit clunky. Worth the read though.

[4]Rectified Linear Units Improve Restricted Boltzmann Machines

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

I hate when people introduce a new idea but don't care to explain it! This is terrible compared to bengio's paper.

[5]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[6]Learning Deep Architectures for AI

Yoshua Bengio - 2009

25 papers in library cite

It's a nice overview. Some sections get very theoretical, but the first half is very good and I feel that it does a waaaay better job of explaining RBMs and DBNs than other papers. This feels like Bengio is taking your hand and saying "if you don't know what's going on, here you go, everything you need to know to jump into the deep nets train"

[7]Extracting and Composing Robust Features With Denoising Autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

I am *so* glad we found an alternative to DBNs. Also, introduced the idea of denoising which is nice.

[8]Efficient Backprop

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

The first half is very very good. The remainder is very boring.

[9]Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Bengio is perfect. This is everything that Hinton's paper hoped to be. Very well explained, and also tying back to real use cases (not just "hey, the math works and it reduced the score")

[10]Why Does Unsupervised Pre-Training Help Deep Learning?

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010

12 papers in library cite

Good paper, easy to follow, and brings some light to the pre-training stuff (layer-by-layer). I just wish it wasn't so long. It's a chore.

[11]What Is the Best Multi-Stage Architecture for Object Recognition?

K. Jarrett, Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2009

20 papers in library cite

So boring and convoluted... They try to make things more abstract by saying "filter banks" and "2 stages", but in the end it's CNNs. It seems like they are trying to tie NNs to the rest of ML, but it doesn't work.

[12]Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification

John Blitzer, Mark Dredze, Fernando Pereira - 2007

4 papers in library cite

It's alright. I think they focus a lot on methodology of prediction but most citations are actually because of the dataset.

[13]Learning Methods for Generic Object Recognition With Invariance to Pose and Lighting

Yann Lecun, Fu Jie Huang, Leon Bottou - 2004

18 papers in library cite

Good paper, nice methodology for creating different images. However, I think that this was not too impactful... I don't see this being used a lot.

[14]Efficient Learning of Sparse Representations With an Energy-Based Model

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

It's ok. Not really good, but alright.

[15]Measuring Invariances in Deep Networks

I. Goodfellow, Quoc Le, A. Saxe, A. Ng - 2009

7 papers in library cite

Very nice concept and methodology, but the results in the end are underwhelming

[16]Sparse Feature Learning for Deep Belief Networks

Marc'aurelio Ranzato, Y. Boureau, Yann Lecun - 2008

12 papers in library cite

[17]A Theoretical Analysis of Robust Coding Over Noisy Overcomplete Channels

E. Doi, D. C. Balcan, M. S. Lewicki - 2006

5 papers in library cite

only 33 citations

[18]Sparse Coding With an Overcomplete Basis Set: A Strategy Employed by V1?

Bruno A. Olshausen, David J. Field - 1997

10 papers in library cite

[19]Sparse Deep Belief Net Model for Visual Area V2

Honglak Lee, C. Ekanadham, A. Ng - 2008

10 papers in library cite

[20]Efficient Sparse Coding Algorithms

Honglak Lee, Alexis Battle, Rajat Raina, A. Ng - 2007

6 papers in library cite

[21]Multiple Aspect Ranking Using the Good Grief algorithm

B. Snyder, R. Barzilay - 2007

5 papers in library cite

[22]Opinion Mining and Sentiment Analysis

Bo Pang, L. Lee - 2008

4 papers in library cite

[23]A Model of Multiplicative Neural Responses in Parietal Cortex

E. Salinas, L. F. Abbott - 1996

2 papers in library cite

[24]A Quantitative Theory of Immediate Visual Recognition

T. Serre, G. Kreiman, M. Kouh, C. Cadieu, U. Knoblich, T. Poggio - 2007

2 papers in library cite

[25]On the Piecewise Analysis of Networks of Linear Threshold Neurons

R. H. R. Hahnloser - 1998

2 papers in library cite

[26]Supervised Dictionary Learning

Julien Mairal, F. Bach, Jean Ponce, G. Sapiro, Andrew Zisserman - 2009

2 papers in library cite

[27]The Cost of Cortical Computation

P. Lennie - 2003

2 papers in library cite

[28]Active Deep Networks for Semi-Supervised Sentiment Classification

Shuyan Zhou, Qinlang Chen, Xinpeng Wang - 2010

1 paper in library cites

[29]An Energy Budget for Signaling in the Grey Matter of the Brain

D. Attwell, S. Laughlin - 2001

1 paper in library cites

[30]Decoding by linear Programming

Emmanuel Candès, T. Tao - 2005

1 paper in library cites

[31]Deep Self-Taught Learning for Handwritten Character Recognition

Yoshua Bengio, Others - 2010

1 paper in library cites

[32]Handprinted Forms and Character Database, NIST Special Database 19

P. Grother - 1995

1 paper in library cites

[33]Incorporating Second-Order Functional Knowledge for Better Option Pricing

C. Dugas, Yoshua Bengio, F. Belisle, C. Nadeau, R. Garcia - 2001

1 paper in library cites

[34]Recurrent Excitation in Neocortical Circuits

R. Douglas, Others - 2003

1 paper in library cites

[35]The Cortical Neuron

P. C. Bush, T. J. Sejnowski - 1995

1 paper in library cites

[36]Theoretical Neuroscience

Peter Dayan, L. Abott - 2001

1 paper in library cites

Cited by

17

papers in your library

Cites

16

papers in your library

Read

on July 1, 2025

How can you not love Bengio? This paper is everything that Hinton's is not. Really well explained, surprising results, good comparisons... Amazing impact!

Tags

Paper Aliases

No aliases