Papperoni

2014

Dropout: A Simple Way to Prevent Neural Networks From Overfitting

N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov

Open PDF Google Scholar

citations

Cite Score

98

AI summary

This paper introduces dropout, a regularization technique for neural networks that randomly drops units during training to prevent co-adaptation. This reduces overfitting and improves generalization performance, achieving state-of-the-art results on various benchmark datasets for vision, speech recognition, document classification, and computational biology.

Main Contributions

Introduces dropout, a novel regularization technique for neural networks that prevents overfitting by randomly dropping units during training.
Demonstrates that dropout can be interpreted as a form of model averaging over an exponential number of thinned networks.
Shows that dropout improves the performance of neural networks on a variety of supervised learning tasks, including vision, speech recognition, document classification, and computational biology.
Achieves state-of-the-art results on many benchmark datasets using dropout.
Introduces the dropout Restricted Boltzmann Machine (RBM) model and demonstrates its improved performance compared to standard RBMs.

Abstract

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Citation Graph

Loading graph...

References [36]

Sort:

Filter:

[1]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[2]Learning Multiple Layers of Features From Tiny Images

Alex Krizhevsky - 2009

27 papers in library cite

It's alright. It mainly focuses on RBMs and their features and the actual part that describes the dataset is like 1 page. However, it's maybe the best intuitive description of an RBM I have seen. Other than that, it reads very much like an undergraduate thesis.

[3]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[4]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[5]Backpropagation Applied to Handwritten Zip-Code Recognition

Yann Lecun, B. Boser, John S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackal - 1989

24 papers in library cite

The first convolution NN! Very simple concept and very simply explained. Very good results and overall a good read.

[6]Extracting and Composing Robust Features With Denoising Autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

I am *so* glad we found an alternative to DBNs. Also, introduced the idea of denoising which is nice.

[7]Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network With a Local Denoising Criterion

P. H. Vincent, Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre Antoine Manzagol - 2010

6 papers in library cite

This is basically a summary of everything that happened from 2006-2010, and also points some interesting things about DBNs! Very well explained as well.

[8]Reading Digits in Natural Images With Unsupervised Feature Learning

Y. Netzer, Tianle Wang, A. Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng - 2011

8 papers in library cite

It's a bit meh. It's also 2011 so maybe it was impressive, but I think the main contribution is the dataset.

[9]Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

John C. Platt - 2003

12 papers in library cite

Very good paper. Simple and pragmatic.

[10]What Is the Best Multi-Stage Architecture for Object Recognition?

K. Jarrett, Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2009

20 papers in library cite

So boring and convoluted... They try to make things more abstract by saying "filter banks" and "2 stages", but in the end it's CNNs. It seems like they are trying to tie NNs to the rest of ML, but it doesn't work.

[11]Maxout Networks

Yoshua Bengio - 2013

17 papers in library cite

A bit hard to understand, but very nice idea.

[12]Convolutional neural Networks Applied to House Numbers Digit Classification

P. Sermanet, S. Chintala, Yann Lecun - 2012

6 papers in library cite

This reads like an undergrad's research paper or a recent master's paper. Only applying stuff to a new dataset.

[13]Deep Boltzmann Machines

Ruslan Salakhutdinov, Geoffrey E. Hinton - 2009

9 papers in library cite

Not sure what's the difference vs. DBNs

[14]Acoustic Modeling Using Deep Belief Networks

A. Mohamed, G. Dahl, Geoffrey Hinton - 2012

12 papers in library cite

[15]Practical Bayesian Optimization of Machine Learning Algorithms

J. Snoek, Hugo Larochelle, R. P. Adams - 2012

9 papers in library cite

[16]Improving Neural Networks With Dropout

N. Srivastava - 2013

6 papers in library cite

[17]Phone Recognition With the Mean-Covariance Restricted Boltzmann Machine

George E. Dahl, Marc'aurelio Ranzato, A. Mohamed, Geoffrey E. Hinton - 2010

6 papers in library cite

[18]CUDAMat: A CUDA-based Matrix Class for Python

V. Mnih - 2009

5 papers in library cite

[19]Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

Matthew D. Zeiler, Rob Fergus - 2013

5 papers in library cite

[20]Fast Dropout Training

Shijie Wang, C. Manning - 2013

4 papers in library cite

[21]High-Dimensional Signature Compression for Large-Scale Image Classification

J. Sanchez, F. Perronnin - 2011

4 papers in library cite

[22]Regression Shrinkage and Selection via the Lasso

R. Tibshirani - 1996

4 papers in library cite

[23]The Kaldi Speech Recognition Toolkit

D. Povey, A. Ghoshal - 2011

4 papers in library cite

[24]Bayesian Learning for Neural Networks

R. Neal - 1995

3 papers in library cite

[25]Rank, Trace-Norm and Max-Norm

N. Srebro, A. Shraibman - 2005

3 papers in library cite

[26]Marginalized Denoising Autoencoders for Domain Adaptation

Mark Chen, Zhiwei Xu, K. Weinberger, F. Sha - 2012

2 papers in library cite

[27]Nightmare at Test Time: Robust Learning by Feature Deletion

A. Globerson, S. Roweis - 2006

2 papers in library cite

[28]Bayesian Prediction of Tissue-Regulated Splicing Using RNA Sequence and Cellular Context

H. Xiong, Y. Barash, B. Frey - 2011

1 paper in library cites

[29]Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo

Ruslan Salakhutdinov, A. Mnih - 2008

1 paper in library cites

[30]Dropout Training as Adaptive Regularization

S. Wager, Shijie Wang, Percy Liang - 2013

1 paper in library cites

[31]Imagenet Classification: Fast Descriptor Coding and Large-Scale SVM Training

Yutong Lin, F. Lv, S. Zhu, Michael Yang, T. Cour, K. Yu, L. Cao, Zhiyuan Li, M. Tsai, Xinyu Zhou, T. Huang, Tong Zhang - 2010

1 paper in library cites

[32]Learning to Classify With Missing and Corrupted Features

O. Dekel, O. Shamir, L. Xiao - 2010

1 paper in library cites

[33]Learning With Marginalized Corrupted Features

Laurens Van Der Maaten, Mark Chen, S. Tyree, K. Weinberger - 2013

1 paper in library cites

[34]On the Stability of Inverse Problems

A. Tikhonov - 1943

1 paper in library cites

[35]Sex, Mixability, and Modularity

A. Livnat, C. Papadimitriou, N. Pippenger, M. Feldman - 2010

1 paper in library cites

[36]Simplifying Neural Networks by Soft Weight-Sharing

S. Nowlan, Geoffrey Hinton - 1992

1 paper in library cites

Cited by

20

papers in your library

Cites

14

papers in your library

Read

on October 13, 2025

Good paper, but it's mostly a review of the method described in the other paper with more results. It's longer as well, so I would suggest just reading the other one.

Tags

Paper Aliases

No aliases