Papperoni

2014

Deeply-Supervised Nets

Chen Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu

Open PDF Google Scholar

citations

Cite Score

65

AI summary

This paper introduces Deeply-Supervised Nets (DSN) which minimize classification error and enhance learning transparency in deep networks by adding a "companion objective" to hidden layers, achieving state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and SVHN datasets.

Main Contributions

Introduced Deeply-Supervised Nets (DSN) for direct and early supervision of hidden layers and the output layer.
Proposed a "companion objective" for individual hidden layers as an additional constraint/regularization.
Formulation significantly enhances the performance of existing supervised deep learning methods.
Provided justification for the formulation using stochastic gradient techniques and demonstrated improved convergence rates.
Achieved state-of-the-art classification error on benchmark datasets including MNIST, CIFAR-10, CIFAR-100, and SVHN.

Abstract

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying a new formulation in deep networks. Three aspects in convolutional neural networks (CNN) style architectures are being looked at: (1) transparency of the intermediate layers to the overall classification; (2) discriminativeness and robustness of learned features, especially in the early layers; (3) effectiveness in training due to the presence of the exploding and vanishing gradients. We introduce “companion objective" to the individual hidden layers, in addition to the overall objective at the output layer (a different strategy to layer-wise pre-training). We extend techniques from stochastic gradient methods to analyze our algorithm. The advantage of our method is evident and our experimental result on benchmark datasets shows significant performance gain over existing methods (e.g. all state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and SVHN).

Citation Graph

Loading graph...

References [31]

Sort:

Filter:

[1]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[2]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[3]Understanding the Difficulty of Training Deep Feedforward Neural Networks

Yoshua Bengio - 2010

20 papers in library cite

Nice but underwhelming results (they still underperform vs. pretraining). I also didn't really like the way it's written. It's not bad, it's just a bit clunky. Worth the read though.

[4]Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, Rob Fergus - 2014

15 papers in library cite

Very good explanation and visualization of CNNs, and also nice that they use their findings to improve the performance. The ablation study is also nice.

[5]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[6]Caffe: Convolutional Architecture for Fast Feature Embedding

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, Ross Girshick, S. Guadarrama, Trevor Darrell - 2014

12 papers in library cite

Nothing new really, but worth the read. It's nice because it's the precursor to current AI frameworks + has a Python interface. Also good that model representation is separate from implementation

[7]Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

Dropout, super impactful. The idea that you are training many estimators at once is also very nice.

[8]Network in Network

M. Lin, Qinlang Chen, Shuicheng Yan - 2013

11 papers in library cite

I think this was badly written and explained. The idea is nice but I didn't like the paper at all.

[9]On the Difficulty of Training Recurrent Neural Networks

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

It starts very mathy but in the end there are some very nice contributions! You don't actually need to understand the math to know what's going on in the end.

[10]Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Bengio is perfect. This is everything that Hinton's paper hoped to be. Very well explained, and also tying back to real use cases (not just "hey, the math works and it reduced the score")

[11]Multi-Column Deep Neural Networks for Image Classification

Dan C. Ciresan, Ueli Meier, Jürgen Schmidhuber - 2012

11 papers in library cite

Very nice paper! And I am impressed they used CNNs before Hinton's paper. It's a shame there are so few citations. They also propose max-pooling and actually give a good explanation about it.

[12]DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

J. Donahue, Y. Jia, Oriol Vinyals, J. Hoffman, N. Zhang, E. Tzeng, Trevor Darrell - 2014

15 papers in library cite

Very nice paper. First I've seen (and based on the text, first ever) about feature extraction for images. It's very nice to see embeddings doing SotA

[13]Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Good paper, very well written and probably the best explanation of RBMs and DBNs I've seen. However, I don't see a lot of impact and seems very derivative from other works.

[14]Regularization of Neural Networks Using Dropconnect

L. Wan, M. Zeiler, S. Zhang, Rob Fergus - 2013

8 papers in library cite

I feel that the method is very complex and does not improve much on top of regular dropout.

[15]What Is the Best Multi-Stage Architecture for Object Recognition?

K. Jarrett, Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2009

20 papers in library cite

So boring and convoluted... They try to make things more abstract by saying "filter banks" and "2 stages", but in the end it's CNNs. It seems like they are trying to tie NNs to the rest of ML, but it doesn't work.

[16]Maxout Networks

Yoshua Bengio - 2013

17 papers in library cite

A bit hard to understand, but very nice idea.

[17]Deep Learning via Semi-Supervised Embedding

Jason Weston, F. Ratle, Ronan Collobert - 2008

10 papers in library cite

It's a good paper and nice idea, but seems overly complicated and I don't think it's very used... (PS: this was republished in 2012)

[18]Theano: A CPU and GPU Math Expression Compiler

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

Very nice framework. Symbolic programming is very nice. However, I think that this had very little impact and was mostly used by Bengio's lab.

[19]Understanding Deep Architectures Using a Recursive Convolutional Network

D. Eigen, J. Rolfe, Rob Fergus, Yann Lecun - 2013

2 papers in library cite

"recent research on small datasets suggests that the accuracy should improve from the increased number of parameters in conv layers"

[20]Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

Honglak Lee, R. Grosse, R. Ranganath, Andrew Y. Ng - 2009

12 papers in library cite

[21]The Nature of Statistical Learning Theory

V. Vapnik - 1995

9 papers in library cite

Book, 300+ pages

[22]Distributed Representations, Simple Recurrent Networks, and Grammatical Structure

Jeffrey L. Elman - 1991

5 papers in library cite

[23]Large-Scale Learning With SVM and Convolutional Nets for Generic Object Categorization

F. Huang, Yann Lecun - 2006

5 papers in library cite

[24]Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

Matthew D. Zeiler, Rob Fergus - 2013

5 papers in library cite

[25]Online Algorithms and Stochastic Approximations

Leon Bottou - 1998

4 papers in library cite

[26]Tiled Convolutional Neural Networks

Quoc V. Le, J. Ngiam, Ziru Chen, D. Chia, P. W. Koh, Andrew Y. Ng - 2010

4 papers in library cite

[27]Deep Learning Using Linear Support Vector Machines

Y. Tang - 2013

2 papers in library cite

[28]Discriminative Transfer Learning With Tree-Based Priors

N. Srivastava, Ruslan Salakhutdinov - 2013

2 papers in library cite

[29]Nonparametric Guidance of Autoencoder Representations Using Label Information

J. Snoek, R. P. Adams, Hugo Larochelle - 2012

1 paper in library cites

[30]Regularized m-estimators With Nonconvexity: Statistical and Algorithmic Theory for Local Optima

P. L. Loh, M. J. Wainwright - 2013

1 paper in library cites

[31]Stochastic Gradient Descent for Non-Smooth Optimization: Convergence Results and Optimal Averaging Schemes

A. Rakhlin, O. Shamir, K. Sridharan - 2012

1 paper in library cites

Cited by

8

papers in your library

Cites

20

papers in your library

Read

on February 18, 2026

Amazing potential, very nice idea, but bad writing.

Tags

Paper Aliases

No aliases