Papperoni

2016

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, Jian Sun

Open PDF Google Scholar

citations

Cite Score

100

AI summary

This paper introduces a deep residual learning framework, enabling the training of significantly deeper networks by learning residual functions with shortcut connections. They achieved 3.57% error on the ImageNet test set, winning 1st place in the ILSVRC 2015 classification task, and demonstrated improved performance on COCO object detection dataset.

Main Contributions

Introduces a deep residual learning framework to address the degradation problem in very deep networks, enabling easier optimization and accuracy gains from increased depth.
Presents residual networks (ResNets) with identity shortcut connections that allow training of networks with up to 152 layers.
Achieves state-of-the-art results on the ImageNet dataset, winning 1st place in the ILSVRC 2015 classification task with a 3.57% error rate.
Demonstrates the effectiveness of residual learning on other datasets such as CIFAR-10, and shows improved performance on object detection tasks on the COCO dataset.
Provides analysis of layer responses, showing that ResNets have generally smaller responses than their plain counterparts, suggesting that identity mappings provide reasonable preconditioning.

Abstract

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [41] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Citation Graph

Loading graph...

References [50]

Sort:

Filter:

[1]Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

This is very good! The great thing here is small filters and depth analysis, but truly they do some other stuff as well: SotA, generalization for other tasks, and open source their models. Very nice.

[2]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[3]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[4]Going Deeper With Convolutions

Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015

20 papers in library cite

Introduced the inception algorithm, which is nice. The paper is quite good, but I had to google some stuff to understand it fully. Nice contribution and SotA, but TBH I felt that it wasn't toooo good of a read.

[5]Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

S. Ioffe, Christian Szegedy - 2015

18 papers in library cite

Very good paper! Similar feel as ResNets: simple idea, elegant. Not too mathy

[6]Microsoft COCO: Common Objects in Context

T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014

14 papers in library cite

I liked this paper a lot. It's a bit long and I was already a bit tired, but it was nice overall.

[7]Fully Convolutional Networks for Semantic Segmentation

J. Long, E. Shelhamer, Trevor Darrell - 2015

7 papers in library cite

I didn't really like the way the paper is written and the results seem a bit underwhelming. However, it's nice that they do it in a fully convolutional way and that they increase the speed.

[8]Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks

Jian Sun - 2016

2 papers in library cite

Now, this is the pinnacle of object detection so far! Very nice to see the progress from Deep CNNs -> R-CNN -> Fast R-CNN -> Faster R-CNN. Very nice improvements on detection and speed, and the paper is incredibly written. The abstract is probably the best I've read in a long time.

[9]Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

Good results, beat overfeat, used pretraining for improving performance. Only issue is that the paper is overly long...

Ross Girshick - 2015

2 papers in library cite

Very nice improvement over R-CNN! I like that they start pushing towards having an end-to-end framework, but this still falls short for using a separate region proposal module.

[11]Learning Multiple Layers of Features From Tiny Images

Alex Krizhevsky - 2009

27 papers in library cite

It's alright. It mainly focuses on RBMs and their features and the actual part that describes the dataset is like 1 page. However, it's maybe the best intuitive description of an RBM I have seen. Other than that, it reads very much like an undergraduate thesis.

[12]Understanding the Difficulty of Training Deep Feedforward Neural Networks

Yoshua Bengio - 2010

20 papers in library cite

Nice but underwhelming results (they still underperform vs. pretraining). I also didn't really like the way it's written. It's not bad, it's just a bit clunky. Worth the read though.

[13]Rectified Linear Units Improve Restricted Boltzmann Machines

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

I hate when people introduce a new idea but don't care to explain it! This is terrible compared to bengio's paper.

[14]Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

K. He, X. Zhang, S. Ren, Jian Sun - 2015

10 papers in library cite

I think the PRELU idea didn't catch on, but the initialization is very nice! Good read.

[15]The PASCAL Visual Object Classes (VOC) Challenge

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman - 2010

7 papers in library cite

Good overview of the VOC challenge and good read, but too long and the part that they describe methods and results is very boring. Also, not related to NNs at the time!

[16]Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, Rob Fergus - 2014

15 papers in library cite

Very good explanation and visualization of CNNs, and also nice that they use their findings to improve the performance. The ablation study is also nice.

[17]Backpropagation Applied to Handwritten Zip-Code Recognition

Yann Lecun, B. Boser, John S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackal - 1989

24 papers in library cite

The first convolution NN! Very simple concept and very simply explained. Very good results and overall a good read.

[18]Caffe: Convolutional Architecture for Fast Feature Embedding

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, Ross Girshick, S. Guadarrama, Trevor Darrell - 2014

12 papers in library cite

Nothing new really, but worth the read. It's nice because it's the precursor to current AI frameworks + has a Python interface. Also good that model representation is separate from implementation

[19]Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

K. He, X. Zhang, S. Ren, Jian Sun - 2014

6 papers in library cite

Very simple, general and effective method. The paper ends at page ~4 TBH, the rest is just results and gets boring. Good contribution though.

[20]Learning Long-Term Dependencies With Gradient Descent Is Difficult

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

The first ones to notice that there is a problem with gradient descent, but way too mathy for me.

[21]Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

Dropout, super impactful. The idea that you are training many estimators at once is also very nice.

[22]Network in Network

M. Lin, Qinlang Chen, Shuicheng Yan - 2013

11 papers in library cite

I think this was badly written and explained. The idea is nice but I didn't like the paper at all.

[23]Efficient Backprop

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

The first half is very very good. The remainder is very boring.

[24]Training Very Deep Networks

R. K. Srivastava, K. Greff, Jürgen Schmidhuber - 2015

6 papers in library cite

The idea of using the same concepts of LSTM to feedforward nets is so simple yet it took so long! Very cool idea but the results are underwhelming (otherwise would be a 5!)

[25]Deeply-Supervised Nets

Chen Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu - 2014

8 papers in library cite

Amazing potential, very nice idea, but bad writing.

[26]Maxout Networks

Yoshua Bengio - 2013

17 papers in library cite

A bit hard to understand, but very nice idea.

[27]Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks

Surya Ganguli - 2014

9 papers in library cite

TBH it's been almost 2 months since I read this paper (shame on me for forgetting to add it)... Anyway, as I recall it I liked it, but TBH it's a bit underwhelming because it solved only for linear networks

[28]Deep Learning Made Easier by Linear Transformations in Perceptrons

Tapani Raiko, Harri Valpola, Yann Lecun - 2012

7 papers in library cite

Kudos for introducing shortcut connections (which would become important in the future), but to me it seems a bit mid.

[29]OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

Very convoluted method, was SotA for only a bit of time, and the paper is very boring.

[30]Imagenet Large Scale Visual Recognition Challenge

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

Imagenet dataset challenge paper

[31]Neural Networks for Pattern Recognition

C. M. Bishop - 1995

12 papers in library cite

Book, 38k citations

[32]Fitnets: Hints for Thin Deep Nets

A. Romero, Nicolas Ballas, S. E. Kahou, A. Chassang, C. Gatta, Yoshua Bengio - 2015

5 papers in library cite

Fitnets, referenced by resnets + bengio

[33]Convolutional Neural Networks at Constrained Time Cost

K. He, Jian Sun - 2014

2 papers in library cite

Main arch for the "delving deep into RELU" paper (SotA on ILSVRC challenge)

[34]On the Number of Linear Regions of Deep Neural Networks

G. F. Montufar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio - 2014

3 papers in library cite

Sounds very mathy, but bengio and seems to challenge the idea that multilayer networks can approximate complex functions

[35]Highway Networks

R. K. Srivastava, K. Greff, Jürgen Schmidhuber - 2015

6 papers in library cite

Introduced highway networks, which seem like a precursor to resnets

[36]Fisher Kernels on Visual Vocabularies for Image Categorization

F. Perronnin, C. Dance - 2007

3 papers in library cite

[37]Aggregating Local Image Descriptors Into Compact Codes

Hervé Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, Cordelia Schmid - 2012

2 papers in library cite

[38]Centering Neural Network Gradient Factors

N. N. Schraudolph - 1998

2 papers in library cite

[39]Object Detection Networks on Convolutional Feature Maps

S. Ren, K. He, Ross Girshick, X. Zhang, Jian Sun - 2015

2 papers in library cite

[40]The Devil Is in the Details: An Evaluation of Recent Feature Encoding Methods

K. Chatfield, Victor Lempitsky, A. Vedaldi, Andrew Zisserman - 2011

2 papers in library cite

[41]A Multigrid Tutorial

W. L. Briggs, S. F. Mccormick - 2000

1 paper in library cites

[42]Accelerated Gradient Descent by Factor-Centering Decomposition

N. N. Schraudolph - 1998

1 paper in library cites

[43]Fast Surface Interpolation Using Hierarchical Basis Functions

R. Szeliski - 1990

1 paper in library cites

[44]Locally Adapted Hierarchical Basis Preconditioning

R. Szeliski - 2006

1 paper in library cites

[45]Modern Applied Statistics With S-Plus

W. Venables, B. Ripley - 1999

1 paper in library cites

[46]Object Detection via a Multi-Region & Semantic Segmentation-Aware CNN model

S. Gidaris, N. Komodakis - 2015

1 paper in library cites

[47]Pattern Recognition and Neural Networks

B. D. Ripley - 1996

1 paper in library cites

[48]Product Quantization for Nearest Neighbor Search

Hervé Jégou, M. Douze, Cordelia Schmid - 2011

1 paper in library cites

[49]Pushing Stochastic Gradient Towards Second-Order Methods–backpropagation Learning With Transformations in Nonlinearities

T. Vatanen, Tapani Raiko, Harri Valpola, Yann Lecun - 2013

1 paper in library cites

[50]VLFeat: An Open and Portable Library of Computer Vision Algorithms

A. Vedaldi, B. Fulkerson - 2008

1 paper in library cites

Cited by

20

papers in your library

Cites

35

papers in your library

Read

on July 14, 2025

This is simply amazing. Very very simple idea, totally revolutionary. No maths, just "it works!". Amazing.

Tags

Paper Aliases

No aliases