Papperoni

2016

Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks

Jian Sun

Open PDF Google Scholar

citations

Cite Score

97

AI summary

This paper introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, enabling nearly cost-free region proposals. The RPN is trained end-to-end and achieves state-of-the-art object detection accuracy on PASCAL VOC and MS COCO datasets with only 300 proposals per image.

Main Contributions

Introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, enabling nearly cost-free region proposals.
RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position.
RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.
Achieves state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image.
Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks of ILSVRC and COCO 2015 competitions.

Abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with "attention" mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

Citation Graph

Loading graph...

References [39]

Sort:

Filter:

[1]Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

This is simply amazing. Very very simple idea, totally revolutionary. No maths, just "it works!". Amazing.

[2]Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

This is very good! The great thing here is small filters and depth analysis, but truly they do some other stuff as well: SotA, generalization for other tasks, and open source their models. Very nice.

[3]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[4]Going Deeper With Convolutions

Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015

20 papers in library cite

Introduced the inception algorithm, which is nice. The paper is quite good, but I had to google some stuff to understand it fully. Nice contribution and SotA, but TBH I felt that it wasn't toooo good of a read.

[5]Microsoft COCO: Common Objects in Context

T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014

14 papers in library cite

I liked this paper a lot. It's a bit long and I was already a bit tired, but it was nice overall.

[6]Fully Convolutional Networks for Semantic Segmentation

J. Long, E. Shelhamer, Trevor Darrell - 2015

7 papers in library cite

I didn't really like the way the paper is written and the results seem a bit underwhelming. However, it's nice that they do it in a fully convolutional way and that they increase the speed.

[7]Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks

Jian Sun - 2016

2 papers in library cite

Now, this is the pinnacle of object detection so far! Very nice to see the progress from Deep CNNs -> R-CNN -> Fast R-CNN -> Faster R-CNN. Very nice improvements on detection and speed, and the paper is incredibly written. The abstract is probably the best I've read in a long time.

[8]Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

Good results, beat overfeat, used pretraining for improving performance. Only issue is that the paper is overly long...

Ross Girshick - 2015

2 papers in library cite

Very nice improvement over R-CNN! I like that they start pushing towards having an end-to-end framework, but this still falls short for using a separate region proposal module.

[10]Rectified Linear Units Improve Restricted Boltzmann Machines

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

I hate when people introduce a new idea but don't care to explain it! This is terrible compared to bengio's paper.

[11]Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, Rob Fergus - 2014

15 papers in library cite

Very good explanation and visualization of CNNs, and also nice that they use their findings to improve the performance. The ablation study is also nice.

[12]Backpropagation Applied to Handwritten Zip-Code Recognition

Yann Lecun, B. Boser, John S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackal - 1989

24 papers in library cite

The first convolution NN! Very simple concept and very simply explained. Very good results and overall a good read.

[13]Caffe: Convolutional Architecture for Fast Feature Embedding

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, Ross Girshick, S. Guadarrama, Trevor Darrell - 2014

12 papers in library cite

Nothing new really, but worth the read. It's nice because it's the precursor to current AI frameworks + has a Python interface. Also good that model representation is separate from implementation

[14]Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

K. He, X. Zhang, S. Ren, Jian Sun - 2014

6 papers in library cite

Very simple, general and effective method. The paper ends at page ~4 TBH, the rest is just results and gets boring. Good contribution though.

[15]OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

Very convoluted method, was SotA for only a bit of time, and the paper is very boring.

[16]Imagenet Large Scale Visual Recognition Challenge

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

Imagenet dataset challenge paper

[17]Edge Boxes: Locating Object Proposals From Edges

C. L. Zitnick, Piotr Dollar - 2014

2 papers in library cite

A way of detecting objects that brings speedups

[18]Object Detection With Discriminatively Trained Part-Based Models

P. F. Felzenszwalb, Ross Girshick, D. Mcallester, D. Ramanan - 2010

8 papers in library cite

[19]The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman - 2007

7 papers in library cite

[20]Selective Search for Object Recognition

J. Uijlings, K. V. D. Sande, T. Gevers, A. Smeulders - 2013

6 papers in library cite

[21]Deep Neural Networks for Object Detection

Christian Szegedy, A. Toshev, Dumitru Erhan - 2013

4 papers in library cite

[22]Scalable Object Detection Using Deep Neural Networks

Dumitru Erhan, Christian Szegedy, A. Toshev, Dragomir Anguelov - 2014

4 papers in library cite

[23]Attention-Based Models for Speech Recognition

J. Chorowski, D. Bahdanau, D. Serdyuk, Kyunghyun Cho, Yoshua Bengio - 2015

3 papers in library cite

[24]Convolutional Feature Masking for Joint Object and Stuff Segmentation

Josef Dai, K. He, Jian Sun - 2015

2 papers in library cite

[25]CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts

J. Carreira, C. Sminchisescu - 2012

2 papers in library cite

[26]Measuring the Objectness of Image Windows

B. Alexe, T. Deselaers, V. Ferrari - 2012

2 papers in library cite

[27]Multiscale combinatorial Grouping

P. Arbelaez, J. P. Tuset, J. Barron, F. Marques, Jitendra Malik - 2014

2 papers in library cite

[28]Object Detection Networks on Convolutional Feature Maps

S. Ren, K. He, Ross Girshick, X. Zhang, Jian Sun - 2015

2 papers in library cite

[29]What Makes for Effective Detection Proposals?

J. H. Hosang, R. Benenson, Piotr Dollar, B. Schiele - 2015

2 papers in library cite

[30]Deep Sliding Shapes for Amodal 3d Object Detection in rgb-d images

S. Song, Jianxiong Xiao - 2015

1 paper in library cites

[31]DeePM: A Deep Part-Based Model for Object Detection and Semantic Part Localization

Jiacheng Zhu, X. Chen, A. Yuille - 2015

1 paper in library cites

[32]Densecap: Fully Convolutional Localization Networks for Dense Captioning

J. Johnson, A. Karpathy, Li Fei Fei - 2015

1 paper in library cites

[33]How Good Are Detection Proposals, Really?

J. H. Hosang, R. Benenson, B. Schiele - 2014

1 paper in library cites

[34]Human Curation and Convnets: Powering Item-to-Item Recommendations on Pinterest

D. Kislyuk, Yibo Liu, D. Liu, E. Tzeng, Y. Jing - 2015

1 paper in library cites

[35]Instance-Aware Semantic Segmentation via Multi-Task Network Cascades

Josef Dai, K. He, Jian Sun - 2015

1 paper in library cites

[36]Learning to Segment Object Candidates

P. Pinheiro, Ronan Collobert, Piotr Dollar - 2015

1 paper in library cites

[37]Object-Proposal Evaluation Protocol Is 'Gameable'

N. Chavali, H. Agrawal, A. Mahendru, D. Batra - 2015

1 paper in library cites

[38]R-CNN Minus R

K. Lenc, A. Vedaldi - 2015

1 paper in library cites

[39]Scalable, High-Quality Object Detection

Christian Szegedy, S. Reed, Dumitru Erhan, Dragomir Anguelov - 2015

1 paper in library cites

Cited by

2

papers in your library

Cites

17

papers in your library

Read

on October 20, 2025

Now, this is the pinnacle of object detection so far! Very nice to see the progress from Deep CNNs -> R-CNN -> Fast R-CNN -> Faster R-CNN. Very nice improvements on detection and speed, and the paper is incredibly written. The abstract is probably the best I've read in a long time.

Tags

Paper Aliases

No aliases