2014

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

K. He, X. Zhang, S. Ren, Jian Sun

citations

Cite Score

90

AI summary

This paper introduces Spatial Pyramid Pooling (SPP-net), a novel network structure that eliminates the fixed-size input constraint of CNNs, achieving state-of-the-art results on ImageNet 2012, Pascal VOC 2007, and Caltech101 datasets, and demonstrating significant speedup in object detection compared to R-CNN.

Main Contributions

  • Introduces a spatial pyramid pooling (SPP) layer to remove the fixed-size constraint of CNNs.
  • Demonstrates that SPP-net can generate a fixed-length representation regardless of image size/scale.
  • Shows that SPP-net boosts the accuracy of various CNN architectures on ImageNet 2012.
  • Achieves state-of-the-art classification results on Pascal VOC 2007 and Caltech101 using a single full-image representation and no fine-tuning.
  • Presents a method using SPP-net that is 24-102x faster than the R-CNN method in object detection, while achieving better or comparable accuracy on Pascal VOC 2007.

Abstract

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102x faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

Citation Graph

Loading graph...

References [40]

Sort:
Filter:

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015

20 papers in library cite

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

Matthew D. Zeiler, Rob Fergus - 2014

15 papers in library cite

Yann Lecun, B. Boser, John S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackal - 1989

24 papers in library cite

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, Ross Girshick, S. Guadarrama, Trevor Darrell - 2014

12 papers in library cite

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce - 2006

14 papers in library cite

M. Lin, Qinlang Chen, Shuicheng Yan - 2013

11 papers in library cite

Y. Taigman, Michael Yang, Marc'aurelio Ranzato, Lior Wolf - 2014

5 papers in library cite

Josef Sivic, Andrew Zisserman - 2003

5 papers in library cite

Li Fei Fei, Rob Fergus, Pietro Perona - 2004

15 papers in library cite

J. Donahue, Y. Jia, Oriol Vinyals, J. Hoffman, N. Zhang, E. Tzeng, Trevor Darrell - 2014

15 papers in library cite

Kristen Grauman, Trevor Darrell - 2005

4 papers in library cite

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

A. Razavian, H. Azizpour, J. Sullivan, S. Carlsson - 2014

6 papers in library cite

K. Chatfield, K. Simonyan, A. Vedaldi, Andrew Zisserman - 2014

5 papers in library cite

C. L. Zitnick, Piotr Dollar - 2014

2 papers in library cite

A. G. Howard - 2013

4 papers in library cite

N. Dalal, B. Triggs - 2005

12 papers in library cite

D. Lowe - 2004

9 papers in library cite

Jihan Yang, K. Yu, Y. Gong, T. Huang - 2009

8 papers in library cite

P. F. Felzenszwalb, Ross Girshick, D. Mcallester, D. Ramanan - 2010

8 papers in library cite

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman - 2007

7 papers in library cite

A. Coates, Andrew Y. Ng - 2011

5 papers in library cite

Christian Szegedy, A. Toshev, Dumitru Erhan - 2013

4 papers in library cite

Maxime Oquab, Leon Bottou, I. Laptev, Josef Sivic - 2014

4 papers in library cite

C. C. Chang, C. J. Lin - 2001

4 papers in library cite

K. E. A. V. D. Sande, J. R. R. Uijlings, T. Gevers, A. W. M. Smeulders - 2011

3 papers in library cite

Hervé Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, Cordelia Schmid - 2012

2 papers in library cite

F. Perronnin, J. Sanchez, T. Mensink - 2010

2 papers in library cite

J. Wang, Jihan Yang, K. Yu, F. Lv, T. Huang, Y. Gong - 2010

2 papers in library cite

N. Zhang, M. Paluri, Marc'aurelio Ranzato, Trevor Darrell, L. Bourdev - 2014

2 papers in library cite

Xinpeng Wang, Michael Yang, S. Zhu, Yutong Lin - 2013

2 papers in library cite

K. Chatfield, Victor Lempitsky, A. Vedaldi, Andrew Zisserman - 2011

2 papers in library cite

W. Y. Zou, Xinpeng Wang, Maosong Sun, Yutong Lin - 2014

1 paper in library cites

J. C. V. Gemert, J. M. Geusebroek, C. J. Veenman, A. W. Smeulders - 2008

1 paper in library cites

Y. Gong, Lisa Wang, R. Guo, Svetlana Lazebnik - 2014

1 paper in library cites

Cited by

6

papers in your library

Cites

21

papers in your library

Read

on August 16, 2025

Your review

Tags

Paper Aliases

No aliases