Papperoni

2014

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

J. Donahue, Y. Jia, Oriol Vinyals, J. Hoffman, N. Zhang, E. Tzeng, Trevor Darrell

Open PDF Google Scholar

citations

Cite Score

77

AI summary

This paper introduces DeCAF, a deep convolutional activation feature, achieving state-of-the-art results on several important vision challenges including scene recognition, domain adaptation, and fine-grained recognition. The method uses a convolutional network trained on ImageNet and releases an open-source implementation.

Main Contributions

Introduces DeCAF: a generic visual feature based on a convolutional network trained on ImageNet.
Demonstrates that convolutional features cluster semantic topics more readily than conventional features.
Achieves state-of-the-art results on Caltech-101, the Office domain adaptation dataset, the Caltech-UCSD Birds fine-grained recognition dataset, and the SUN-397 scene recognition database.
Releases an open-source implementation of DeCAF, along with all associated network parameters.

Abstract

We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re purposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient la beled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a va riety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed fea ture, and report novel results that significantly outperform the state-of-the-art on several impor tant vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimenta tion with deep representations across a range of visual concept learning paradigms.

Citation Graph

Loading graph...

References [46]

Sort:

Filter:

[1]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[2]ImageNet: A Large-Scale Hierarchical Image Database

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Very nice idea and huge impact!

[3]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[4]Visualizing Data Using t-SNE

Geoffrey Hinton - 2008

7 papers in library cite

Amazing. Simple. Impactful. Easy to understand. Masterpiece.

[5]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[6]Backpropagation Applied to Handwritten Zip-Code Recognition

Yann Lecun, B. Boser, John S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackal - 1989

24 papers in library cite

The first convolution NN! Very simple concept and very simply explained. Very good results and overall a good read.

[7]Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

Dropout, super impactful. The idea that you are training many estimators at once is also very nice.

[8]Multitask Learning

Rich Caruana - 1997

13 papers in library cite

I expected waaaaaay more from this paper. The idea is sooooo simple and the results are underwhelming. Also, 30 pages for something that could be said in 10. The writing style is a bit boring. TBH it seems like it's just a re-writing of Caruana's PhD thesis.

[9]Learning Generative Visual Models From Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

Li Fei Fei, Rob Fergus, Pietro Perona - 2004

15 papers in library cite

I think most people cite this thinking this is where the Caltech 101 dataset comes from (it's not). Anyway, it's just an extension of the other dataset and it's very mathy, not NNs, and uninteresting.

[10]Sun Database: Large-Scale Scene Recognition From Abbey to Zoo

Jianxiong Xiao, James Hays, K. Ehinger, Aude Oliva, Antonio Torralba - 2010

2 papers in library cite

I like the dataset and the approach. Solid methodology also. However, I don't really think "scene within a scene" is good for practical use. Also not too sure about impact.

[11]What Is the Best Multi-Stage Architecture for Object Recognition?

K. Jarrett, Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2009

20 papers in library cite

So boring and convoluted... They try to make things more abstract by saying "filter banks" and "2 stages", but in the end it's CNNs. It seems like they are trying to tie NNs to the rest of ML, but it doesn't work.

[12]Building High-Level Features Using Large Scale Unsupervised Learning

Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng - 2012

10 papers in library cite

Very nice and very early work - seems very simple but very insightful to use an autoencoder to detect objects. Also, very similar to the neocognitron :)

[13]Self-Taught Learning: Transfer Learning From Unlabeled Data

Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, A. Ng - 2007

7 papers in library cite

It's nice to make the distinction of self-taught learning vs. semi-supervised learning. However, it doesn't talk about NNs. Overall, LeCun's CNN unsupervised learning was better.

[14]A Framework for Learning Predictive Structures From Multiple Tasks and Unlabeled Data

Rie Kubota Ando, Tong Zhang - 2005

10 papers in library cite

Very nice and clever way of solving the problem of semi-supervised learning, and makes a lot of sense. I give them more credit for formalizing the concept. The methodology is a bit boring.

[15]Is Learning the N-Th Thing Any Easier Than Learning the First?

Sebastian Thrun - 1996

3 papers in library cite

It's one of those early NN papers that only have toy examples and some very non-standard terminology. TBH doesn't add much.

[16]ImageNet Large Scale Visual Recognition Challenge 2012

A. Berg, J. Deng, Li Fei Fei - 2012

1 paper in library cites

Imagenet dataset challenge paper

[17]Unsupervised and Transfer Learning Challenge: A Deep Learning Approach

G. Mesnil, Yann Dauphin, Xavier Glorot, S. Rifai, Yoshua Bengio, I. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. W. Farley, Pascal Vincent, Aaron Courville, J. Berkgstra - 2012

2 papers in library cite

Transfer learning with DNN

[18]Histograms of Oriented Gradients for Human Detection

N. Dalal, B. Triggs - 2005

12 papers in library cite

[19]Object Detection With Discriminatively Trained Part-Based Models

P. F. Felzenszwalb, Ross Girshick, D. Mcallester, D. Ramanan - 2010

8 papers in library cite

[20]Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Aude Oliva, Antonio Torralba - 2001

7 papers in library cite

[21]Unbiased Look at dataset Bias

Antonio Torralba, A. Efros - 2011

5 papers in library cite

[22]Learning Hierarchical Invariant Spatio-Temporal Features for Action Recognition With Independent Subspace Analysis

Quoc Le, W. Zou, S. Y. Yeung, A. Ng - 2011

4 papers in library cite

[23]Multi-Task Feature Learning

A. Argyriou, T. Evgeniou, M. Pontil - 2006

3 papers in library cite

[24]Adapting Visual Category Models to New Domains

K. Saenko, B. Kulis, M. Fritz, Trevor Darrell - 2010

2 papers in library cite

[25]Caltech-Ucsd Birds 200

Peter Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, Pietro Perona - 2010

2 papers in library cite

[26]Frustratingly Easy Domain Adaptation

H. D. Iii - 2007

2 papers in library cite

[27]Histograms of Sparse Codes for Object Detection

Xiang Ren, D. Ramanan - 2013

2 papers in library cite

[28]Locality-Constrained Linear Coding for Image Classification

J. Wang, Jihan Yang, K. Yu, F. Lv, T. Huang, Y. Gong - 2010

2 papers in library cite

[29]Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts

Sanja Fidler, A. Leonardis - 2007

2 papers in library cite

[30]Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

N. Zhang, R. Farrell, F. Iandola, Trevor Darrell - 2013

1 paper in library cites

[31]Describing People: A Poselet-Based Approach to Attribute Classification

L. Bourdev, S. Maji, Jitendra Malik - 2011

1 paper in library cites

[32]Dlid: Deep Learning for Domain Adaptation by Interpolating Between Domains

S. Chopra, S. Balakrishnan, R. Gopalan - 2013

1 paper in library cites

[33]Efficient Learning of Domain-Invariant Image Representations

J. Hoffman, E. Rodner, J. Donahue, K. Saenko, Trevor Darrell - 2013

1 paper in library cites

[34]Efficient Object Category Recognition Using Classemes

L. Torresani, M. Szummer, A. Fitzgibbon - 2010

1 paper in library cites

[35]Geodesic Flow Kernel for Unsupervised Domain Adaptation

B. Gong, Yangyang Shi, F. Sha, Kristen Grauman - 2012

1 paper in library cites

[36]Group-Sensitive Multiple Kernel Learning for Object Categorization

Jihan Yang, L. Y., Yuandong Tian, L. Duan, W. Gao - 2009

1 paper in library cites

[37]Kernel Descriptors for Visual Recognition

L. Bo, Xiang Ren, D. Fox - 2010

1 paper in library cites

[38]LSCOM Lexicon Definitions and Annotations (Version 1.0)

L. Kennedy, A. Hauptmann - 2006

1 paper in library cites

[39]Multi-Label Prediction via Compressed Sensing

D. Hsu, S. Kakade, John Langford, Tong Zhang - 2009

1 paper in library cites

[40]Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Lei Li, H. Su, Li Fei Fei, E. Xing - 2010

1 paper in library cites

[41]POOF: Part-Based One-vs-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation

T. Berg, P. Belhumeur - 2013

1 paper in library cites

[42]SURF: Speeded Up Robust Features

H. Bay, T. Tuytelaars, Luc Van Gool - 2006

1 paper in library cites

[43]Transfer Learning for Image Classication With Sparse Prototype Representations

A. Quattoni, Michael Collins, Trevor Darrell - 2008

1 paper in library cites

[44]Unsupervised Discovery of Mid-Level Discriminative Patches

Shivalika Singh, Aman Gupta, A. Efros - 2012

1 paper in library cites

[45]Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing

L. Zhu, Yanru Chen, A. Yuille - 2007

1 paper in library cites

[46]What You Saw Is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms

B. Kulis, K. Saenko, Trevor Darrell - 2011

1 paper in library cites

Cited by

15

papers in your library

Cites

17

papers in your library

Read

on August 2, 2025

Very nice paper. First I've seen (and based on the text, first ever) about feature extraction for images. It's very nice to see embeddings doing SotA

Tags

Paper Aliases

No aliases