Papperoni

2014

How Transferable Are Features in Deep Neural Networks?

Hod Lipson

Open PDF Google Scholar

citations

Cite Score

87

AI summary

This paper quantifies the generality versus specificity of neurons in each layer of a deep convolutional neural network, reporting that transferability is negatively affected by specialization and optimization difficulties, and initializing a network with transferred features can improve generalization performance on ImageNet.

Main Contributions

Introduces a method to quantify the degree to which a particular layer is general or specific.
Experimentally shows two separate issues that cause performance degradation when using transferred features without fine-tuning.
Quantifies how the performance benefits of transferring features decreases the more dissimilar the base task and target task are.
Finds that initializing a network with transferred features can produce a boost to generalization performance after fine-tuning to a new dataset.
Demonstrates state-of-the-art results on ImageNet.

Abstract

Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

Citation Graph

Loading graph...

References [15]

Sort:

Filter:

[1]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[2]ImageNet: A Large-Scale Hierarchical Image Database

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Very nice idea and huge impact!

[3]Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

Good results, beat overfeat, used pretraining for improving performance. Only issue is that the paper is overly long...

[4]Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, Rob Fergus - 2014

15 papers in library cite

Very good explanation and visualization of CNNs, and also nice that they use their findings to improve the performance. The ablation study is also nice.

[5]Caffe: Convolutional Architecture for Fast Feature Embedding

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, Ross Girshick, S. Guadarrama, Trevor Darrell - 2014

12 papers in library cite

Nothing new really, but worth the read. It's nice because it's the precursor to current AI frameworks + has a Python interface. Also good that model representation is separate from implementation

[6]Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

Dropout, super impactful. The idea that you are training many estimators at once is also very nice.

[7]Learning Generative Visual Models From Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

Li Fei Fei, Rob Fergus, Pietro Perona - 2004

15 papers in library cite

I think most people cite this thinking this is where the Caltech 101 dataset comes from (it's not). Anyway, it's just an extension of the other dataset and it's very mathy, not NNs, and uninteresting.

[8]DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

J. Donahue, Y. Jia, Oriol Vinyals, J. Hoffman, N. Zhang, E. Tzeng, Trevor Darrell - 2014

15 papers in library cite

Very nice paper. First I've seen (and based on the text, first ever) about feature extraction for images. It's very nice to see embeddings doing SotA

[9]What Is the Best Multi-Stage Architecture for Object Recognition?

K. Jarrett, Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2009

20 papers in library cite

So boring and convoluted... They try to make things more abstract by saying "filter banks" and "2 stages", but in the end it's CNNs. It seems like they are trying to tie NNs to the rest of ML, but it doesn't work.

[10]Learning Many Related Tasks at the Same Time With Backpropagation

Rich Caruana - 1995

3 papers in library cite

Very nice read, fast and simple. Very well and intuitive explanation as well. However, it doesn't seem too impactful overall (the other multitask learning paper seems more relevant, but also more boring)

[11]OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

Very convoluted method, was SotA for only a bit of time, and the paper is very boring.

[12]Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

Honglak Lee, R. Grosse, R. Ranganath, Andrew Y. Ng - 2009

12 papers in library cite

[13]ICA With Reconstruction Cost for Efficient Over-Complete Feature Learning

Quoc Le, A. Karpenko, J. Ngiam, A. Ng - 2011

4 papers in library cite

[14]Deep Learning of Representations for Unsupervised and Transfer Learning

Yoshua Bengio - 2011

2 papers in library cite

[15]Deep Learners Benefit More From Out-of-Distribution Examples

Yoshua Bengio, F. Bastien, A. Bergeron, N. B. Lewandowski, T. Breuel, Y. Chherawala, M. Cisse, M. Cote, Dumitru Erhan, J. Eustache, Xavier Glorot, X. Muller, S. P. Lebeuf, Razvan Pascanu, S. Rifai, F. Savard, G. Sicard - 2011

1 paper in library cites

Cited by

2

papers in your library

Cites

12

papers in your library

Read

on October 24, 2025

Such a good paper to analyze pretraining behavior: simple, intuitive, nice to read

Tags

Paper Aliases

No aliases