Papperoni

2012

Building High-Level Features Using Large Scale Unsupervised Learning

Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng

Open PDF Google Scholar

citations

Cite Score

64

AI summary

This paper introduces a 9-layer locally connected sparse autoencoder to learn high-level features from 10 million unlabeled images, achieving 15.8% accuracy on ImageNet with 22,000 categories, a 70% relative improvement over the state-of-the-art, demonstrating the possibility of training class-specific feature detectors without labeled data.

Main Contributions

Demonstrates that it is possible to train a face detector without having to label images as containing a face or not.
Introduces a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization.
Uses a large dataset of 10 million 200x200 pixel images downloaded from the Internet.
Achieved 15.8% accuracy in recognizing 22,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.
Shows that the learned feature detector is robust not only to translation but also to scaling and out-of-plane rotation.

Abstract

We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200x200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 22,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.

Citation Graph

Loading graph...

References [40]

Sort:

Filter:

[1]ImageNet: A Large-Scale Hierarchical Image Database

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Very nice idea and huge impact!

[2]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[3]Learning Multiple Layers of Features From Tiny Images

Alex Krizhevsky - 2009

27 papers in library cite

It's alright. It mainly focuses on RBMs and their features and the actual part that describes the dataset is like 1 page. However, it's maybe the best intuitive description of an RBM I have seen. Other than that, it reads very much like an undergraduate thesis.

[4]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[5]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[6]Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images

B. Olshausen, D. Field - 1996

5 papers in library cite

I can see the contribution, but overall the paper was uninteresting

[7]Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Bengio is perfect. This is everything that Hinton's paper hoped to be. Very well explained, and also tying back to real use cases (not just "hey, the math works and it reduced the score")

[8]What Is the Best Multi-Stage Architecture for Object Recognition?

K. Jarrett, Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2009

20 papers in library cite

So boring and convoluted... They try to make things more abstract by saying "filter banks" and "2 stages", but in the end it's CNNs. It seems like they are trying to tie NNs to the rest of ML, but it doesn't work.

[9]Self-Taught Learning: Transfer Learning From Unlabeled Data

Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, A. Ng - 2007

7 papers in library cite

It's nice to make the distinction of self-taught learning vs. semi-supervised learning. However, it doesn't talk about NNs. Overall, LeCun's CNN unsupervised learning was better.

[10]Scaling Learning Algorithms Towards AI

Yoshua Bengio, Yann Lecun - 2007

15 papers in library cite

I should have read this sooner! Such a good explanation of why deep learning > other stuff! Also, better than Bengio's 2006 Learning Deep Archs for AI

[11]Visualizing Higher-Layer Features of a Deep Network

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pascal Vincent - 2009

4 papers in library cite

Very nice the way that they tackled it as an optimization problem using gradient descent. I think this is a similar approach to adversarial examples (not sure if this is what inspired them, I don't remember)

[12]Unsupervised Learning of Invariant Feature Hierarchies With Applications to Object Recognition

Marc'aurelio Ranzato, F. Huang, Y. Boureau, Yann Lecun - 2007

8 papers in library cite

I thought that I had learned pretty much everything before 2015, but this paper proved me wrong. Amazing read and very nice methodology. Very simple as well!

[13]Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition

Dan C. Ciresan, Ueli Meier, Luca M. Gambardella, Jürgen Schmidhuber - 2010

10 papers in library cite

It's short, simple and straight to the point (as the network in the paper). It's refreshing to read something less "academic" and more "in your face"

[14]Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

Honglak Lee, R. Grosse, R. Ranganath, Andrew Y. Ng - 2009

12 papers in library cite

[15]Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments

G. B. Huang, M. Ramesh, T. Berg, E. L. Miller - 2007

5 papers in library cite

Seems relevant but I am tired of reading papers about datasets, especially the early ones.

[16]Sparse Deep Belief Net Model for Visual Area V2

Honglak Lee, C. Ekanadham, A. Ng - 2008

10 papers in library cite

[17]Hierarchical Models of Object Recognition in Cortex

M. Riesenhuber, T. Poggio - 1999

8 papers in library cite

[18]An Analysis of Single-Layer Networks in Unsupervised Feature Learning

A. Coates, A. Ng, Honglak Lee - 2011

7 papers in library cite

[19]Neocognitron: A New algorithm for Pattern Recognition Tolerant of Deformations and Shifts in Position

Kunihiko Fukushima, S. Miyake - 1982

7 papers in library cite

[20]Efficient Sparse Coding Algorithms

Honglak Lee, Alexis Battle, Rajat Raina, A. Ng - 2007

6 papers in library cite

[21]Receptive Fields of Single Neurons in the Cat's Striate Cortex

D. H. Hubel, T. N. Wiesel - 1959

6 papers in library cite

[22]Why Is Real-World Visual Object Recognition Hard?

N. Pinto, D. D. Cox, J. J. Dicarlo - 2008

5 papers in library cite

[23]High-Dimensional Signature Compression for Large-Scale Image Classification

J. Sanchez, F. Perronnin - 2011

4 papers in library cite

[24]ICA With Reconstruction Cost for Efficient Over-Complete Feature Learning

Quoc Le, A. Karpenko, J. Ngiam, A. Ng - 2011

4 papers in library cite

[25]Large-Scale Deep Unsupervised Learning Using Graphics Processors

Rajat Raina, A. Madhavan, Andrew Y. Ng - 2009

4 papers in library cite

[26]On Optimization Methods for Deep Learning

Quoc V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Andrew Y. Ng - 2011

4 papers in library cite

[27]Tiled Convolutional Neural Networks

Quoc V. Le, J. Ngiam, Ziru Chen, D. Chia, P. W. Koh, Andrew Y. Ng - 2010

4 papers in library cite

[28]Traffic Sign Recognition With Multi-Scale Convolutional Networks

P. Sermanet, Yann Lecun - 2011

4 papers in library cite

[29]Emergence of Complex-Like Cells in a Temporal Product Network With Local Receptive Fields

K. Gregor, Yann Lecun - 2010

3 papers in library cite

[30]Nonlinear Image Representation Using Divisive Normalization

S. Lyu, E. Simoncelli - 2008

3 papers in library cite

[31]Slow Feature Analysis Yields a Rich Repertoire of Complex Cell Properties

P. Berkes, L. Wiskott - 2005

3 papers in library cite

[32]Wsabie: Scaling Up to Large Vocabulary Image Annotation

Jason Weston, Samy Bengio, Nicolas Usunier - 2011

3 papers in library cite

[33]How Does the Brain Solve Visual Object Recognition?

J. J. Dicarlo, D. Zoccolan, N. C. Rust - 2012

2 papers in library cite

[34]Invariant Visual Representation by Single Neurons in the Human Brain

R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, I. Fried - 2005

2 papers in library cite

[35]A New Benchmark for Stereo-Based Pedestrian Detection

C. Keller, M. Enzweiler, D. M. Gavrila - 2009

1 paper in library cites

[36]Aging and the Human Neocortex

B. Pakkenberg, P. D. Marner, L. Bundgaard, M. J. Gundersen, H. J. G. Nyengaard, J. R. Regeur - 2003

1 paper in library cites

[37]Cat Head Detection - How to Effectively Exploit Shape and Texture Features

Wenxuan Zhang, Jian Sun, X. Tang - 2008

1 paper in library cites

[38]Natural Image Statistics

A. Hyvarinen, J. Hurri, P. O. Hoyer - 2009

1 paper in library cites

[39]Stimulus-Selective Properties of Inferior Temporal Neurons in the Macaque

R. Desimone, T. Albright, C. Gross, C. Bruce - 1984

1 paper in library cites

[40]What Does Classifying More Than 10,000 Image Categories Tell Us?

J. Deng, A. Berg, K. Li, Li Fei Fei - 2010

1 paper in library cites

Cited by

10

papers in your library

Cites

14

papers in your library

Read

on October 19, 2025

Very nice and very early work - seems very simple but very insightful to use an autoencoder to detect objects. Also, very similar to the neocognitron :)

Tags

Paper Aliases

No aliases