2004

Learning Methods for Generic Object Recognition With Invariance to Pose and Lighting

Yann Lecun, Fu Jie Huang, Leon Bottou

citations

Cite Score

54

AI summary

This paper introduces a study on generic object recognition with invariance to pose and lighting, utilizing the NORB dataset, a large stereo image dataset. Nearest Neighbor methods, SVM, and Convolutional Networks were evaluated, achieving a test error rate of 7% using Convolutional Networks.

Main Contributions

  • Introduces the NORB dataset, a large-scale dataset for generic object recognition comprising stereo image pairs of 50 objects under varying conditions.
  • Evaluates Nearest Neighbor methods, Support Vector Machines, and Convolutional Networks for generic shape recognition.
  • Achieves a test error rate of 7% for unseen object instances on uniform backgrounds using Convolutional Networks.
  • Demonstrates the effectiveness of Convolutional Networks for simultaneous detection and recognition tasks with highly cluttered images, yielding a 14% error rate.
  • Implements a real-time version of the system capable of detecting and classifying objects in natural scenes at around 10 frames per second.

Abstract

We assess the applicability of several popular learning methods for the problem of recognizing generic visual cat- egories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions was collected (for a total of 194,400 in- dividual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest Neighbor methods, Support Vector Ma- chines, and Convolutional Networks, operating on raw pix- els or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for Convolutional Nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while Convolutional nets yielded 14% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

Citation Graph

Loading graph...

References [21]

Sort:
Filter:

H. Rowley, S. Baluja, Takeo Kanade - 1998

4 papers in library cite

P. Viola, M. J. Jones - 2001

10 papers in library cite

H. Schneiderman, Takeo Kanade - 2000

3 papers in library cite

Cordelia Schmid, R. Mohr - 1997

3 papers in library cite

B. Leibe, B. Schiele - 2003

2 papers in library cite

B. Moghaddam, A. Pentland - 1995

2 papers in library cite

O. Chapelle, Patrick Haffner, V. Vapnik - 1999

2 papers in library cite

Ronan Collobert, Samy Bengio, J. Mariethoz - 2002

2 papers in library cite

A. Selinger, R. Nelson - 2001

1 paper in library cites

Jitendra Malik, S. Belongie, T. Leung, J. Shi - 2001

1 paper in library cites

Missing year

M. Partridge, R. Calvo

1 paper in library cites

Sandhini Agarwal, Dan Roth - 2002

1 paper in library cites

S. Belongie, Jitendra Malik, J. Puzicha - 2001

1 paper in library cites

O. Carmichael, M. Hebert - 2002

1 paper in library cites

Jean Ponce, M. Cepeda, S. Pae, S. Sullivan - 1999

1 paper in library cites

M. Pontil, A. Verri - 1998

1 paper in library cites

M. Weber, M. Welling, Pietro Perona - 2000

1 paper in library cites

E. Osuna, R. Freund, F. Girosi - 1997

1 paper in library cites

S. Ullman, M. V. Naquet, E. Sali - 2002

1 paper in library cites

H. Murase, S. Nayar - 1995

1 paper in library cites

Cited by

18

papers in your library

Cites

1

papers in your library

Read

on July 31, 2025

Your review

Tags

Paper Aliases

No aliases