Papperoni

2009

Large-Scale Object Recognition With CUDA-accelerated Hierarchical Neural Networks

R. Uetz, Sven Behnke

citations

Cite Score

AI summary

This paper introduces a hierarchical, locally-connected neural network model (LCNP) accelerated by NVIDIA CUDA, suited for large-scale object recognition. A new realistic dataset was created from the LabelMe dataset, and the model achieved a testing error rate of 0.76% and 2.87% on MNIST and NORB datasets, respectively.

Main Contributions

Introduced a hierarchical, locally-connected neural network model (LCNP) optimized for large-scale, high-performance object recognition.
Implemented the model using the NVIDIA CUDA framework, allowing for massively parallel execution on a state-of-the-art graphics card.
Created a new realistic dataset by extracting a large number of objects from the LabelMe dataset of natural images.
Achieved a testing error rate of 0.76% on the MNIST dataset and 2.87% on the NORB dataset.
Demonstrated a speedup factor of up to 82 times compared to a single-core CPU version of the system.

Abstract

Robust recognition of arbitrary object classes in natural visual scenes is an aspiring goal with numerous practical applications, for instance, in the area of autonomous robotics and autonomous vehicles. One obstacle on the way towards human-like recognition performance is the limitation of computational power, restricting the size of the training and testing dataset as well as the complexity of the object recognition system. In this work, we present a hierarchical, locally-connected neural network model that is well-suited for large-scale, high-performance object recognition. By using the NVIDIA CUDA framework, we create a massively parallel implementation of the model which is executed on a state-of-the-art graphics card. This implementation is up to 82 times faster than a single-core CPU version of the system. This significant gain in computational performance allows us to evaluate the model on a very large, realistic, and challenging set of natural images which we extracted from the LabelMe dataset. To compare our model to other approaches, we also evaluate the recognition performance using the well-known MNIST and NORB datasets, achieving a testing error rate of 0.76% and 2.87%, respectively.

Citation Graph

Loading graph...

References [13]

Sort:

Filter:

[1]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Google Scholar

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[2]The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain

F. Rosenblatt - 1958

3 papers in library cite

Google Scholar

Very nice intro to NNs!

[3]Efficient Backprop

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Google Scholar

The first half is very very good. The remainder is very boring.

[4]Learning Generative Visual Models From Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

Li Fei Fei, Rob Fergus, Pietro Perona - 2004

15 papers in library cite

Google Scholar

I think most people cite this thinking this is where the Caltech 101 dataset comes from (it's not). Anyway, it's just an extension of the other dataset and it's very mathy, not NNs, and uninteresting.

[5]Labelme: A Database and Web-Based Tool for Image Annotation

Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, William T. Freeman - 2008

10 papers in library cite

Google Scholar

It's a good paper overall but not worth the read. They just describe the platform (which for the time may have been a paradigm shift from in-house datasets). Maybe the basis for Amazon MT?

[6]Learning Methods for Generic Object Recognition With Invariance to Pose and Lighting

Yann Lecun, Fu Jie Huang, Leon Bottou - 2004

18 papers in library cite

Google Scholar

Good paper, nice methodology for creating different images. However, I think that this was not too impactful... I don't see this being used a lot.

[7]The MNIST Database of Handwritten Digits

Yann Lecun - 1998

8 papers in library cite

Google Scholar

Not a paper - it's actually a dataset

[8]Large-Scale Learning With SVM and Convolutional Nets for Generic Object Categorization

F. Huang, Yann Lecun - 2006

5 papers in library cite

Google Scholar

[9]Why Is Real-World Visual Object Recognition Hard?

N. Pinto, D. D. Cox, J. J. Dicarlo - 2008

5 papers in library cite

Google Scholar

[10]Hierarchical Neural Networks for Image Interpretation

Sven Behnke - 2003

3 papers in library cite

Google Scholar

[11]NVIDIA CUDA Programming Guide

Nvidia - 2009

2 papers in library cite

Google Scholar

[12]Principles of Neural Science

E. R. Kandel, J. H. Schwartz, T. M. Jessel - 2000

1 paper in library cites

Google Scholar

Missing year

[13]The LabelMe-12-50k Dataset

R. Uetz

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on August 2, 2025

It's not a bad paper, but it seems like they tried to put too much stuff in it - the CNN part is very forgettable and the CUDA part is not so well explained as other papers.