Papperoni

2006

High Performance Convolutional Neural Networks for Document Processing

K. Chellapilla, S. Puri, Patrice Y. Simard

citations

Cite Score

AI summary

This paper introduces three novel approaches to speeding up CNNs: unrolling convolution, using BLAS, and using GPUs. The results on character recognition problems indicate that unrolled convolution with BLAS produces a dramatic 2.4X-3.0X speedup and the GPU implementation produces a 3.1X–4.1X speedup.

Main Contributions

Introduces a novel unrolling convolution method for speeding up CNNs.
Presents a BLAS based approach to efficiently compute matrix products on the CPU.
Presents a pixel shader based GPU implementation of CNNs.
Achieves a dramatic 2.4X-3.0X speedup using unrolled convolution with BLAS.
Achieves a 3.1X-4.1X speedup using the GPU implementation.

Abstract

Convolutional neural networks (CNNs) are well known for producing state-of-the-art recognizers for document processing [1]. However, they can be difficult to implement and are usually slower than traditional multi-layer perceptrons (MLPs). We present three novel approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units). Unrolled convolution converts the processing in each convolutional layer (both forward-propagation and back-propagation) into a matrix-matrix product. The matrix-matrix product representation of CNNs makes their implementation as easy as MLPs. BLAS is used to efficiently compute matrix products on the CPU. We also present a pixel shader based GPU implementation of CNNs. Results on character recognition problems indicate that unrolled convolution with BLAS produces a dramatic 2.4X-3.0X speedup. The GPU implementation is even faster and produces a 3.1X–4.1X speedup.

Citation Graph

Loading graph...

References [9]

Sort:

Filter:

[1]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Google Scholar

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[2]Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

John C. Platt - 2003

12 papers in library cite

Google Scholar

Very good paper. Simple and pragmatic.

[3]Using GPUs for Machine Learning Algorithms

D. Steinkraus, I. Buck, Patrice Simard - 2005

3 papers in library cite

Google Scholar

It's not bad, it's just not worth the read.

[4]Multi-Digit Recognition Using a Space Displacement Neural Network

O. Matan, C. J. C. Burges, Yann Lecun, John S. Denker - 1992

3 papers in library cite

Google Scholar

Meh. They just use a CNN to read more digits, but doesn't seem to have a ton of impact.

[5]The MNIST Database of Handwritten Digits

Yann Lecun - 1998

8 papers in library cite

Google Scholar

Not a paper - it's actually a dataset

[6]Intel Math Kernel Library (MKL)

Intel - 2009

2 papers in library cite

Google Scholar

[7]Linear Operators for GPU Implementation of Numerical Algorithms

J. Kruger, R. Westermann - 2003

2 papers in library cite

Google Scholar

Missing author listMissing year

[8]Automatically Tuned Linear Algebra Software (ATLAS)

1 paper in library cites

Google Scholar

Missing author listMissing year

[9]The AMD Core Math Library (ACML)

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on August 3, 2025

It's ok. It's not bad, it's just a bit boring, and I don't think it's bringing anything new. People already implemented unrolled CNN and GPUs, they just brought it all together.