2006
Cite Score
37
AI summary
This paper introduces three novel approaches to speeding up CNNs: unrolling convolution, using BLAS, and using GPUs. The results on character recognition problems indicate that unrolled convolution with BLAS produces a dramatic 2.4X-3.0X speedup and the GPU implementation produces a 3.1X–4.1X speedup.
Main Contributions
Abstract
Convolutional neural networks (CNNs) are well known for producing state-of-the-art recognizers for document processing [1]. However, they can be difficult to implement and are usually slower than traditional multi-layer perceptrons (MLPs). We present three novel approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units). Unrolled convolution converts the processing in each convolutional layer (both forward-propagation and back-propagation) into a matrix-matrix product. The matrix-matrix product representation of CNNs makes their implementation as easy as MLPs. BLAS is used to efficiently compute matrix products on the CPU. We also present a pixel shader based GPU implementation of CNNs. Results on character recognition problems indicate that unrolled convolution with BLAS produces a dramatic 2.4X-3.0X speedup. The GPU implementation is even faster and produces a 3.1X–4.1X speedup.
Citation Graph
References [9]
Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998
62 papers in library cite
John C. Platt - 2003
12 papers in library cite
D. Steinkraus, I. Buck, Patrice Simard - 2005
3 papers in library cite
O. Matan, C. J. C. Burges, Yann Lecun, John S. Denker - 1992
3 papers in library cite
Yann Lecun - 1998
8 papers in library cite
Intel - 2009
2 papers in library cite
J. Kruger, R. Westermann - 2003
2 papers in library cite
1 paper in library cites
Cited by
3
papers in your library
Cites
5
papers in your library
Read
on August 3, 2025
Your review
Tags
Paper Aliases
No aliases