2003

Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

John C. Platt

citations

Cite Score

71

AI summary

This paper introduces best practices for document analysis using convolutional neural networks. The authors expand the MNIST dataset by adding elastic distortions, achieving state-of-the-art results. They propose a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.

Main Contributions

  • Propose expanding the training set by adding a new form of distorted data.
  • Show that convolutional neural networks are better suited for visual document tasks than fully connected networks.
  • Propose a simple implementation of convolution with a flexible architecture suitable for many visual document problems.
  • Achieve state-of-the-art performance on document analysis with the MNIST dataset.
  • Achieve 0.4% error, which is the best result to date on the MNIST database.

Abstract

Neural networks are a powerful technology for classification of visual inputs arising from documents. However, there is a confusing plethora of different neural network methods that are used in the literature and in industry. This paper describes a set of concrete best practices that document analysis researchers can use to get good results with neural networks. The most important practice is getting a training set as large as possible: we expand the training set by adding a new form of distorted data. The next most important practice is that convolutional neural networks are better suited for visual document tasks than fully connected networks. We propose that a simple “do-it-yourself” implementation of convolution with a flexible architecture is suitable for many visual document problems. This simple convolutional neural network does not require complex methods, such as momentum, weight decay, structure-dependent learning rates, averaging layers, tangent prop, or even finely-tuning the architecture. The end result is a very simple yet general architecture which can yield state-of-the-art performance for document analysis. We illustrate our claims on the MNIST set of English digit images.

Citation Graph

Loading graph...

References [9]

Sort:
Filter:

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Yann Lecun - 1998

8 papers in library cite

C. M. Bishop - 1995

12 papers in library cite

D. Decoste, B. Scholkopf - 2002

6 papers in library cite

A. Sinha - 1999

1 paper in library cites

Yi Tay, P. Lallican, M. Khalid, C. V. Gaudin, S. Knerr - 2001

1 paper in library cites

L. Yaeger, Richard Lyon, B. Webb - 1996

1 paper in library cites

K. M. Hornik, Maxwell Stinchcombe, Halbert White - 1990

1 paper in library cites

Cited by

12

papers in your library

Cites

3

papers in your library

Read

on June 28, 2025

Your review

Tags

Paper Aliases

No aliases