Papperoni

2003

Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

John C. Platt

citations

Cite Score

AI summary

This paper introduces best practices for document analysis using convolutional neural networks. The authors expand the MNIST dataset by adding elastic distortions, achieving state-of-the-art results. They propose a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.

Main Contributions

Propose expanding the training set by adding a new form of distorted data.
Show that convolutional neural networks are better suited for visual document tasks than fully connected networks.
Propose a simple implementation of convolution with a flexible architecture suitable for many visual document problems.
Achieve state-of-the-art performance on document analysis with the MNIST dataset.
Achieve 0.4% error, which is the best result to date on the MNIST database.

Abstract

Neural networks are a powerful technology for classification of visual inputs arising from documents. However, there is a confusing plethora of different neural network methods that are used in the literature and in industry. This paper describes a set of concrete best practices that document analysis researchers can use to get good results with neural networks. The most important practice is getting a training set as large as possible: we expand the training set by adding a new form of distorted data. The next most important practice is that convolutional neural networks are better suited for visual document tasks than fully connected networks. We propose that a simple “do-it-yourself” implementation of convolution with a flexible architecture is suitable for many visual document problems. This simple convolutional neural network does not require complex methods, such as momentum, weight decay, structure-dependent learning rates, averaging layers, tangent prop, or even finely-tuning the architecture. The end result is a very simple yet general architecture which can yield state-of-the-art performance for document analysis. We illustrate our claims on the MNIST set of English digit images.

Citation Graph

Loading graph...

References [9]

Sort:

Filter:

[1]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Google Scholar

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[2]The MNIST Database of Handwritten Digits

Yann Lecun - 1998

8 papers in library cite

Google Scholar