1987

Large Automatic Learning, Rule Extraction and Generalization

John Denker, Daniel Schwartz, Ben Wittner, Sara A. Solla, Richard Howard, Lawrence Jackel, John Hopfield

citations

Cite Score

24

AI summary

This paper explores automatic learning, rule extraction, and generalization in layered neural networks, proposing methods to quantify learning task complexity, demonstrating efficient representations for the "two-or-more clumps" predicate, and discussing challenges in escaping local minima and the role of preprocessors in achieving robust generalization.

Main Contributions

  • Introduced definitions for memorization, generalization, and rule extraction for layered networks.
  • Proposed a method to measure the information content of a learning task and the efficiency of information extraction by a network.
  • Demonstrated that a two-layer network can efficiently represent the "clump-counting problem" and any Boolean function.
  • Investigated the stability of the "human, geometric solution" in network weight space and found it to be neutrally unstable.
  • Emphasized the critical role of preprocessors and architectural constraints in achieving effective generalization and rule extraction, arguing against purely 'tabula rasa' networks for practical problems.

Abstract

Since antiquity, man has dreamed of building a device that would "learn from examples", "form generalizations", and "discover the rules" behind patterns in the data. Recent work has shown that a highly connected, layered network of simple analog processing elements can be astonishingly successful at this, in some cases. In order to be precise about what has been observed, we give definitions of memorization, generalization, and rule extraction. The most important part of this paper proposes a way to measure the entropy or information content of a learning task and the efficiency with which a network extracts information from the data. We also argue that the way in which the networks can compactly represent a wide class of Boolean (and other) functions is analogous to the way in which polynomials or other families of functions can be "curve fit" to general data; specifically, they extend the domain, and average noisy data. Alas, finding a suitable representation is generally an ill-posed and ill-conditioned problem. Even when the problem has been "regularized", what remains is a difficult combinatorial optimization problem. When a network is given more resources than the minimum needed to solve a given task, the symmetric, low-order, local solutions that humans seem to prefer are not the ones that the network chooses from the vast number of solutions available; indeed, the generalized delta method and similar learning procedures do not usually hold the "human" solutions stable against perturbations. Fortunately, there are ways of "programming" into the network a preference for appropriately chosen symmetries.

Citation Graph

Loading graph...

References [39]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

Andrei N. Kolmogorov - 1968

1 paper in library cites

D. Rumelhart, J. Mcclelland, T. P. R. Group - 1986

5 papers in library cite

M. Minsky, S. Papert - 1969

12 papers in library cite

Geoffrey E. Hinton, T. J. Sejnowski - 1986

9 papers in library cite

T. J. Sejnowski, C. R. Rosenberg - 1986

6 papers in library cite

S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi - 1983

6 papers in library cite

B. Widrow, S. D. Stearns - 1985

4 papers in library cite

T. Kohonen - 1988

3 papers in library cite

C. V. D. Malsburg - 1973

3 papers in library cite

W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling - 1986

2 papers in library cite

D. J. Burr - 1981

1 paper in library cites

D. J. Burr - 1986

1 paper in library cites

L. D. Jackel, H. P. Graf, R. E. Howard - 1987

1 paper in library cites

W. Jeffrey, R. Rosner - 1986

1 paper in library cites

J. J. Hopfield, D. W. Tank - 1985

1 paper in library cites

Y. A. Mostafa - 1986

1 paper in library cites

R. W. Prager, T. D. Harrison, F. Fallside - 1986

1 paper in library cites

M. R. Garey, D. S. Johnson - 1979

1 paper in library cites

D. Angluin, C. Smith - 1983

1 paper in library cites

Missing year

H. P. Graf, W. Hubbard, L. D. Jackel

1 paper in library cites

D. J. Burr - 1987

1 paper in library cites

Timothy Maxwell, C. L. Giles, Y. C. Lee - 1987

1 paper in library cites

R. P. Lippmann - 1987

1 paper in library cites

J. H. Holland, K. J. Holyoak, R. E. Nisbett, P. R. Thagard - 1986

1 paper in library cites

E. W. Packel, J. F. Traub - 1987

1 paper in library cites

C. Mead, L. Conway - 1980

1 paper in library cites

R. Scalettar, A. Zee - 1987

1 paper in library cites

P. Gorman, T. H. Sejnowski - 1987

1 paper in library cites

T. J. Sejnowski, P. K. Kienker, Geoffrey E. Hinton - 1986

1 paper in library cites

J. Hadamard - 1923

1 paper in library cites

D. J. Burr - 1982

1 paper in library cites

H. H. Chen, Y. C. Lee, G. Z. Sun, H. Y. Lee, Timothy Maxwell, C. L. Giles - 1986

1 paper in library cites

Missing year

Yann Lecun

1 paper in library cites

R. E. Howard - 1987

1 paper in library cites

Missing year

D. Parker

1 paper in library cites

J. J. Hopfield, D. W. Tank - 1987

1 paper in library cites

A. N. Tikhonov, V. Y. Arsenin - 1977

1 paper in library cites

Missing year

S. Ryckebusch, Sara A. Solla, John S. Denker, B. S. Wittner, D. B. Schwartz

1 paper in library cites

Cited by

4

papers in your library

Cites

5

papers in your library

Read

on January 17, 2026

Your review

Tags

Paper Aliases

No aliases